This is a cross-post to an article I’ve written about refactoring on Martin Fowler’s site – “This class is too large – An example of refactoring from a real (flawed) code base.“
Category: Coding skills
Some notes I made on the Liskov Substitution Principle (LSP) – which is the L in SOLID.
I can never remember the details of Liskov, which is why I made these notes.
These notes are also visible on my new personal wiki site, where I briefly describe the S, O, I and D in SOLID as well.
- “objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.” See also design by contract.
- Code that uses a base class must be able to substitute a subclass without knowing it
- The specific functionality of the subclass may be different but must conform to the expected behaviour of the base class.
- So, I can call FemaleMammal.GiveBirth and I can substitute a cow into that – Cow.GiveBirth without breaking anything
- No matter which version I choose, I should be confident that I will end up with a baby animal. Even though the specifics might be slightly different.
- These are the principles which must be adhered to:
- Contravariance with parameters
- There should be contravariance between parameters of the base class’s methods and the matching parameters in subclasses. This means that the parameters in subclasses must either be the same types as those in the base class or must be less restrictive.
- Parameters in subclasses are either the same or have more / add extra functionality
- FemaleMammal.GiveBirth(Mammal mothersBestFriend)
- Cow.GiveBirth(Animal mothersBestFriend)
- See Covariance and Contravariance
- Covariance with return values
- There must be covariance between method return values in the base class and its subclasses. This specifies that the subclass’ return types must be the same as, or more restrictive than, the base class’ return types.
- Return values in subclasses are either the same or have less functionality
- Mammal baby = FemaleMammal.GiveBirth();
- Bovine baby = Cow.GiveBirth();
- See Covariance and Contravariance
- The preconditions of a base class must not be strengthened by a subclass
- A precondition of a class is a rule that must be in place before an action can be taken.
- FemaleMammal must not be a virgin and must never have had a Caesarian before giving birth
- Cow must not be a virgin before giving birth
- Postconditions cannot be weakened in subclasses.
- Postconditions describe the state of objects after a process is completed.
- After female mammal gives birth, it must have an associated baby that is a mammal
- After cow gives birth, it must have an associated baby that is a bovine (this is fine)
- After cow gives birth, it must have an associated baby that is any kind of animal (this would not be fine)
- The invariants of a base class must not be changed by a subclass.
- An invariant describes a condition of a process that is true before the process begins and remains true afterwards.
- FemaleMammal is female both before and after birth
- Cow is female both before and after birth
- New or modified members should not modify the state of an object in a manner that would not be permitted by the base class
- FemaleMammal gains a newly populated Baby property and the baby has a head
- Cow gains a newly populated Baby property (this is fine)
- Cow gains a newly populated CowUdders property (this is not fine)
- A subclass should not throw exceptions that are not thrown by the base class unless they are subtypes of exceptions that may be thrown by the base class
- FemaleMammal.GiveBirth throws a BreachedBaby exception
- Cow.GiveBirth throws a BreachedBabyCow exception (this is fine)
- Cow.GiveBirth throws a NotEnoughGrass exception (this is not fine)
- Contravariance with parameters
- For instance, if you want a read-only file type, it should not inherit from File – and File should not have a Save method. Rather, the base File type has a Load method, and it has a subclass – WriteableFile – which has a Save method. Thus the base class cannot be saved. (otherwise you would have ReadOnlyFile which was not able to implement the Save method on its parent, and also left the object in a different state after calling Save than what you would expect – particularly if its Save method threw an exception which was not thrown by the base class)
We talked about a lot of things related to teaching and learning: different learning models, standards for education in software and how to create a culture that allows people to not know things.
During my talk at NDC London this week, I promised to publish a list of resources you can use if you are trying to persuade people of the efficacy of paired programming as a software development technique. Here it is!
(incidentally, the image is of Sal Freudenberg and me doing some remote pairing – we use Zoom and we find it works really well – I often forget we are not in the same room).
- Video on pairing from some ThoughtWorks colleagues
- Pair Programming Workshop – LeanPub book by Yves Hanoulle (partially edited by me)
- Pair Programming – The Most Extreme XP Practice? by Dave Farley
- Nawrocki/Wojciechowski, 2001 – “Experimental Evaluation of Pair Programming”
- On the Economic Feasibility of Pair Programming, Hakan Erdogmus & Laurie Williams
- Cockburn/Williams, 2001 – “The costs and benefits of pair programming”
- Good general reading – “So how good is pair programming, really?” by Rickard Dahlstrom
- Shorter reading, good for mgmt — “Pair Programming Benefits – Two Heads Are Better than One” by Jeff Langr, Tim Ottinger
- Old but still solid research — Strengthening the Case for Pair Programming” by Williams, Kessler, Cunningham, Jeffries
- Mini-book – “Pair Programming” by Naresh Jain
- Marlena Compton, Sustainable Programming
- ‘The Case for Collaborative Programming’ Nosek 1998
- ‘Pair Programming is about Business Continuity‘ Dave Hounslow
- letspair: A set of zines about pair programming, an artsy print of a framework to help you get going & be a plushy pear to be your pairing patronus
- Original discussion on PairProgramming in C2
- You will never work alone again, Naresh Jain
- Promiscuous Pairing and Beginners Mind, Mitch Lacey
- All I Really Need to Know about Pair Programming I Learned In Kindergarten, Laurie A Williams
- Pomodoro Technique, Francesco Cirillo
- The Core Protocols, Jim & Michele McCarthy
- The Costs and Benefits of Pair Programming⁷, Laurie Williams & Alistair Cockburn
- Pair Programming and expertise, Sal Freudenberg
- Pair Programming uptake and Peripheral Awareness, Sal Freudenberg
- Pair programming: pluses and minuses, bonuses and anti-patterns, Nadya Klim
- How to Piss off your pair
- Pair programming Anti Patterns, Naresh Jain, JB Rainsberger
- Pair Staircase
- Pair programming illuminated, Williams, Kessler
- Rethinking Pair-Programming, Sandro Mancuso
- Does PairProgramming work, Dave Nicolette
- Pair programming in teams with diverse skill and experience levels
- Roles in Pair-Programming Across Skill: Leader/Adopter Model, Irena 2017
- Root causes of friction in pairs: Expert Induced Amnesia, Irena 2017
- The needs aspects of Pair Programming, Irena 2017
- Remote Pair Programming, Todd Sedano
- Considerate Pair Programming – Part 1
- Considerate Pair Programming – Part 2
- Considerate Pair Programming – Part 3
- How to pair effectively in 6 steps
- Llewellyn’s strong-style pairing
- Pair Programming: Taking turns with your pair
- The Three Pairs of Pair Programming
- The power dynamics in Pair Programming – Twitter thread
- Power dynamics in pairs and mobs
- “Our First Experiment with Promiscuous Pairing Is a Success”
I got pinged by my colleague @j_f_green this morning with the following message: “Hey Clare. Hope all is well. Was wondering if I could be cheeky and get your input on a (small) thing I’m trying to solve? I am searching for an elegant solution and I’m starting to suspect the answer may involve maths.”
…and of course I was instantly interested.
It turned out to be a relatively simple problem, depending on your point of view. If I say “GCSE-level simultaneous equations” your reaction will tell you whether that’s simple for you or not (for people not in theUK, GCSEs are the exams our pupils take when they are 16 years old).
Here is the problem: We want to discover two whole numbers (aka positive integers) that will give us the height and width of a grid in cells, given the following constraints:
- We have a number, let’s call it capacity, which represents the minimum number of cells to fill the grid. We want at least that many cells. We’re happy if we have more, but not if we have fewer.
- We also have another number, we’ll call it aspectRatio, which is the aspect ratio of the grid. This is the relationship between height and width. So for instance, if the aspect ratio is 2:1, that means the width is twice the height (it’s twice as wide as it is high). To reduce that to a single number, aspectRatio is 2.
width = 2 * height = aspectRatio * height
(aspectRatio will often be a decimal or a fraction, so for instance if width is half height, aspectRatio is 0.5, or if the ratio is 2:3 then aspectRatio = 2/3 = 0.66, and width = 0.66 * height)
A concrete example
Here is a concrete example: Let’s say we have a grid that is twice as wide as it is high, so aspectRatio = 2. Let’s also say that our capacity is 200: We want to fit at least 200 cells into this space. I’ve deliberately chosen an easy example, and the answer in this case would be a width of 20 and a height of 10:
Apologies if that’s making your eyes hurt. Is it making your eyes hurt? It’s making my eyes hurt. I’ll try and make a better diagram later. It’s not even got the correct aspect ratio on the page…
If you enjoy attacking maths problems, look away now. Have a go at solving it before you read on. You may come up with a better solution anyway.
So how do we compute the answer?
One way would be a loop. Here is some code that would do the job:
var capacity = 1000;
var aspectRatio = 1000 / 800 = 1.25;
int width = 0;
int height = 0;
while (width * height < capacity)
height += 1;
width = RoundToNearestInteger(height * aspectRatio);
Here we have a grid that has at least 1000 cells:
36 * 29 = 1044
It has roughly the correct aspect ratio:
36 / 29 = 1.24
1000 / 800 = 1.25
The problem with this is that it does not scale well. The higher the numbers, the longer the loop. But fear not – maths to the rescue.
A simple use case
x * y = 200andx = 2 * y = 2y
2y² = 200y² = 100y = 10x = 20 (because it’s 2y)
A generalised solution
The first block above gives us our two simultaneous equations.
x * y = aandx = b * y
by² = a
y² = a/bandy = sqrt(a/b)
x = b * y
y >= sqrt(a/b)
height >= sqrt(capacity / aspectRatio)width = aspectRatio * height
height >= 28.28 [= sqrt(1000 / 1.25)]width = 36 [= RoundUp(1.25 * height)]So height = 29, width = 36, aspectRatio = 36 / 29 = 1.24, num cells = 29 * 36 = 1044
var heightUnrounded = SquareRoot(capacity / aspectRatio);var height = RoundUp(heightUnrounded);var width = RoundUp(heightUnrounded * aspectRatio);
These may well be the notiest notes I’ve ever published, but just in case they’re of any use to anyone… if nothing else they may whet your appetite for the new edition of Martin Fowler’s Refactoring book.
I confess I never read it first time round (not on purpose, just there are so many books in the world and so little time…), so I’m looking forward to reading it this time. It hasn’t come out yet in the UK but should be some time in the next few weeks. Mine is already on order [drums fingers impatiently].
So anyway, Martin did a webinar today on the topic of his new book, and here are my notes:
Refactoring should be done via small changes
- v small semantics-preserving changes
- So small they’re not worth doing on their own
- String together small changes to make a big change
When adding new functionality:
- Alternate between refactoring and adding functionality
- Often it’s easier to add new functionality if you refactor first
- Balance between adding new functionality and refactoring – it’s a matter of judgement
V1 of Martin’s Refactoring book vs v2
- Some things that were included in the first book he decided not to include I the second
- Eg Unidirectional vs bidirectional – not important
- Some things he thought were too trivial for the first book, he has now changed his mind about and included
- Eg moving statements within a function
- Most notable changes = no longer centrally object-oriented
- “Extract function” used throughout instead of extract method
- One of the refactorings is around the choice between OO and non-OO
- But language is not necessarily relevant anyway
- OO vs functional
- He doesn’t see that as a huge shift
- OO should still have functional elements – referentially transparent
- They are overlapping paradigms, not distinct
- “Split phase”
- Hunk of computation which can sensibly be divided into two phases with a data structure communicating between them
- Eg parsing – separate the tokenising out – deal with a series of tokens instead of a stream of text
- But if you split a loop, what if you are introducing performance problems? See below…
Performance and refactoring
- Split loop is something people often worry about because it will run the loop through twice
- Most of the time it doesn’t matter – a push for clarity will not change the performance of the code
- Most of the time by refactoring you open up an opportunity for performance improvements you would never have noticed otherwise
- Then again you should run performance tests frequently
- Not necessarily with every build, but every day or two
- But most micro-changes have no impact on performance
- You will only find real performance impact by performance testing
Recommendations for other books?
- Scott Ambler and Pramod Sadalage, Refactoring Databases
- Automated migrations to allow us to modify a database as we go
- Working effectively with legacy Code, Michael Feathers
- Always have to ask, how do you make the smallest possible change?
- Eg extracting a data-intensive microservice from a larger system
- Most of the moves are happening within the monolith BEFORE you start firing up the external system
- Then you move stuff in really small pieces
- Things that change for similar reasons should be together – you need to see the relationship between them
Branching / trunk-based development
- Develop on a branch and merge at the end, with branch being long-lived – days, weeks or months
- Still a really common pattern
- But if something has been refactored, you don’t find out until you merge – this can blow up on you
- It means you become scared to refactor! There’s a disincentive.
- Continuous integration: Integrate with each other fully at regular intervals – at least once a day
- Every day you receive everyone else’s code changes and push your own changes back
- Removes the big painful merge problem – not saving up for a big horrible crunch
- Adage of Apollo – it hurts if you do it more often (?)
- Sorry, I wrote this down but I’ve since Googled it and I don’t really know what it’s about – I may have misheard!
- Open source development:
- Actually this CAN be a good place to use branches, because everyone working separately and you can’t guarantee all changes should be merged
- People call it trunk-based development but “continuous integration” is a better name because that’s what it’s really about
- People worry about continuous integration but it’s less painful than they think
- If you do integration every day it becomes easier and less painful until you stop even noticing
- Keep code base in healthy state
- If not you are stealing from colleagues, clients and employers!
- But no point saying you should do it for moral reasons, that won’t motivate people to do it
- It’s economic: Want to get more changes out more quickly
- It’s professional: We’re being paid to produce features rapidly
- It’s not up to managers or business: They don’t know how to keep code healthy – that’s what they pay us to do
Refactoring and design patterns – are they incompatible?
- With a lot of patterns, you shouldn’t introduce them right away cos they’re not appropriate and you don’t require the full strength of them yet, particularly the more complicated ones
- Introduce patterns as a result of refactoring: Refactor towards them
- This was recognised by the original Design Patterns authors (Gamma, Helm, Johnson, Vlissides)
- Also Josh Kerievsky: Refactoring to Patterns
- Code smell of the week!
- “This week we will look at long methods” – whenever we find them we try to refactor them
When not to refactor!
- Code that is never touched
- Sitting behind an API, working quite happily, doesn’t need to change – doesn’t matter if it’s a mess!
- Don’t clean code all in one go – just a little bit more every time you go in
- Over time that will concentrate your efforts where they are most needed – ie areas modified most frequently
Refactor as a step in the red-green-refactor cycle
- You should only refactor when tests are green
- Sometimes that means you have to remove a failing test, refactor, then put the failing test back in again
- Ralph Johnson, John Brant, Don Roberts wrote the first Smalltalk refactoring book (A refactoring tool for smalltalk?)
If you don’t have tests you can’t refactor?
- “Probably safe” refactorings? Interesting route to go? Not a great thing for most people to focus on, requires a lot of expertise
- For most people you should have tests as well – after all, how do you know whether you’re breaking what’s currently there?
- Small steps, commit frequently.
- Then you can always roll back.
I couldn’t find anywhere all the Hack Manchester 2018 challenges / prizes were listed together, so I’m putting them all on one page for easy comparison.
! I’m not sure if they’re all there yet – there might be more to come, so keep an eye on the Hack Manchester site (scroll down to the “THE CHALLENGES” – yellow section just before the schedule).
Follow the links below to find out about the sponsors and see more detail / explanation about the challenges:
WebApp UK: Best in Show. (Our team came second for Best in Show last year. Just sayin’.)
MediaBurst: “The most bizarre use of ClockworkSMS!.”
Centre For Biological Timing: “make technology work for our body clocks rather than against them … hack your body clock!”
The other day I posted this problem.
In the process of chatting about it, an associated mathematics problem came up, about permutations and probabilities.
To remind you of the basic problem: We have 450 conference attendees. We have one day of workshops. There are five sessions throughout the day, and five workshops to attend. We want every attendee to attend every workshop. The workshops are being repeated throughout the day: In every session, there are 10 rooms available. There are five workshops, and each workshop is duplicated. So for instance, workshop 1 will be run in rooms A and B, workshop 2 in rooms C and D, etc. They are repeated throughout the day, so one attendee might do them in the order 15423, and another might do 42315.
There will be 45 people in each workshop. The nub of the problem is that the organisers want as much cross-pollenation as possible. People do not get to choose what order they do the workshops: they are told which room to be in at what time. The organisers would like each attendee to meet as many new people as possible in every single session.
So here’s the maths question: What is the minimum average number of repeat people that attendees have to meet in each session? By “repeat people” I mean “people they have already attended a previous session with.”
There is one extra constraint: There is one group of 30 people (labelled “FRIENDS” in the spreadsheet below) that have to be kept together in every session. So they only get 15 people in every session that they may not have met before, ie for them the min value of Repeat People is 30 in every session. This also has an impact on all the other people in the other sessions.
However this is still an interesting problem even without the extra 30-people constraint.
So in session one, Repeat People = 0. In every one of the workshops in session 1, attendees are in a brand new group of people that they have never attended this conference with before.
So what is the minimum average value of Repeat People for session 2? I’m specifying average, because it would be possible to keep things pure for at least one attendee, where they meet 44 new people in every single session. But that would have an adverse effect on everyone else. So I’m aiming for everyone to have roughly the same number of Repeat People per workshop.
I have two theories, one of which I can prove:
- Apart from in Session 1, the average Repeat People is definitely greater than zero. You can’t avoid meeting attendees more than once (if you’re spreading the load evenly) (I can prove this one).
- The average number of Repeat People will get gradually higher throughout the day. The first solution I came up with maintains a constant number for Repeat People after session 1, but it’s non-optimal (I think).
Here is a way of visualising the problem:
I’ll be very interested to hear any solutions!
The problem is described here. You could approach the problem as a coding kata, which I’d love to do at some point – I didn’t have the time on this occasion. I ended up just solving it using pen and paper.
My solution is below. It won’t be the only solution, there are probably better ones out there that rely more on randomisation and less on patterns (or just use better patterns).
Don’t scroll down if you want to have a go at it yourself first!
So, I found a solution which works pretty well. I’m pretty pleased with it, and it’s nice and simple and neat.
The participants are split into groups of 15. The FRIENDS are in two groups of 15, both marked “FRIENDS” in the spreadsheet below. All the other groups are non-FRIENDS (well they might be friends, who knows? But they still don’t get to stay together – we’re mean like that).
It does mean that the people move in groups of 15, and they will stay in those groups. Not that they will necessarily know it. Because of the numbers, no matter what you do, they will keep encountering some people repeatedly (the maths of this is a separate problem which I’ve blogged about here). If you shuffle the list and assign the non-FRIENDS to 28 groups of 15, hopefully they will already be mixed up with people they don’t know, so they needn’t be aware that they have been grouped. I’ve labelled the groups B1 to O1 and B2 to O2.
What it does mean is that each group of 15 never encounters another group of 15 twice – they meet a different two groups in every session. So everybody meets 30 new people in every session (I haven’t actually proved this but I’m pretty confident).
(Since I came up with this, I’ve realised I can improve it quite a lot so that people don’t have to stay with exactly the same 15 people – scroll down to the bottom to see this improvement).
So, there are three groups in every session, like this (this is my actual solution):
Here’s how I did it:
First, split the non-FRIENDS people into two halves, 225 in each half. The first half has groups B1 to O1, and the second group has B2 to O2.
For now, we will also split the FRIENDS into two groups, A1 and A2 (this was my “Aha” moment, all made possible because of the fact that you have duplicate workshops in each session).
So for each half, we have letters A to O. For the first session, we just spread them across the 5 workshops: ABC in the workshop 1, DEF in workshop 2… etc.
For the second session, the first group in each triplet just shuffles along to the next workshop. So A is now in workshop 2, D is in workshop 3, etc.
We keep doing this all the way down the sessions, for the first group in each triplet.
For the second group in each triplet (B, E, H, K and N), instead of shuffling them on by 1, we shuffle them 2 workshops along. So, B was in workshop 1 in session 1, then we add 2 so they are workshop 3 for session 2, then add 2 again so they are in workshop 5 for session 3, keep going (wrapping around) and they are in workshop 2 for session 4 and workshop 4 for session 5. Do this for all the second groups (B, E, H, K and N).
For the third group in each triplet (C, F, I, L, O), add 3 on each time. So group C has the following workshops: 1, then 4, then 2, then 5, then 3.
Do this for all of those third groups.
This is what you get:
Now, at this point you have a problem: Your FRIENDS group have been split into two groups of 15. But ooh! Look! You did the same thing for A1 to O1 as you did for A2 to O2!
A1 and A2 are always in the same workshops at the same time. So you can move A2 into the same workshop as A1, and swap another group out into the spot left blank by A2.
I then renamed both A1 and A2 to “FRIENDS”, and that’s how I arrived at the spreadsheet pasted above.
POST SCRIPT (also see separate maths problem here):
Since I came up with this, I’ve realised I can improve it quite a lot so that people don’t have to stay with exactly the same 15 people:
The “FRIENDS” group mess it up a bit, but in most cases you’ll have parallel groups moving independently in duplicate workshops, eg when group C1 is doing workshop 3, group C2 will be doing the other duplicate of workshop 3.
Well… If each group of 15 was split into two sub groups of 7 and 8, then they could shuffle around and meet each other. Also this could be done dynamically.
So, for instance, you have C1a (7 people), C1b (8 people), C2a (7 people) and C2b (8 people).
In session 2, swap C1b and C2b so now only 7 (or 8) people stay together. Do the same with all groups (except those affected by the “FRIENDS” group).
In session 3, put C1a with C2a and C1b with C2b. This adds up to 14 and 16, so somebody will have to switch sub groups. In sessions 4 and 5 you could split them again, but this time randomly. As long as all the Cs are doing the same workshops, you can split them how you like. Ultimately they will all meet each other so they’ll still have repetition with 15 people overall, but it won’t be the same 15 people in every workshop. And they’ll still get 30 brand new people in every workshop.
I think this would make a great code kata, although I found a solution using pen and paper. It is a real problem, which my friend is genuinely trying to solve.
She is organising a conference. She has been given the following very interesting, and non-negotiable, requirements:
There will be one day of workshops. There are 5 sessions and 5 workshops. There will be 45 people in each workshop. There are 450 conference attendees, so each workshop is duplicated in each session. That is to say, during each session there will be ten actual workshops, but only five distinct workshops. So workshop 1 is happening simultaneously in two different rooms, as are the rest of them.
The aim is for participants to meet as many new people as possible. So for each attendee, we want to minimise the number of people in each workshop that they have met in a previous workshop.
We also want every attendee to attend every workshop.
And there is one more special requirement: There is a group of 30 attendees, we’ll call them FRIENDS, that must be kept together at all times. So in their workshops, there will be a rotating number of 15 extra people. They must meet a different 15 people in every workshop.
I have a solution for this, but I’ll post it separately in case you want to try it for yourself.
Here is a diagram that might help you to visualise the problem:
There is an associated maths problem about the permutations and probabilities involved, here.