Avoiding Code Duplication
The biggest source of bugs in code in EECS 281, in my experience, is a lack of
clean code. “Hey, if my code works, it works — there’s no need to get all
fussy about how I write it,” you might say. It’s true that you can get full
points on the autograder with all of your code shoved in main
, but do you ever
wonder about how those ‘genius’ programmers get done so quickly? They somehow
seem to get it right the first time they write their code.
This guide is aimed at drilling just one critical idea into your head which I guarantee will make you a better programmer: never duplicate code. One of the best habits you can form is to keep looking for duplicated code and deduplicating it.
Why you should care
What’s the problem with duplicating code? In short, it’s a maintenance nightmare: it’s hard to both fix and build upon your code. In EECS 281, there will be a lot of both, so it’s important that your code is accessible.
When you duplicate code, you have a high risk of applying changes in one area but not its twin, leading to nefarious bugs. Sometimes you have two sections which are identical but shouldn’t have been. These risks can be entirely avoided by avoiding the duplication in the first place!
When you don’t consolidate your code, the control flow of your program is hard
to understand: when you branch, do you always run this block of code at the end
of both the if
branch and the else
branch? Or is it slightly different in
one branch? What if the different branches call out to different functions, and
each of those functions may or may not do the same thing? It becomes much harder
to reason about what your code will and won’t do.
This article isn’t an exhaustive list of all the ways that code can be duplicated. It’s just meant to help you begin to form your deduplication habits. You’ll need to constantly look over your code as you write it and ask yourself “Are there any repeated blocks of code here? Can I rewrite it so that there aren’t?”
Don’t repeat expressions
Suppose you had a point and a list of other points and you wanted to find the closest point in that list. Take a look at the following code and decide whether it’s bug-free or not:
In the closestDistance = ...
line, I’ve copied and pasted a pow
call, but I
forgot to change x
to y
. We can make it so that this bug can’t possibly
happen by defining a helper function:
If there’s a lot of code in the if
-block, I might change the distance
call
in the condition and forget that there’s a call in the body as well. To prevent
this from happening, we can introduce a temporary variable to hold the distance.
By naming the variable, it also makes the code clearer:
In practice, repeated expressions may not be as obvious as in this example. As a simple rule of thumb, if you have an expression in multiple places which has more than one term in it, you’ll want to use a helper function or temporary variable.
Don’t repeat variables
Sometimes you want to reference the same variable over and over again. In that case, use a reference for convenience.
You can see that we repeat myVector[i][j][k]
many times. If the indices are
more complicated, it’s really painful to keep typing it out. (In fact, this is
one of the already-legible cases; indices can often be much more verbose when
using a map
later in the course.)
The easiest way to avoid this duplication is to assign the element to a reference. This way, we can use and modify the variable without having to type out the whole name.
Lets say you switch to a two-dimensional vector later. It’s very easy to modify
the second version: just change the myValue
definition and your code compiles,
while in the first you have to go and change every usage site – and you might
make a subtle typo in one.
Additionally, if you know that myValue
won’t be modified, mark it as const
:
This can save you quite a bit of headache because it will prevent you from accidentally modifying the element where you didn’t mean to.
An example: handling different modes
A lot of the projects have several different input or output modes. The worst
way to handle them is to write the same exact code twice, in two different
if
-branches. Suppose we have a program that reads a list of integers and
prints them out in reverse order. Additionally, suppose that in MODE_EVEN
, we
only print the even numbers, and in MODE_ODD
, we only print the odd ones.
Here’s one way which it could be done:
You can see that we repeat a whole segment of code, and the only difference is
one innocuous little 1
versus 0
. What’s the best way to deal with this? The
first step is to identify which parts of the code are the exact same and which
aren’t. For example, the input routine is exactly the same for both. Let’s move
it out of the if
-block:
Now half of the repeated code is deduplicated. But the iteration over numbers
is still repeated. Let’s hoist the for
-loop out of the if
-block:
Now we don’t have the error-prone for
-loop repeated, but we still have most of
the modulus check and the entire output routine duplicated. Since the only thing
which differs is the right-hand side of the equality check, we can put that into
a variable:
Much better! Now all of the identical logic is written only once, and we’ve pulled out the differing part into a single variable.
The only thing left to do is calculate the remainder outside the for
-loop. It
doesn’t change between iterations, so it doesn’t make sense to recalculate it
every time.
When making changes to the code, we can now rest assured that changing the iteration bounds or the printing conditions will correctly update every mode.
Don’t pass around tens of parameters
If you’re coding cleanly, you’ll have many relatively small, self-contained functions that do one thing. One problem with this approach is that you might have to pass in many parameters to each of these functions so that it has all of the data it needs.
You can see that the first two parameters to both of these functions are the same. Typing out the same parameters everywhere is a headache, and you’re prone to bugs where you’ve switched around parameters that happen to have the same type.
When we have many methods that all operate on the same set of data, it’s a good idea to package them into a class:
Now all of the duplicated parameters are still accessible to each of the functions, and we can easily see what unique data each function requires. Additionally, we can’t accidentally switch around the order of parameters in a function call. As you progress through the projects, you’ll find you can save a lot of tedious typing by using classes instead of a lot of parameters.
What next?
By now, you hopefully have developed a little bit of rigor regarding duplicated code. As you write code, keep an eye out for things which could be deduplicated. (If you’re using copy-and-paste, there’s a good chance that you’re duplicating code.) If you avoid code duplication, you’ll gain a significant time advantage on the projects, since you won’t spend as much time looking for silly duplication errors – and this translates to a better final score.