about:drewcsillag

Feb 20, 2021 - 11 minute read - programming

Lies You Find In Code

When working in a code base that’s been around for a while, you will find lies in the code. These lies are usually not put there by nefarious actors, so there’s no conspiracy, or secret cabal. But either by engineers who don’t realize what they’ve done, or more commonly by code evolution, these things happen. 

These lies are the unaccounted time sucks when working in a code base, as people working on it have to dig deeper, search for things that don’t exist, maintain code that doesn’t need to exist, or worse make incorrect assumptions about the state of the world as they work through and around them. So it’s good to be aware of them and fix them when you see them.

Stale Comments

Stale comments are probably the most obvious of lies you find. They were usually true when they were written, but the code has moved on and the comments were never updated to reflect the state of the world.

The fix here is just to update or delete the offending comments. No comment is usually better than a wrong one, but fixing it is better. There may be important historical info in the comment even if what it says isn’t true, in which case retaining the historicity is important, but make sure that the comment is updated to make it clear this is no longer as true as it ought be.

Dead Code

Any code that exists implies that there’s a good reason for that code to be there. This one costs you because you make the assumption that this code needs to be understood, maintained, tests may cover the dead code which themselves need to be maintained, and people might assume that this code might be a valid pattern to follow and be cargo-culted from.

The worst of dead code is when it’s code that’s involved in plumbing as the number of places it’s referenced is pretty high - and this makes this kind of code hard to detect, but very rewarding to clean up. If you’re not sure about the longevity of plumbing, you might want to reconsider either documenting this somewhere, or finding another way of doing what you’re trying to do by adding the plumbing.

The fix is to delete the dead code. Often deleting dead code will reveal other code that can be killed off. Some of my favorite diffs were ones where the line count was a largish negative number.

Related to dead code are unused arguments. They imply that they’re used somewhere that your readers will go looking for. The fix is to name them something which makes it obvious they’re not used. Some languages allow you to use bare underscore for such cases. Unused variables are very closely related to these. If you’re not going to use it, don’t assign it. A number of tools nowadays pick up on this one, which is good. There have been a few odd cases I’ve encountered where assignment was necessary even if not used (I forget exactly why), but in such a case, leave a comment and name the variable to indicate it’s unused (e.g. unused_but_required_because_of_stupid_foo).

Another close relative to dead code are unused HTTP/rpc endpoints. If you have your code in multiple repos, it can be even more difficult to track these down. Cross language source code search tools (such as SourceGraph, LiveGrep and Hound) can definitely help, but because there are so many ways to call HTTP endpoints, these tools can suggest that they’re not used, but it won’t be definitive. Additionally, not everything that calls your endpoints is necessarily in your source control. Webhooks, integrations, and depending on your use case, other companies, may be calling them and they won’t show up in any of your repos.

One nearly definitive way you can know is by having metrics on your endpoints so you can see whether they’re called or not. But if you have legitimate endpoints that just aren’t called very often, so you’ll have to exercise care before pulling the plug on them.

Speculative Generality

When you encounter code that was written speculatively general, it has the presumption that there is more than one use case for it, which can cause readers to go on a fruitless search for them. Now you have something which not only has that false implication, but is always more complicated than it would be if it were specialized to the use case that actually exists. There is an argument for leaving space for future work, but that’s different than pre-generalizing the code.

Closely related to speculative generality are interfaces with only one implementation. An interface implies that there is more than one implementation, or else why would you bother with an interface, rather than just use the single implementation directly? Some frameworks (I’m thinking Java here) may require these for some cases, but far less now than in the past.

The fix is to delete the interface and rename the implementation class to the name of the interface you just deleted.

Lies About Mutability

When you work in a language that has ways to specify that things can be const, final, and so on, not specifying them, or in rust, specifying mut or using a Var in Clojure indicating you want mutability when you don’t need it lies to your reader that you plan to, or need to allow for mutations of those variables. 

Yes, there are times when being const-correct isn’t worth the trouble, but when practical, it’s better to be “mutability stating” correct. 

The fix is to use mutability modifiers correctly. Even better if you try to make your code more immutable in general. If you do, when you follow this rule consistently in a language where you specify const or final, when you see something without it, it then sticks out.

Bad Affordances

This is where code implies you can use it in ways you actually can’t, with varying forms of can’t - starting with: it won’t actually compile, to you won’t discover it until runtime, at the worst possible moment, and it will destroy all you care about. The classical IRL bad affordance is known as the “Norman Door” where the door has a pull handle on the push side of the door. One of the most common ways I see bad affordances in code is evolution of code that has had the DRY principle applied, and then not unapplied as the special cases showed up. You have a function with some common code, then parameters are added to turn some of it off or to turn on some extra bits, and then later you find out that only come combinations of those parameters work at all.

The solution is the refactor the function into the multiple functions yearning to break free.

Ambiguous or Confusing Boolean Arguments

These aren’t necessarily lies in code, but they can be very confusing. These are the boolean arguments to a function where true and false don’t quite fit. When you need to use this function, sometimes you’ll need to read the code to figure out what the booleans actually mean. Alternatively, the argument is named something like dont_do_a_thing where you have to think out the double negative for cases where false is passed. 

No matter which case you start with, it’s clear there are two options for a thing, but here, an enumeration type with clearly named values would be a whole lot better. Or at least get rid of the double negative, if that’s the case. Adding an enumeration handles evolution much better, especially when a third option becomes necessary. 

I’ve seen when an argument started as a boolean argument, instead of making the enumerated type, another boolean argument is added to refine the first argument, which then leads to a bad affordance problem. Here, one of the simple fixes is to create a well named enumerated type and values to indicate what you actually want the argument to do. The other benefit in using enumerated types is that during refactoring, or code spelunking, the callers don’t have to remember what true and false. Consider this:

someVar = doAThing(connection, true, false);

vs.

someVar = doAThing(connection, ThingCommit.COMMIT_ON_COMPLETE, ThingNotify.TO_SLACK); 

With the first, you want the function definition (at least!) to know what true and false mean, with the second, you don’t.

An alternate mechanism to fix these is to use function arguments instead of the enumerated type, where the function does the thing your enum is being used to select.

Bad Naming

It is said that the two hardest problems in computer science are:

  1. naming
  2. cache invali-CONCURRENCY!-dation
  3. off by one errors

So it’s not surprising really that naming finds its way into this list. Sometimes, it’s just that a name has a bunch of meanings, and readers might reasonably pick a one that’s different than what the code means. In such cases, just picking a less overloaded name can go a long way. Sometimes it’s just that the code evolved to a point where the name just isn’t true at all. In such cases, a rename will save future you a bunch of trouble. Fortunately, we live in a time where tooling to rename things in most common languages is available.

The overloaded meaning problem has been a common problem I’ve seen with service names in SOAs where people assume different sets of functions that a service provides based on different meanings and scope of its name. The solution I give for service names is to just pick an uncommon foreign word, or make one up, and you then you get to assign the desired meaning to it. Dead languages like classic Greek dialects or Latin work well, but others can work too.

Variables Assigned Too Early

This is one of the more innocent lies that I encounter, and usually pretty easy to fix. When you assign a variable too soon, it implies it’s needed sooner than it is, and may cause your reader to look for it and not find it. Code is often not initially written this way, but it’s more due to code evolution. The code that follows the assignment either gets deleted or more code get put in the way, but the assignment never gets relocated. The easy fix is to move the assignment closer to where it’s used.

Now there are occasions due to concurrency or side-effects where this is something you have to do. In such cases, a comment can point this out to your reader, so they’re not left wondering why things are the way they are.

Importing * When You Only Need Or Two Things

I’m not a big fan of importing * in general, but especially when you’re only using one or two things. A few reasons being, needing to figure out where things come from. It’s all fine and good that your IDE can click through the definition, but better if you don’t need to because you can see clearly where things come from. But the other part is that importing * implies that you’re going to be using many items from the module/package/whatever, and often enough so that you don’t want to have the namespace-qualified name. Thus, you have to think harder when you see names “are they from the * import?” when you shouldn’t have needed to.

Flaky Tests

This is more of a first derivative lie in code, but it’s a test that indicates that something is busted when it isn’t. These are the bane of engineers everywhere as these tests will often break submission of code into version control where tests must pass first. The submitter has to look at things and wonder, did my code break this? And the answer is usually no. 

So not only do you pay the price of the time required to submit a second time, but the time and mental load to evaluate whether the test break was because of their change or not. This is especially true when it’s an unfamiliar code base that you don’t contribute to often enough to know that certain tests tend to flake. When you have flaky tests and they legitimately break, then you usually waste more than a second submission, as my experience is that you submit it a second time, and maybe you believe the failure is legitimate, but sometimes not until the third submission. These are the tests that cry wolf.

Flaky tests are harder to fix, but the clock time saved alone is usually worth it, not to mention the developer frustration, confusion, and CI resources.

Help Out Future You

Consider it enlightened self-interest, but when you see lies in code, fix them. The code will be clearer, smaller, and easier to maintain. The time to fix many of the lies you encounter is not very much, and the payback is almost always worth it.