I’ve spent a lot of time over the course of my career in other people’s code, and as a result, I’ve learned a lot about what kinds of things make code easy to navigate, and which things made reading and debugging code more difficult. The way I promote coding styles are reflections of those things which made things easier, and avoiding those things that make it harder.
When I get into a code base, I want to be able to easily:
- read,
- assess,
- and confidently make my change
These move fastest when you can do the first two without having to run the code. They’re also much faster when things are transparent and discoverable.
Reading Code
For point 1, for reading code, a good and consistent coding standard is needed. If it’s a language that Google uses, you could do a lot worse than using one of their coding standards docs. If there’s a standard formatting tool for your chosen language, use it. If none of those applies, you’ll have a lot of grief as rarely will a standard will make everyone happy – do it anyway. You may have to deal with the tabs vs. spaces argument. Pick one, and only one of those (spaces of course!). There will be religious arguments; may God have mercy on you.
Something that definitely aids here is manifested, or IDE assisted,
static typing. If you have
either types manifested in the
source code, or some IDE tool that can tell me what data item X
is,
figuring out what is going on can be magnitudes easier. If you have
strong inferred types, and no IDE tool, then you as the reader have
to do the type inference. This is not something that humans do
particularly well. Where there are no types defined, finding out what the
type of X
is can be even more challenging. In one case, I was porting a
python microservice to Java and ran into something named c
. By simple
inspection, I was able to divine it was some kind of data connection
thing, but I had to understand the entire microservice
before I could find the specific type it was. In Java, it would have been
a simple matter to look at the type, or in the case of an inferred lambda
argument, a hover with my IDE, rather than about two hours
grepping source code to find the root of where c
was instantiated.
Assessing
For point 2, now that you’re able to read the code more easily, especially across multiple code bases, you need to assess things based upon what you’re actually trying to do. When doing this there are two or three questions, depending on how you count, that you need answers to. Those questions are:
- Where did you come from?
- Where did you go?
- What did you do to my data?
The Cotton-eyed Joe Problem
I treat 1 and 2 as the same thing: it’s the Cotton-Eyed Joe problem from the line in the song: “where did ya come from, where did ya go?” (AFAIK, I coined the term). It applies both to execution, and by extension, data objects. For example, given a controller endpoint, how does it flow down to the lower layers of the service. Or for a given method, where is it called from? Is the method even used?
There are a number of techniques used in libraries and frameworks that make answering these questions basically impossible to know without running the code. My top five things are, for languages that support these kinds of things:
- annotations that have runtime (or even compile time) effect
- reflection
- dynamic proxies and/or things that use cglib
- inheritance
- hidden code run by frameworks
This is why I hate Spring, because it leans very heavily on all of these.
With annotations that have runtime effect, the question is: if it’s not doing what you wanted it to, can you find the code that is processing the annotation? In most cases, unless you happen to know, or that code is throwing an exception, you’re basically out of luck.
Fun story: in a microservice I was working on, I added another
constructor parameter of a type I was quite sure wasn’t already
@Bean
'd, so I wrote an @Bean
method for it. Fired up the service and
Spring complains that “duplicate bean for ….”. Hmmm, OK. I guess
it’s already provided somewhere, so I’ll delete it. Run it again, and
get “No bean provided for ….”. Wat!?!? Spent about an hour trying
to dig into how Spring does Autowiring, gave up and as something that
shouldn’t have done things any different, I put an @Qualifier
on it,
and voila! Awesome, right?
With reflection, you cannot, without running the code, and possibly even then, tell what all the callers of a method (if any) are. Somewhere in the monolith I used to work on, there’s a comment I wrote about not being able to delete a method that’s not actually needed, but it only exists because something in Spring required it to be there, which I discovered while firing it up (it compiled just fine). Suffice it to say, refactoring code that uses reflection can be more challenging.
Now throw dynamic proxies or cglib on top. With proxies, you’re now having to trace through a reflection-using layer. I’ve had issues where I was debugging code that called through a reflected object to downstream code, and digging around, it was hard to see where it would wind up, and I’d have to put breakpoints in the places I thought it might go. Cglib takes it one step further and often there’s no Java code to see, though IntelliJ often can decompile stuff, but it’s even less human readable than normal decompiled files.
You might find inheritance making things difficult surprising. Let me
ask you a concrete question I’ve had to find the answer to in migrating
stuff out of a monolith: if you have a bunch of classes that extend
an abstract class named Foo
and Foo
has a save
method, and Bar
a descendant of Foo
, and you want to find out where
<instance of Bar>.save
is called? You’d
think, no big deal right? While the IDE can tell you about all
save
calls are for all descendants, that’s not really super helpful is
it? Turns
out there’s a source code change you can make so that it can tell you,
but again, now you have to fiddle with things. This is just one
example of many that fit in this bag, it just happens to be the
simplest to explain. BTW: the source code change is to put a copy of
the save
method in Bar
. Then you can find
usages of save
just on Bar
.
Hidden code run by frameworks can make for really good fun in debugging. Some examples of what I’m talking about are things like request filters and AOP. By tracing around in your IDE, you’ll never find these. You have to know they’re there to find them. When these things misbehave, if you’re lucky, they throw an exception, but sometimes they don’t and all you have is a failing test case, or data in a column that’s not what you thought it should be. Unless you know to look for these things, you’re basically screwed.
In any case, these reasons are why I’d rather have more boilerplate than magic. This way, you can trace through everything, all in your IDE, all without running any code. This is part of the reason I originally wrote this post which provides for a fully IDE traceable dependency injection method. Is some magic ok to reduce some of the boiler plate? Probably, but I don’t want to do anything which breaks the ability to trace through everything in the IDE. The boilerplate can be generated code, which is 100% fine. Ideally, it’s checked in with the rest of your project, because you may change how it’s generated over time.
What Happened To My Data?
To answer the question about what happens to a given data object, this is why data immutability is such a life saver. Without immutability, if you pass an object to a method, what happened to that object you passed to a function as an argument? Without tracing all the way down the data path, you can’t know for sure if something changed in it. Not only that, but the fact that it was changed might not affect the function you started in, but something fairly disconnected from it, i.e. spooky action at a distance. I’ve spent more than a few days tracking down bugs in code due to this precise issue. Immutability avoids the problem entirely. If you want a changed object, you have to make a new one. Immutable data items also have the very nice property of being able to be passed between threads without having to be super paranoid.
Immutability isn’t a panacea. It generates more memory management overhead, which if you’re trying to trim machines cycles, can matter. Depending on your setup, the boiler plate to create a modified object can be problematically large. But when you can use it, it can greatly reduce the amount of code you need to drill into to see what’s going on.
Confidently Making The Change
Confidently here is the big thing. First, we’ll need tests. Somewhat recently, state based testing has become more prevalent, and for good reason. They’re basically black-box mini-integration tests. When I was at WorkMarket, they were much closer to regular integration tests we did on our microservices. In many services, we had very few unit tests but lots of state based tests. The service started fast enough that the time to run the full test suite was tolerably short.
Why go against the conventional wisdom of many unit tests and fewer integration tests? Functional and integration tests tell you the thing you actually care about when testing: does the end product actually work? Unit tests are good and right when applied to the smaller atoms that are sufficiently complex as to need it, but they don’t tell you if your service actually works. Not only that, unit tests lock in the behavior and implementation of these lower level atoms which inhibits refactoring in the large as you wind up having to fix up a bunch of tests that you’ll invariably break. Unit tests also tend to be mock heavy, and I’ve seen wayyyy too many unit tests that really only wind up effectively testing that the mocks behave the way you told them to, rather than testing anything useful.
However, we most definitely want to lock in behavior in the large. State based tests allow you to refactor the innards of your code with fewer (and in some case no) changes to the tests. Win/win really. Not only that, but the integration tests can and do find errors that are of the variety that no one would probably ever write a unit test for (at least in advance).
Can you go whole hog state-based testing on a monolith? Probably not, at least in the large.
Strongly typed functional programming is also really helpful in making changes with confidence. Functional programming I’ve found has had two main benefits:
- if the code compiles, it often just works (or very nearly so)
- easier to get higher test coverage.
As to why it works first time more often, my theory is this: when I was in a science class some time in college, one of my profs mentioned that when doing equations on non-scalar quantities, to carry the units with them as you did the math. This way, if the units of the result of the equation matched what you thought it should be, your math was usually correct. The same thing holds with functional programming: if the function produces the correct type as a result, the code is usually correct, or at least not far wrong.
As to higher test coverage, we seem to see much less branching in functionally oriented code, which makes the code much easier to cover. This may just be anecdote though.
Summary
In any case, these are some of the things that makes your coding life simpler. Apply and enjoy.