Refactoring Serial Calls To Leaf Calls

When dealing with the problem of a long method, the simplest thing you can usually do is to break it into it’s conceptual pieces and have a series of tail called functions that call the next chunk of the computation.

So given this:

private Thing methodThatIsTooLong(TypeThing arg1, OtherType arg2) {
   // chunk1
   // chunk2
   // ....
   // chunkN
}

You get:

private Thing methodThatWasTooLong(TypeThing arg1, OtherType arg2) {
  //chunk1
  IntermediateType intermediateResult = ....
  RandomType soOnAndSoForth = ....
  return doChunk2(arg1, intermediateResult, soOnAndSoForth);
}

private Thing doChunk2(IntermediateType intermediateResult,
                       OtherIntermediate soOnAndSoForth) {
  //chunk2
  // create whatever here...
  return doChunk3(intermediateResult, soOnAndSoForth, whatever);
}
// ....

The two major downsides to breaking down a large method this way are:

The code can be a bear to test as you need fully fleshed out TypeThings and OtherTypes and need to check vast details of Things. As we know, often in long methods, small perturbations in inputs can cause very unexpected changes in the outputs, so often, tests will miss things it shouldn’t. You can test bottom-up (from chunkN to chunk1), but it will still be hard to ensure you’ve covered the cases you care about.
The code, while easier to understand than methodThatIsTooLong, is still not easy to understand in isolation. That is, you still need to understand the whole flow, to know what the pieces are doing.
Data values that doChunk3 and so on need have to be passed through doChunk2, even though it could not care less and makes the functions where values are passed through more difficult to understand as you have to realize that these items are pass-through and not actually used by the function itself.

How else to break up a function like this? It’s better to break these into self contained functions with a top level function that coordinates them. That would look more like this:

private Thing methodThatWasTooLong(TypeThing arg1, OtherType arg2) {
  IntermediateType intermediate = doChunk1(arg1, arg2);
  NextIntermediate nextIntermediate = doChunk2(arg1, intermediate);
  // ...
  return doChunkN(nthIntermediate);
}

This has several strong advantages:

The doChunk methods can now be tested independently, as well as their intermediate values.
Because of this, the testing surface of the whole is much smaller, as the input and output values are now more tightly constrained.
While you may have to define intermediate value POJOs (and for the love of everything good don’t use Pair) the intermediate values are now well defined, and simplifies understanding of the code.
In languages with multiple return values, the extra return types for the intermediate values can be skipped which can make it easier to do in the first place, even though real intermediate objects in the end do make for more readable code in the end.
The only arguments needing to be passed to these functions are the data items needed by it’s computation, not all the data the rest of the whole computation needs.
It tends to reveal more opportunities for further refactorings, limiting mutability, rainbows and unicorns.

So in short, refactoring to leaf methods rather than chained tail calls is a good thing.

Happy Refactoring!

See here for other posts on refactoring.