My Adventures With Legacy & Spaghetti Code
One thorn in the side of many a developer is having to deal with poorly written or maintained code, like legacy code or spaghetti code.
Legacy code is the unwieldy and outdated code in your codebase that is in dire need of refactoring, but that is also too big and complex to just replace. Often a result of favoring “get it done now” over “get it done right” too many times, the code base decays into something that just becomes harder and harder to work with.
Spaghetti code, meanwhile, is code that is unstructured and difficult to maintain, with complex and tangled systems that (largely) ignore best coding practices like modularity and separation of concerns. As a result, only the original developer(s) may still understand how it works. It is not the same as legacy code, though the two are not mutually exclusive.
Here is an account of some of my adventures and lessons from working with legacy and/or spaghetti code over the years.
A Spaghetti GUI
I was tasked with adding new functionality to an existing desktop application. The application, written in Java with a Swing-based UI, was used for exploring different options and configurations for designing electronic and embedded systems.
The application was the result of a one-person project. While it was impressive in its functionality, it was made in a short amount of time and with a lot of (premature) optimization. To this day, it remains the biggest example of “spaghetti code” I’ve ever worked with.
Rather than having a list of objects, side-by-side arrays were used. Instead of
[
{ question: "What", answer: "Eiffel Tower" },
{ question: "Where", answer: "Paris" }
]there was
{
questions: [ "What", "Where"],
answers: [ "Eiffel Tower", "Paris" ]
}Indexes were used to refer to positions in the side-by-side arrays. When items in the array were added/moved/removed, all indexes were updated across the codebase. Yet at the same time, this update happened in-line, wherever the items were modified. As you can imagine, there was a lot of code duplication.
There were also lots of short non-descriptive names for classes and variables, and code documentation was all but missing. Separation of concerns was completely missing. Most of the logic was placed in 2-3 classes, each of which were thousands of lines in length.
At first, I tried to separate the logic into their own systems, both to clean up the code and to gain a better understanding of it. Unfortunately, it was too tangled up to be separated, and I had to do a complete rewrite. The source code wasn’t even useful as reference material. So I simply ran the application to observe and copy its behavior, after which I could finally add my own features.
The Old Noodles Library
The next example is a mix of both legacy and spaghetti code. It was a client library that, essentially, wrapped another library whilst providing custom functionality. Providing its own UI, adding commonly-used features, using a specific (secondary) back-end for said features and things of that nature.
This library was originally written in a hurry and likely by a too-small team, several years before it landed on my plate. By that point, however, the project had an almost-perfect case of “The One Developer”.
The One Developer was the only one who knew the system inside and out, and could communicate with their manager/PO without the need for writing important stuff down. Usually, a ticket on the Scrum board would have a name like “Fix API change” or “New submit button”, with little to no extra information or discussion deemed necessary. Any other developer wanting to contribute would have to ask The One Developer for the details on each ticket.
The One Developer was, of course, rarely available to answer questions or help with on-boarding, because they were always busy on something important. They didn’t seem to see the spaghetti-like nature of the codebase, and would be quick to look down on other developers for either asking too many questions (most of which went unanswered) or struggling with the project in general.
Even though they were quite capable, this One Developer had simply spent too much time being the only (real) maintainer. Any new developer added would never really be their equal, so there was no basis for constructive feedback. This led to an increasing pile of spaghettified code that no one could really fix.
The library was roughly structured for proper separation of concerns. But that aspect had eroded over years of “quick fixes” and “it makes sense to me”-coding practices. Instead of discovering where the concerns could best be separated and adapting to that, the old separation was still kept but constantly violated. As a result, concerns were mixed all over the project: UI components contained network logic, business logic had UI logic etc.
Data was often explicitly passed through many unrelated intermediate classes to their destination. Sometimes, showing a piece of text in the UI meant passing it from network to business logic, to other network logic, to an unrelated UI component, then to a mostly-but-not-entirely-unrelated “manager” class and then to the actual UI logic that rendered the text. And with the way the system was built, that would’ve been the shortest route.
Unfortunately, the library was used by ourselves as well as by third parties. Which third parties? We don’t really know. What do they use, exactly? Hard to tell. There was no clearly-defined API, and many internal details were exposed publicly.
Add to that the fact that there were numerous moving parts. Some things were specifically written to allow for special cases that we never used in-house, but that often stood in the way of fixing bugs or adding new functionality. We couldn’t change any of it without risking breaking it for someone else.
Eventually, The One Developer left, and ownership of the code went entirely to us “secondary” developers. We rewrote the whole system from the ground up. We used a more modular design that we adapted to changes as we went along. We tracked down some of the third parties that we knew used the old version and asked them for input. We included documentation and a user guide.
And most importantly, we refrained from having just One Developer work on it. Most of the important architectural decisions were made by two or more developers with diverse skills.
A Legacy Backend
Thankfully, this last example isn’t nearly as bad. This Java-based back-end system served quite a few people, but it started off as a side project. Since it was put together by skilled developers, it worked well. And since there wasn’t much chance that it’d be worked on by others, strict adherence to certain best-practices weren’t a priority, at least at first.
The side project grew in importance, however, and I was one of the lucky developers who got to work with the code base.
There was good use of some best practices, like a preference for clear names over short ones and good use of object-oriented modularity. However, the encapsulation of concerns wasn’t very effective in the end. It primarily used mutable data structures, had a lot of in-line logic to take care of other classes’ problems and underutilized dependency injection.
One prime example of typical “legacy code” was the existence of “unclear functions”. These functions would have a name that is almost-but-not-quite-descriptive of its contents, or they’d have undocumented side effects that are still needed to make things work. It made unit testing nearly impossible.
Other “legacy code” examples were the many quirks. Oddly written or structured logic where you can’t tell if it’s a bug, a deliberate choice to achieve a real goal or some no-longer-needed fix. Thanks to the difference between test and production environments, you couldn’t be sure that fixing the quirks wouldn’t cause problems. At least, not until you tried it out in production.
Refactoring or rewriting the system wasn’t really possible either, because the code still grew into what it was for a reason. Because there was little documentation (aside from the code itself), rewriting it would not only mean having to distill the requirements again, but also re-discovering all the problems that the code’s quirks were fixes for.
Plus, it would have to be worthwhile. If the project doesn’t actually have the potential for paying itself back, it’s not worth spending the time and money to refactor.
Ultimately, I learned to deal with the code as-is. Of course, changes and new additions would not have to follow the legacy-style patterns, but the old code became familiar enough to me that I could work with it without breaking anything important.
What I learned
Despite the annoyances, the experience of working with legacy and spaghetti code has been valuable.
One thing I’ve learned is that code smell doesn’t always come from developer incompetence, but from the context in which the code was written.
Single developers taking on projects by themselves with little to no involvement of other developers or viewpoints can lead to hard-to-read code, and can cause the developer to become overprotective about “their” work.
Also, skipping documentation and code encapsulation in favor of “quick-fixes” and reaching deadlines can compound tech debt if left unchecked. The same counts for having a project start out too “small” or “secondary” to warrant proper documentation, and having it grow into something more important anyway.
My advice
My first piece of advice is to avoid sending developers in alone when making something new, or on long-term maintenance on a still-growing project. In isolation, even really good developers can end up making something only they can understand, and that they may not be able to look at objectively once the team grows.
Secondly, always enforce a minimum standard of documentation and adherence to coding conventions, even on “small” side projects. You want new people to be able to pick it up at any time, especially once you start relying on the project. Of course, you’ll need to decide on a standard that strikes a good balance between cost and result, but once it’s set, you should enforce it strictly.
And finally, if you’re not documenting the code itself as a project grows, at least document the requirements. This is something that’s often already done with multi-platform projects (when building Android and iOS apps in parallel, for example), but it’s worth doing in other cases as well.