What is tech debt?
First of all, it has to hurt for it to be a legitimate tech debt. It has to hurt your customers and your business, whether that's directly or indirectly. Trying to polish an area of your code that's not causing problems might be premature optimisation and the effort will probably be better spent elsewhere.
Your customers can be directly impacted by a poor user experience, e.g., by missing functionality that they so badly want, the service being slow to respond, or some bug that prevents them from doing whatever it is they want to do. They may also be impacted indirectly if the outstanding tech debt slows developers down. This would mean that new features are not delivered as quickly as they could be, preventing your customers from experiencing the benefits of those features and reducing your company's competitive advantage.
How to prioritise your tech debt?
By first tackling the issues that actually result in wasted effort, your team will be in a place where they can integrate and release their code more quickly. This in turn will allow them to get much faster feedback on their work, which often leads to further reduction in waste as course correction is performed more frequently and the likelihood of building unnecessary functionality is reduced.
When changing things, I always try to ask myself, "Why am I doing this? Would my time be better spent elsewhere?" My thinking when reviewing other people's code is similar, i.e., if I don't have a strong argument why something might cause pain and needs to change, then it's most likely that it doesn't have to change, and I won't try to slow them down by engaging in nitpicking.
Pampering your code or spending a lot of time swapping an existing framework or library with the latest and shiniest one is not necessarily a legitimate tech debt if at the same time you have slow or flaky tests, tests that are hard to write/read, missing automation, or anything else that hampers your effectiveness as a developer.
What causes tech debt?
Generally, I've observed tech debt accumulating in the following situations:
We as a team did things in a less-than-ideal way in favour of speed; this could be due to urgent fixes that needed to be released or speed-to-market requirements.
We've done things in a way that eventually turns out or becomes non-ideal; inevitably, services evolve, and what might've looked like a good idea at the time eventually becomes the area of the code that nobody wants to touch with a barge pole.
As you can see, "I won't write tests because we're in a rush" is not a legitimate reason for tech debt to accumulate; that's just a very bad practice, and you should avoid it at all costs.
How much does it cost?
It's important to understand that when your team decides to deliberately incur tech debt in favour of releasing a feature, you need to think about the fact that this tech debt might potentially slow down other features or lead to bugs, and those might be really, really urgent. So as a team, you have to be honest with each other about what is really urgent and what can indeed wait. Don't prioritise tickets just because someone has an itch to try out a new framework or work in an area that's fancy if you have tech debt piled up that should be taken care of first and that can improve your team's effectiveness.
The silver bullet for managing tech debt
First off, you have to track your tech debt and make it visible. If it's just a TODO comment somewhere in the code or a Slack thread, it's less likely to get fixed. Making it visible makes the whole team accountable, not just the people who know about it.
Continuous refactoring is the only way to manage tech debt, and that's impossible without having good test coverage. In a distributed services setup, it's also important to have contracts between your services to ensure compatibility is preserved as things change.
Having a good test coverage allows you to improve your code and make changes with more confidence, as the chances of breaking your production environment are reduced. The only way to achieve and maintain quality is by ensuring that it is safe to make changes. It's that simple! If you have inadequate test coverage, developers won't have the necessary confidence to make changes and improve the codebase. They'll be continuously patching the system as they implement new tasks, and things will only be getting worse.
Don't compromise on writing tests when developing features, thinking this will allow you to deliver faster. On the contrary, following TDD will allow you to build only the minimum required functionality and have it covered by tests at the same time, thus empowering engineers to be continuously improving the quality and design of the code as they continue working on it.
Comments