Using Machine Learning to Measure and Manage Technical Debt – The New Stack

Ori Saporta

Ori is a co-founder of vFunction and acts as a system architect. Prior to founding vFunction, Ori was WatchDox’s lead systems architect until its acquisition by Blackberry, where he continued to serve in the role of preeminent systems architect. Before that, Ori was a systems architect at the Israeli intelligence agency Unit 8200. Ori holds a Bachelor of Science degree in computer engineering and a master’s degree in computer science, both from Tel-Aviv University.

If you’re a software developer, “technical debt” is probably a term you’re familiar with. Technical debt, in plain terms, is an accumulation of many small compromises that hinder your coding efforts. Sometimes you (or your manager) choose to tackle these challenges “next time” because of the urgency of the current release.

This is a cycle that continues for many organizations until a real breaking point or crisis occurs. If software teams decide to tackle the technical debt right away, these brave software engineers may find that the situation has become so complex that they don’t know where to start.

The difficult part is that decisions we make regarding technical debt must balance the short-term and long-term consequences of building such debt, emphasizing the need to properly assess and address them when planning. of development cycles.

The real-world implications of this can be seen in a recent study from 250 senior IT professionals, in which 97% predicted the organization would fall back on app modernization projects, where “risk” was the primary concern of executives and architects alike. For architects, we can think of this as “technical risk” – the threat that making changes to one part of an application will have unpredictable and unwanted downstream effects elsewhere.

The science behind measuring technical debt

In their groundbreaking 2012 article “Looking for a Measure of Architectural Technical Debt Management”, authors Robert L. Nord, Ipek Ozkaya, Philippe Kruchten, and Marco Gonzalez-Rojas provide a metric to measure technical debt based on dependencies between architectural elements. They use this method to show how an organization should plan development cycles, taking into account the effect that accumulating technical debt will have on the total resources required for each subsequent version that is released.

Although this article was published nearly 10 years ago, its relevance today is hard to overestimate. Earlier this March it was received the “Most Influential Paper” award at the 19th IEEE International Conference on Software Architecture.

In this post, we show that not only is technical debt essential when making decisions about a specific application, but also important when prioritizing work across multiple applications, especially modernization work.

In addition, we will demonstrate a method that can be used not only to compare the performance of different design paths for a single application, but also to compare the technical debt levels of multiple applications at any point in their development lifecycle.

Accurately measure system-wide technical debt

In the IEEE article mentioned above, the technical debt is calculated using a formula that mainly depends on the dependencies between architectural elements in the given application. It is worth noting that the article does not define what an architectural element is or how to identify those elements when approaching an application.

We took a similar approach and developed a method to measure the technical debt of an application based on the interclass dependency graph. The dependency graph is a directional graph G=V, E, where V=c1, c2, … is the set of all classes in the application and an edge e=⟨c1, c2⟩E exists between two vertices if class c1 depends on class c2 in the original code. We perform a multifaceted analysis on the graph to ultimately arrive at a score that describes the technical debt of the application. Here are some of the stats we get from the raw chart:

  1. Mean/median out degree of the vertices in the graph.
  2. Top N outer degree of a node in the graph.
  3. Longest paths between classes.

Using standard graph clustering algorithms, we can identify class communities in the graph and measure additional metrics about them, such as:

  1. Average out rate of the identified communities.
  2. Longest paths between communities.

The hypothesis here is that by using these generic metrics on the dependency graphs, we can identify architectural issues that represent real technical debt in the original code base. Moreover, by analyzing the dependencies on these two levels – class and community – we give a broad interpretation of what an architectural element is in practice without trying to define it formally.

To test this method, we created a dataset of more than 50 applications from different domains – financial services, e-commerce, automotive and others – and extracted the aforementioned statistics. We used this dataset in two ways.

First, we correlated specific high-level instances of outdegrees and long paths with local issues in the code. For example, identifying god classes due to their high degree of exteriority. This proved efficient and increased our confidence that this approach is valid in identifying local technical debt issues.

Second, we attempted to provide a high-level score that can be used not only to identify technical debt in a single application, but also to compare technical debt between applications and use it to prioritize to be addressed and how. To do that, we introduced three indexes:

  1. Complexity — represents the effort required to add new features to the software.
  2. Risk — represents the potential risk that adding new features has on the stability of existing ones.
  3. Total debt — represents the total amount of extra work required when trying to add new features.

From graph theory to actionable insights

We manually analyzed the applications in our dataset, drawing on the expert knowledge of the individual architects and developers responsible for product development, and scored the complexity, risk, and overall blame of each application on a scale of 1 to 5, where a score of 1 represents little effort required and 5 represents high effort. We used these benchmarks to train a machine learning model that correlates the values ​​of the extracted metrics with the indexes and normalizes them to a score from 0 to 100.

This allows us to use this ML model to provide a score per index for every new application we encounter, allowing us to analyze and compare entire portfolios of applications against each other and against our pre-calculated benchmarks. The following chart shows an example of 21 applications demonstrating the relationship between the above statistics:

Source: vFunction, Inc. 2022

The total debt levels were then converted to currency units, which reflect the level of investment required to add new functionality to the system. For every $1 invested in application development and innovation, for example, how much goes specifically toward servicing architectural engineering debt? This is intended to help organizations build a business case for handling and removing architectural technical debt from their applications.

We have shown a method to measure the technical debt of applications based on the dependencies between the classes. We have successfully used this method to both identify local issues causing technical debt and provide a global score that can be compared between applications. By adopting this method, organizations can successfully assess the technical debt in their software, which can lead to improved decision making around it.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Unit.

Feature image through Pixabay

Leave a Comment

Your email address will not be published.