
There are a lot of big numbers thrown around about the impact of generative AI on software development lifecycle (SDLC). Studies showing that it can double developer productivity, increase the volume of code produced by ⅔, halving the time required to ship a feature.
In our whitepaper our headline figure was more conservative. The increase we see in end to end productivity across the SDLC is between 10-20%.
Why is there such a large discrepancy between these numbers, and why is this so confusing? I don’t have any quibbles with the methodologies of these studies. Instead, we’re seeing two of my favourite statistical pitfalls when trying to scientifically measure complex systems.
When tasked with measuring performance, our natural instinct is to look at whatever data is closest at hand. In statistics, this observational bias is known as:

This is perfectly illustrated by a classic parable:
A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, “this is where the light is”.
This is exactly what happens when we evaluate GenAI in software development. The most explosive numbers dominating the headlines – lines of code produced, tokens consumed, tests created, or hours saved on isolated tasks – are simply the easiest data points to focus on. They are the numbers we pour over first.
But convenience doesn’t equal relevance.
These accessible metrics fail to illuminate what actually matters. They tell us nothing about code quality, whether the architecture is maintainable, or if the software actually solves a valuable business problem.
At best, these surface-level metrics confirm that activity is happening. At worst, they become dangerous vanity metrics used to guide major business decisions.
Recommendation: Look at a wider suite of metrics and think about what each of them tells you. Even if some of them are harder to get hold of, or they’re less precise measurements.

To understand a complex system like the SDLC, it helps to frame it using Amdahl’s Law. Gene Amdahl originally came up with it in the 1960s to evaluate parallel computing in order to speed up computational tasks. The law can be roughly stated as:
The overall performance improvement gained by optimising a single part of a system is limited by the fraction of time that the improved part is actually used.
If you optimise a step within the process that isn’t a major time-sink, your overall timeline won’t budge. Piling effort into optimising code generation yields negligible end-to-end gains if the real bottleneck lies elsewhere in the chain.
This aligns with a foundational concept taught to everyone at Skyscanner (one of my previous employers) through reading The Goal as homework. While the book’s narrative style has aged a bit, its core focus “the theory of constraints” remains vital. This theory is all about identifying where the bottlenecks are in your process, constraining a system’s maximum throughput.
To make the entire system faster, you have to address that bottleneck by disrupting the end-to-end process.
Ignoring these constraints compounds the Streetlight Effect. As real SDLC bottlenecks are notoriously difficult to see, we often avoid measuring them altogether. Framed by these questions:
If your answers involve endless meetings where people talk at each other, or piles of unread documentation, then that’s where your bottleneck is.
Generative AI can be a boon in this space as well, simplifying how information flows around an organisation. Don’t let the fact that these systemic improvements are harder to measure deter you from pursuing them.
Look at the end to end process, as far back as you can and as far forward as you can. From idea to production. Get an understanding of where the bottlenecks are in the process, and put your effort into disrupting those parts.

By using Value Stream Mapping, you’re able to visualise your entire pipeline, you can locate exactly where throughput is restricted and focus your energy on disrupting those friction points. Crucially, that disruption may or may not use AI.
This is exactly how we approach engineering efficiency at Waracle. Rather than deploying GenAI tools in isolation, we use value stream mapping to diagnose where an organisation’s development pipeline is blocked. This practical assessment ensures that optimisation efforts are targeted where they will actually improve end-to-end delivery.
So, if the huge numbers bandied around are often due to statistical misunderstanding, and the actual figure is likely somewhere between 10% and 20%, isn’t that a bit of a disappointment?
I don’t think so.
This is a fairly frugal figure in many respects. One that I feel comfortable basing actual budget decisions on. It is a 10-20% increase in the entirety of what you do that relates to software development. If you had a magic button to press that gave everyone in your technology functions an extra day per week, you would.
But there’s a catch. That 10-20% is neither a silver bullet nor a given. You do not unlock it simply by purchasing licenses, handing them out, and waiting for the magic to happen. To actually realise these gains, you have to do the hard work of critically examining both the data you are collecting and the process you are introducing AI into. If you inject GenAI into a broken, unmapped workflow, you will simply generate technical debt or misaligned features at a faster rate. AI cannot fix a broken system.
Achieving that meaningful 10-20% lift requires moving past convenient vanity metrics and looking deeply at how value actually moves through your pipeline. Only by auditing your current data collection practices and optimising your true process constraints can you ensure that AI is being applied where it will actually make a positive impact.



