There’s a gap between expectations, experience, and metrics when it comes to AI code assistants, according to a recent study by engineering intel firm Uplevel.
Previous surveys have shown both high expectations and satisfaction with code assistants. Stack Overflow’s 2024 developer survey found 76% of respondents were already using or planned to use AI code assistants. A separate GitHub survey found nearly all developers have at least tried AI, and 73% of those in the US were optimistic it could help them better meet customer requirements. Surveys have also shown high rates of developer satisfaction with AI tools.
Uplevel took a sample of nearly 800 developers using its metrics-tracking platform, of whom around 350 were using GitHub Copilot. They then compared them to the control group on “objective metrics” like cycle time, pull request (PR) cycle time, bugs detected during review, and extended working hours.
Copilot didn’t actually help much if at all on any of these metrics, according to the study. On efficiency metrics, Uplevel found Copilot had little impact on the developers in the sample and didn’t increase coding speed. While there were statistically significant effects in some areas, Uplevel called them “inconsequential to engineering outcomes, e.g., cycle time decreased by 1.7 minutes.”
More ominously: The rate of detected bugs rose 41%. According to Matt Hoffman, product manager and data analyst at Uplevel, that’s just those bugs caught in production—there’s not enough data to determine the total impact on product defects.
“It’s obviously not a causal relationship,” Hoffman told IT Brew. “We’re looking for correlations here. We just don’t understand the why just yet.”
It’s possible that developers are accepting a code suggestion that “looks fine” on a first glance, or “overly trusting” the AI, Hoffman said, but there’s still not a lot of deep data on exactly how developers are employing the tools. The same is true of efficiency metrics, where he said it’s hard to account for the gap between expectations and reality due to variables like the rapid advancement of models, training, and what coding language is being used.
Top insights for IT pros
From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.
However, it could be related to the worst-kept secret in the development world: The average software engineer spends only a minority of their time actually writing new code. The rest is sucked up by maintenance, meetings, operations, testing, and the myriad other tasks that come with any white-collar job. Finally, AI can’t compensate for many common barriers to productivity developers face, like poor project management.
“There’s lots of factors that go into the overall cycle time, or how often you’re able to release,” Hoffman said. “If I’m going to get unclear requirements from a product manager, GenAI is not going to help me understand those requirements. That’s not going to get me unblocked if I’m waiting on somebody who’s in a different time zone.”
Copilot also didn’t help mitigate burnout, Uplevel found. The survey showed extended working hours decreased for all developers across the duration of the survey. However, it fell 17% for those with Copilot access, compared to 28% for those that didn’t.
Hoffman’s advice: “It is not going to be a one-size-fits-all. You can’t just throw GenAI at your engineers and then they’ll suddenly be faster and more productive.”
While GenAI tools could be transformative, Hoffman recommended approaching them with an experimental mindset. He also said to approach productivity concerns holistically, as AI won’t solve external roadblocks.
“Talk to your teams, see what you’re feeling like are the root causes of what’s going on, what’s slowing you down, what are the bottlenecks you’re feeling,” Hoffman said.