Skip to main content
IT Operations

AI is adding to, not lifting, burden for SRE professionals: report

“Everybody wants to jump on the bandwagon of automation and AI, and therefore there is more going back and fixing things,” one expert tells IT Brew.

An AI robot robot sitting side by side with a businessman at an office desk working

Amelia Kinsinger

3 min read

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

AI is helping site reliability engineers do their jobs, but the reality of working in the sector continues to expand their frustration.

A new site reliability engineering (SRE) report from Catchpoint involving the feedback from more than 300 professionals in the field found that although the industry is experiencing an increase in AI adoption, the amount of “toil” is increasing. While somewhat paradoxical, it makes sense, Catchpoint CEO Mehdi Daoudi told IT Brew.

“The resources are really spread very thin,” Daoudi said. “Then everybody wants to jump on the bandwagon of automation and AI, and therefore there is more going back and fixing things and dealing with systems that are a little bit outdated.”

Old tech, new tech. Outdated infrastructure—the kind that’s 20 or 30 years old—is a problem for SREs, especially in the banking industry, Daoudi said. Dropping in AI is often expected to solve problems, but the issues often remain because change happens over time, not immediately.

It’s a familiar story. Staff gets new tech, management urges implementation, and existing issues remain unfixed. When it comes to SRE, as with other tech sectors, the free time opened up from using AI for automated processes can instead end up filled with “toilsome tasks,” as the report said, though it emphasized this was only a hypothesis. Laura de Vesine, a senior staff engineer at Datadog, said in a comment for the report that “manual supervision of AI systems…can easily raise the operational load of a team for both day to day work and incidents.”

“AI systems are themselves a new source of operations we as an industry have yet to master: Maintaining and updating models and running massive GPU clusters are both new problems for most teams,” de Vesine commented. “For teams not running those AI systems, AI proponents are keen to tell us that its rollout will reduce toil, but the evidence may suggest that AI is actually a source of increased toil.”

What’s next? For Daoudi, the future of AI and SRE is likely to continue to see a push and pull. The fear is that rather than find practical use cases for the technology, SRE teams and management will focus on quick fix projects “rather than actually empowering the SREs to do their job and to bring automation.” Daoudi grimly predicted that it might take catastrophe to change the way business is done.

“Unfortunately, we’re going to have to wait for bad events for people to change,” Daoudi said.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.