Google’s ‘Big Sleep’ AI agent finds a zero day

It’s a first, Google says, and a “small needle” in the internet haystack, according to one pro at the company.

November 15, 2024

• 3 min read

Wake up, human developers!

Google reported that its large language model (LLM) dubbed “Big Sleep” found a previously unknown software vulnerability in open-source database engine SQLite. The company considers the agent’s discovery a first, and an important demonstration of AI’s role in testing.

“Finding vulnerabilities in software before it’s even released means that there’s no scope for attackers to compete: The vulnerabilities are fixed before attackers even have a chance to use them,” Big Sleep team members wrote in a Nov. 1 blog post.

Jog my memory. The Cybersecurity Infrastructure Security Agency (CISA) has urged developers to avoid memory-safe vulnerabilities, or “how memory can be accessed, written, allocated, or deallocated in unintended ways in programming languages.”

US defense agency DARPA has led efforts like TRACTOR to replace legacy C code, a language CISA considers memory-unsafe. At the agency’s AIxCC event in August 2024, Team Atlanta researchers used their LLM-based tool to find a “null pointer dereference bug” in SQLite.

“This kind of issue can crash a program, but it’s not usually something attackers can exploit to take control of a system,” Mark Brand, a Big Sleep pro and Project Zero researcher, shared with IT Brew in a statement on behalf of the team. “We thought it would be interesting to see if our tool could discover a more serious issue.”

Training Day. Brand and his team trained their AI agent to explore how to run a program to trigger memory-safety weaknesses, he shared, like buffer overflows, when data is written beyond designated space, allowing code execution. Google’s team discovered an “underflow” in SQLite, when the flow of data does not reach the beginning of the space, which can lead to ruined programs and exploits.

Dustin Childs, head of threat awareness at Trend Micro Zero Day Initiative, finds Google’s demo “impressive,” while noting one important advantage for the LLM: It scanned an open-source tool, which means it had all the code.

“You can’t do that with a closed source or a commercial product, like, say, Windows or Adobe Reader, or something along those lines. So, that, to me, is the biggest caveat: You have to have that full source code to do what they did,” Childs told IT Brew.

For that reason, Childs sees LLMs as an important tool for the prerelease phase of development. He also sees models supplementing other testing measures like fuzzing and manual code review.

“The human effort is going to be more in writing the secure code than auditing the insecure code,” Childs said.

Almost three-quarters (74%) of polled developers plan to upskill in AI-assisted coding, according to an Oct. 2023 study from PluralSight. (Gartner, too, sees developers as early enterprise adopters.)

Google’s researcher said the team wants to scale the AI effort to more complex software like operating systems and web browsers, and agents will need to wake up more than just a single underflow error—one Childs considers “a relatively simple bug” to find.

“When the vulnerability becomes a very small needle in a giant haystack, there are still many open-ended research questions left to investigate,” wrote Brand.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.