IT Operations

CrowdStrike overview of IT outage causes raises concerns about testing

Bug in update “could not be gracefully handled,” CrowdStrike admits.
article cover

Credit: Anadolu/Getty Images

less than 3 min read

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

Days after the IT outage shut down systems around the world, affecting banks, airlines, and other organizations, CrowdStrike—the company whose update was behind the disruption—has issued guidance explaining what happened.

In a content update for Windows systems users employing the company’s Falcon defense platform, CrowdStrike detailed how its Rapid Response Content tool pushes updates to users multiple times a day. On July 19, a bad update went out to Channel File 291.

“Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data,” CrowdStrike noted, adding that it “resulted in an out-of-bounds memory read triggering an exception. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash.”

Root causes. Thomas Haver, a software developer and technologist, told IT Brew the problems for CrowdStrike likely go deeper than just getting people back online. CrowdStrike would have needed to prove it had controls in place before issuing updates.

“Companies, when they’re doing their due diligence to enter into an agreement with them for what’s called a master services agreement, would request this documentation be provided to their legal and risk team to confirm that it’s safe and okay to proceed with them,” Haver said. “They almost certainly have these procedures in place, and these procedures were just not followed.”

CrowdStrike promised to do a better job in the future to ensure safer update outcomes. The company listed a number of areas of focus, including software resiliency and testing, Rapid Response Content validation checks, and third party validation reviews.

Talal Haj Bakry, a Mysk security researcher, told Forbes the improvements were overdue.

“The crash would have been detected early in the first rollout stages and the number of impacted computers would have been significantly limited,” Bakry said.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.