Over 40 health systems are using allegedly error-prone OpenAI transcription tool

Tens of thousands of clinicians across the US use a transcription tool experts say isn’t reliable.

article cover — Moor Studio/Getty Images

November 8, 2024

• 3 min read

Hospitals and other healthcare settings are relying on an OpenAI model named Whisper to automate transcriptions of recordings, despite a demonstrable risk the resulting text might contain errors or outright fabrications, the Associated Press reported.

OpenAI warns in its guidelines that Whisper should only be used “with caution in high-risk domains,” but that hasn’t stalled widespread adoption in the medical community. An AP investigation found some 30,000 clinicians and 40 health systems across the US are using a tool by Nabla to transcribe notes, which itself is powered by Whisper.

The AP reported that interviews with “more than a dozen software engineers, developers, and academic researchers” showed Whisper regularly spits out error-laden text, even when using “well-recorded, short audio samples.” One University of Michigan researcher told the AP he had found hallucinations—incorrect, misleading, or made-up outputs resulting from flaws in an AI model or its training data—popped up in around 80% of town-hall recordings.

Others reported similarly high rates of errors in Whisper, with one machine-learning engineer reporting he found transcription errors in about half of its transcription of 100 hours of audio and another telling the AP errors were almost universal in an analysis of 26,000 Whisper transcripts.

AI is a hot topic in the medical community; hospital administrators eager to cut costs are implementing AI tools into all kinds of processes, with minimal regulatory oversight. In some cases, the tools are bespoke, but the widespread usage of Whisper reflects interest in off-the-shelf commercial products. The National Nurses’ Union (NNU) and its affiliates have sounded the alarm on what they view as unregulated experimentation on patients.

Troublingly, some 48% of nurses who responded to an NNU survey on AI whose employers utilize automated hand-offs said medical reports often didn’t reflect their professional assessments or didn’t include relevant information. Around four in ten reported they were unable to override AI-generated reports and/or disregard their diagnoses.

“The most harmful thing we’re seeing is the way it’s being used to redesign care delivery and usurp the skill of decision-makers,” NNU Assistant Director of Nursing Practice Michelle Mahon told IT Brew. Mahon also warned of the displacement of medical judgment via AI developers’ “claim of intelligence.”

Alondra Nelson, who led the White House Office of Science and Technology Policy until 2023, told the AP mistakes in transcriptions could result in “really grave consequences” for patients. Nelson added there should be a “higher bar.”

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.