Skip to main content
IT Operations

IT pros recommend guardrails for large language models’ imperfect answers

Pros recommend limited ways to use LLMs, given their possibility for inaccuracy.
article cover

Schitt’s Creek/CBC Television via Giphy

4 min read

AI, like secret agents and magicians, can be trained to deceive, according to researchers from the startup Anthropic.

The Anthropic study—which produced one of several recent examples of large language models (LLMs) producing unreliable output—offers more uncertainty to temper enthusiasm for generative AI adoption in the enterprise. To defend against distrustful data, AI pros who spoke with IT Brew offered “guardrail” recommendations to keep LLMs honest.

“I think that models should be limited in the workplace. They have inherent issues that are still being explored, and their use needs to be highly tailored and specific to a desired outcome,” said Josh Mitchell, SVP of cyber risk at the business consultancy Kroll.

Here are a few recent examples of LLMs causing problems.

  • Ask and it shall deceive. In its study, published in January, the Anthropic team (purposely) manipulated an LLM to present secure code when a prompt stated the year “2023” and exploitable code when presented with “2024.” “We can train models to have backdoors that, when triggered, involve switching from writing safe code to inserting code vulnerabilities,” read the report.
  • If you can’t handle the heat…To protect artwork from being used as training data, a tool known as Nightshade adds pixels to imagery, effectively poisoning training data, disturbing the expected outputs, and making for strange pictures. “This is the equivalent of putting hot sauce in your lunch because someone keeps stealing it out of the fridge,” Nightshade creator Ben Zhao, Neubauer Professor of Computer Science at University of Chicago, told Engadget in November.
  • Objection! And some models don’t need any extra help to mess up. A report in early January from the Stanford University Institute for Human-Centered AI found that LLM-based tools experience factual errors known as hallucinations 69% to 88% of the time, when responding to “specific legal queries for state-of-the-art language models.
Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

LL-Let’s go! A Deloitte survey of 2,800 “AI-savvy business and technology leaders,” conducted at the end of December 2023, found that 79% of respondents said they expected generative AI to transform their orgs in three years’ time.

Organizations are moving forward with LLM deployments for applications like code creation (from natural language suggestions) and summarization of complex topics.

Over one-half (62%) of the business leaders polled by Deloitte said generative AI elicits feelings of “excitement,” whereas 30% said the tech brings “uncertainty.” Deloitte also cited a “lack of confidence in results” and “intellectual property issues” as top governance-specific concerns.

So, what to do with all this uncertainty?

  • Use the LLMs as rewriters instead of answer-ers, said a team at Oxford University after concluding that hallucinations threaten scientific study. Use the tools to transform scientific data into a graph, for example.
  • Don’t trust, but verify. “Given generative AI’s potential to hallucinate inaccurate outputs, AI-generated insights should be verified with real-world data and traditional research methods to ensure accuracy and reliability,” said Deloitte’s Generative AI Dossier.
  • Cache in. Richard Bownes, principal in the data and AI practice at the digital transformation consultancy Kin + Carta, recommends placing preapproved, editorially compliant responses into memory ahead of time: a process known as pre-caching that blocks out inadvertent, inaccurate responses. “A prewritten, preapproved message that is still personalized to you gets sent out,” Bownes told IT Brew.
  • Align up. Mitchell recommends “alignment,” tailoring the model to only perform specific functions. “You’re essentially creating a thin wrapper around the model that directs its behavior. So, for example, you ask a model to help you hack a computer, and it says, ‘No, I won’t help you with that,’” said Mitchell.

That kind of request would require a little more human deception.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.