Top insights for IT pros
From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.
Who needs 50 billion parameters in a large language model (LLM), when 3 billion or so will do just fine?
Market intelligence firm Gartner sees security benefits in “small language models”—computational machine learners with fewer than 10 billion parameters, or training variables.
“You don’t need your language model to write a poem about cats and dogs eating spaghetti under a bridge. You need it to answer an HR-related question,” Birgi Tamersoy, Gartner’s senior director analyst for AI technologies, said in a live presentation on September 12.
Known, domain-specific data—that HR-related info, for example—can be embedded into a small language model to solve a specific task, Tamersoy said.
A July 2024 survey from another market intel firm, IDC, found that 20% of IT pro respondents said they “don’t expect to use small models”; 25% have deployed them; and 26% characterize their current use of small models as “learning.” (A further 17% said “evaluating,” and 13% said “testing.”)
The specialized LLM option offers potential security benefits when the smaller computational workloads can remain nearby, according to Tamersoy.
“I think the primary advantage is true local hosting, because the moment you have this thing in house, on premises, or on a private cloud, then nothing, no information, leaves the organization technically,” Tamersoy said during the webinar.
What counts as “small”? The Gartner pros defined small language models as those containing fewer than 10 billion parameters. For comparison, OpenAI’s GPT-3 contains 175 billion parameters.
Some small examples cited by the Gartner team included Meta Llama 3, which recently released an 8B-parameter model; Microsoft’s Phi-3 mini, which has 3.8 billion parameters; and Google’s Gemma 2, available in 2B-parameter and 9B-parameter options.
Options like the 2.7-billion-parameter BioMedLM, developed by the Stanford University Center for Research on Foundation Models (CRFM), offers the security benefit, according to Tamersoy, of knowing a dataset’s source; it pulls from the free database of PubMed’s abstracts and full articles.
“With smaller models, like the BioMedLM, you can actually have full transparency on what they were trained on, so you don’t [have] more risks in terms of, what did go into this model,” Tamersoy said on the live call.
Data remember: The chatbots, by the way, occasionally hallucinate, creating data-specific, if not exactly security, concerns.
Bhavani Vangala, VP of engineering at software company Onymos, guides the integration of AI technologies—sometimes recommending small computation models and narrower data sets, to ensure high-integrity data, without all the cats and dogs poetry. The data should be validated by subject matter experts early on, Vangala told IT Brew.
A MathBot from Babson College, for example, uses about 10 open-source textbooks to provide business undergrads with responses to questions.
“It should be like a textbook’s content: Verifying that it’s all good, and then use that data set for creating this model,” Vangala said.