AI hype has mostly centered on massive, cloud-based generative AI services like OpenAI’s ChatGPT or Microsoft Copilot.
Yet many organizations may find running their own custom AI services on owned or rented hardware to be surprisingly accessible—and feasible—in the near future, experts told IT Brew. Other factors like European data regulations may also ultimately affect where companies deploy their services.
Up to speed. Brandon Jung, VP of ecosystem and business development for AI assistant developer Tabnine, told IT Brew the most costly up-front components of AI are training data and deep learning. Once both are in place, the game switches to making inferences (the actual outputs of AI) cheaper and more efficient.
While broad-purpose AI tools like ChatGPT or Copilot are expensive to train and run, Jung said, “If I have my own data, and I can build a custom model—yes, it won’t solve a bunch of problems, but it’s going to be very efficient and cost-effective, and bring a whole lot more value to the problem area that it’s addressing.”
Moreover, Jung said open-source AI like Llama 2 or Gemma that can be run on-premises are converging in performance with more popular proprietary AIs, and may soon be indistinguishable from the average users’ perspective.
David Lithincum, former chief cloud strategy officer at Deloitte turned independent expert, concurred that currently companies are running AI for convenience and lack of alternatives. While most firms may end up running a hybrid model to balance cost versus ease of use, he said, even running generative AIs that are “piggies in terms of resources” may become viable on-premises given the right use case.
“Do the requirements processing, and the needs assessment, and the total cost of ownership in the cloud versus on premise, and see which one’s going to be cheaper to build,” Lithincum told IT Brew. “The ecosystem, you need the training, the cost, the egress fees, and data moving in and out of the cloud.”
A bevy of options. Tech giant Dell is bullish that on-prem is the future of enterprise AI, the company’s global CTO John Roese told IT Brew, though it’s agnostic as to the exact semiconductor architecture that will run it. For example, IBM is building AI-focused application-specific integrated circuits, and PCs and laptops with dedicated AI chips could eventually handle some AI computation themselves.
Top insights for IT pros
From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.
Roese views large generative AIs like ChatGPT, Gemini, and Bard as essentially “search engine 2.0.” Then there’s integrating AI features into existing software, which Roese predicts will likely resemble a traditional hybrid environment where providers don’t directly access user data, and core enterprise use where AI is the “primary engine” running on smaller, more focused training sets.
Demand on useful enterprise AI systems is driven by inferences per second rather than size of the model, Roese said. “You don’t want to be on the receiving end of having a public cloud environment that charges you per transaction, and then you build a wildly successful GPT to serve your customers.”
“It’s just more economically viable to run them on dedicated systems,” whether that’s a company-owned or colocated data center or an edge node,” Roese said. “Our experience has been, in almost every scenario we’ve been able to fully model, the demand is significantly higher than we initially thought.”
For example, Roese said, Dell tested a prototype chatbot for a user base of around 1,000 senior systems engineers and found it would rack up 50 million transactions a month: “They didn’t really design the public cloud for an environment where every customer used every bit of performance all the time.”
Jung sees one potential other future addition to the AI race: As the cost of training decreases, marketplaces for private enterprises could emerge. Enterprises could eventually be running many different AIs just as they’ve experienced a proliferation of apps and/or SaaS.
“You’re going to see a multiplication, more and more models,” Jung said. “So, then you’ll start getting into model management and what goes to where, and how it’s controlled again. Now, if the world comes back to role-based access control…those are some areas we’ll go, they’re well understood.”