I’ve spent my profession swimming in knowledge — as former Chief Information Officer at Kaiser Permanente, UnitedHealthcare, and Optum — and at one level, I had oversight of almost 70% all of America’s healthcare claims. So after I let you know the issue with enterprise AI isn’t the mannequin structure however the knowledge that fashions are being fed, consider me: I’ve seen it firsthand.
LLMs are already peaking
The cracks are already exhibiting in LLMs. Take GPT-5. Its launch was plagued with complaints: it failed primary math, missed context that earlier variations dealt with with ease, and left paying clients calling it “bland” and “generic.” OpenAI even needed to restore an older mannequin after customers rejected its colder, checklist-driven tone. After two years of delays, many began asking if OpenAI had misplaced its edge — or if the complete LLM method was merely hitting a wall.
Meta’s LLaMA 4 tells an analogous story. In long-context exams — the sort of work enterprises really want — Maverick confirmed no enchancment over LLaMA 3, and Scout carried out “downright atrociously.” Meta claimed these fashions may deal with tens of millions of tokens; in actuality, they struggled with simply 128,000. In the meantime, Google’s Gemini sailed previous 90% accuracy on the identical scale.
The information drawback nobody needs to confess
As a substitute of confronting the bounds we’re already seeing with LLMs, the trade retains scaling up — pouring extra compute and electricity into these fashions. And but, regardless of all that energy, the outcomes aren’t getting any smarter.
The reason being easy: the web knowledge these fashions are constructed on has already been scraped, cleaned, and retrained over and over to loss of life. That’s why new releases really feel flat — there’s little new to study. Each cycle simply recycles the identical patterns again into the mannequin. They’ve already eaten the web. Now they’re ravenous on themselves.
In the meantime, the actual gold mine of intelligence — personal enterprise knowledge — sits locked away. LLMs aren’t failing for lack of information — they’re failing as a result of they don’t use the proper knowledge. Take into consideration what’s wanted in healthcare: claims, medical data, scientific notes, billing, invoices, prior authorization requests, name heart transcripts — the data that truly displays how companies and industries are run.
Till fashions can prepare on that sort of knowledge, they’ll all the time run out of gas. You’ll be able to stack parameters, add GPUs, and pour electrical energy into larger and greater fashions, however it gained’t make them smarter.
Small language fashions are the longer term
The best way ahead isn’t larger fashions. It’s smaller, smarter ones. Small Language Fashions (SLMs) are designed to do what LLMs can’t: study from enterprise knowledge and deal with particular issues.
Right here’s why they work.
First, they’re environment friendly. SLMs have fewer parameters, which implies decrease compute prices and quicker response instances. You don’t want a knowledge heart stuffed with GPUs simply to get them operating.
Second, they’re domain-specific. As a substitute of making an attempt to reply each query on the web, they’re educated to do one factor nicely — like HCC threat coding, prior authorizations, or medical coding. That’s why they ship accuracy in locations the place generic LLMs stumble.
Third, they match enterprise workflows. They don’t sit on the skin as a shiny demo. They combine with the info that truly drives your online business —billing knowledge invoices, claims, scientific notes — they usually do it with governance and compliance in thoughts.
The longer term isn’t larger — it’s smaller
I’ve seen this film earlier than: large investments, limitless hype, after which the belief that scale alone doesn’t resolve the issue.
The best way ahead is to repair the info drawback and construct smaller, smarter fashions that study from the data enterprises already personal. That’s the way you make AI helpful — not by chasing measurement for its personal sake. And I’m not the one one saying it. Even NVIDIA’s personal researchers now say the future of agentic AI belongs to small language models.
The trade can maintain throwing GPUs at ever-larger fashions, or it might probably construct higher ones that truly work. The selection is apparent.
Photograph: J Studios, Getty Photographs
Fawad Butt is the co-founder and CEO of Penguin Ai. He beforehand served because the Chief Information Officer at Kaiser Permanente, UnitedHealthcare Group, and Optum, main the trade’s largest crew of information and analytics consultants and managing a multi-hundred-million greenback P&L.
This put up seems via the MedCity Influencers program. Anybody can publish their perspective on enterprise and innovation in healthcare on MedCity Information via MedCity Influencers. Click here to find out how.


