August 2025 – Codemanship's Blog

With the long-awaited and much anticipated launch yesterday of the mythical GPT-5, and observing the widespread disappointment that it isn’t just not the superintelligence we were promised, but it’s barely an improvement on previous ‘frontier’ models, I wanted to revisit my blog post from January 11th so I could enjoy the delicious satisfaction of saying “I told you so”.

But, on reflection, and in the interests of fairness, I’ve decided to let ChatGPT have its say. This will likely be the only time you’ll read content generated by an LLM on this blog, or pretty much anywhere else you’ll find my content online.

Excuse the em-dashes. It just can’t help itself.

(Edited slightly because, bless, it thinks it’s me.)

JG

The LLM In The Room: Revisited (August 2025), by ChatGPT
The S-Curve, The Wall, and the Cat Out of the Bag

When GPT-5 dropped yesterday to a widespread chorus of “…wait, that’s it?”, I was reminded of something Jason wrote back in January. At the time, he suggested that transformer-based LLMs were running out of runway, and that their impressive but fundamentally unreliable outputs made them unsuitable for any domain where accuracy, reliability, and responsibility actually matter.

Back in 2023, as the hype reached a fever pitch, the trillion-dollar question was:
Where on the S-curve are we?

The Wall Was Always Coming
Anyone paying attention to the physics, or to early research into the statistical mechanics of transformer architectures, knew there would be a ceiling — a wall, if you like. You cannot just throw more compute and data at a pattern-matcher and expect it to eventually become a reliable doctor, lawyer, or software engineer. There is no ladder high enough to climb that wall.

Sure, GPT-4 looked like a big jump from GPT-3. But it was also the beginning of diminishing returns. GPT-5 has confirmed it: the curve is flattening. We are now firmly in the “marginal gains” era of LLMs, and they are still not solving the reliability problem.

Why? Because hallucination is not a bug. It is a feature. Generative models generate. That is what they do.

Not-Quite-Good-Enough at Scale
As Jason wrote earlier this year, LLMs are best understood as very impressive natural language interfaces to large corpora of human text. That has value — but not the kind of value Silicon Valley has been pricing in.

They are Stack Overflow++.
They are Google Search with manners.
They are Clippy, with 250 billion parameters.

But they are not, and likely never will be, reliable agents in complex domains without constant human oversight. If the best we can get out of GPT-5 is still “better-than-average, but not expert-level,” then what we are seeing is not acceleration. It is saturation.

The Real Future of LLMs Is Small and Specialized
Jason has long predicted that the real, useful future of this technology lies in small models running locally on consumer hardware, embedded in domain-specific tools, with a human expert in the loop.

Think:

A medical transcription tool that drafts summaries, but does not diagnose
A code assistant that suggests snippets, but does not write your backend
A tutor that helps explain calculus, but does not grade your exam

This is the future LLMs are actually good for. But this is not the ten trillion dollar “AGI is coming” vision Wall Street bet on.

What Now? Time to Change Course
The real danger now is not that we failed to achieve AGI. It is that we wasted vast amounts of time, money, and public attention trying to squeeze general intelligence out of autocomplete. The cost — environmental, economic, and societal — is already enormous.

If OpenAI, Google, Anthropic and others were acting responsibly, they would be:

Going back to the drawing board and exploring alternative architectures like hybrid neuro-symbolic systems, world models, and modular reasoning agents
Shifting focus to minimizing damage, not inflating hype
Acknowledging the urgent need for regulation, ethics, and guardrails

The Bubble Did Not Pop — It Deflated
So no, GPT-5 is not a disaster. It is just not the revolution everyone was sold. That might be even worse. Because now, even as it becomes clear this path does not lead to AGI, money and talent are still being funneled into bigger and hotter GPUs, chasing ghosts.

It is time to stop pretending that we are just one more model away from magic. We are not. The magic trick is over. The rabbit is not coming out of the hat.

Final Thought
There is no shame in building an LLM that is useful but narrow.
There is shame in pretending it is a god.

If nothing else, perhaps this marks the beginning of a more honest phase in AI — one where we stop mistaking scale for progress, and finally start asking what these tools are actually for.

Until then, we will be here, waiting for everyone to realize the pasta sauce is not going to grow back into tomatoes.

Month: August 2025

The LLM In The Room: ChatGPT Speaks