When Bigger Is Not Always Better: Rethinking SLMs and LLMs

When Bigger Is Not Always Better: Rethinking SLMs And LLMs

FutureShifts | May 2026 Edition

For a long time, the assumption was simple: bigger models are better models.

But that does not always hold in real work.

Some tasks need full capability. Others only need something fast, cheap and reliable that gets out of the way.

When you are writing, summarising or handling routine work, how do you decide whether a large model like ChatGPT is necessary or whether a smaller specialised model like Microsoft Phi-3 Mini is enough?

This week’s piece explores why AI is moving beyond “bigger is better” and towards a more layered approach using both SLMs and LLMs. It also breaks down seven major AI developments from the past week.

Dive in and let us know what you think.

AI in Focus: Recent Developments

1) OpenAI rolls out GPT-5.5 Instant as ChatGPT’s new default

The updated model claims 52.5 per cent fewer hallucinated answers than its predecessor on high-stakes prompts across medicine, law and finance. OpenAI also says responses are tighter, more accurate and less cluttered with unnecessary conversational filler.

2) Chinese open-weights models close the gap with the frontier

A new analysis published on 1 May places Kimi K2.6, MiMo V2.5 Pro and DeepSeek V4 Pro within striking distance of GPT-5.5, Claude Opus 4.7 and Gemini 3.1 Pro on the Artificial Analysis Intelligence Index, despite running at dramatically lower inference costs.

3) Anthropic doubles down on compute and Wall Street

Within a single week, Anthropic reportedly expanded its long-term Google Cloud and TPU commitments significantly, secured full use of SpaceXAI’s Colossus supercomputer and launched ten Claude agents designed specifically for banking, insurance and asset management workflows.

4) Google retires Fitbit and launches the screenless Fitbit Air

Google has unveiled a sensor-only wearable that pairs with the new Google Health app and a Gemini-powered AI coach, completing Fitbit’s gradual integration into Google’s ecosystem.

5) Meta moves towards agents that act for you, not just chat

Reports suggest Meta is developing an AI assistant capable of handling multi-step actions across its apps, designed for its ecosystem of more than 3 billion users, alongside an agentic shopping system for Instagram.

6) AWS makes its MCP Server generally available

Amazon has launched a managed service giving AI coding agents authenticated access to more than 15,000 AWS API operations, live documentation and sandboxed scripting environments while keeping human and agent permissions clearly separated.

7) US government starts testing frontier AI before release

Google, Microsoft and xAI have agreed to share unreleased frontier models with the Center for AI Standards and Innovation for evaluation against cybersecurity, biosecurity and chemical weapons risks before public deployment.

Rethinking The LLM vs SLM Debate

For most of the AI era, one assumption has dominated the conversation: bigger models are better models.

That logic still holds in many situations. Frontier systems continue to lead in complex scientific and research-intensive tasks.

But in practice, businesses are finding that raw capability is only one part of the equation.

A different question is starting to matter more: What is the right model for this task?

This is already reflected in real-world AI system architecture.

What we mean by LLMs and SLMs

Large language models, or LLMs, are systems trained on vast amounts of text data to perform a broad range of language and reasoning tasks. Many frontier systems today, such as ChatGPT, Claude and Gemini, also increasingly include multimodal capabilities, meaning they can process text alongside images, audio or other inputs.

Small language models, or SLMs, use many of the same underlying techniques but at a reduced scale. They are typically faster, cheaper and more specialised. In some cases, they can run directly on devices rather than relying entirely on cloud infrastructure.

Examples of SLMs include Microsoft Phi, Google Gemma, Mistral’s smaller open-weight models and Alibaba’s Qwen small model families, which are designed for efficiency, fine-tuning and deployment in constrained environments.

The distinction is no longer purely technical, it is becoming more operational.

The shift was already underway

The move towards smaller models did not begin this year.

For some time, researchers and enterprises have been reaching a similar conclusion: many real-world workflows are narrow, repetitive and domain-specific. They do not require frontier-level general intelligence to perform effectively.

A customer support classifier, for example, does not need to understand philosophy or generate poetry. It needs to route tickets accurately, consistently and cheaply at scale.

In those environments, smaller specialised models can sometimes perform better than larger general-purpose systems on specific business metrics.

Source

Why smaller models are gaining ground

A key reason is that many business tasks are predictable.

Many enterprise workflows involve:

structured inputs
repetitive processes
narrow domains
clear outputs

That changes the economics completely.

A highly specialised small model can often:

respond faster
cost significantly less
use fewer computing resources
operate with lower latency
run on-device or on-premises
provide more predictable outputs

For organisations deploying AI continuously across thousands or millions of interactions, those efficiencies compound quickly.

Sources: 1 2 3

Specialisation can outperform scale

One of the most important lessons emerging from recent AI deployments is that focused systems can outperform general systems in constrained environments

A model trained specifically for supply chain fulfilment, fraud detection or medical classification may outperform a much larger frontier model simply because it has been tuned for a narrower problem space.

The same pattern is now appearing in agentic AI systems.

Rather than using one giant model for every step, many architectures are moving towards routing subtasks to smaller specialised models while reserving large models for the genuinely difficult reasoning work.

The result is often:

cheaper systems
faster systems
more reliable systems
more scalable

in production environments without sacrificing overall capability

Source

Where large models still dominate

None of this means large models are becoming irrelevant.

Recent survey and preprint literature on small language models highlights increasing use of hybrid architectures combining SLMs and LLMs depending on task complexity. Frontier systems still lead in:

complex reasoning
open-ended generation
multimodal understanding
open-ended or ambiguous problem solving
complex scientific and research-intensive tasks

They also remain the foundation from which many smaller models are derived through distillation and fine-tuning.

Rather than being in competition, the relationship between large and small models is becoming more layered.

Sources: 1 2 3

The hybrid future is already here

Many organisations are moving towards a similar architectural pattern.

Small language models (SLMs) handle:

high-volume tasks
predictable workflows
domain-specific operations

Large language models (LLMs) handle:

edge cases
complex reasoning
open-ended requests

Routing systems dynamically decide which model should respond based on the nature of the task, increasingly treating model selection as part of system design rather than an afterthought.

In many ways, this mirrors earlier eras of computing. Mainframes did not disappear when personal computers arrived, and cloud computing did not eliminate local computing. Systems specialised according to strengths.

The result is not replacement, but layering.

Sources: 1 2 3

The Takeaway

The AI conversation is moving beyond a simplistic race for the largest model.

What matters is no longer maximum capability in isolation, but how well a model fits the workflow it supports.

For many organisations, the future will not belong exclusively to giant frontier systems or tiny local models. It will belong to hybrid architectures that combine both.

The companies getting the most value from AI are unlikely to be those chasing scale alone. They will be the ones designing systems where each model is matched carefully to the task it performs.

Over to you

If you were designing an AI system for your organisation today, which would you prioritise more: maximum capability or operational efficiency?

How we can help

At GenFutures Lab, we help organisations move beyond experimentation to design practical AI systems that scale responsibly. From workflow design to model strategy, we help teams identify where frontier models create value and where smaller specialised systems may be the smarter choice.

Get in touch to discuss your organisation’s AI strategy and operational challenges.

If you found this week’s breakdown useful, please consider forwarding it to a colleague.

When Bigger Is Not Always Better: Rethinking SLMs and LLMs