One of our recent initiatives at Signal AI has been to push frontier models to their limits. We've spent the last few months, leaning heavily on LLM's (predominantly Claude) in our daily workflows, observing what works, what doesn't, what breaks, and where it excels. We've had some great experiences, coupled with multiple revelations, numerous 'back-to-earth' moments, and lots of insights in between.
As an organisation, we're pretty comfortable with AI workloads (it's in our name), but LLM's are a disruptive technology and this was really a playground for discovering the art of the possible. Since ~November 2025 model capabilities have made a noticeable step change. This was a time-bound exercise, with little consideration for token usage and ultimately cost to see where we landed. And it's fair to say we've become fairly claude-pilled. It's not something that is going away, and it will dramatically affect organisations far beyond engineering*.
- Naturally, engineering teams will be the first to experiment with LLM's. We're still early enough that most of the advancements continue to be made in coding / development domains - beyond pure intellectual power. Models continue to improve but the surrounding tooling is better by the day. Other business functions will lag behind this but not by much.
As with any technology, there is always a risk in overreliance on a single provider - vendor lock-in, single point-of-failure, opportunity cost - we'll continue to move ahead with Claude, and will take a visit inside the walled garden, but won't be getting into bed with it. The main remaining question for us, and the wider industry, is how to accurately measure value creation from LLMs.
This is a consequence that the really grounding effect in this equation currently is the spend. You can find endless horror stories online about extreme token spends - General Intelligence Co, Peter Steinberger, Uber. It's becoming common knowledge that subscription plans are loss leaders. There is an argument to be made for efficiency gains that will continue to push token cost down, or at a minimum keep it stable. Whilst I believe this is true in the long-term, consumers will want to see these play out before any extremely long-term commitments. The counter to this as well is that as models become increasingly capable, whether through direct inference or the surrounding tooling, they ironically become more token hungry purely from tackling more difficult challenges and longer horizon tasks.
From this assessment, it does raise the further interesting question - "Where are the local models?". Excluding the big players for a moment (Anthropic, Amazon, Google, Facebook), where this is their bread and butter, I've personally yet to see anything suggesting that a company is successfully running their own models internally for the day-to-day use case (coding, product work, admin etc). Assuming that price hikes are possible, or simply that token demand will balloon, running local machines seems like an attractive proposition. My personal stance is that local models will be ubiquitous, but that it is still too early. I've described what I believe to be some reasons for this below.
Speed
The strongest current argument is that the pace of development is too fast whilst costs are artificially low. New models from competing labs release continually. If you're scaffolding infrastructure and running models from scratch, this is a significant overhead. You've added the extra burden of hot-swapping models each time Kimi releases 2.7, 2.8, 18.1... or risk 'being behind'.
And 'being behind' is actually detrimental here. The delta on improvements between new models is significant enough that not using a SoTA model means you're likely learning the wrong things. Last year the leading model couldn't generate a decent PDF report, and now it can. You don't want to make an incorrect assumption of where the limit is when the line continues to shift so dramatically.
Hardware
Running a SoTA model is no small engineering task. These are huge models and are not being ran on a single machine. That means at any reasonable scale, you must run a smaller model. Capable models that can be ran on more readily available hardware do exist - Gemma 4 at 31B parameters for example can be ran on a newer high-end Macbook (This statement will probably age quick). It won't be SoTA grade or nearly as fast however*. These smaller models are likewise being released continually and get better and better. It's a fair prediction, this trend will continue. It could be argued that we'll never comprise the likes of even todays SoTA models onto consumer grade hardware but we're still so early in the journey. These models themselves will eventually feel rudimentary.
An interesting anecdote. It is already funny to recall back to when 'reasoning' was first released. There was a whole discourse online that the 'thinking' delay this introduced in response time would be unattractive. Here we are a few years on and that voice has been silenced. It seems people are happy to wait when you understand what the models are capable.
This is a fairly reasonable argument from the perspective of a single user, single machine scenario. Thinking beyond this, perhaps a small organisation centrally hosting a model to share internally or for some relatively small internal workloads, and there is still a significant capex expenditure in getting these models up and running. This is the bet that leading labs are willing to take. They will happily plough further capital whilst demand is high, machine utilisation is high, and therefore all available resources have high utilisation rates. Any remaining resource can always be fed as additional inputs for training newer models. This is a risky move if you're not an AI lab. What happens with the hardware you've purchased as models becoming increasingly capable on leaner requirements and you're sat on a fleet of GPUs?
Knowledge / tooling
The next point is that it's not easy for someone to understand what is required to run any given model. The terms in this space can be fairly arcane, and even in 2026, most people couldn't tell you what specs their own machine has. Downloading, running, and accessing a model is a tall hurdle. If you're not already pretty familiar with engineering I'd say this is pretty much a no-go, at least for now. Perhaps, particularly now we've LLMs which can almost do the whole setup for you, this is somewhat overstated, but the conceptual overhead none the less exists and is a barrier to taking this any further.
This essentially builds on my earlier point around engineers being the first to really build the ecosystem around the models themselves. Apps like LMStudio make this much more accessible but there is a lag point in this until the tooling sufficiently abstracts enough of the internals away before many users would be confident getting started. It's also so early on in the 'post-GPT' world that mindsets have not yet fully shifted to LLM's being part of the 'norm' and so there is a lot of work to be done here still.
Where they fit
The final point I'll mention, bringing this back to the question of how we measure value creation from LLMs, it is yet unclear as to which tasks LLMs are best suited to, and ultimately make sense to leverage them on. From the perspective of an individual contributor, 'state of the art' isn't necessary for day-to-day work. Most tasks being completed are rather menial, and trivial for even today's models. From this perspective, Gemma 4 is already enough.
As this space evolves we'll gain a better collective understanding of what fits where and when. The average person will know this, and smaller, finely tuned models will be accessible following on the above principles.
The waiting game
These points seem like the biggest factors as to where local models currently sit. It's still far too early in this journey such that they make sense, and there's a lot of surrounding factors at play before this comes through. As I said at the start however, I believe local models will be ubiquitous, and events like a cost squeeze could certainly accelerate these trends.
These seem like good and worthwhile trends. Greater access to these technologies is a net positive, both in terms of output and also personal independence. The big players aren't going to disappear, SoTA models have their place and are incredibly attractive when thinking about the biggest problems that need solving. But the use cases at the individual level are equally attractive. When almost everyone has access to significant intelligence at their fingertips.