< Back

Mar 30, 2026

From agent to assistant

I had intended for this next post to be a deep dive into Warlock. A control plane for Firecracker built in Rust that I have been chipping away at for a little while. This was a way for me to experiment with giving an LLM a fully owned environment to really flex it's capabilities. A full filesystem, resources, and network, whist controlling it's isolation (sandboxing). This certainly deserves a dedicated post, yet I had noticed a(nother) step-change in agent-assisted coding through this project that felt worth sharing.

I was first made aware of Firecracker from a post by Julia Evans. I had thought the technology looked very cool, but didn't really have a particular use case where it made sense at the time. A few years on, and an independent environment for an LLM appeared to be a perfect fit. LLM's, as agents, are becoming increasingly capable, and subsequently more autonomous in nature. Reasoning from the point of an agent being much like a human, logically it makes sense for it to own it's environment e.g. a virtual machine, much the same as a developer has their own PC. Having numerous developers share a machine, or requiring them to be continually observed in every action they take is a sure way to throttle speed and progression. However, we cannot always be sure of what actions will be taken, and so guardrails are important in much the same as is the principle of least privilege. It's great to also see companies taking this concept further at a significantly larger scale such as Sprites from Fly.io & exe.dev.

Bridging the gap

Firecracker makes use of the Kernel Virtual Machine (KVM). This is a virtualization technology, essentially allowing for the creation of virtual machine with it's own 'hardware', networking, and filesystem to be created within an existing physical machine. Firecracker does a great job of abstracting this, and although I know my way around a Linux box, it introduces a lot of unfamiliar concepts, chroot, Cgroups, and networking internals such as iptables.

An LLM is a powerful tool here, and one I could lean on extensively to quickly familiarise with these terms and find the right resources for further reading. On these personal projects, I try to stay away from letting the LLM write the code. My main objective is learning, and having the agent complete this for me makes that increasingly more difficult and less satisfying. With a lot of back-and-forth, I gradually built up a control plane over Firecracker allowing me to spin up multiple Firecracker micro-VM's on a single host. Giving me, or an LLM, essentially it's own PC to interact with.

I then took this a step further, introducing a gateway controller on top, allowing me to deploy a VPC with a bastion node responsible for managing hosts within the network (all running firecracker), orchestrating the creation of these micro-VMs and SSH-forwarding for secure access from the outside as required. I'd set this up using Terraform and DigitalOcean. DigitalOcean kindly offers nested virtualization, allowing Firecracker VM's to be created on what is already a VM, and Terraform greatly simplifies the creation and management of the infrastructure without needing to touch a web dashboard.

LLM's were an indispensable tool on a project like this, where many of the concepts are new and a lot of learning is required. Certainly something that would have been more daunting and time-consuming, particularly for a solo developer in a pre-GPT world.

From agent to assistant

Developing on top of these VM technologies already introduces some difficulties on a single machine, but reaching the stages of provisioning remote infrastructure as well as trying to observe the lifecycle of micro-VM's was becoming quite tedious. Needing to SSH into an individual box and observe logs before debugging the root cause to find the right fix whether that was in the Firecracker configuration, control plane, or network setup.

This is where I tried something new. The LLM I was working with already had a good handle on what I was trying to achieve, as well as the existing components of the project. Rather than back-and-forth messaging on these individual issues, could I just give it access to the network and let it figure it out? Turns out the answer is yes! I gave the LLM a key to the network and off it went on it's merry-way, exploring the infrastructure, observing configurations, and reasoning it's way through the problem. I emphasise reasoning, as we have really reached a point where these LLM's can take multiple steps towards some individual task and continually react to new information to inform it's next actions. And I could confidently say it could attempt things I, or an experienced sysadmin even, wouldn't have thought to try, at a pace unmatched.

This was my first real attempt at just giving the LLM's the tools and letting it figure out the rest, and with great success. A great indicator to the initial hypothesis of the this project as a whole (although earlier than anticipated and in a different form). I was merely the observer in the move from the LLM being what we might term an 'agent', to an 'assistant', one capable of much more independently, requiring less hand-holding, and potentially far more capable than most human operators.

Where next?

To the other key point I initially mentioned in this project, these longer running agents become more difficult to observe, both from their speed of execution, and in what actions they decide to take, and whether they can always be interpreted correctly by the human. The guardrails become more pertinent as this technology continues to develop along with the systems around them. All while models become increasingly more powerful as a base.

If you've been paying close attention, you'll also notice this behaviour contradicts my earlier statement around effectively using the LLM as a tool for learning. Whilst I could let the LLM loose at the latter stages of this project, it no doubt hindered by ability to learn myself how to debug these increasingly complex processes, and particularly in unfamiliar territory where I am unable to fully verify the actions the LLM decides to take. Fine for my personal project, and an insignificant development environment, but the same cannot be said for a critical production setup.

This is without a doubt, the next steps on the path of these new technologies. There are still many unanswered questions but it is exciting to already see what is possible and the opportunities this unlocks. The big players, Fly.io, Cloudflare, OpenAI will take this further on an unprecedented scale, and more personally I'm keen to see what I can achieve myself now I have such a setup where I can 'let these agents loose'.