What We Learned by Building Our Own Coding Agent
My team and I did not originally set out to build a coding agent.
Our goal was simply to understand why existing coding agents were behaving inconsistently when connected to our internal vLLM deployment. Building our own agent turned out to be the most effective way to answer that question, and along the way we learned far more than expected.
When ICS first launched an internal instance of gpt-oss-120b running on vLLM, we knew we were operating slightly ahead of the curve. Getting that system running reliably was already a challenge, and the details of that effort deserve their own post. Once the model was online and connected to OpenWebUI, the next step was to integrate coding agents such as Qwen Coder CLI, Aider and OpenCode so teams could begin using the system in day-to-day work.
That’s when problems began to surface.
Early Signs of Trouble
Users started reporting dropped connections, corrupted sessions and agents that would fail partway through otherwise normal tasks. Nothing was failing in an obvious or reproducible way, which made debugging particularly difficult. The coding agents themselves provided little insight into what was going wrong, largely because they are designed to be productive tools rather than diagnostic ones.
To make progress, we needed to better understand how conversations were actually flowing between the agents and the vLLM backend. We wanted to know where context was being lost, how malformed responses were handled, and what specific conditions caused an agent to fail completely rather than recover.
It became clear that we needed a simpler tool that gave us full visibility and control.
Building ICS Agent
ICS Agent was created to provide direct, low-level access to gpt-oss-120b through vLLM. The initial goal was intentionally narrow: inspect raw requests and responses, adjust parameters, and observe how the model behaved without the added complexity of a full coding agent layered on top.
The tool was written in Python and used the rich library to provide a basic terminal interface. Even in this minimal form, it quickly proved useful. We were able to see that many of the issues users experienced were not caused by the coding agents themselves, but by edge cases in how vLLM handled certain generations and session states. Having precise control over how requests were constructed and responses were interpreted made these problems much easier to identify.
Stabilizing the System
As a result of this work, we ended up with a significantly patched vLLM Docker configuration and an additional gateway layer between agents and the model. This gateway ensures that requests conform to what vLLM expects and that responses are normalized before being passed back to the client.
Some of the issues we encountered were subtle but impactful. In certain situations, the model would return only internal reasoning text without any usable output. While a human user might recognize this and retry, most coding agents cannot. A single malformed response can halt an agent session entirely, forcing the user to restart and rebuild context from scratch. By detecting and handling these cases centrally, we were able to make agent interactions far more reliable.
Tool-Calling Lessons
Tool calling presented a separate set of challenges. Our initial attempts at defining tools were unsuccessful, with the model frequently failing to invoke them correctly. Progress came when we changed our approach and involved the model more directly in the process.
By iteratively shaping tools based on what gpt-oss-120b appeared to expect, tool calling behavior improved noticeably. This reinforced the idea that effective tool integration is not just an API design problem, but also one of alignment with the model’s internal assumptions.
At this stage, the work was exploratory rather than production-focused, aimed at helping the ML team better understand agentic loops and tooling behavior before deploying more specialized agents.
Should You Build Your Own Coding Agent?
Building a custom agent is not a replacement for mature tools, and it is not something most teams need to do. However, for anyone trying to deeply understand how agents, tools and LLM backends interact, the exercise is extremely valuable.
For us, ICS Agent served as a way to remove layers of abstraction until the system became understandable again. That understanding made it much easier to integrate existing agents reliably and to plan future agent-based systems with confidence. Sometimes the most effective way to move forward is to simplify, observe closely, and learn from what is actually happening under the hood.
For more on AI, read Building a Device Driver from Scratch, with an AI Wingman.