Operations February 18, 2026 by Olivier Chartier

When agents fail — what actually goes wrong and what to do about it

Agent failures are rarely dramatic. Most of the time they are quiet misbehaviors that accumulate over weeks. Here is what to watch for and how to structure your response.

A person reviewing data on a laptop screen in low light — representing agent monitoring

There is a common expectation that AI agents fail loudly — that you will see an error, get an alert, or notice something clearly broken. In practice, the failures that cause real damage are almost always quiet. The agent keeps running. It just starts doing the wrong thing, slowly, and nobody notices for a while.

I want to be specific about the types of failures we actually see, because the categories matter for how you respond.

Prompt drift

This is probably the most common failure mode for agents that have been running for more than a few months. The agent's behavior changes because something in the environment changed — the data format, the upstream system's output, the structure of the documents being passed as context — and the original prompt assumptions no longer hold.

The agent does not error. It just starts interpreting things slightly differently. If you are not logging outputs and reviewing them periodically, you will not catch this until the downstream consequences pile up.

The fix is boring: scheduled output reviews, at whatever cadence makes sense for the stakes involved. Once a week for low-stakes processes, more frequently if the agent is taking consequential actions. Some kind of structured comparison between expected behavior and actual behavior, done by a human who knows the domain.

Upstream schema changes

Someone updates HubSpot, renames a field, adds a new required property, or changes how a date is formatted. The agent breaks at the input stage. Depending on how your error handling is set up, it either fails silently, retries indefinitely, or throws an exception that nobody sees because the alert goes to a Slack channel that is mostly muted.

The pattern here is that data integrations need ownership. Someone should be responsible for knowing when upstream schemas change and updating the integration accordingly. This sounds obvious but is consistently deprioritized until it becomes urgent.

One practical approach: write explicit schema validation at the agent input boundary. Not just type checking — actual assertions about what you expect to be present. If something changes, you want a clear failure at the boundary rather than a confusing failure three steps later.

Context window issues

Your agent was designed for a certain volume of context. Usage grows, documents get longer, you add more fields to the prompt. At some point the model starts truncating, summarizing in ways you did not anticipate, or losing track of instructions that were in the middle of a long prompt.

Context overflow usually manifests as degraded quality rather than an error. The agent is still "working" in the technical sense. But the outputs get worse, or become inconsistent in ways that are hard to diagnose if you do not know to look at prompt length.

Monitor your token usage. Build in headroom. If you are regularly hitting 80% of the context limit, that is something to address before it becomes a problem.

When a model version changes

This catches teams off guard more than it should. Model providers update their models. Behavior changes in ways that are not always documented clearly. An agent that has been running reliably for six months starts producing subtly different outputs after a model update, and nobody connects the cause to the effect because the update was automatic and silent.

Pin model versions where possible. If you are using an API that automatically serves the latest model, evaluate whether that is actually what you want for a production agent with established behavior expectations.

Rate limits and cost surprises

Not a failure of the agent itself, but a failure of planning. Usage grows faster than expected. You hit rate limits at a critical moment. Or the cost of a process that seemed cheap at prototype scale turns out to be significant at production volume.

Build cost monitoring into the agent from the start, not as an afterthought. Know what a normal run costs. Set alerts when costs deviate significantly. This is infrastructure thinking, not AI thinking, but it is part of operating agents responsibly.

What maintenance actually involves

When we take on ongoing maintenance for a client's agents, the work breaks down roughly like this: most of the time nothing is actively broken, but we are running those output reviews, watching for schema drift, monitoring costs and token usage, and staying aware of model updates. A smaller portion of the time we are making adjustments — prompt refinements, input validation updates, small integration fixes. Occasionally there is a real incident that needs fast attention.

The value of that ongoing attention is not visible when everything is running smoothly. It becomes visible when something starts drifting and gets caught early rather than after three months of bad outputs.

Agents are not software you deploy and forget. They require a different maintenance model — one that is more like monitoring a live data pipeline than maintaining a static web application. Teams that treat them as the latter tend to discover this the hard way.