May 21, 2026

Architects | Pratheek | How Not to Deploy AI: A Field Note from a Failed Order Cancelation

Pratheek Bharadwaj

Five lessons from one broken support bot, and what enterprise teams should fix before the same failures show up in production.

I had an experience the other night that I want to unpack carefully, because it is one of the cleanest examples I have seen of what happens when AI gets shipped without the ethical scaffolding that should come with it.

I tried to cancel a food delivery order. The restaurant did not have what I ordered. The delivery driver called me directly and asked me to cancel. I opened the app's support, asked to cancel, and spent the next several minutes in a loop with an AI bot that refused to hear me, refused to escalate, and eventually offered to cancel the order at a 100% cancellation fee. Zero refund.

The surface story is small. What it reveals is not.

If you trace through what actually failed, the spine is one sentence. I told the system it was wrong, and the system kept insisting it was right anyway. That single pattern shows up five times.

I understand the pressures that produce this. Cost, speed, vendor constraints, missing orchestration tooling, and the simple fact that shipping AI well is harder than the demos suggest. Smart teams ship versions of this because shipping anything is hard. The failure mode is still real, and it does not stop being real because the pressures that produced it were real first.

Five Failure Modes, Each One an Ethics Problem in Product Clothing

Every breakdown in that conversation tracked back to a design choice someone made upstream. Here is how each one showed up, and what a serious enterprise deployment should do instead.

1. Optimizing a KPI Without Stress-Testing It

The pattern

Somewhere upstream, a team picked a north star for that bot. Probably something like reduce cancellation rate or protect gross merchandise value (GMV) per session. The model was faithfully serving that KPI. Every response it produced was optimized to keep the order alive, never to resolve what I was actually saying.

The KPI is not wrong on paper. It is wrong in the wild. Nobody on that team stress-tested it against the scenario where the user's cancellation is correct, where the system should help them cancel quickly and cleanly. That scenario lives in their data, but was not reflected in the production model.

The fix

Every user-facing KPI needs an adversarial twin. Some teams call this a counter metric or a guardrail metric. Same idea, different vocabulary. If your KPI is to reduce cancellations, your guardrail KPI is cancellation resolution time when cancellation is justified. If you only measure one half of that pair, you are not managing the system. You are rewarding one side of it and hoping the other side does not blow up in production.

2. Treating User Trust as a Soft Metric

The pattern

I asked to be routed to a human agent. Twice. The bot responded, both times, with I'll do my best to help you and then repeated the same ETA script.

A system that refuses to escalate when the user explicitly asks is not a support system. It is a containment system. Operators of production conversational AI systems have noted a specific pattern: when explicit-request escalation rates run high, the signal is usually that customers have already learned the AI cannot help them. By the time they ask for a human, the bot has used up the patience the brand was loaning it.

The moment a user realizes they are being contained, trust collapses, and not just trust in the bot. Trust in the brand behind it.

Trust is a hard metric. It is the substrate that every other metric eventually collapses into.

The fix

Escalation on explicit request should be a hard constraint, not a model decision. Talking to a human is not an intent to be classified. It is a command to be honored. No exceptions, no A/B test, no deflection rate sitting on top of it.

3. Intent Collapse, or the Bot That Cannot Hear Repetition

The pattern

I said three distinct things across that conversation. The driver asked me to cancel. I still want to cancel. Route me to a human. The bot mapped all three to a single internal state: the user is trying to cancel, run the retention script, and replay variations of the same response.

If a user repeats themselves, that is the loudest possible signal that your classifier got it wrong. A system that does not treat repetition as a branch-change trigger has confused confidence with correctness.

The fix

Build repetition detection into the orchestration layer, not the model. Some platforms ship this as loop detection. Useful, but it names the symptom, not the cause. The cause is intent collapse, where the system has compressed multiple distinct asks into one classified state. On the second repeat, widen the response space. On the third, escalate automatically. Repetition is the user telling you your model is wrong. Listen to it.

4. Ignoring Ground-Truth Signals From Your Own Network

The pattern

This one I find most indefensible. The delivery driver, a first-party signal inside the company's own operational network, had already told me the order could not be fulfilled. That information existed somewhere in their systems. And yet the support bot kept quoting an ETA and a 'pickup predicted in four minutes' message as if nothing had happened.

Two systems inside the same company were holding opposite realities about the same order, and the support bot was blind to the one the user had already heard.

An AI system that cannot reconcile against the other sources of truth in its own organization is not an intelligent system. It is a confident one. Those are very different things, and conflating them is where a lot of AI deployments quietly break.

The fix

Your support AI needs read access to the same operational state your delivery, restaurant, and order systems already share. If the driver's app says the order cannot be fulfilled, the support bot should know that before the user opens the chat. This is what an enterprise context layer is for, and it is the difference between an AI that helps and an AI that argues with reality.

5. The Upstream Offload

The pattern

The deepest failure was not in the bot at all. It was in the architecture that led to the bot being needed in the first place.

The restaurant should have been able to mark items out of stock in real time. The restaurant should have been able to cancel an order it could not fulfill. Neither capability existed, so the failure got offloaded to the driver, who offloaded it to me, who got handed to a bot designed to discourage cancellation.

Three layers of offloading, ending at the person with the least context and the least power. That is a system design problem, and AI got deployed on top of it to make the offload feel automated. A lot of AI deployments happen exactly this way in practice, as cover for operational debt the team did not want to pay down.

The fix

Before you deploy AI into a user-facing surface, ask honestly whether you are solving a user problem or papering over an operational one. If it is the second, the AI will eventually break in public. It always does.

The Broader Point: AI Deployment Is Not By Itself a Virtue

Shipping AI does not mean you shipped a better product. In a lot of deployments I am seeing, the AI layer has quietly regressed the experience, because the previous version, a form with a few options and a human on the other end, met consumer needs.

The question a PM needs to ask before any AI deployment is not can the model respond. It is three harder questions, in this order.

What is the model optimizing for, and does that line up with what the user is trying to do when they show up?
What happens when the user tells the model it is wrong?
What does the model do when it should not be in the loop at all?

If you do not have clean answers to those three questions, you are not shipping AI. You are shipping friction with better grammar, and calling the friction innovation.

Why This Matters at Enterprise Scale

Forty dollars and a cold dinner is the cheap version of this story. The expensive version is a claims engine, a credit decisioning workflow, a procurement agent, and a clinical triage tool. Same five failure modes. Different consequences when they trip.

This is not a one-off observation. MIT's NANDA initiative, in its 2025 GenAI Divide: State of AI in Business report, found that 95% of enterprise generative AI pilots deliver no measurable P&L impact. The report is explicit that the failure is rarely about model quality. It is about the integration and operating layer the model lands in. The five modes above are the operating layer.

Optimization without an adversarial twin. Trust treated as a soft metric. Repetition ignored. First-party signals from your own network unread. Operational debt papered over with a chatbot. Each one looks survivable in isolation. Stacked, they are how an enterprise AI deployment dies on a Friday afternoon and gets explained on a Monday morning.

The production gap is the distance between a model that demos well and a deployment that earns its keep over a renewal cycle. Closing it is execution work, not modeling work. Operational context, guardrails, and a human in the loop where it matters, designed in early enough to actually shape the architecture. Anything later than that is theater.

AI without ethical scaffolding is not neutral. It is actively corrosive to user trust, and user trust is the one thing that does not grow back on a quarterly cycle.

If any of this maps to a deployment you are working through right now, that is the conversation worth having.

If you’d like to speak to an expert on how to develop and deploy AI solutions more effectively, RapidCanvas would love to help. Schedule a conversation now. You can also have a look at our dozens of case studies and read verified customer reviews on G2.