Watch Out for the AI Agent Hype. Building Them Well Is Hard — Ask Someone Who's Actually Done It.

Soundtrack: I Still Haven’t Found What I’m Looking For — U2

I’ve had the privilege — and the burden, sic! — of working on several agentic AI projects for months now, and the enthusiasm I see online for this technology strikes me as wildly overblown.

Given how hot the buzzword is, I’m sure everyone and their dog is trying to get an AI project on their résumé — it looks great on a CV, after all. What I’m far less convinced of is that most of the AI agents (or things claiming to be agents) being shipped these days can actually do what their authors promise, let alone operate with any genuine autonomy.

So, since I’ve built a few agents that actually work and have nothing to brag about, in this #NerdPost I’ll share what I’ve learned building my own AI agents — and why I remain deeply unimpressed by most of what I hear people say about them.

Let me be clear: I believe in this technology and its potential, fiercely. What I don’t believe is that it’s as simple, reliable, and cheap as the hype merchants want you to think.

What I do believe — based on experience — is that making an AI agent work well requires a more-than-proportional amount of human effort: to understand what the agent will do, how it will do it, within what constraints, connected to what systems, and with what actions available. Answering those questions, in turn, requires the organisation to have a clear understanding of the processes being automated, and to have already built at least the primitive functions that the automation hooks into. Without that foundation, what exactly is the agent supposed to automate?

But enough preamble. If after this TLDR you’re still hungry to find out how deep the rabbit hole goes, keep reading. Otherwise, stop here — and tomorrow you’re free to believe whatever you like, including whatever the self-appointed “AI expert” of the week is selling.

Why oh why, didn’t you choose the blue pill?

Why oh why, didn’t you choose the blue pill?

A concrete case: the customer journey of Peppino ‘u Suricillo

Let’s start by defining a customer journey.

A customer — let’s call him Peppino ‘u Suricillo — calls customer care and says he wants to replace a defective product he bought online 25 days ago, delivered three days later. The order has multiple line items: most SKUs arrived in perfect condition, but of the one problematic product, he ordered three units and only one arrived damaged. The damage: mould inside the primary packaging. He used the other two units in the days after purchase. Today he opened the third one and found the surprise.

This journey looks straightforward at first glance, but even an experienced human customer service agent can run into trouble here. And for an LLM-based bot? Let’s break it down step by step — the way a human would — and think carefully about what the AI agent needs to do and how.

Identifying the customer

This is the first thing that needs to happen, and it immediately surfaces the first problem: “Peppino ‘u Suricillo” is not a recognised user in the system. Maybe he gave a nickname, maybe there’s a typo somewhere. Either way, if the system can’t identify him, it will need to ask for at least an email address, an order number, or escalate to a human agent.

For a bot, doing this means having access to a customer database, an order database (properly structured with headers and line items), and an agent registry — all connected to the LLM engine. Those databases obviously need to be up to date, and for the system to function, it needs at least four working primitives:

find_customer_by_name
find_customer_by_email
find_customer_by_order_number
escalate_to_human_agent

You also need logic telling the system that if one lookup method fails, it should try the others — and if all fail, hand off to a human. That logic can live in a prompt, but it’s fundamentally programming logic, and it needs to be hardened against every erroneous input a customer might throw at it. I won’t drag you into the depths of input validation, but in the context of LLMs this is a real prompt injection risk — the same way traditional systems have SQL injection vulnerabilities. The logic is the same, the danger is the same, and you need to be aware of it.

In this scenario alone, you’ve already run anywhere from 1 prompt (best case) to 4 (worst case). Every prompt has a token cost, however small.

The full flow: how many APIs do you actually need?

Once the customer is identified, the agent needs to:

Retrieve the customer profile and order details (getCustomerByPhoneNumber and getOrderDetails)
Determine whether a replacement is eligible, applying business rules that vary by product and time elapsed (checkReturnEligibility)
If everything checks out: create the return request (createReturnRequest), generate the return label (generateReturnLabel), and schedule a courier pickup (schedulePickup)
Trigger shipment of the replacement product (createReplacementOrder + generateShippingLabel)
Update the order status in the relevant systems (updateOrderStatus) and notify the customer clearly (sendCustomerNotification)

That’s 8–10 distinct API calls, each with its own potential errors and edge cases. If even one of them fails — a courier doesn’t confirm the pickup, a label doesn’t generate — the entire flow stalls.

With a 95% success rate per step, a 10-step process has an overall success rate of roughly 60%. For a customer waiting on a product replacement, that’s not acceptable.

The lesson I keep learning

Over the past few months I’ve had the privilege — and the hard work — of being involved in several projects built around complex AI agents. I’m thinking of Virtual CS Beauty, the customer care agent we’re developing for the beauty industry, and VirtualCami, our experiment with an intelligent assistant capable of integrating data, protocols, and business logic.

The longer I work on these, the more convinced I become that AI agents are not magic. They only work when they’re built on a solid foundation of primitive functions — those granular, reliable APIs that allow complex flows to be orchestrated without crashing every time something deviates from the happy path.

It’s a lesson I keep relearning: you can’t just “connect GPT to a CRM” and call it intelligent customer care or a business assistant. You need serious engineering, robust tooling, and an ecosystem that can handle errors, rollbacks, and partial failures.

Enthusiasm isn’t enough. You need to do the work.

A concrete case: the customer journey of Peppino ‘u Suricillo

Identifying the customer

The full flow: how many APIs do you actually need?

The lesson I keep learning

Iscriviti alla newsletter malvagia.