Not Just Prompts but Engrams: The Birth of Context Engineering

Reading time: 5 minutes; Soundtrack: Isolated System - Muse

Soft artificial light fell from a white ceiling. Francesco worked in a modern, understated open-plan office: grey walls, wide windows, touches of pastel green, the faint smell of lemon and ozone. On the curved monitor in front of him, his own face reflected back as he typed into the company Copilot. The code GPT had suggested looked correct — elegant, even — but it was blind to what had happened that afternoon in the legal meeting. Applied as-is, it would have invalidated the CRM’s privacy consent records. All the machine needed was access to the meeting notes to catch the mistake, but the GPT that had written the code wasn’t connected to the LLM that had summarised the key points. Copilots are powerful, but naive: they don’t understand context, Francesco thought to himself, as he corrected the code and committed the branch. Without that intervention, the bug would have gone straight to production. He saved the file, looked up at the clear sunset west of Milan, aware that he had once again made the difference the machines couldn’t. He allowed himself a sip of lukewarm coffee, squinted at the distant Monte Rosa, and got back to work.

If you’re anything like me — and given that you’re reading this, you probably are — not a day has gone by without you opening your GPT editor and trying to get an LLM to do something you would have considered impossible the week before.

On quieter days, you may have managed to carve out an hour from the relentless conference call schedule, pointed your browser at N8N, and tried to build a prompt chain or wire up an agent to take some repetitive task off your plate — with mixed results.

Over the past two years, we’ve poured our energy into crafting ever more refined prompts and leaning ever harder on LLMs. But the further we’ve gone down the prompt engineering rabbit hole, the more we’ve noticed that AI responses become progressively less coherent as task complexity increases. We know the models’ limits, and we’ve learned to work around them gracefully.

The underlying problem is that LLMs don’t carry much context beyond what we explicitly hand them — Excel files, text documents, photographs — or what they can pull in through connectors that retrieve information online. These additional inputs are known as RAG, Retrieval-Augmented Generation, and they exist to sharpen the machine’s awareness of what we’re actually after. They work. Just not always, and not for everything.

All the data we provide, plus whatever the model retrieves, flows into the LLM’s “context window”: a vector space left open by the model’s training, which the machine can use freely at inference time — that is, when we ask it questions. This space is fixed for each model: newer models have more of it, older ones less. It cannot be expanded.

When the machine can’t move beyond the information available to it — or when the context window fills up — it loses the ability to correctly interpret intent. It starts making things up. It hallucinates.

The point is that once you move past basic prompt engineering, anyone trying to use LLMs to drive real productivity gains in an organisation has to reckon with this constraint. Especially if you’re chaining multiple prompts together, or starting to exploit the agentic capabilities that have emerged in recent months on Perplexity and OpenAI.

I’ve run into context window exhaustion regularly over the past few months. A few examples:

Last month I asked OpenAI GPT to translate a PowerPoint from Italian to English while preserving the layout. The machine translated the text and completely destroyed the slide formatting. Only by switching to agent mode and specifying that the file should be processed slide by slide did it actually work. My best guess: the single prompt blew through the context window immediately, while the agentic execution broke the job into manageable chunks.

Last week I needed to verify a set of cart rules on a site I manage. I prepared an Excel matrix with the expected outcomes and asked an OpenAI agent to navigate the site, add products to the cart, and check the results against expectations. A more complex task than a translation: after roughly the twentieth check out of fifty, the machine stopped and declared it had exhausted its context window. Relaunching the script and spending a few extra credits was enough to finish the job. Not a disaster — but the problem doesn’t go away.

Someone also pointed out to me that even when the machine manages to contextualise the task itself, it often struggles to contextualise how long execution is actually taking.

Then there’s the question of context portability. What happens if your developers use Claude for coding but the Teams GPT for meeting minutes? If the two systems don’t talk to each other, how is Claude supposed to know that an integration issue surfaced during the meeting, if that information stays siloed inside Microsoft’s context model?

And further: what happens if you use OpenAI in your daily work — which stores your prompts in its context memory — and one day you decide to migrate to Perplexity? Do you lose everything? This is a data portability problem that rhymes uncomfortably with the antitrust case Microsoft lost for making it difficult for users to install other browsers or migrate bookmarks out of Internet Explorer. Should we expect future battles over the right to own your context memory and move it freely?

Finally, who owns the “context” generated by the prompts an employee runs inside a corporate GPT? I’m not talking about copyright — that’s already being regulated — but about the context itself: the vector memory, the machine’s accumulated ability to recall information relevant to a task. When an employee leaves the company, does the context memory go with them? What’s the privacy framework? Is it analogous to a corporate email inbox?

A footnote: just yesterday, while trying to work through these questions, I discovered that the problem is already well-known enough that several companies have built SaaS products to address at least the technical dimensions of it — and profit from them in the meantime — by observing the queries fed into LLMs, capturing the responses, and storing everything in external vector databases: OpenMemory, BrainAPI, Pinecone, BasicMemory.

How they handle the privacy implications, however, remains entirely unclear. And I’m starting to suspect that organisations will increasingly want to move LLM processing power and context memory storage out of the cloud and into their own datacentres, to keep tighter control over the data.

At the end of the day, LLM agents have opened up extraordinary possibilities — but the problem no longer seems to be model capability. It’s context management: how it’s used, whether it can be moved, and who it belongs to.

So I’ll open the floor, and ask you:

Who should own context memory? Should it be an asset belonging to the individual, the company, or the technology provider?
Are you worried about fragmentation across LLM platforms — and the risk of being locked into ecosystems that don’t talk to each other?
How do you think privacy law and employment law will evolve when part of an employee’s value lies in the context patrimony they’ve built up through their prompts?

Everyone is talking about compute power. But I think it’s time to start a serious conversation about context sovereignty.

In an isolated system, entropy can only increase. What do you think?

Iscriviti alla newsletter malvagia.