Reading time: 7 minutes; Soundtrack: Explorers by Muse.
Last week I overdosed on Artificial Intelligence: between the AIweek, the AIRoadshow, and the main 4eCom event, I came home with more than enough to chew on.
The first topic was sustainability. Across every conversation I joined, one conviction held firm: AI is not green and never will be. Really?
That certainty — so solid, so unexamined — gave me the kind of nerd itch that can only be scratched by going to look at the actual numbers. The data, I’ll admit, is not abundant. It’s hard to find, sometimes contradictory, and often built on assumptions that deserve scrutiny. And yet it’s enough — even for a back-of-the-envelope reckoning — and it tells a surprisingly counterintuitive story.
Most people know that training Large Language Models requires a lot of energy. Far fewer know how much energy the inference phase consumes — that is, what happens when we actually interact with the model. And fewer still know that the energy figure needs to be converted into a carbon footprint, and that this value depends entirely on the energy mix used to produce that electricity. Which makes sense, right? Whether I train and run a model on solar power or burn “wonderfully ecological” anthracite coal, the amount of electricity consumed doesn’t change — but the carbon emissions do, dramatically.
Let’s start building a picture.
Phase 1. What does training the model actually cost?
Take the current default model on OpenAI: ChatGPT-4o. No official figures exist, but researchers have done the work of estimating how much energy and CO₂ were required to train it, combining information that has leaked from OpenAI, the technical specs of the servers used, and a set of working assumptions.
The full methodology is here: https://medium.com/data-science/the-carbon-footprint-of-gpt-4-d6c676eb21ae. In short, the pessimistic estimate lands at 62,318,750 kWh and 14,994 metric tonnes of CO₂, calculated assuming the electricity was generated using the current energy mix of data centres hosted in the western United States. By contrast, had the same amount of energy been produced by data centres in eastern Canada, the footprint could have been ten times lower.
We’ll use the worst-case figure anyway, for the sake of argument. Let’s proceed.
What does sixty-two million kilowatt-hours actually look like? Let’s convert it into something more tangible: how many Italian households consume that much energy in a year?
Electricity consumption per capita in Italy is 4,928 kWh/year. ISTAT tells us that the average Italian household has roughly 2.29 people. That means the average Italian household consumes about 11,285 kWh per year — which means that sixty-two million kilowatt-hours corresponds to roughly the annual consumption of 5,500 Italian households: approximately the number of families living between Caulonia and Roccella Jonica, the small Calabrian towns on the Ionian coast where I was lucky enough to grow up.
Framed as the annual electricity draw of two small villages on the southern Calabrian coast, sixty-two million kilowatt-hours doesn’t sound quite so apocalyptic.
That said, electricity consumption is not the same as carbon footprint — it depends on how the energy was produced. Sticking with the worst case, 14,994 metric tonnes of CO₂: if we wanted to offset that entirely, and knowing that 6 trees absorb roughly one tonne of CO₂ per year, we’d need around 90,000 trees working for a full year to sequester the equivalent carbon.
Since a natural forest has a density of roughly 150 trees per hectare, those 90,000 trees would cover about 600 hectares — 6 square kilometres. A mature forest slightly smaller than the area enclosed by Milan’s historic city walls would, in a single year, absorb the (pessimistic) carbon footprint of generating the sixty-two million kilowatt-hours needed to train the model.
That’s significant — but it’s not stratospheric. And in any case, it’s a one-shot cost. Once trained, the model is ready for inference. Which brings us to the second phase.
Phase 2. What does using the model actually cost?
Once trained and deployed, the model starts answering our questions, writing our copy, generating our images, and so on. To do this, it runs inside data centres purpose-built for the task, which consume energy to operate — as does transporting data to and from our devices.
Recent estimates suggest that 14 billion searches are performed on Google every day. It’s also estimated that those same queries, if run on ChatGPT-4o, could consume somewhere between 0.3 Wh and 3 Wh per query.
Again, let’s take the pessimistic figure — 3 Wh — and do the maths. Take this article you’re reading right now: I used AI at several points during the writing process. Between brainstorming prompts, grammar checks, and image generation, I’d estimate I used about 20 prompts. At the pessimistic rate of 3 Wh per query, that’s roughly 60 Wh consumed.
That’s enough energy to keep a 10W LED bulb on for about six hours. You probably have two or three bulbs per room in your home. Not exactly a scandal. But the more striking point is this: those queries consumed no more electricity than my own body would have used to perform the same tasks.
The human body at rest burns roughly 100 Wh, and I spent about 5 hours writing this post. That means my body consumed 500 Wh during that time, while the AI queries I ran consumed just 60 Wh. Without the machine’s help, I would have done everything by hand and needed another 4–5 hours to research the article’s outline, brainstorm, proofread for grammatical errors, find and download images from an online service, and so on.
In practice, had I not used AI, I would have consumed at least 1,000 Wh to complete the article — not 560. And I would have had to run Google searches anyway: additional energy consumption on top. The only difference is that instead of drawing that energy from the electrical grid, I would have absorbed it as food — which still moves carbon through production, distribution, consumption, and waste.
In other words, it appears that AI has actually made my work less ecologically impactful, not more. But let’s push further. Let’s assume that all 14 billion daily searches currently happening on Google were instead happening on ChatGPT-4o. We already know the pessimistic inference cost is 3 Wh per query. A simple multiplication gives us:
If all 14 billion daily Google queries were run on ChatGPT-4o, the draw would be 4,200 MWh per day — or 15,330,000 MWh per year — equivalent to roughly 3.7 million metric tonnes of CO₂ annually.
First, notice the scale difference between training and inference:
- Training the model required 62 million kilowatt-hours (equivalent to 62,000 MWh, to keep the units consistent)
- Running inference for one year would require 15,330,000 MWh
Training: 62 thousand MWh. Inference at scale: 15.33 million MWh.
The real monster here is inference, not training. But as absurdly large as these numbers sound — are they actually unsustainable?
Let’s repeat the earlier exercise and calculate how many mature trees would be needed, over one year, to offset that volume of queries. If 90,000 trees were needed to absorb 62,000 MWh, then the number needed to absorb 15,330,000 MWh is: 15,330,000 / 62,000 ≈ 247 × 90,000 = 22.23 million trees. At the same density of 150 trees per hectare, that’s 1,482 square kilometres.
That sounds enormous — until you realise it’s just over the surface area of the municipality of Rome. That’s the forest you’d need to sequester the estimated global GPT query load for an entire year, calculated using the most pessimistic assumptions possible — both for computation and for energy mix.
These numbers should put the scale of the problem back into human perspective.
So what? What do we actually take from this?
First, we now have a concrete sense of how much LLMs consume during training and during use, and how those numbers compare to physical realities we can actually visualise: the size of a forest relative to a city, or the annual consumption of two small towns.
Which means we can say this: IT IS NOT NECESSARILY TRUE THAT AI WILL NEVER BE GREEN.
What it will take to get there:
-
Optimising the energy mix to minimise CO₂ emissions during both training and inference. In my view, this is a job for national energy policy.
-
Choosing carefully which tasks to hand to LLMs — prioritising prompts that genuinely reduce human hours, and avoiding those that don’t. AI is well-suited to brainstorming, surfacing signals from ticket queues or survey data, translation, summarisation, and similar tasks. At the other end of the spectrum: prompts generating cute guys posing as action figures in their original packaging are considerably less defensible.
-
Optimising prompts to use as few tokens as possible, both in the request and in the response.
-
Optimising the code that trains models and runs inference, the hardware doing the computation, the energy draw of the data centres hosting these infrastructures, and the efficiency of energy transport.
There are also a number of developments I expect to materialise in the near future, which will — for better or worse — increase cost awareness among users.
At some point, I expect someone to introduce a zero-carbon pricing tier for chatbot prompts. Right now, chat prompts are priced on a flat basis, while API token usage is already metered and billed.
Given the energy impact of inference at scale, it seems reasonable to expect that chat prompts will eventually be priced per token as well. If the LLM providers don’t get there first, some European regulator will — and we’ll probably end up with a carbon tax on AI queries.
As we move toward a global pay-per-token model, I also expect some kind of energy label to emerge — something like the efficiency ratings on washing machines and refrigerators.
“Hi Giuseppe, your query consumed 70.25 tokens. Your card will be charged €0.12.”
As costs rise, I expect that — at some point — it will make financial sense for many companies to bring some LLM inference functions in-house.
It’s already possible today to run LLM models “on the edge” (meaning locally, close to where the data is consumed in the network), inside specialised clusters that can be hosted in corporate data centres. These servers, depending on configuration, can draw several thousand watts and serve a few hundred concurrent LLM users throughout the day.
Now that we have a realistic sense of how much a single query actually consumes, it should be clear that this kind of edge deployment could easily be powered by rooftop solar panels on company premises — nothing that plenty of sharp Lombard entrepreneurs haven’t already been doing for years, and without needing AI to nudge them into it.
Closing the loop
After hours of calculations, I’ve reached the conclusion that artificial intelligence is not the energy-devouring monster that conventional wisdom would have us believe. Everything depends on how we use it, how we generate the energy to run it, and our ability to identify and capture efficiency gains across its value chain.
Now I want to hear from you:
- How much did you expect the energy draw for training and running an LLM to be?
- Have you already started thinking about this and doing your own analysis?
- Do you have a view on which AI use cases are environmentally “justifiable”?
- Have you already run the numbers to include AI in your corporate sustainability reporting — and do you want to share what you found?
Drop it in the comments. I’d like to open a real debate and reach a clearer, more pragmatic understanding of where we actually stand.