Free tool

Cloud vs local AI cost calculator

Enter your monthly token volume, pick a cloud model class and a local hardware shape, get a real 3 year total cost of ownership for both. Plus the breakeven point where local starts to pay for itself, and the EU AI Act sovereignty trade-offs cloud pricing alone never captures.

Defaults are May 2026 list pricing from Anthropic, OpenAI, and Google, converted from USD at roughly 0.93. Override every field for your real numbers. Not legal or procurement advice.

1. Your monthly volume

One million tokens is roughly 750,000 English words, or about 350 pages of dense text.

Input tokens, millions per month

What you send to the model (prompts, RAG context, user messages).

Output tokens, millions per month

What the model writes back. Usually 20-30% of input volume.

2. Cloud option

Pick a model class. EUR rates are May 2026 list pricing; override either field for your negotiated rate.

Model

Source: anthropic.com/pricing

Input price, EUR per million tokensOutput price, EUR per million tokens

3. Local / on-prem option

Pick a hardware shape, then tune the numbers. Runs 70B class models comfortably. Needs a server room or colo.

Hardware shape

Hardware capex, EUR

One-off purchase price.

Amortisation, months

36 months is typical for GPU servers.

Power draw, kW

Steady-state under load. Add 30% for cooling.

Electricity price, EUR per kWh

Dutch business rate is roughly 0.20.

Utilisation, % of time under load

Idle GPUs cost the same in capex but less in power.

Ops time, % of an FTE

Updates, monitoring, incident response, model swaps.

Loaded FTE cost, EUR per year

Salary + employer NL costs + benefits.

Model licence, EUR per month

Llama is free. Some Mistral / Cohere tiers are paid.

Verdict

Cloud is cheaper by EUR 1,915 per month

Cloud

EUR 349

per month

3 year total: EUR 13K

Local

EUR 2,263

per month, amortised

3 year total: EUR 81K

Local monthly breakdown

Hardware amortised over 36 monthsEUR 972
Electricity at 60% loadEUR 104
Ops at 15% of an FTEEUR 1,188
Model licenceEUR 0

At your current cloud rate of 2.79 in / 13.95 out EUR per million tokens, local breaks even at roughly 422 million tokens per month. Below that, cloud wins on cost; above that, your local stack pays for itself.

AI Readiness Check

When does each option actually win

Volume is only one axis. Latency, data residency, vendor lock-in, and Article 10 data governance all push the decision around.

Cloud wins

Low or bursty volume, fast iteration

Under 50M tokens / month sustained
No personal data leaves your tenant
You need the latest model in days, not quarters
Internal experimentation, prototypes, infrequent tasks
You have a valid GDPR Chapter V transfer mechanism

Local wins

High volume, high sensitivity, predictable workload

Over 100-200M tokens / month sustained
Special-category personal data under GDPR Article 9
High-risk Annex III system with Article 10 data governance
Sub-100ms latency requirement (private network)
Procurement bans on US cloud (defence, gov, health)

The EU AI Act sovereignty premium

If your AI system falls into Annex III high-risk, Article 10 of Regulation (EU) 2024/1689 requires that training, validation, and testing datasets are subject to data governance with explicit attention to data origin, processing operations, relevance, and bias correction. Sending personal data through a US-based cloud LLM under Standard Contractual Clauses requires a documented Transfer Impact Assessment; under the EU-US Data Privacy Framework, careful self-certification checks. Both add cost the calculator above does not show.

The deepest cost of cloud is not the per-token bill, it is the ongoing legal load. Local + EU-hosted is sometimes the cheapest answer once you add a fair DPIA and TIA cost back in.

Article 10 on EUR-Lex · GDPR Chapter V on EUR-Lex

Done-with-you

Need a private AI stack on your own infrastructure?

We build private AI infrastructure on your data, your servers, your rules. EU-hosted, GDPR-compliant, full audit trail. 12 month embedded partnership.

Build With Us

Frequently asked

When does local AI become cheaper than cloud?+

The crossover depends on volume and your hardware shape. For a dual A100 server (roughly EUR 35,000 capex, 15% ops time, 60% utilisation), local typically beats Anthropic and OpenAI list pricing somewhere between 60 and 120 million tokens per month. Below that, cloud is cheaper and the operational simplicity wins. Above that, the amortised hardware cost falls below per-token cloud billing and local pays for itself within the 36 month amortisation window.

Does data sovereignty change the cloud vs local decision?+

Yes, materially. The EU AI Act does not require local hosting by default, but Article 10 of Regulation (EU) 2024/1689 imposes strict data governance obligations on high-risk AI systems, and GDPR transfer rules (Chapter V) restrict where personal data can be processed. Several EU regulators have flagged that sending personal data to US-based cloud LLMs without a valid transfer mechanism is a GDPR violation. If you are deploying a high-risk Annex III system or processing sensitive personal data, the real comparison is not cloud vs local at the same cost, it is cloud + a robust DPIA + transfer impact assessment vs local with full data control.

What is the token volume of a typical SME deployment?+

A 50-person team running an internal AI assistant for 20-30 daily active users typically burns 30 to 80 million tokens per month, weighted heavily toward input (RAG context, prompt scaffolding). A 500-person team running a customer-facing chatbot at moderate volume sits around 200 to 500 million tokens per month. Heavy-use cases (transcription, embeddings, autonomous agents, multi-step pipelines) can hit 1 billion tokens per month or more.

What does ops time on a local AI stack actually involve?+

Patching the OS, updating CUDA / drivers, monitoring GPU temperature and memory pressure, model version upgrades, prompt cache invalidation, observability and incident response, periodic re-quantisation when newer model weights drop, and capacity planning. A well-run dual-GPU setup needs roughly 10 to 20% of an SRE FTE in steady state. A multi-node cluster needs more. The calculator defaults to 15%.

Is hardware capex really the dominant cost?+

It depends on amortisation. At 36 months on a EUR 35,000 dual A100, hardware adds roughly EUR 970 per month to your TCO. Electricity at NL business rates (EUR 0.20 per kWh, 1.2 kW draw, 60% utilisation) adds about EUR 100. Ops at 15% of a EUR 95,000 FTE adds EUR 1,200. So ops time, not hardware, is usually the biggest single line. The further you can automate the stack, the more local pays off.

How accurate are the default cloud prices?+

The defaults are May 2026 list pricing on the Anthropic, OpenAI, and Google websites, converted to EUR at roughly USD * 0.93. Negotiated enterprise rates and committed-spend discounts can be 30 to 60% lower; batch and prompt-caching can knock another 30 to 90% off input cost. Override the cloud price fields with your actual negotiated rate for a real comparison.

Does this calculator give legal advice?+

No. The calculator outputs a financial TCO comparison. The EU AI Act, GDPR transfer rules, sector-specific obligations (NIS2, DORA, EHDS), and your company's own data classification policy all affect the legally defensible answer. For a binding view on whether cloud or local is the correct architecture for a specific use case, consult a qualified EU technology lawyer.

Deeper dive

The full AI infrastructure cost guide

When to host locally, when to stay in the cloud, how EU-based providers compare, and how Article 10 data governance changes procurement.

Read the article

Cloud vs local AI cost calculator

Defaults are May 2026 list pricing from Anthropic, OpenAI, and Google, converted from USD at roughly 0.93. Override every field for your real numbers. Not legal or procurement advice.

1. Your monthly volume

One million tokens is roughly 750,000 English words, or about 350 pages of dense text.

Input tokens, millions per month

What you send to the model (prompts, RAG context, user messages).

Output tokens, millions per month

What the model writes back. Usually 20-30% of input volume.

2. Cloud option

Pick a model class. EUR rates are May 2026 list pricing; override either field for your negotiated rate.

Model

Source: anthropic.com/pricing

Input price, EUR per million tokensOutput price, EUR per million tokens

3. Local / on-prem option

Pick a hardware shape, then tune the numbers. Runs 70B class models comfortably. Needs a server room or colo.

Hardware shape

Hardware capex, EUR

One-off purchase price.

Amortisation, months

36 months is typical for GPU servers.

Power draw, kW

Steady-state under load. Add 30% for cooling.

Electricity price, EUR per kWh

Dutch business rate is roughly 0.20.

Utilisation, % of time under load

Idle GPUs cost the same in capex but less in power.

Ops time, % of an FTE

Updates, monitoring, incident response, model swaps.

Loaded FTE cost, EUR per year

Salary + employer NL costs + benefits.

Model licence, EUR per month

Llama is free. Some Mistral / Cohere tiers are paid.

Verdict

Cloud is cheaper by EUR 1,915 per month

Cloud

EUR 349

per month

3 year total: EUR 13K

Local

EUR 2,263

per month, amortised

3 year total: EUR 81K

Local monthly breakdown

Hardware amortised over 36 monthsEUR 972
Electricity at 60% loadEUR 104
Ops at 15% of an FTEEUR 1,188
Model licenceEUR 0

AI Readiness Check

When does each option actually win

Volume is only one axis. Latency, data residency, vendor lock-in, and Article 10 data governance all push the decision around.

Cloud wins

Low or bursty volume, fast iteration

Under 50M tokens / month sustained
No personal data leaves your tenant
You need the latest model in days, not quarters
Internal experimentation, prototypes, infrequent tasks
You have a valid GDPR Chapter V transfer mechanism

Local wins

High volume, high sensitivity, predictable workload

Over 100-200M tokens / month sustained
Special-category personal data under GDPR Article 9
High-risk Annex III system with Article 10 data governance
Sub-100ms latency requirement (private network)
Procurement bans on US cloud (defence, gov, health)

The EU AI Act sovereignty premium

The deepest cost of cloud is not the per-token bill, it is the ongoing legal load. Local + EU-hosted is sometimes the cheapest answer once you add a fair DPIA and TIA cost back in.

Article 10 on EUR-Lex · GDPR Chapter V on EUR-Lex

Done-with-you

Need a private AI stack on your own infrastructure?

We build private AI infrastructure on your data, your servers, your rules. EU-hosted, GDPR-compliant, full audit trail. 12 month embedded partnership.

Build With Us

Frequently asked

When does local AI become cheaper than cloud?+

Does data sovereignty change the cloud vs local decision?+

What is the token volume of a typical SME deployment?+

What does ops time on a local AI stack actually involve?+

Is hardware capex really the dominant cost?+

How accurate are the default cloud prices?+

Does this calculator give legal advice?+

Deeper dive

The full AI infrastructure cost guide

When to host locally, when to stay in the cloud, how EU-based providers compare, and how Article 10 data governance changes procurement.

Read the article

Search

Cloud vs local AI cost calculator

1. Your monthly volume

2. Cloud option

3. Local / on-prem option

Verdict

Local monthly breakdown

When does each option actually win

Low or bursty volume, fast iteration

High volume, high sensitivity, predictable workload

The EU AI Act sovereignty premium

Need a private AI stack on your own infrastructure?

Frequently asked

The full AI infrastructure cost guide

Cloud vs local AI cost calculator

1. Your monthly volume

2. Cloud option

3. Local / on-prem option

Verdict

Local monthly breakdown

When does each option actually win

Low or bursty volume, fast iteration

High volume, high sensitivity, predictable workload

The EU AI Act sovereignty premium

Need a private AI stack on your own infrastructure?

Frequently asked

The full AI infrastructure cost guide