Back

Mistral API Pricing Explained: How to Estimate and Control Your Token Costs

June 8, 2026

8 min read

Amnic

AI and LLM costs

No headings found on page

A rate card looks simple until your finance team forwards the first five-figure invoice. Mistral publishes per-token prices that fit on a single page, yet teams shipping production features still see the number drift every month. The reason is rarely the posted rate.

It is everything the rate card does not show: which Mistral model your code actually called, whether OCR pages and agent tool calls bypassed your token budget entirely and whether the 50% Batch discount finance assumed was active was ever turned on.

This guide breaks down current Mistral API pricing the way a FinOps for AI practitioner would read it. Not as a list of numbers, but as a model of where your money actually goes in production.

What You Actually Pay For

Mistral bills its API by tokens, not requests, words, or seconds. A token is a chunk of text, roughly four characters or three-quarters of an English word on average, per Mistral's tokenization guide. The unit is the same one every modern LLM vendor uses, which is why AI token management is the cost lever most teams underbuild for before their first paid invoice.

Every call has two billable sides:

Input tokens: Your system prompt, conversation history, retrieved context, function definitions and the user message. Anything you send to the model is counted before the response starts streaming, per Mistral's billing docs.
Output tokens: What the model writes back. On Mistral Large 3 and Medium 3.5 this also includes tool-call arguments and any structured-output JSON the model emits.

Output is the expensive side. Across the flagship Mistral family, output costs three to five times more per token than input, per Mistral's pricing page. That single asymmetry shapes every downstream cost decision, from how you write prompts to which model you route a workload to.

The same dynamic shows up across rate cards and the gap widens as you climb tiers, which is why the same workload prices very differently when you read OpenAI's per-token rates on the same axis.

Two structural notes before reading the table. First, Mistral runs a free tier through La Plateforme with roughly a billion tokens per month at rate-limited access, intended for evaluation and prototypes, per Mistral's plans overview.

The moment a workload exits eval, billing moves to pay-as-you-go at the rates below. Second, every production rate has a 50% Batch discount available for asynchronous workloads with a 24-hour SLA, per Mistral's batch processing docs.

Current Mistral API Pricing by Model

Pricing per 1M tokens, sourced from Mistral's rate card and cross-checked against pricepertoken's Mistral index and Artificial Analysis's Mistral provider page:

Model	Tier	Input	Output	Best fit
Mistral Medium 3.5	Flagship reasoning	$1.50	$7.50	Complex agents, long context
Mistral Large 3	Generalist flagship	$0.50	$1.50	Production chat, RAG
Mistral Small 4	Workhorse	$0.10	$0.30	High-volume chat, extraction
Devstral 2	Agentic coding	$0.40	$2.00	Code agents, IDE tools
Codestral	Code completion	$0.30	$0.90	Fill-in-the-middle, autocomplete
Mistral NeMo	Open multilingual	$0.15	$0.15	Translation, regional apps
Ministral 3 14B	Edge	$0.20	$0.20	On-device, mid-tier classification
Ministral 3 8B	Edge	$0.15	$0.15	Low-latency routing
Ministral 3 3B	Edge nano	$0.10	$0.10	Mass classification, filters

Specialist endpoints sit outside this table. OCR 3 is billed by document at $2 per 1,000 pages with annotations at $3 per 1,000 pages, per Mistral's OCR 3 launch post. Voxtral text-to-speech runs $0.016 per 1,000 characters and Voxtral Mini transcription at $0.002 per minute. Mistral Embed and Mistral Moderation are both $0.10 per 1M tokens, Codestral Embed is $0.15 per 1M tokens.

Three patterns to read into the model table before you pick one.

The spread between Ministral 3B and Medium 3.5 is roughly 75× on output. Ministral 3 3B runs $0.10 output while Medium 3.5 reaches $7.50, per llm-stats's Mistral provider page. Model choice is the largest cost lever you have on Mistral, more so than on OpenAI's family.
Edge tiers are symmetric. Input and output rates match across the Ministral line, which makes them unusually friendly to chatty agent loops where output cascades into next-turn input. Most other vendors price output higher and punish loops.
Code-specific models are real models, not the same model with a flag. Codestral and Devstral 2 are separate endpoints with separate billing, per Mistral's models overview. A copilot built on Codestral for completion plus Devstral 2 for agentic edits is paying on two rate cards inside one feature.

The Cost Formula and Three Worked Examples

The math itself is trivial:

cost per call = (input_tokens / 1,000,000 × input_price)

+ (output_tokens / 1,000,000 × output_price)

What is not trivial is what the inputs to that formula actually look like in production. Three scenarios.

Example 1: Customer support chatbot on Mistral Small 4

400-token system prompt, 100-token user message, 300-token reply. 10,000 conversations per day.

Input: 500 × 10,000 = 5M tokens/day × $0.10 = $0.50/day
Output: 300 × 10,000 = 3M tokens/day × $0.30 = $0.90/day
Daily total: $1.40. Monthly: ~$42.

Same scenario on Medium 3.5 instead: roughly $690 per month. Sixteen times more for the same conversation, before any quality measurement.

Example 2: RAG knowledge assistant on Mistral Large 3

8,000 tokens of retrieved context plus a 200-token question, returning a 600-token answer. 2,000 queries per day.

Input: 8,200 × 2,000 = 16.4M tokens/day × $0.50 = $8.20/day
Output: 600 × 2,000 = 1.2M tokens/day × $1.50 = $1.80/day
Daily total: $10. Monthly: ~$300.

The input side dominates here because of context length. On a comparable workload, Anthropic Sonnet 4.6 input runs at $3 versus Large 3 at $0.50, a 6× input-side gap that compounds on long-context features.

Example 3: High-volume classification on Ministral 3B and Batch

A 300-token input classifying 500,000 records per day into 50-token labels. Run through the Batch API at 50% off.

Input: 150M tokens/day × ($0.10 × 0.5) = $7.50/day
Output: 25M tokens/day × ($0.10 × 0.5) = $1.25/day
Daily total: ~$8.75. Monthly: ~$263 for half a billion classifications.

Same workload on Mistral Large 3 at standard pricing: roughly $4,650 per month. Model choice plus Batch is an 18× swing on identical functional output.

Where Mistral Pricing Diverges from OpenAI, Anthropic and Google

Mistral is the price-competitive option in the flagship tier and the price leader in the edge tier. The comparison below uses each vendor's current generalist workhorse, per OpenAI's API pricing and the Anthropic pricing rate card.

Provider	Workhorse model	Input ($/M)	Output ($/M)
Mistral	Large 3	$0.50	$1.50
OpenAI	GPT-5.4	$2.50	$15.00
Anthropic	Sonnet 4.6	$3.00	$15.00
Google	Gemini 3.1 Pro	$2.00	$12.00

Three implications for budget owners.

Mistral wins outright on EU-residency workloads. Enterprise APIs ship with regional data processing controls and system-level SLAs as a default rather than an add-on, per Mistral's enterprise overview. The same residency story on Azure OpenAI involves contract-level negotiation, covered in our Azure OpenAI true-cost breakdown.

The Large 3 to Medium 3.5 split is wider than it looks. Medium 3.5 is positioned as the reasoning flagship, not the generalist successor, per Mistral's Medium 3 announcement. Defaulting agents to Medium 3.5 for chat work is a 5× tax on output. Routing logic that auto-selects the largest model, like the patterns covered in the OpenAI versus Bedrock versus Vertex AI comparison, will burn more on Mistral than on OpenAI.

Edge tier pricing has no equivalent. Ministral 3 at $0.10 input and $0.10 output is the only mainstream symmetric edge tier on the market and it is the workload that most often gets over-provisioned to a flagship on competitor stacks. Gemini's nano tier is the closest analog, covered in the Gemini API pricing breakdown.

Why Your Mistral Bill Outruns the Rate Card

The rate card is honest. What inflates real invoices is usually one of these.

Three billing streams, one feature: A document workflow built on Mistral often charges through three meters at once: chat tokens on Large 3, OCR pages on OCR 3 and agent tool calls at $30 per 1,000 calls for web search or code execution. The token budget tells you nothing about the other two.

The 50% Batch discount that was never enabled: Batch is opt-in. The default chat completions endpoint runs at full rate. Teams model projections against the Batch column and ship against the standard column, leaving a 2× gap on every overnight job.

Free tier roll-off: La Plateforme's free billion-token tier ends at production usage. The handoff is silent. The same API key keeps working, but the next 1M tokens bill at full Large 3 rates. The first paid invoice typically lands two to three weeks after launch, not at month-end.

Model drift on -latest aliases. Pointing your client at mistral-large-latest rather than a pinned version means a major model revision can ship under your code and a 3× output multiplier lands in your bill the same week. The same risk hits any AI workload running on a default alias.

Output cascading into input on agent loops: A five-step Devstral 2 agent emitting 1,000 tokens per step pays input on 1,000, then 2,000, then 3,000, then 4,000 tokens of accumulated context. Symmetric edge pricing on Ministral changes the calculus, flagship pricing on Medium 3.5 does not.

Ways to Cut Mistral API Costs Without Cutting Quality

Five levers, ranked by realistic impact.

Route by task, not by default: Send classification, extraction and structured-output jobs to Ministral 3 or Small 4. Reserve Large 3 for production chat and Medium 3.5 for agents that genuinely need reasoning. Most production traffic on Mistral is over-provisioned by a full tier.

Turn on Batch for anything not real-time: The Batch API gives 50% off across every chat and embedding model, with a 24-hour SLA. Overnight evaluations, embeddings backfills, content classification and even OCR jobs through OCR 3 all qualify.

Pin model versions: Use the dated model ID rather than -latest. Mistral publishes deprecation timelines on its models overview. Pinning forces an intentional upgrade decision instead of a silent one.

Cap output, not input: Set max_tokens based on what the downstream UI actually renders. A 4,000-token ceiling on a feature that displays 200 tokens is a 20× tax on every call to a flagship model.

Separate meters before you separate budgets: Tag chat, OCR, embeddings and agent tool calls into distinct cost dimensions in your observability stack so finance can see all three streams in one view, the way the patterns covered in our FinOps tools for AI cost management guide treat multi-meter workloads.

Adjacent Cost Surfaces Worth Tracking

The rate card sits at the centre of Mistral spend, but several adjacent surfaces shape what finance actually sees on the invoice when you ship the feature end-to-end.

The proxy and routing layer: Teams fronting the Mistral API with their own gateway pay separately for that path. If that gateway runs on AWS, AWS API Gateway pricing adds per-million-request and per-GB cost on top of every Mistral call.
The side-compute layer: Retrieval pipelines, embeddings post-processing and agent orchestration run somewhere. The choice between Fargate and EC2 sets the floor for that side compute and the same workload can run double on the wrong primitive.
The cross-cloud layer: Mistral's residency story drives some teams to evaluate AWS vs Azure for the surrounding stack. That decision changes egress, latency and reserved-instance economics.
The network layer: Cross-region inference calls and large multimodal payloads (OCR, audio) move data. AWS data transfer costs often outrun the token bill on document-heavy workloads.
The self-hosting layer: Mistral's open-weight models can run on your own GPUs. That moves the conversation to Kubernetes vs Docker for orchestration and Karpenter for GPU autoscaling, where utilisation, not tokens, drives cost.

Where Amnic Fits

Mistral gives you the rate card. Amnic gives you the bill before it lands.

The Amnic AI agents attribute every Mistral API dollar to a model, an endpoint, a feature, a team and a customer in near real time. They flag the silent failure modes covered above: a workload that quietly upgraded from Large 3 to Medium 3.5, a Batch-eligible job that was never routed to Batch, an OCR meter that doubled when a new document pipeline shipped. Token, page and agent-call streams unify into one ledger so engineering and finance argue about the same number.

Key Takeaways

Mistral bills per 1M input and output tokens, with output running three to five times more than input on the flagship line.
Mistral is the price leader in the edge tier and competitive in the flagship tier, with symmetric pricing on Ministral 3 that no major vendor matches today.
The largest cost lever is model choice, then Batch enablement, then prompt structure, then output capping.
Real bills outrun rate cards because of three billing streams running in parallel, free-tier roll-off and unpinned -latest aliases.
Track Mistral spend by feature and customer rather than only by model, or the rate card optimization will not survive the next product launch.

Frequently Asked Questions

What is the difference between Mistral input and output tokens?

Input tokens cover your prompt, context and tool definitions. Output tokens cover what the model returns, including tool-call arguments. Output costs three to five times more than input across the flagship line, so output is where most cost decisions land.

How much does the Mistral API cost?

Pricing runs from $0.10 input on Mistral Small 4 and Ministral 3 3B to $1.50 input on Medium 3.5. Output ranges from $0.10 on Ministral 3 to $7.50 on Medium 3.5 per million tokens. Full rates are on Mistral's pricing page.

Does Mistral have a free tier?

Yes. La Plateforme includes a free tier with roughly one billion tokens per month at rate-limited access, intended for evaluation. Production usage moves to pay-as-you-go automatically once the limits are exceeded.

How does the Mistral Batch API discount work?

Batch jobs run at 50% off the standard rate with a 24-hour completion SLA, per Mistral's batch processing docs. It applies to chat completions, embeddings and OCR. The default chat endpoint does not use Batch unless you explicitly route to it.

Is Mistral cheaper than OpenAI or Anthropic?

Mistral Large 3 at $0.50 input and $1.50 output undercuts OpenAI GPT-5.4 and Anthropic Sonnet 4.6 by 5× to 10× on output. The comparison narrows on reasoning workloads where Medium 3.5 competes against Sonnet and o3.

What does Mistral OCR cost?

OCR 3 is $2 per 1,000 pages with annotations at $3 per 1,000 pages, per Mistral's OCR 3 launch post. Batch processing applies, dropping OCR to $1 per 1,000 pages on asynchronous jobs.

How can teams reduce Mistral API costs without sacrificing quality?

Route low-complexity work to Ministral or Small 4, turn on Batch for anything not real-time, pin model versions instead of using -latest, cap output tokens and tag chat, OCR and agent meters separately so each stream is independently visible.

Better visibility and management into AI Tokens?

Start with a 30 day trial

Connect leading LLMs

24 hour time to value

Stay ahead of AI Spend

Request a Demo