Perplexity API Pricing: Sonar Models, Request Fees and What You Actually Pay

8 min read

Amnic

Amnic

Tools

Table of Contents

No headings found on page

Perplexity bills its API two ways at once. You pay for input and output tokens like any other model provider and you pay a separate per-request fee tied to how much of the web each call searches. 

The base Sonar model runs $1 per million input and output tokens, while the flagship Sonar Pro reaches $15 per million output tokens. Miss the second layer and your bill lands at roughly double your estimate, a gap that disciplined AI in FinOps work catches before launch.

That two-layer structure is the single biggest source of pricing confusion for teams putting Sonar into production. This guide lays out every Sonar rate, the request fees by search context, a worked per-query example and the part most pricing pages skip: how to attribute and control the spend once real traffic hits the API. It also shows where Sonar lands in an LLM cost comparison against token-only providers.

What Is the Perplexity API?

The Perplexity API gives developers programmatic access to Sonar, the same search-grounded models that power Perplexity's consumer answers. Every Sonar call runs a live web search, pulls sources and returns an answer with citations attached. That built-in retrieval is the product and it is also why the pricing carries an extra fee that pure text models do not, which is why Sonar belongs in any GenAI cost management platform you already run.

Sonar fits applications that need fresh, sourced answers rather than static model knowledge. Research assistants, market-monitoring tools, customer-facing search and internal knowledge agents are the common patterns. Because retrieval is bundled, teams reach for it instead of stitching a separate search layer onto a general model. Strong FinOps practice treats that convenience as a line item worth watching, not a rounding error.

The catch is billing visibility. A Sonar call mixes token cost with a search fee that shifts based on a context setting, so two queries with identical token counts can cost different amounts. Getting clean AI cost visibility tools in place early saves a painful reconciliation later, especially once usage spreads across several product features.

Perplexity API Pricing Table

The numbers below come straight from the official pricing schedule. Token prices are per million tokens and request fees are per 1,000 requests, billed on top of tokens.

Model

Input / 1M

Output / 1M

Request fee (low / med / high)

Sonar

$1

$1

$5 / $8 / $12

Sonar Pro

$3

$15

$6 / $10 / $14

Sonar Reasoning Pro

$2

$8

$6 / $10 / $14

Sonar Deep Research

$2

$8

search at $5 / 1K

Sonar Pro adds a multi-step Pro Search mode that raises the request fee to $14, $18, or $22 per 1,000 calls across the same low, medium and high context sizes. Sonar Deep Research also bills $2 per million citation tokens and $3 per million reasoning tokens, which stack on the base token rate during long research runs. Disciplined FinOps for AI means reading these stacked rates before a feature ships, not after the invoice.

The Two-Layer Cost Model

The first layer is familiar. You pay for the tokens you send and the tokens the model returns, priced per million. This is where teams set their mental model, because every other LLM provider works the same way. On its own, this layer makes Sonar look cheap next to premium chat models.

The second layer is the request fee and it is unique to search-grounded calls. Perplexity charges per 1,000 requests based on the search context size you request: low, medium, or high. A higher context pulls more web content into the answer and costs more per call. This fee is flat per request and ignores token count entirely, so light queries carry the same search charge as heavy ones.

That decoupling is what surprises people. A high-context Sonar query at $12 per 1,000 requests adds $0.012 before a single token is counted. Across a million monthly calls, the request layer alone reaches $12,000. Treating Sonar like a token-only model is the most common budgeting mistake and it is why AI token management tools that ignore the request fee will understate your real cost.

Per-Model Breakdown

Each Sonar tier targets a different job and the cost driver to watch shifts with it. The model reference describes the lineup from lightweight grounded search up to exhaustive research.

Model

Token rate (in / out per 1M)

Best for

Cost driver to watch

Sonar

$1 / $1

high-volume factual lookups

request fee, not tokens

Sonar Pro

$3 / $15

complex queries and follow-ups

output length

Sonar Reasoning Pro

$2 / $8

multi-step reasoning tasks

reasoning inflates output

Sonar Deep Research

$2 / $8

exhaustive multi-source reports

searches fired per run

A few details decide which tier actually pays off:

  • Sonar is the right default for low-complexity traffic where you want sources without deep reasoning and the request fee drives its bill more than tokens do.

  • Sonar Pro makes output length the swing factor, so a product that returns long answers will see output tokens become its largest line item, far above the request fee.

  • Sonar Reasoning Pro runs cheaper per token than Sonar Pro, but chain-of-thought steps inflate output counts, so cost per answer often lands higher than the sticker rate suggests.

  • Sonar Deep Research stacks $2 per million citation tokens and $3 per million reasoning tokens plus $5 per 1,000 searches on the base rate, so a single report costs cents to dollars, not fractions of a cent.

  • Embeddings are separate and cheap, starting at $0.004 per million tokens for the smallest model, with no request fee attached.

What a Real Query Costs

Take a typical Sonar Pro call with 1,000 input tokens, 1,500 output tokens and medium search context. Input costs $0.003, output costs $0.0225 and the medium request fee adds $0.010. The query lands near $0.036 and the search fee is roughly 28 percent of that total. The table below runs three real query shapes end to end.

Query shape

Model

Tokens (in / out)

Context

Token cost

Request fee

Total / query

Light lookup

Sonar

500 / 500

low

$0.001

$0.005

~$0.006

Standard answer

Sonar Pro

1,000 / 1,500

medium

$0.0255

$0.010

~$0.036

Long deep dive

Sonar Pro

2,000 / 3,000

high

$0.051

$0.014

~$0.065

Scale the standard answer to 100,000 queries a month and the picture sharpens. Tokens cost about $2,550 and request fees add $1,000, for a bill near $3,550. The request layer you might have ignored is nearly a third of the spend. This is the math that any unit economics view has to include or it misranks Sonar against token-only providers.

Cost per query also swings with context size. Drop the same workload to low context and the request fee falls to $6 per 1,000, trimming the monthly bill by $400 with no change to token usage. Routing low-stakes queries to low context is one of the cleanest savings available and good AI cost tracking tools make that routing decision visible rather than guesswork.

Perplexity vs Other LLM API Pricing

Sonar is not directly comparable to a chat-only model, because the price includes retrieval that you would otherwise pay for separately. A bare token comparison flatters general models that have no search built in. The fair comparison asks what you would pay to add a search and citation layer on top of OpenAI API pricing before judging Sonar as expensive.

On raw output tokens, Sonar Pro at $15 per million sits in premium territory next to Anthropic API pricing and well above the cheaper search-grounded option in Gemini API pricing. Base Sonar at $1 both ways, however, undercuts most flagship models outright. The model you choose inside the Sonar family matters more than the provider-versus-provider headline.

Budget-sensitive teams often shortlist Sonar against lower-cost European options, where Mistral API pricing competes on token rate but does not bundle live search. The trade is real: Sonar costs more per token but removes an entire retrieval pipeline. Whether that nets out cheaper depends on how much you would have spent building and running search yourself.

How to Control Perplexity API Costs

Six levers move the bill, in rough order of impact:

  • Match the model to the job. Send routine lookups to base Sonar, reserve Sonar Pro for genuinely complex answers and gate Deep Research behind a deliberate trigger so it never runs by default.

  • Lower the search context. The request fee scales directly with context size, so default to low and step up only when an answer genuinely needs more retrieved content.

  • Cap output length. Sonar Pro output at $15 per million is where bills balloon, so a hard token ceiling on responses protects the largest line item.

  • Cache repeated queries. A cache hit removes both the token cost and the request fee, which cuts call volume in any app with query overlap.

  • Drop the free-credit assumption. The monthly API credit Perplexity once bundled with Pro is gone, so how credits work no longer offsets early experiments.

  • Attribute every call. Tag by feature, team and customer with a cloud cost allocation tool built for shared usage, then settle chargeback vs showback early so one feature cannot quietly consume the whole budget.

Why Perplexity API Spend Is Hard to Track

Sonar usage rarely lives in one place. It spreads across product features, internal tools and experiments, each firing calls at different models and context sizes. The native dashboard shows aggregate consumption, not which feature or customer caused a spike, so a sudden jump turns into a manual hunt through logs.

The two-layer billing makes this worse. A cost increase might come from more calls, longer outputs, higher context, or a shift toward Deep Research and the raw total cannot tell you which. Without a breakdown, teams cannot tell an efficiency regression from healthy growth. This is exactly the gap cloud cost anomaly detection tools close by flagging the spike and pointing at its driver.

Amnic gives FinOps and platform teams that breakdown across AI and cloud spend in one view, tagging Sonar usage by model, feature and team so the bill maps to who created it. It pairs that with cloud cost forecasting tools so a rising API line becomes a projection you can plan against rather than a month-end surprise. For teams already running multiple providers, the same approach behind OpenAI cost monitoring tools applies cleanly to Sonar.

The Bottom Line

Perplexity API pricing is straightforward once you read it as two layers: token rates that look modest and request fees that quietly add up. Base Sonar at $1 both ways is genuinely cheap for grounded search, while Sonar Pro output and high-context request fees are where production bills concentrate. Choose the model and context size deliberately, cap output and cache what repeats.

The harder work starts after the API is live, when spend fragments across features and the native dashboard stops answering who spent what. Putting attribution and anomaly detection in place from day one turns Sonar from an unpredictable line item into a managed one. A consolidated view across providers, the kind built into FinOps tools for AI cost management, keeps the bill readable as usage grows.

FAQs

How much does the Perplexity API cost?

Base Sonar costs $1 per million input and output tokens and Sonar Pro costs $3 input and $15 output per million. Every call also adds a per-request search fee of $5 to $22 per 1,000 requests depending on model and context size.

What is the Perplexity request fee?

It is a flat charge per 1,000 API requests, billed on top of tokens and set by your search context size. Low, medium and high context cost more as each pulls more web content into the answer, regardless of token count.

Is Perplexity Sonar cheaper than OpenAI or Anthropic?

Base Sonar undercuts most flagship models on token price and includes live search. Sonar Pro output is premium-priced, so the answer depends on which Sonar model you use and whether you would otherwise pay separately for retrieval.

Does the Perplexity API still include free credits?

No. The recurring monthly API credit that once came with a Perplexity Pro subscription has been discontinued, so API usage is billed from the first call with no bundled offset for development or testing.

What does a single Sonar Pro query cost?

A query with 1,000 input and 1,500 output tokens at medium context costs about $0.036, of which the request fee is roughly $0.010. At 100,000 queries a month that totals near $3,550, with request fees making up about a third.

How do I track Perplexity API spend by team?

Tag each call by feature, team, or customer and feed usage into a cost platform that attributes both token and request costs. This converts a single aggregate bill into a per-owner breakdown you can charge back or forecast.

Better visibility and management into AI Tokens?

Start with a 30 day trial

Connect leading LLMs

24 hour time to value

Stay ahead of AI Spend

Make AI spend visible, controllable, and accountable.

Gain insights into your AI token costs at a team, customer, business unit and individual user level to measure and manage AI utilization.

Can your engineering context keep up with the speed of AI?

Start with a 14-day Runtime Accountability Audit. Read-only access. No commitment.

No credit card · No migration · No agents

STAY AHEAD

Can your engineering context keep up with the speed of AI?

Start with a 14-day Runtime Accountability Audit. Read-only access. No commitment.

No credit card · No migration · No agents

STAY AHEAD

Can your engineering context keep up with the speed of AI?

Start with a 14-day Runtime Accountability Audit. Read-only access. No commitment.

No credit card · No migration · No agents

STAY AHEAD