ai-api

Groq pricing & features

Fastest LLM inference — LPU-powered (300-1000+ tokens/sec)

LPU (Language Processing Unit) inference infrastructure. Hosts Llama, Mixtral, gpt-oss, Whisper. OpenAI-compatible API. Blazing-fast: 300-1000+ tokens/sec.

Data sourced from vendor documentation · Last updated May 2026

Groq website ↗Docs ↗

Pricing

Tier	Price	Notes
Free Tier	Free	Generous free RPM / TPM by model. Great for dev + small apps.
On-Demand (paid)	Free	Pay-as-you-go per token. OpenAI-compatible API, no infrastructure to manage.
Developer Tier	Free	Higher rate limits for production apps.
Enterprise	Custom	Custom. Dedicated capacity, SLA, on-prem option.

Limits

Tier	Metric	Value	Notes
—	batch api discount	50% off	Batch API
—	cached input discount	50% off cached input	Input caching
—	function calling	supported on most models	Function calling
—	gpt-oss-20b input	$0.075/M tokens	gpt-oss 20B input
—	gpt-oss-20b output	$0.30/M tokens	gpt-oss 20B output
—	llama-3.1-8b-instant input	$0.05/M tokens	Llama 3.1 8B input
—	llama-3.1-8b-instant output	$0.08/M tokens	Llama 3.1 8B output
—	llama-3.3-70b input	$0.59/M tokens	Llama 3.3 70B input
—	llama-3.3-70b output	$0.79/M tokens	Llama 3.3 70B output
—	openai api compat	yes — swap base_url to https://api.groq.com/openai/v1	OpenAI SDK compatibility
—	speed gptoss20b tps	952 tokens/sec	gpt-oss 20B speed (high)
—	speed llama8b tps	640 tokens/sec	Llama 3.1 8B speed
—	streaming	SSE streaming supported	Streaming responses
—	whisper-large-v3	$0.111/hour audio	Whisper transcription

Features

Audio Transcription — Whisper endpoint.
Batch API — 50% discount.
Chat Completions (OpenAI-compat) — Standard /v1/chat/completions endpoint.
Function Calling
JSON Mode — Enforce JSON output format.
Prompt Caching — 50% discount on cached input.
Streaming — SSE streaming for chat.

Developer interfaces

Slug	Name	Kind	Version
rest-api	Groq API (OpenAI-compat)	rest	v1
sdk-python	groq-python	sdk	1.x
sdk-node	groq-sdk (Node)	sdk	0.x

Related ai-api platforms

ai-api

Anthropic API

API for Claude — frontier models for chat, tool use, agents, and long-context reasoning

ai-api

AssemblyAI

Best-in-class speech-to-text API — Universal models, 99 languages, low-latency streaming

ai-api

Deepgram

Enterprise-grade speech-to-text + voice agents — Nova + Flux + Aura TTS

ai-api

ElevenLabs

Best-in-class AI text-to-speech + voice cloning + Conversational AI

ai-api

Google Gemini API

Gemini 2.5 Pro, Flash, Flash-Lite — multimodal + 2M context

Compare Groq with

ai-api

Groq vs Anthropic API

Side-by-side breakdown.

ai-api

Groq vs AssemblyAI

Side-by-side breakdown.

ai-api

Groq vs Deepgram

Side-by-side breakdown.

ai-api

Groq vs ElevenLabs

Side-by-side breakdown.

ai-api

Groq vs Google Gemini API

Side-by-side breakdown.

ai-api

Groq vs OpenAI API

Side-by-side breakdown.

ai-api

Groq vs Replicate

Side-by-side breakdown.

ai-api

Groq vs Together AI

Side-by-side breakdown.

ai-coding

Groq vs Aider

Side-by-side breakdown.

ai-coding

Groq vs Bolt.new

Side-by-side breakdown.

ai-coding

Groq vs Claude Code

Side-by-side breakdown.

ai-coding

Groq vs Cody

Side-by-side breakdown.

Staxly is an independent catalog of developer platforms. The link to Groq above may be an affiliate link — Staxly may earn a commission if you sign up through it, at no extra cost to you. Pricing is verified at publication time — reconfirm on the vendor site before buying.