ai-api
Groq pricing & features
Fastest LLM inference — LPU-powered (300-1000+ tokens/sec)
LPU (Language Processing Unit) inference infrastructure. Hosts Llama, Mixtral, gpt-oss, Whisper. OpenAI-compatible API. Blazing-fast: 300-1000+ tokens/sec.
Data sourced from vendor documentation · Last updated May 2026
Pricing
| Tier | Price | Notes |
|---|---|---|
| Free Tier | Free | Generous free RPM / TPM by model. Great for dev + small apps. |
| On-Demand (paid) | Free | Pay-as-you-go per token. OpenAI-compatible API, no infrastructure to manage. |
| Developer Tier | Free | Higher rate limits for production apps. |
| Enterprise | Custom | Custom. Dedicated capacity, SLA, on-prem option. |
Limits
| Tier | Metric | Value | Notes |
|---|---|---|---|
| — | batch api discount | 50% off | Batch API |
| — | cached input discount | 50% off cached input | Input caching |
| — | function calling | supported on most models | Function calling |
| — | gpt-oss-20b input | $0.075/M tokens | gpt-oss 20B input |
| — | gpt-oss-20b output | $0.30/M tokens | gpt-oss 20B output |
| — | llama-3.1-8b-instant input | $0.05/M tokens | Llama 3.1 8B input |
| — | llama-3.1-8b-instant output | $0.08/M tokens | Llama 3.1 8B output |
| — | llama-3.3-70b input | $0.59/M tokens | Llama 3.3 70B input |
| — | llama-3.3-70b output | $0.79/M tokens | Llama 3.3 70B output |
| — | openai api compat | yes — swap base_url to https://api.groq.com/openai/v1 | OpenAI SDK compatibility |
| — | speed gptoss20b tps | 952 tokens/sec | gpt-oss 20B speed (high) |
| — | speed llama8b tps | 640 tokens/sec | Llama 3.1 8B speed |
| — | streaming | SSE streaming supported | Streaming responses |
| — | whisper-large-v3 | $0.111/hour audio | Whisper transcription |
Features
- Audio Transcription — Whisper endpoint.
- Batch API — 50% discount.
- Chat Completions (OpenAI-compat) — Standard /v1/chat/completions endpoint.
- Function Calling
- JSON Mode — Enforce JSON output format.
- Prompt Caching — 50% discount on cached input.
- Streaming — SSE streaming for chat.
Developer interfaces
| Slug | Name | Kind | Version |
|---|---|---|---|
| rest-api | Groq API (OpenAI-compat) | rest | v1 |
| sdk-python | groq-python | sdk | 1.x |
| sdk-node | groq-sdk (Node) | sdk | 0.x |
Related ai-api platforms
ai-api
Anthropic API
API for Claude — frontier models for chat, tool use, agents, and long-context reasoning
ai-api
AssemblyAI
Best-in-class speech-to-text API — Universal models, 99 languages, low-latency streaming
ai-api
Deepgram
Enterprise-grade speech-to-text + voice agents — Nova + Flux + Aura TTS
ai-api
ElevenLabs
Best-in-class AI text-to-speech + voice cloning + Conversational AI
ai-api
Google Gemini API
Gemini 2.5 Pro, Flash, Flash-Lite — multimodal + 2M context
Compare Groq with
ai-api
Groq vs Anthropic API
Side-by-side breakdown.
ai-api
Groq vs AssemblyAI
Side-by-side breakdown.
ai-api
Groq vs Deepgram
Side-by-side breakdown.
ai-api
Groq vs ElevenLabs
Side-by-side breakdown.
ai-api
Groq vs Google Gemini API
Side-by-side breakdown.
ai-api
Groq vs OpenAI API
Side-by-side breakdown.
ai-api
Groq vs Replicate
Side-by-side breakdown.
ai-api
Groq vs Together AI
Side-by-side breakdown.
ai-coding
Groq vs Aider
Side-by-side breakdown.
ai-coding
Groq vs Bolt.new
Side-by-side breakdown.
ai-coding
Groq vs Claude Code
Side-by-side breakdown.
ai-coding
Groq vs Cody
Side-by-side breakdown.
Staxly is an independent catalog of developer platforms. The link to Groq above may be an affiliate link — Staxly may earn a commission if you sign up through it, at no extra cost to you. Pricing is verified at publication time — reconfirm on the vendor site before buying.