ai-api
Groq
Fastest LLM inference — LPU-powered (300-1000+ tokens/sec)
LPU (Language Processing Unit) inference infrastructure. Hosts Llama, Mixtral, gpt-oss, Whisper. OpenAI-compatible API. Blazing-fast: 300-1000+ tokens/sec.
Pricing
| Tier | Price | Notes |
|---|---|---|
| Free Tier | Free | Generous free RPM / TPM by model. Great for dev + small apps. |
| On-Demand (paid) | Free | Pay-as-you-go per token. OpenAI-compatible API, no infrastructure to manage. |
| Developer Tier | Free | Higher rate limits for production apps. |
| Enterprise | Custom | Custom. Dedicated capacity, SLA, on-prem option. |
Limits
| Tier | Metric | Value | Notes |
|---|---|---|---|
| — | batch api discount | 50% off | Batch API |
| — | cached input discount | 50% off cached input | Input caching |
| — | function calling | supported on most models | Function calling |
| — | gpt-oss-20b input | $0.075/M tokens | gpt-oss 20B input |
| — | gpt-oss-20b output | $0.30/M tokens | gpt-oss 20B output |
| — | llama-3.1-8b-instant input | $0.05/M tokens | Llama 3.1 8B input |
| — | llama-3.1-8b-instant output | $0.08/M tokens | Llama 3.1 8B output |
| — | llama-3.3-70b input | $0.59/M tokens | Llama 3.3 70B input |
| — | llama-3.3-70b output | $0.79/M tokens | Llama 3.3 70B output |
| — | openai api compat | yes — swap base_url to https://api.groq.com/openai/v1 | OpenAI SDK compatibility |
| — | speed gptoss20b tps | 952 tokens/sec | gpt-oss 20B speed (high) |
| — | speed llama8b tps | 640 tokens/sec | Llama 3.1 8B speed |
| — | streaming | SSE streaming supported | Streaming responses |
| — | whisper-large-v3 | $0.111/hour audio | Whisper transcription |
Features
- Audio Transcription — Whisper endpoint.
- Batch API — 50% discount.
- Chat Completions (OpenAI-compat) — Standard /v1/chat/completions endpoint.
- Function Calling
- JSON Mode — Enforce JSON output format.
- Prompt Caching — 50% discount on cached input.
- Streaming — SSE streaming for chat.
Developer interfaces
| Slug | Name | Kind | Version |
|---|---|---|---|
| rest-api | Groq API (OpenAI-compat) | rest | v1 |
| sdk-python | groq-python | sdk | 1.x |
| sdk-node | groq-sdk (Node) | sdk | 0.x |
Compare Groq with
ai-api
Groq vs Anthropic API
Side-by-side breakdown.
ai-api
Groq vs AssemblyAI
Side-by-side breakdown.
ai-api
Groq vs Deepgram
Side-by-side breakdown.
ai-api
Groq vs ElevenLabs
Side-by-side breakdown.
ai-api
Groq vs Google Gemini API
Side-by-side breakdown.
ai-api
Groq vs OpenAI API
Side-by-side breakdown.
ai-api
Groq vs Replicate
Side-by-side breakdown.
ai-api
Groq vs Together AI
Side-by-side breakdown.
Staxly is an independent catalog of developer platforms. Outbound links to Groq are plain references to their official pages. Pricing is verified at publication time — reconfirm on the vendor site before buying.