ai-api

Replicate pricing & features

Run and fine-tune AI models in the cloud — pay-per-second GPU

Run 1000s of open-source AI models (FLUX, Stable Diffusion, LLMs) via API. Per-second GPU billing. Cog framework for packaging your own models. Deploy + fine-tune.

Data sourced from vendor documentation · Last updated June 2026

Replicate website ↗Docs ↗

Pricing

Tier	Price	Notes
Pay-as-you-go	Free	Per-second GPU billing. No minimum. Public models billed by processing time or tokens.
Enterprise	Custom	Custom. Dedicated capacity, private deployments, SOC2, HIPAA on request.

Limits

Tier	Metric	Value	Notes
—	cpu small	$0.000025/sec (1 vCPU, 2GB)	CPU small
—	cpu standard	$0.000100/sec (4 vCPU, 8GB)	CPU standard
—	fast boot fine tunes	Only active processing time billed	Fine-tune billing
—	gpu a100 80gb	$0.001400/sec (~$5.04/hr)	Nvidia A100 80GB
—	gpu h100 80gb	$0.001525/sec (~$5.49/hr)	Nvidia H100 80GB
—	gpu l40s 48gb	$0.000975/sec (~$3.51/hr)	Nvidia L40S
—	gpu t4	$0.000225/sec (~$0.81/hr)	Nvidia T4
—	model claude sonnet	$3/M input + $15/M output tokens (Claude 3.7 Sonnet)	Token-billed example
—	model flux pro	$0.04 per output image (FLUX 1.1 Pro)	Image model example
—	private model billing	Dedicated hardware billed for setup + idle + active time	Private model billing

Features

10k+ Models — Public catalog of image, video, audio, LLM, embedding, speech models. · docs
Batch Predictions — Parallel batch execution.
Cog — OSS tool to containerize ML models. Standard for Replicate. · docs
Deployments — Private model endpoints with dedicated GPUs.
File Storage — Temporary output file hosting.
Fine-Tuning — Fine-tune FLUX, SDXL, Llama 2/3 with your data.
Per-Second Billing — Pay only while model runs. No idle cost for public models.
Playground — Interactive UI for every public model.
Predictions API — Async + sync + streaming predictions.
Streaming Outputs — SSE streaming for LLMs + audio.
Webhooks — Notify when predictions complete.

Developer interfaces

Slug	Name	Kind	Version
cog	Cog (package models)	cli	0.x
mcp	Replicate MCP	mcp	—
rest-api	Replicate REST API	rest	v1
webhooks	Webhooks	other	—
sdk-node	replicate (Node)	sdk	1.x
sdk-go	replicate-go	sdk	1.x
sdk-python	replicate-python	sdk	1.x

Related ai-api platforms

ai-api

Anthropic API

API for Claude — frontier models for chat, tool use, agents, and long-context reasoning

ai-api

AssemblyAI

Best-in-class speech-to-text API — Universal models, 99 languages, low-latency streaming

ai-api

Deepgram

Enterprise-grade speech-to-text + voice agents — Nova + Flux + Aura TTS

ai-api

ElevenLabs

Best-in-class AI text-to-speech + voice cloning + Conversational AI

ai-api

Google Gemini API

Gemini 2.5 Pro, Flash, Flash-Lite — multimodal + 2M context

Compare Replicate with

ai-api

Replicate vs Anthropic API

Side-by-side breakdown.

ai-api

Replicate vs AssemblyAI

Side-by-side breakdown.

ai-api

Replicate vs Deepgram

Side-by-side breakdown.

ai-api

Replicate vs ElevenLabs

Side-by-side breakdown.

ai-api

Replicate vs Google Gemini API

Side-by-side breakdown.

ai-api

Replicate vs Groq

Side-by-side breakdown.

ai-api

Replicate vs OpenAI API

Side-by-side breakdown.

ai-api

Replicate vs Together AI

Side-by-side breakdown.

ai-coding

Replicate vs Aider

Side-by-side breakdown.

ai-coding

Replicate vs Bolt.new

Side-by-side breakdown.

ai-coding

Replicate vs Claude Code

Side-by-side breakdown.

ai-coding

Replicate vs Cody

Side-by-side breakdown.

Staxly is an independent catalog of developer platforms. The link to Replicate above may be an affiliate link — Staxly may earn a commission if you sign up through it, at no extra cost to you. Pricing is verified at publication time — reconfirm on the vendor site before buying.