Model catalog
Open-source and free-tier models across eight modalities — all behind one API.
Text Generation
Chat and completion with open-source LLMs. OpenAI-compatible.
Gemini 2.5 Flash
Google's fast multimodal model with a large context window.
Llama 3.1 8B Instant
Fast, capable general-purpose LLM. Great default for most tasks.
Llama 3.3 70B Versatile
High-quality reasoning and generation for complex tasks.
GPT-OSS 20B
Open-weight model with strong instruction following.
Qwen3 32B
Multilingual model with solid coding and math abilities.
Image to Text
Vision understanding, captioning, and OCR from images.
Gemini 2.5 Flash (Vision)
Image understanding, captioning, and OCR.
Llama 4 Scout (Vision)
Open multimodal model for visual question answering.
Text to Image
Generate images from prompts with FLUX and SDXL.
FLUX.1 [schnell]
Ultra-fast, high-quality text-to-image generation.
FLUX.1 [dev]
Highest-detail FLUX model for photorealistic images.
Stable Diffusion XL
Versatile open image model with broad style support.
Image to Image
Edit and transform images with a guiding prompt.
FLUX.1 Kontext [dev]
Prompt-guided image editing and transformation.
Text to Video
Create short video clips from text prompts.
LTX Video
Generate short video clips from a text prompt.
Text to Speech
Natural-sounding speech synthesis from text.
PlayAI TTS
Natural English speech synthesis.
Kokoro 82M
Lightweight open-source TTS via Hugging Face.
Speech to Text
Fast, multilingual transcription with Whisper.
Whisper Large v3 Turbo
Fast multilingual transcription (216x real-time).
Whisper Large v3
State-of-the-art accuracy for transcription & translation.
Embeddings
Vector embeddings for search and RAG (entity-to-entity).
BGE Base EN v1.5
Compact, high-quality English text embeddings.
Multilingual E5 Large
Multilingual embeddings for cross-language search.