Instead of managing a separate API key for every LLM provider, you route all inference through your agent’s single Nonhumans API key. The models primitive gives your agent access to the full landscape of frontier and open-source models — chat completions, embeddings, and streaming — with automatic fallback and per-token billing visible in your dashboard.Documentation Index
Fetch the complete documentation index at: https://docs.nonhumans.ai/llms.txt
Use this file to discover all available pages before exploring further.
Supported Providers
OpenAI
GPT-4o, GPT-4 Turbo, o1, o3, text-embedding-3, DALL·E
Anthropic
Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus
Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 2.0
Mistral
Mistral Large, Mistral Small, Codestral
OpenRouter
Unified access to 200+ models via a single route
Hugging Face
Open-source and fine-tuned models via serverless inference
Chat Completions
Call any supported model with a consistent interface. Swap themodel field to switch providers — no other changes required.
Streaming
Enable streaming to receive tokens as they are generated, which is essential for responsive user-facing agents.Embeddings
Generate vector embeddings for any text input — pairs naturally with the memory primitive for building RAG pipelines.How Routing Works
You specify a model
Pass the model name in your request. Nonhumans resolves the correct provider endpoint automatically.
Automatic fallback
If the primary provider returns an error or is rate-limited, Nonhumans retries with a semantically equivalent model — without requiring any changes to your code.
Billing
Model usage is billed per token and charged to your Nonhumans account. Every inference call is logged with model name, token counts, and cost — visible in the dashboard under Usage → Models.Embedding calls are billed separately from chat completions and are typically 10–20× cheaper per token. Check the dashboard for current per-model rates.
Available Parameters
Chat
The model identifier to use. Examples:
gpt-4o, claude-3-5-sonnet, gemini-1.5-pro, mistral-large.An array of message objects with
role (system | user | assistant) and content fields.When
true, returns an async iterable of token chunks instead of a single response object.Embed
The embedding model identifier to use. Example:
text-embedding-3-small.The text to embed. Pass a single string to receive a single embedding vector.