Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nonhumans.ai/llms.txt

Use this file to discover all available pages before exploring further.

Instead of managing a separate API key for every LLM provider, you route all inference through your agent’s single Nonhumans API key. The models primitive gives your agent access to the full landscape of frontier and open-source models — chat completions, embeddings, and streaming — with automatic fallback and per-token billing visible in your dashboard.

Supported Providers

OpenAI

GPT-4o, GPT-4 Turbo, o1, o3, text-embedding-3, DALL·E

Anthropic

Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus

Google

Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 2.0

Mistral

Mistral Large, Mistral Small, Codestral

OpenRouter

Unified access to 200+ models via a single route

Hugging Face

Open-source and fine-tuned models via serverless inference

Chat Completions

Call any supported model with a consistent interface. Swap the model field to switch providers — no other changes required.
const response = await agent.models.chat({
  model: 'claude-3-5-sonnet',
  messages: [
    { role: 'system', content: 'You are a helpful scheduling assistant.' },
    { role: 'user', content: 'Find a time for Alice and Bob to meet this week.' },
  ],
});

console.log(response.content);

Streaming

Enable streaming to receive tokens as they are generated, which is essential for responsive user-facing agents.
const stream = await agent.models.chat({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Write me a project brief.' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.delta);
}

Embeddings

Generate vector embeddings for any text input — pairs naturally with the memory primitive for building RAG pipelines.
const embedding = await agent.models.embed({
  model: 'text-embedding-3-small',
  input: 'Alice prefers morning meetings and async communication.',
});

// embedding.vector is a Float32Array ready to store
console.log('Dimensions:', embedding.vector.length);

How Routing Works

1

You specify a model

Pass the model name in your request. Nonhumans resolves the correct provider endpoint automatically.
2

Automatic fallback

If the primary provider returns an error or is rate-limited, Nonhumans retries with a semantically equivalent model — without requiring any changes to your code.
3

Cost optimization

For requests where you specify a capability rather than an exact model, Nonhumans routes to the lowest-cost model that meets your requirements.
You don’t need separate API keys for OpenAI, Anthropic, Google, or any other provider. Nonhumans handles authentication and billing on your behalf — one key, every model.

Billing

Model usage is billed per token and charged to your Nonhumans account. Every inference call is logged with model name, token counts, and cost — visible in the dashboard under Usage → Models.
Embedding calls are billed separately from chat completions and are typically 10–20× cheaper per token. Check the dashboard for current per-model rates.

Available Parameters

Chat

model
string
required
The model identifier to use. Examples: gpt-4o, claude-3-5-sonnet, gemini-1.5-pro, mistral-large.
messages
array
required
An array of message objects with role (system | user | assistant) and content fields.
stream
boolean
When true, returns an async iterable of token chunks instead of a single response object.

Embed

model
string
required
The embedding model identifier to use. Example: text-embedding-3-small.
input
string
required
The text to embed. Pass a single string to receive a single embedding vector.