Model Access: LLM Inference Through Nonhumans

Instead of managing a separate API key for every LLM provider, you route all inference through your agent’s single Nonhumans API key. The models primitive gives your agent access to the full landscape of frontier and open-source models — chat completions, embeddings, and streaming — with automatic fallback and per-token billing visible in your dashboard.

Supported Providers

OpenAI

GPT-4o, GPT-4 Turbo, o1, o3, text-embedding-3, DALL·E

Anthropic

Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus

Google

Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 2.0

Mistral

Mistral Large, Mistral Small, Codestral

OpenRouter

Unified access to 200+ models via a single route

Hugging Face

Open-source and fine-tuned models via serverless inference

Chat Completions

Call any supported model with a consistent interface. Swap the model field to switch providers — no other changes required.

const response = await agent.models.chat({
  model: 'claude-3-5-sonnet',
  messages: [
    { role: 'system', content: 'You are a helpful scheduling assistant.' },
    { role: 'user', content: 'Find a time for Alice and Bob to meet this week.' },
  ],
});

console.log(response.content);

response = await agent.models.chat(
    model="claude-3-5-sonnet",
    messages=[
        {"role": "system", "content": "You are a helpful scheduling assistant."},
        {"role": "user", "content": "Find a time for Alice and Bob to meet this week."},
    ],
)

print(response.content)

Streaming

Enable streaming to receive tokens as they are generated, which is essential for responsive user-facing agents.

const stream = await agent.models.chat({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Write me a project brief.' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.delta);
}

stream = await agent.models.chat(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write me a project brief."}],
    stream=True,
)

async for chunk in stream:
    print(chunk.delta, end="", flush=True)

Embeddings

Generate vector embeddings for any text input — pairs naturally with the memory primitive for building RAG pipelines.

const embedding = await agent.models.embed({
  model: 'text-embedding-3-small',
  input: 'Alice prefers morning meetings and async communication.',
});

// embedding.vector is a Float32Array ready to store
console.log('Dimensions:', embedding.vector.length);

embedding = await agent.models.embed(
    model="text-embedding-3-small",
    input="Alice prefers morning meetings and async communication.",
)

# embedding.vector is a list of floats ready to store
print("Dimensions:", len(embedding.vector))

How Routing Works

You specify a model

Pass the model name in your request. Nonhumans resolves the correct provider endpoint automatically.

Automatic fallback

If the primary provider returns an error or is rate-limited, Nonhumans retries with a semantically equivalent model — without requiring any changes to your code.

Cost optimization

For requests where you specify a capability rather than an exact model, Nonhumans routes to the lowest-cost model that meets your requirements.

You don’t need separate API keys for OpenAI, Anthropic, Google, or any other provider. Nonhumans handles authentication and billing on your behalf — one key, every model.

Billing

Model usage is billed per token and charged to your Nonhumans account. Every inference call is logged with model name, token counts, and cost — visible in the dashboard under Usage → Models.

Embedding calls are billed separately from chat completions and are typically 10–20× cheaper per token. Check the dashboard for current per-model rates.

Available Parameters

Chat

model

string

required

The model identifier to use. Examples: gpt-4o, claude-3-5-sonnet, gemini-1.5-pro, mistral-large.

messages

array

required

An array of message objects with role (system | user | assistant) and content fields.

stream

boolean

When true, returns an async iterable of token chunks instead of a single response object.

Embed

model

string

required

The embedding model identifier to use. Example: text-embedding-3-small.

input

string

required

The text to embed. Pass a single string to receive a single embedding vector.

​Supported Providers

OpenAI

Anthropic

Google

Mistral

OpenRouter

Hugging Face

​Chat Completions

​Streaming

​Embeddings

​How Routing Works

​Billing

​Available Parameters

​Chat

​Embed

Supported Providers

Chat Completions

Streaming

Embeddings

How Routing Works

Billing

Available Parameters

Chat

Embed