.env file.
Default: Claude subscription
Out of the box, ollim-bot uses your Claude subscription via Claude Code OAuth. The model you get depends on your subscription tier:| Subscription | Default model | Opus access |
|---|---|---|
| Pro | Sonnet 4.6 | Not available |
| Max | Opus 4.6 | Default |
/model slash command in Discord:
sonnet
currently maps to Sonnet 4.6, opus to Opus 4.6, haiku to Haiku 4.5.
Claude Code may fall back to Sonnet if you hit your Opus usage threshold
on a subscription plan.
Alternative subscriptions
Don’t want a Claude subscription? Several providers offer their own coding subscriptions with Anthropic Messages API-compatible endpoints. Set two environment variables and ollim-bot uses their models instead — no code changes.
All of these use the same pattern —
ANTHROPIC_BASE_URL and
ANTHROPIC_AUTH_TOKEN in your .env file:
.env (Z.AI example)
Z.AI GLM setup
Z.AI GLM setup
.env
Qwen / Alibaba Cloud setup
Qwen / Alibaba Cloud setup
.env
MiniMax setup
MiniMax setup
.env
Kimi / Moonshot AI setup
Kimi / Moonshot AI setup
.env
Pay-per-token providers
If you prefer paying for what you use instead of a flat subscription:| Provider | Input cost | Output cost | Models | Notes |
|---|---|---|---|---|
| DeepSeek | $0.27/1M | $0.42/1M | DeepSeek V3.2 | Cheapest option available |
| OpenRouter | Varies | Varies | 400+ models | Gateway with unified billing |
DeepSeek setup
DeepSeek setup
.env
OpenRouter setup
OpenRouter setup
.env
ANTHROPIC_API_KEY= prevents Claude Code from authenticating directly
with Anthropic. Only Claude models are guaranteed to work —
non-Claude models require a
translation proxy.Self-hosted models
If you want full control over your data — no tokens leaving your network — you can run models locally and point ollim-bot at them.Ollama
Ollama
Ollama runs open models locally and exposes an
Anthropic-compatible endpoint as of v0.14+:Recommended models for tool use: Qwen3-Coder (32B), GLM-4.7-Flash.
Use q8 or fp16 quantization with at least 24GB VRAM for reliable results.Limitations: tool use is still experimental — smaller models get stuck
in reasoning loops, and inference is 50–70x slower than cloud providers.
Expect tinkering. Not recommended as a primary backend for a bot that
needs to respond reliably.
.env
vLLM / other local inference
vLLM / other local inference
Any local inference server that exposes an Anthropic Messages
API-compatible endpoint works with the same
ANTHROPIC_BASE_URL
mechanism. vLLM and
llama.cpp server
both support this with the right configuration.Route through a LiteLLM proxy if your
inference server only speaks the OpenAI Chat Completions format —
LiteLLM translates to Anthropic format automatically.Model version pinning
By default, model aliases (opus, sonnet, haiku) resolve to the latest
version. Pin specific versions with these environment variables:
| Variable | Description |
|---|---|
ANTHROPIC_DEFAULT_OPUS_MODEL | Pin the opus alias |
ANTHROPIC_DEFAULT_SONNET_MODEL | Pin the sonnet alias |
ANTHROPIC_DEFAULT_HAIKU_MODEL | Pin the haiku alias |
glm-4.7, deepseek-chat, kimi-k2.5).
Claude model IDs by provider
Claude model IDs by provider
| Model | Anthropic API | Amazon Bedrock | Google Vertex AI |
|---|---|---|---|
| Opus 4.6 | claude-opus-4-6 | us.anthropic.claude-opus-4-6-v1 | claude-opus-4-6 |
| Sonnet 4.6 | claude-sonnet-4-6 | us.anthropic.claude-sonnet-4-6 | claude-sonnet-4-6 |
| Haiku 4.5 | claude-haiku-4-5-20251001 | us.anthropic.claude-haiku-4-5-20251001-v1:0 | claude-haiku-4-5@20251001 |
.env (Bedrock example)
Per-routine model override
Background routines can override the model in their YAML frontmatter:routines/quick-email-check.md
model field accepts aliases (opus, sonnet, haiku) and only
applies to background routines. See Routines
for all frontmatter fields.
Choosing a model
For most ollim-bot use, Sonnet 4.6 handles tool calling, scheduling, and conversation as well as Opus 4.6 — at 40% of the cost. Opus pulls ahead on deep reasoning and complex multi-step debugging. Haiku 4.5 is ideal for lightweight background routines where speed matters more than depth.| Model | Best for | Tool calling | Agentic coding | Deep reasoning | Speed | Cost |
|---|---|---|---|---|---|---|
| Opus 4.6 | Complex multi-step tasks, novel problem-solving | Excellent | 65.4% Terminal-Bench | 91.3% GPQA | Slowest | $$$ |
| Sonnet 4.6 | Daily conversations, routines, most agentic work | Excellent | 59.1% Terminal-Bench | 74.1% GPQA | Fast | $$ |
| Haiku 4.5 | Background routines, email triage, quick checks | Good | 41.8% Terminal-Bench | — | Fastest | $ |
Full agentic benchmark comparison
Full agentic benchmark comparison
All scores use extended/adaptive thinking unless noted. Benchmarks are
selected for relevance to agentic tool-calling bots like ollim-bot.
Key pattern: Sonnet 4.6 matches or beats Opus on practical tool
calling (tau2-bench, MCP Atlas, Finance Agent, GDPval-AA). Opus leads
on deep reasoning (GPQA, ARC-AGI-2, Humanity’s Last Exam) and
long-context retrieval — tasks that matter for complex debugging, not
typical daily bot interactions.Haiku 4.5 achieves 73.3% on SWE-bench Verified — matching Claude
Sonnet 4.5 — at one-third the cost and 4-5x the speed. It reaches ~90%
of Sonnet 4.5’s agentic coding performance per Augment’s evaluation.Sources:
Anthropic Opus 4.6,
Anthropic Sonnet 4.6,
Anthropic Haiku 4.5,
Vellum benchmarks,
Anthropic model overview.
Scores current as of February 2026.
| Benchmark | What it measures | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|---|
| SWE-bench Verified | Real-world software engineering | 80.8% | 79.6% | 73.3% |
| Terminal-Bench 2.0 | Agentic CLI coding | 65.4% | 59.1% | 41.8% |
| tau2-bench Retail | Multi-step tool calling (retail) | 91.9% | 91.7% | — |
| tau2-bench Telecom | Multi-step tool calling (telecom) | 99.3% | 97.9% | — |
| OSWorld | Agentic computer use | 72.7% | 72.5% | 22.0% |
| MCP Atlas | Scaled tool use | 59.5% | 61.3% | — |
| GDPval-AA (Elo) | Economically valuable knowledge work | 1606 | 1633 | — |
| Finance Agent | Financial tool use | 60.7% | 63.3% | — |
| ARC-AGI-2 | Novel problem-solving | 68.8% | 58.3% | — |
| GPQA Diamond | Graduate-level scientific reasoning | 91.3% | 74.1% | — |
| Humanity’s Last Exam | Hardest questions (with tools) | 53.1% | 19.1% | — |
| BrowseComp | Web search and information discovery | 84.0% | — | — |
| MRCR v2 8-needle @ 1M | Long-context retrieval accuracy | 76.0% | — | — |
Claude pricing
For most users, a Claude subscription is dramatically cheaper than API pay-as-you-go. The average Claude Code developer uses the equivalent of $130/month in API tokens — covered by a $20 Pro plan.| Plan | Cost | Default model | Opus access | Rate limits |
|---|---|---|---|---|
| Pro | $20/mo | Sonnet 4.6 | Available (with fallback) | ~45 msgs/5hr |
| Max 5x | $100/mo | Opus 4.6 | Default | 5x Pro |
| Max 20x | $200/mo | Opus 4.6 | Default | 20x Pro |
On the Pro plan, Claude Code may fall back from Opus to Sonnet when you
hit a usage threshold. The exact limit is not published. Max plans have
higher thresholds — Max 20x rarely triggers fallback.
API token pricing and breakeven analysis
API token pricing and breakeven analysis
Per million tokens (standard on-demand):
Extended thinking tokens are billed at output token rates. Long context
(>200K input) doubles the input cost and adds 50% to the output cost.Breakeven analysis (assuming 3:1 input-to-output ratio):
For context, Anthropic reports the average Claude Code developer uses
$6/day ($130/month) in API-equivalent costs, and the 90th percentile
is under $12/day (~$260/month). Pro at $20/month covers what would be
$130+ on the API — a subscription is the clear winner for regular use.API pay-as-you-go only wins at very low usage (under ~3M tokens/month on
Sonnet) or when you need guaranteed access without rate limit resets.Source: Anthropic API pricing,
Claude Code costs.
| Model | Input | Output | Cache read (90% off) | Batch (50% off) |
|---|---|---|---|---|
| Opus 4.6 | $5.00 | $25.00 | $0.50 in | 12.50 out |
| Sonnet 4.6 | $3.00 | $15.00 | $0.30 in | 7.50 out |
| Haiku 4.5 | $1.00 | $5.00 | $0.10 in | 2.50 out |
| Plan | Monthly cost | Breakeven on Sonnet | Breakeven on Opus |
|---|---|---|---|
| Pro | $20 | ~3.3M tokens/month | ~2M tokens/month |
| Max 5x | $100 | ~16.7M tokens/month | ~10M tokens/month |
| Max 20x | $200 | ~33.3M tokens/month | ~20M tokens/month |
Advanced provider options
If you need pay-as-you-go API billing, cloud provider infrastructure, or a custom LLM gateway, these options are available but require more setup.Anthropic API key
Anthropic API key
For pay-as-you-go billing instead of a subscription:This bypasses Claude Code OAuth entirely. You pay per token at
Anthropic’s API rates.
.env
Amazon Bedrock
Amazon Bedrock
Set these environment variables in your
Bedrock supports five authentication methods: AWS CLI config, environment
variable access keys, SSO profiles, Management Console credentials, and
Bedrock API keys (
.env file:| Variable | Required | Description |
|---|---|---|
CLAUDE_CODE_USE_BEDROCK | Yes | Set to 1 to enable Bedrock |
AWS_REGION | Yes | AWS region (e.g. us-east-1) — not read from .aws config |
AWS_ACCESS_KEY_ID | Conditional | AWS access key (one auth method required) |
AWS_SECRET_ACCESS_KEY | Conditional | AWS secret key |
AWS_SESSION_TOKEN | No | Session token for temporary credentials |
AWS_PROFILE | Conditional | AWS SSO profile name (alternative to access keys) |
.env
AWS_BEARER_TOKEN_BEDROCK).IAM permissions required: bedrock:InvokeModel,
bedrock:InvokeModelWithResponseStream, bedrock:ListInferenceProfiles.For full IAM policy details, credential chain options, and guardrail
configuration, see the
Claude Code Bedrock docs.Google Vertex AI
Google Vertex AI
Set these environment variables in your
Authenticate with For full GCP setup, region-specific configuration, and credential details,
see the
Claude Code Vertex AI docs.
.env file:| Variable | Required | Description |
|---|---|---|
CLAUDE_CODE_USE_VERTEX | Yes | Set to 1 to enable Vertex AI |
CLOUD_ML_REGION | Yes | GCP region (e.g. us-east5) or global |
ANTHROPIC_VERTEX_PROJECT_ID | Yes | Your GCP project ID |
GOOGLE_APPLICATION_CREDENTIALS | No | Path to service account JSON (alternative to gcloud auth) |
.env
gcloud auth application-default login or provide a
service account key via GOOGLE_APPLICATION_CREDENTIALS.IAM role required: roles/aiplatform.userModel access approval on Vertex AI can take 24–48 hours. Not all models
are available in all regions.
Custom LLM gateway
Custom LLM gateway
Point ollim-bot at any endpoint that implements the
Anthropic Messages API —
a LiteLLM proxy, vLLM, or your own gateway.
The gateway must expose
| Variable | Required | Description |
|---|---|---|
ANTHROPIC_BASE_URL | Yes | Base URL for the Messages API endpoint |
ANTHROPIC_AUTH_TOKEN | No | Static API key sent as Authorization header |
.env
/v1/messages and forward the anthropic-beta
and anthropic-version headers. For LiteLLM-specific setup (unified
endpoint, Bedrock/Vertex pass-through, dynamic key helpers), see the
Claude Code LLM gateway docs.Cross-provider pricing
Cross-provider pricing
Global endpoint pricing is identical across Anthropic API, Amazon Bedrock,
and Google Vertex AI — no markup. Regional endpoints add a 10% premium
for data residency compliance.Per million tokens (global endpoints):
Feature availability:
Choose based on your infrastructure, not pricing — the per-token cost
is the same. Bedrock and Vertex AI add value through IAM integration,
compliance frameworks, and provisioned throughput for predictable
workloads. The Anthropic API gets new features and models first.Source: Anthropic pricing,
Bedrock pricing,
Vertex AI pricing.
Pricing current as of February 2026.
| Model | Anthropic API | Bedrock | Vertex AI | Regional (+10%) |
|---|---|---|---|---|
| Opus 4.6 input | $5.00 | $5.00 | $5.00 | $5.50 |
| Opus 4.6 output | $25.00 | $25.00 | $25.00 | $27.50 |
| Sonnet 4.6 input | $3.00 | $3.00 | $3.00 | $3.30 |
| Sonnet 4.6 output | $15.00 | $15.00 | $15.00 | $16.50 |
| Haiku 4.5 input | $1.00 | $1.00 | $1.00 | $1.10 |
| Haiku 4.5 output | $5.00 | $5.00 | $5.00 | $5.50 |
| Feature | Anthropic API | Bedrock | Vertex AI |
|---|---|---|---|
| Prompt caching | Yes | Yes | Yes |
| Batch API (50% off) | Yes | Yes | Yes |
| Extended thinking | Yes | Yes | Yes |
| Fast mode (6x pricing) | Yes | Not confirmed | Not confirmed |
| 1M context (beta) | Yes | Verify | Verify |
| New model availability | First | Delayed | Delayed |
| Provisioned throughput | No | Yes | Yes |
