Skip to main content
ollim-bot authenticates through Claude Code OAuth by default — no API key, no provider config, nothing to set up. Your Claude subscription handles everything. If you want to experiment with other models or reduce costs, swap providers with a couple of environment variables in your .env file.

Default: Claude subscription

Out of the box, ollim-bot uses your Claude subscription via Claude Code OAuth. The model you get depends on your subscription tier:
SubscriptionDefault modelOpus access
ProSonnet 4.6Not available
MaxOpus 4.6Default
Switch models at runtime with the /model slash command in Discord:
/model opus
/model sonnet
/model haiku
The agent resolves aliases to the latest version automatically — sonnet currently maps to Sonnet 4.6, opus to Opus 4.6, haiku to Haiku 4.5.
Claude Code may fall back to Sonnet if you hit your Opus usage threshold on a subscription plan.

Alternative subscriptions

Don’t want a Claude subscription? Several providers offer their own coding subscriptions with Anthropic Messages API-compatible endpoints. Set two environment variables and ollim-bot uses their models instead — no code changes.
ProviderCostModelsNotes
Z.AI$3–49/moGLM-5, GLM-4.7Free tier (GLM-4.7-Flash)
Qwen$10–50/moQwen3.5 + 5 other modelsMulti-model under one subscription
MiniMax$10–150/moMiniMax M2.5100+ tokens/sec throughput
Kimi~$7/weekKimi K2.5256K context window
All of these use the same pattern — ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN in your .env file:
.env (Z.AI example)
ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
ANTHROPIC_AUTH_TOKEN=your-zai-api-key
.env
ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
ANTHROPIC_AUTH_TOKEN=your-zai-api-key
ANTHROPIC_DEFAULT_SONNET_MODEL=glm-4.7
ANTHROPIC_DEFAULT_HAIKU_MODEL=glm-4.5-air
GLM-4.7 runs at roughly 5–7x cheaper than Claude Sonnet 4.6. Z.AI offers subscription plans (Lite $3/mo, Pro $15/mo, Max ~$60/mo) with prompt-based quotas, or pay-per-token. GLM-4.7-Flash and GLM-4.5-Flash are free.
.env
ANTHROPIC_BASE_URL=https://dashscope-intl.aliyuncs.com/apps/anthropic
ANTHROPIC_AUTH_TOKEN=your-dashscope-api-key
ANTHROPIC_MODEL=qwen3.5-plus
ANTHROPIC_SMALL_FAST_MODEL=qwen3.5-coder
The standout feature — one subscription ($10–50/mo), six models. Switch between Qwen3.5-Plus, Qwen3-Coder, GLM-4.7, Kimi-K2.5, and MiniMax M2.5 without changing providers.
.env
ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
ANTHROPIC_AUTH_TOKEN=your-minimax-api-key
ANTHROPIC_MODEL=minimax-m2.5
ANTHROPIC_SMALL_FAST_MODEL=minimax-m2.5-lightning
Subscription tiers: Starter $10/mo (100 prompts/5h), Plus $20/mo (300/5h), Max $50/mo (1,000/5h). 100+ tokens per second throughput.
.env
ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic
ANTHROPIC_AUTH_TOKEN=your-moonshot-api-key
ANTHROPIC_MODEL=kimi-k2.5
ANTHROPIC_SMALL_FAST_MODEL=kimi-k2
Weekly membership at ~$7/week with 300–1,200 API calls per 5-hour window. 256K context window and 100 tokens/second output speed.

Pay-per-token providers

If you prefer paying for what you use instead of a flat subscription:
ProviderInput costOutput costModelsNotes
DeepSeek$0.27/1M$0.42/1MDeepSeek V3.2Cheapest option available
OpenRouterVariesVaries400+ modelsGateway with unified billing
.env
ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
ANTHROPIC_AUTH_TOKEN=your-deepseek-api-key
ANTHROPIC_MODEL=deepseek-chat
ANTHROPIC_SMALL_FAST_MODEL=deepseek-chat
The cheapest per-token option — roughly 10x cheaper than Claude Sonnet 4.6. No subscription required. Tool use works but image input is not supported through the Anthropic compatibility endpoint.
.env
ANTHROPIC_BASE_URL=https://openrouter.ai/api
ANTHROPIC_AUTH_TOKEN=sk-or-v1-your-key
ANTHROPIC_API_KEY=
Gateway to 400+ models with unified billing. The empty ANTHROPIC_API_KEY= prevents Claude Code from authenticating directly with Anthropic. Only Claude models are guaranteed to work — non-Claude models require a translation proxy.
Alternative models are community-supported and not tested by Anthropic. ollim-bot’s agentic loop depends on reliable tool use — test thoroughly before relying on a non-Claude model for daily operations. Provider endpoints and pricing may change without notice.

Self-hosted models

If you want full control over your data — no tokens leaving your network — you can run models locally and point ollim-bot at them.
Ollama runs open models locally and exposes an Anthropic-compatible endpoint as of v0.14+:
.env
ANTHROPIC_BASE_URL=http://localhost:11434
ANTHROPIC_AUTH_TOKEN=ollama
ANTHROPIC_MODEL=qwen3-coder
ANTHROPIC_SMALL_FAST_MODEL=qwen3-coder
Recommended models for tool use: Qwen3-Coder (32B), GLM-4.7-Flash. Use q8 or fp16 quantization with at least 24GB VRAM for reliable results.Limitations: tool use is still experimental — smaller models get stuck in reasoning loops, and inference is 50–70x slower than cloud providers. Expect tinkering. Not recommended as a primary backend for a bot that needs to respond reliably.
Any local inference server that exposes an Anthropic Messages API-compatible endpoint works with the same ANTHROPIC_BASE_URL mechanism. vLLM and llama.cpp server both support this with the right configuration.Route through a LiteLLM proxy if your inference server only speaks the OpenAI Chat Completions format — LiteLLM translates to Anthropic format automatically.
Self-hosting makes sense for data sovereignty and experimentation. For daily bot reliability, cloud providers still have the edge — tool use support in local models is improving but not yet production-grade.

Model version pinning

By default, model aliases (opus, sonnet, haiku) resolve to the latest version. Pin specific versions with these environment variables:
VariableDescription
ANTHROPIC_DEFAULT_OPUS_MODELPin the opus alias
ANTHROPIC_DEFAULT_SONNET_MODELPin the sonnet alias
ANTHROPIC_DEFAULT_HAIKU_MODELPin the haiku alias
These also work with alternative providers — set them to the provider’s model IDs (e.g. glm-4.7, deepseek-chat, kimi-k2.5).
ModelAnthropic APIAmazon BedrockGoogle Vertex AI
Opus 4.6claude-opus-4-6us.anthropic.claude-opus-4-6-v1claude-opus-4-6
Sonnet 4.6claude-sonnet-4-6us.anthropic.claude-sonnet-4-6claude-sonnet-4-6
Haiku 4.5claude-haiku-4-5-20251001us.anthropic.claude-haiku-4-5-20251001-v1:0claude-haiku-4-5@20251001
.env (Bedrock example)
ANTHROPIC_DEFAULT_OPUS_MODEL=us.anthropic.claude-opus-4-6-v1
ANTHROPIC_DEFAULT_SONNET_MODEL=us.anthropic.claude-sonnet-4-6
ANTHROPIC_DEFAULT_HAIKU_MODEL=us.anthropic.claude-haiku-4-5-20251001-v1:0
Pin all three models when using Bedrock or Vertex AI. Without pinning, aliases resolve to the latest version — which may not be available in your deployment yet.

Per-routine model override

Background routines can override the model in their YAML frontmatter:
routines/quick-email-check.md
---
id: "d1e2f3a4"
cron: "0 */3 * * *"
description: "Email check"
background: true
model: "haiku"
---
Check for new important emails. Save a summary to pending updates.
The model field accepts aliases (opus, sonnet, haiku) and only applies to background routines. See Routines for all frontmatter fields.

Choosing a model

For most ollim-bot use, Sonnet 4.6 handles tool calling, scheduling, and conversation as well as Opus 4.6 — at 40% of the cost. Opus pulls ahead on deep reasoning and complex multi-step debugging. Haiku 4.5 is ideal for lightweight background routines where speed matters more than depth.
ModelBest forTool callingAgentic codingDeep reasoningSpeedCost
Opus 4.6Complex multi-step tasks, novel problem-solvingExcellent65.4% Terminal-Bench91.3% GPQASlowest$$$
Sonnet 4.6Daily conversations, routines, most agentic workExcellent59.1% Terminal-Bench74.1% GPQAFast$$
Haiku 4.5Background routines, email triage, quick checksGood41.8% Terminal-BenchFastest$
Sonnet 4.6 is the sweet spot for ollim-bot. It matches Opus on tau2-bench tool calling (91.7% vs 91.9%), beats it on knowledge work tasks (GDPval-AA: 1633 vs 1606 Elo), and is 70% more token-efficient. It’s the default for a reason.
All scores use extended/adaptive thinking unless noted. Benchmarks are selected for relevance to agentic tool-calling bots like ollim-bot.
BenchmarkWhat it measuresOpus 4.6Sonnet 4.6Haiku 4.5
SWE-bench VerifiedReal-world software engineering80.8%79.6%73.3%
Terminal-Bench 2.0Agentic CLI coding65.4%59.1%41.8%
tau2-bench RetailMulti-step tool calling (retail)91.9%91.7%
tau2-bench TelecomMulti-step tool calling (telecom)99.3%97.9%
OSWorldAgentic computer use72.7%72.5%22.0%
MCP AtlasScaled tool use59.5%61.3%
GDPval-AA (Elo)Economically valuable knowledge work16061633
Finance AgentFinancial tool use60.7%63.3%
ARC-AGI-2Novel problem-solving68.8%58.3%
GPQA DiamondGraduate-level scientific reasoning91.3%74.1%
Humanity’s Last ExamHardest questions (with tools)53.1%19.1%
BrowseCompWeb search and information discovery84.0%
MRCR v2 8-needle @ 1MLong-context retrieval accuracy76.0%
Key pattern: Sonnet 4.6 matches or beats Opus on practical tool calling (tau2-bench, MCP Atlas, Finance Agent, GDPval-AA). Opus leads on deep reasoning (GPQA, ARC-AGI-2, Humanity’s Last Exam) and long-context retrieval — tasks that matter for complex debugging, not typical daily bot interactions.Haiku 4.5 achieves 73.3% on SWE-bench Verified — matching Claude Sonnet 4.5 — at one-third the cost and 4-5x the speed. It reaches ~90% of Sonnet 4.5’s agentic coding performance per Augment’s evaluation.Sources: Anthropic Opus 4.6, Anthropic Sonnet 4.6, Anthropic Haiku 4.5, Vellum benchmarks, Anthropic model overview. Scores current as of February 2026.

Claude pricing

For most users, a Claude subscription is dramatically cheaper than API pay-as-you-go. The average Claude Code developer uses the equivalent of $130/month in API tokens — covered by a $20 Pro plan.
PlanCostDefault modelOpus accessRate limits
Pro$20/moSonnet 4.6Available (with fallback)~45 msgs/5hr
Max 5x$100/moOpus 4.6Default5x Pro
Max 20x$200/moOpus 4.6Default20x Pro
On the Pro plan, Claude Code may fall back from Opus to Sonnet when you hit a usage threshold. The exact limit is not published. Max plans have higher thresholds — Max 20x rarely triggers fallback.
Per million tokens (standard on-demand):
ModelInputOutputCache read (90% off)Batch (50% off)
Opus 4.6$5.00$25.00$0.50 in2.50in/2.50 in / 12.50 out
Sonnet 4.6$3.00$15.00$0.30 in1.50in/1.50 in / 7.50 out
Haiku 4.5$1.00$5.00$0.10 in0.50in/0.50 in / 2.50 out
Extended thinking tokens are billed at output token rates. Long context (>200K input) doubles the input cost and adds 50% to the output cost.Breakeven analysis (assuming 3:1 input-to-output ratio):
PlanMonthly costBreakeven on SonnetBreakeven on Opus
Pro$20~3.3M tokens/month~2M tokens/month
Max 5x$100~16.7M tokens/month~10M tokens/month
Max 20x$200~33.3M tokens/month~20M tokens/month
For context, Anthropic reports the average Claude Code developer uses $6/day ($130/month) in API-equivalent costs, and the 90th percentile is under $12/day (~$260/month). Pro at $20/month covers what would be $130+ on the API — a subscription is the clear winner for regular use.API pay-as-you-go only wins at very low usage (under ~3M tokens/month on Sonnet) or when you need guaranteed access without rate limit resets.Source: Anthropic API pricing, Claude Code costs.

Advanced provider options

If you need pay-as-you-go API billing, cloud provider infrastructure, or a custom LLM gateway, these options are available but require more setup.
For pay-as-you-go billing instead of a subscription:
.env
ANTHROPIC_API_KEY=sk-ant-...
This bypasses Claude Code OAuth entirely. You pay per token at Anthropic’s API rates.
Set these environment variables in your .env file:
VariableRequiredDescription
CLAUDE_CODE_USE_BEDROCKYesSet to 1 to enable Bedrock
AWS_REGIONYesAWS region (e.g. us-east-1) — not read from .aws config
AWS_ACCESS_KEY_IDConditionalAWS access key (one auth method required)
AWS_SECRET_ACCESS_KEYConditionalAWS secret key
AWS_SESSION_TOKENNoSession token for temporary credentials
AWS_PROFILEConditionalAWS SSO profile name (alternative to access keys)
.env
CLAUDE_CODE_USE_BEDROCK=1
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
Bedrock supports five authentication methods: AWS CLI config, environment variable access keys, SSO profiles, Management Console credentials, and Bedrock API keys (AWS_BEARER_TOKEN_BEDROCK).
AWS_REGION is required and is not read from your AWS CLI configuration. Always set it explicitly.
IAM permissions required: bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, bedrock:ListInferenceProfiles.For full IAM policy details, credential chain options, and guardrail configuration, see the Claude Code Bedrock docs.
Set these environment variables in your .env file:
VariableRequiredDescription
CLAUDE_CODE_USE_VERTEXYesSet to 1 to enable Vertex AI
CLOUD_ML_REGIONYesGCP region (e.g. us-east5) or global
ANTHROPIC_VERTEX_PROJECT_IDYesYour GCP project ID
GOOGLE_APPLICATION_CREDENTIALSNoPath to service account JSON (alternative to gcloud auth)
.env
CLAUDE_CODE_USE_VERTEX=1
CLOUD_ML_REGION=us-east5
ANTHROPIC_VERTEX_PROJECT_ID=my-project-id
Authenticate with gcloud auth application-default login or provide a service account key via GOOGLE_APPLICATION_CREDENTIALS.IAM role required: roles/aiplatform.user
Model access approval on Vertex AI can take 24–48 hours. Not all models are available in all regions.
For full GCP setup, region-specific configuration, and credential details, see the Claude Code Vertex AI docs.
Point ollim-bot at any endpoint that implements the Anthropic Messages API — a LiteLLM proxy, vLLM, or your own gateway.
VariableRequiredDescription
ANTHROPIC_BASE_URLYesBase URL for the Messages API endpoint
ANTHROPIC_AUTH_TOKENNoStatic API key sent as Authorization header
.env
ANTHROPIC_BASE_URL=https://litellm-server:4000
ANTHROPIC_AUTH_TOKEN=sk-litellm-static-key
The gateway must expose /v1/messages and forward the anthropic-beta and anthropic-version headers. For LiteLLM-specific setup (unified endpoint, Bedrock/Vertex pass-through, dynamic key helpers), see the Claude Code LLM gateway docs.
Global endpoint pricing is identical across Anthropic API, Amazon Bedrock, and Google Vertex AI — no markup. Regional endpoints add a 10% premium for data residency compliance.Per million tokens (global endpoints):
ModelAnthropic APIBedrockVertex AIRegional (+10%)
Opus 4.6 input$5.00$5.00$5.00$5.50
Opus 4.6 output$25.00$25.00$25.00$27.50
Sonnet 4.6 input$3.00$3.00$3.00$3.30
Sonnet 4.6 output$15.00$15.00$15.00$16.50
Haiku 4.5 input$1.00$1.00$1.00$1.10
Haiku 4.5 output$5.00$5.00$5.00$5.50
Feature availability:
FeatureAnthropic APIBedrockVertex AI
Prompt cachingYesYesYes
Batch API (50% off)YesYesYes
Extended thinkingYesYesYes
Fast mode (6x pricing)YesNot confirmedNot confirmed
1M context (beta)YesVerifyVerify
New model availabilityFirstDelayedDelayed
Provisioned throughputNoYesYes
Choose based on your infrastructure, not pricing — the per-token cost is the same. Bedrock and Vertex AI add value through IAM integration, compliance frameworks, and provisioned throughput for predictable workloads. The Anthropic API gets new features and models first.Source: Anthropic pricing, Bedrock pricing, Vertex AI pricing. Pricing current as of February 2026.

Next steps