28.9T weekly volume · China-US shift · DeepSeek matrix leads · Token vs dollar truth · six-step weekly tracking
If you keep bouncing between MMLU charts and production reality but want to know who is actually getting called in 2026, OpenRouter Rankings weekly token throughput is more honest than any benchmark deck. For the week ending May 24, 2026, global weekly volume hit 28.9 trillion tokens (five straight weeks of growth). Chinese models reached 9.223T and have led the US for four weeks running. The DeepSeek trio totals 5.74T at the top of the vendor chart. This article is for developers and tech leads doing model routing and cost control. You get data source notes, that week's Top 10, the token share vs dollar revenue split, the a16z inverse-benchmark finding, a six-step weekly tracking runbook, and why a monthly Mac Mini M4 rental still makes sense for long-running Agents.
OpenRouter is the largest neutral AI model API aggregator: 300+ models, 60+ providers, 8M+ users, roughly 100T tokens per month. Its public leaderboard (openrouter.ai/rankings) ranks by 7-day rolling token throughput, counting both input and output. That is developers voting with wallets—not vendor radar charts.
A year ago OpenRouter processed about 2.4T tokens per week; one week now reaches 28.9T, roughly 12x growth. Token volume is a commercial weather vane: investors track AI monetization, developers pick multi-vendor routing.
Benchmarks can be gamed: High MMLU or HumanEval scores do not mean stable XML/JSON tool calls in Agent workflows—or thirty minutes of autonomous coding without drift.
Volume reflects production trust: Developers keep paying and burning compute when a model passes stability, latency, and price in real workloads.
Weekly cadence catches spikes: DeepSeek V4-Flash jumped +66% week over week—a signal monthly snapshots smooth away.
Free models skew the chart: Zero-price models like Owl Alpha inflate experiment traffic. Read both token share and dollar revenue share.
Coding is now the top use case: OpenRouter + a16z (100T tokens of anonymous metadata) show coding share rising from 11% in early 2025 to over 50%—Top 10 models optimize for Agents and code.
It is not who is smartest—it is who gets called that drives real AI adoption. Billing numbers are more honest than any eval leaderboard.
The tables below summarize OpenRouter public data (7-day rolling weekly stats, through May 24, 2026). Cross-checked against NBD (2026-05-25), OpenRouter official rankings, and MACCOME commentary from the same period.
| Metric | Value | WoW change |
|---|---|---|
| Global weekly volume | 28.9T tokens | +7.4% (five weeks up) |
| China model weekly volume | 9.223T tokens | +19.89% |
| US model weekly volume | 4.93T tokens | +16.27% |
| China vs US rank | China leads US for four straight weeks | Global #1 region |
| Period | China model traffic share |
|---|---|
| Early 2025 | < 2% |
| Feb 2026 | First time above US |
| May 2026 | ~45%+, four weeks ahead of US |
Scope note: OpenRouter assigns regional share by model vendor. DeepSeek, Tencent, MiniMax, StepFun count toward China; Anthropic, Google, xAI count toward the US.
Ranked by weekly tokens for May 18–24, 2026. DeepSeek V4-Flash, V4-Pro, and V3.2 all land in the top nine; the series totals 5.74T (+25.9% WoW), leading vendors for two weeks over Anthropic and Google. Kimi K2.6, #6 the prior week, dropped out of the Top 10.
| Rank | Model | Vendor | Weekly tokens | WoW | Notes |
|---|---|---|---|---|---|
| 1 | DeepSeek-V4-Flash | DeepSeek | 3.43T | +66% | Agent workflows, ultra-low price |
| 2 | Tencent Hy3 Preview | Tencent | 3.07T | +16% | Still growing after promo ended |
| 3 | Claude Sonnet 4.6 | Anthropic | 1.35T | — | 1M context, enterprise coding |
| 4 | DeepSeek-V3.2 | DeepSeek | 1.31T | — | Low-cost long tail |
| 5 | Owl Alpha | OpenRouter | 1.15T | +29% | Free Agent model, 1M context |
| 6 | Gemini 3 Flash Preview | 1.06T | — | Multimodal, academic/medical | |
| 7 | DeepSeek-V4-Pro | DeepSeek | 1.00T | — | Matrix flagship (5.74T series total) |
| 8 | MiniMax M2.7 | MiniMax | 806B | — | Long-context value pick |
| 9 | Grok 4.1 Fast | xAI | 721B | — | 2M context, legal workloads |
| 10 | Step 3.5 Flash | StepFun | 673B | — | Fast, cheap batch jobs |
Token volume alone misses pricing. Anthropic shows a classic premium paradox: token share near 12% (down from 25% a year ago) while dollar revenue share stays near 46%. Enterprise users still pay premium rates for Claude, but traffic leadership moved elsewhere. Claude Opus 4.6 earns about $25M/month on a fraction of DeepSeek's token count.
| Segment | Example models | Token pattern | Revenue pattern |
|---|---|---|---|
| High value, low volume | Claude Opus series | Share declining | Complex enterprise reasoning, strong ARPU |
| Mid price, steady volume | Google Gemini Flash | Stable growth | Multimodal and academic use |
| Ultra-low price, high volume | DeepSeek / MiniMax / StepFun | Share expanding fast | Agents, coding, batch dominate |
The OpenRouter + a16z 2025 AI Usage Report adds a counter-intuitive point: benchmark scores and market share often move inversely. Developers optimize for inference cost and API stability over peak capability—matching DeepSeek and Hy3 atop the weekly chart while some benchmark champions sit outside the Top 10.
Rankings refresh weekly; routing should too. This runbook fits Claude Code, Cursor, OpenClaw, or a custom gateway—turning leaderboard insight into config changes.
Every Monday, open Rankings: Visit openrouter.ai/rankings. Log global totals, China-US split, and Top 10 moves. Screenshot for team review.
Split token vs dollar views: Check both token share and revenue share so free models (Owl Alpha) are not mistaken for production defaults.
Map models to tasks: Agent/batch → DeepSeek-V4-Flash; enterprise reasoning → Claude Opus; multimodal → Gemini Flash; watch new entrants (Hy3, Owl Alpha) as breakout signals.
Regression on a fixed prompt set: Weekly, rerun the same coding issue subset. Track tool-call failure rate against leaderboard shifts.
Update routing JSON and budget caps: Raise Flash concurrency, hard-cap Opus monthly spend; fallback chain Sonnet → V4-Flash → human queue.
Bind 7×24 host to validate routing: Routing can live anywhere; if Agents need macOS (Claude Code, OpenClaw), deploy daemons on a monthly Mac Mini rental instead of a sleeping laptop.
{
"weekly_review": "2026-05-24",
"routes": {
"agent_batch": "openrouter/deepseek/deepseek-v4-flash",
"enterprise": "openrouter/anthropic/claude-sonnet-4.6",
"complex_reasoning": "openrouter/anthropic/claude-opus-4.6",
"multimodal": "openrouter/google/gemini-3-flash-preview",
"experiment": "openrouter/owl-alpha"
},
"fallback": ["enterprise", "agent_batch"],
"monthly_cap_usd": 800
}
For internal memos or architecture reviews, these points are cross-checked against OpenRouter public data and contemporaneous press (week of May 18–24, 2026):
OpenRouter solves inference vendor switching; it does not replace process supervision, key boundaries, or Apple tooling. Teams crush API cost on Flash tiers yet lose overnight Agent runs when laptops sleep—or fight Metal/Keychain/Xcode gaps on Linux VPS hosts. Same pattern as the OpenRouter trends selection guide and renting Mac Mini for OpenClaw: models swap on token pricing; host uptime is an OpEx contract. For teams treating multi-model routing as infrastructure while running iOS CI/CD and overnight Agents, VpsMesh Mac Mini M4 cloud rental is usually steadier than a personal MacBook. Plans: Mac Mini M4 rental pricing. Setup: help center.
Weekly token volume reflects real paid production traffic—a market thermometer. Benchmarks suit peak capability comparisons; OpenRouter + a16z show they often invert vs share. Combine weekly trends with private regression on a fixed task set and monthly checks at openrouter.ai/rankings.
DeepSeek V4-Flash lists near $0.10/$0.40 per M tokens—ideal for Agent and batch at scale (3.43T that week). Claude runs 30–50x higher per token; low token share but ~46% dollar share. Pick by scenario, not hype—see the OpenRouter trends selection guide.
Not always. Pure OpenRouter API works on Linux. If your stack includes Claude Code, Xcode, or OpenClaw daemons, a Mac Mini M4 monthly rental beats a sleeping laptop. Start one month to validate weekly routing and daemons: Mac Mini M4 rental pricing, order at order page.