Key Takeaways:
- DeepSeek V4 official version launches mid-July 2026 with peak-valley API pricing
- Peak hours (9 a.m.-12 p.m. and 2 p.m.-6 p.m.) will cost 2x the normal rate
- Move follows DSpark release boosting inference speed by as much as 85%
Key Takeaways:

DeepSeek will launch its V4 official version in mid-July with peak-hour API pricing set at double the normal rate.
DeepSeek's V4 official version, arriving mid-July, will charge customers double during peak hours — a pricing strategy that builds on the Chinese lab's existing cost advantage over Anthropic and OpenAI.
"Peak-valley pricing lets us allocate compute capacity more efficiently during high-demand windows," a DeepSeek spokesperson said, without disclosing the base per-token rate for the official release.
Peak hours run 9 a.m. to 12 p.m. and 2 p.m. to 6 p.m. daily, with rates at 2x the off-peak price. The V4-Pro model, which activates 49 billion of its 1.6 trillion total parameters per forward pass, already costs roughly 1.5% of Anthropic's Claude Fable 5 for equivalent tasks, according to Deutsche Bank analyst Jim Reid.
The pricing move follows DeepSeek's June 27 release of DSpark, a speculative decoding framework that boosts V4-Flash generation speed by as much as 85% without hardware upgrades. For enterprises running V4 at scale, faster inference plus tiered pricing could reduce per-token costs further — pressuring margins at US model providers ahead of their expected IPOs.
The official version launch caps a period of rapid iteration. DeepSeek's DSpark framework improved per-user generation speed by 60% to 85% on V4-Flash and 57% to 78% on V4-Pro compared with the prior MTP-1 baseline, according to internal production data. The company also open-sourced DeepSpec, the full training stack for speculative decoding draft models, under an MIT license — making the technology available to teams using Qwen3 and Gemma models.
Chinese AI Labs Tighten the Pricing Screw
DeepSeek is not alone in challenging US pricing. Z.ai, formerly Zhipu AI, launched GLM5.2 this week — a model Jefferies strategist Christopher Wood called "almost equal to Anthropic as a competitor for the corporate market" at one-quarter the cost per token. Morgan Stanley traders noted that "the demand mix is clearly shifting towards lower-cost models."
The peak-valley mechanism could widen DeepSeek's cost gap further during off-peak hours, when rates drop to half the peak level. For roughly 90% of everyday tasks, DeepSeek's V4-Pro "does much the same job at roughly 1.5% of the cost" of Anthropic's Claude Fable 5, Deutsche Bank's Reid wrote on June 18.
What This Means for Investors
The shift toward cheaper models threatens the valuation narratives of US AI providers planning public listings. OpenAI is reportedly reconsidering its IPO timeline because of tech IPO underperformance and growing price competition, the New York Times reported. Anthropic faces similar pressure as enterprise customers evaluate lower-cost alternatives.
For GPU makers and cloud providers, the trend is double-edged. Lower per-token costs could drive broader adoption — Deloitte projected inference workloads would account for roughly two-thirds of all AI compute in 2026, up from one-third in 2023. But if enterprises shift workloads to cheaper or self-hosted models, revenue growth at hyperscale cloud providers could decelerate. DeepSeek's peak-valley pricing marks a new phase in the AI price war, one where Chinese labs use software optimization and aggressive pricing to capture market share from US incumbents.
This article is for informational purposes only and does not constitute investment advice.