posts/ideas/cost-of-switching-proprietary-to-open-weight-models.md

Post Idea: The Real Cost of Switching from Claude/OpenAI to Open-Weight Models

Status: Idea Date: 2026-05-22 Source: AI-Powered Development slide series (Parts 1-3), Engineering Excellence talks, Dom/David & Dom/Vela 1:1s

Hook

I switched from Cursor (Composer 2 + Claude models) to DeepSeek Pro/Flash via ACP adapters for my personal development workflow. The result: my 4-week DeepSeek spend came to just $26.25 while processing over 1.9 billion tokens across 20,000+ requests. Here's what I learned — including why some engineers find open-weight models feel "dumber."

The Numbers (Real, Not Hypothetical)

Before (Cursor + Claude)

Team of 4 engineers.

Item Cost
Cursor base plan $40/mo/person → $160/mo team
Cursor on-demand usage ~$94/mo team (low because some switched mid-month)
Total team ~$254/mo

Worth noting: the $94 on-demand figure was artificially low because partway through the month, some team members had already switched away from using Cursor heavy. A full month of Cursor with Claude models would've been significantly higher.

After (DeepSeek via ACP — Personal Usage)

Item Cost
DeepSeek V4 Flash $0.14/M input, $0.28/M output
DeepSeek V4 Pro $0.435/M input, $0.87/M output
4-week personal usage (single person) $26.25 (with 75% promo — now permanent)
Pro tokens consumed 992,532,966 in 8,679 requests
Flash tokens consumed 963,728,972 in 12,043 requests
Reasonix session with 94% cache hit ~$0.01-0.04 per task

Note: DeepSeek has since made the 75% introductory discount permanent, so the price above is the ongoing rate — roughly ~$105/mo at full price is now a moot point.

Comparison with Major Models

Model Input cost Output cost vs DeepSeek Pro
Claude Opus 4.7 $5.00/M $25.00/M 11x / 29x more
GPT-5.5 $5.00/M $30.00/M 11x / 34x more
Claude Sonnet 4.6 $3.00/M $15.00/M 7x / 17x more
DeepSeek V4 Pro $0.435/M $0.87/M baseline
DeepSeek V4 Flash $0.14/M $0.28/M 3x cheaper than Pro

The Prefix Cache Multiplier

DeepSeek's byte-stable prefix caching is the hidden superpower. Reasonix achieves 94% cache hit rate by using an append-only loop pattern. Cached input drops to ~$0.014/M — effectively $0.01-0.04 per coding task.

But Here's the Catch

Not everyone on the team felt the same way about the switch.

Vela (heaviest Claude Sonnet user): Thought DeepSeek was "dumber." We dug into it and found 3 distinct causes:

  1. Actual capability gap — DeepSeek is genuinely less capable for open-ended, poorly-scoped problems. Pro is ~77/100 vs Opus ~91/100 on one benchmark.
  2. Visible reasoning — DeepSeek exposes its thinking. Watching it second-guess makes it feel dumber, even when the final output is fine.
  3. Tone/personality — She literally said "it feels like I lost a friend." Sonnet's voice mattered to her workflow.

David (lighter Claude user): Found DeepSeek comparable. "It's good. I don't think it's dumber."

Dom (heavy planner): Prefers DeepSeek. Uses the visible reasoning to course-correct mid-thought. Built the entire Qwestly agent stack with DeepSeek Pro autonomously.

Key Insight

The cost savings are enormous (even at personal scale — compare $26.25/month vs. the $40 + usage you'd pay solo on Cursor with Claude), but the switching cost isn't just technical — it's psychological. The model's "personality," its reasoning style, and how its output feels to read all affect productivity. Some engineers can absorb that. Others can't. The math changes depending on the user.

Tags

engineering ai cost-optimization deepseek claude openai open-weight economics productivity