Anthropic's Claude Code thinking depth plummets by 67%

A damning report from an AMD AI director alleges that Anthropic’s Claude Code AI has seen its performance systematically degrade since February, with its "thinking depth" plummeting by 67 percent and causing a 122-fold explosion in API costs for one team. The analysis, posted publicly on GitHub, has ignited a firestorm in the developer community, questioning the reliability of the AI coding assistant and putting pressure on competitor OpenAI's Codex.

"Claude has been unable to be trusted to perform complex engineering tasks," Stella Laurenzo, a leader in AMD's AI team, said in the GitHub issue report. She warned that her team has switched to other service providers and that "other competitors need to be taken very seriously and evaluated" now.

Laurenzo's analysis is built on 6,852 session logs, revealing a sharp decline in performance. The median thinking depth of the model, a measure of its reasoning process, fell from approximately 2,200 characters in early February to just 720 characters by the end of the month. This collapse in reasoning was accompanied by a 70% reduction in research effort before writing code, with the model's "read-modify" ratio dropping from 6.6 to 2.0. This led to a surge in errors, with the model attempting to modify code without reading the relevant files first in one out of every three edits.

The performance drop had catastrophic cost implications. Laurenzo's team saw their estimated monthly API bill, based on Bedrock Opus pricing, surge from $345 to $42,121—a 122x increase—while producing worse results. The team was forced to shut down their entire agent cluster. The report suggests the degradation coincides with Anthropic's introduction of an "adaptive thinking" feature and a change in the default "effort" setting from high to medium.

Anthropic Responds, Community Skeptical

A member of the Claude Code team, identified as Boris, responded by stating the changes were not intended to degrade the model's underlying logic. He explained that a feature to hide the model's thinking process was a UI change and that users could manually revert to a higher "effort" setting. However, many developers in the community remain unconvinced, stating that even at the highest effort setting, the model's performance remains subpar. "The problem is much more than just the default thinking level being changed to medium," one user commented on Hacker News.

Developers Seek Alternatives

The incident has prompted many developers to abandon the platform, with some publicly stating they have switched to alternatives like OpenAI's Codex or open-source models like Qwen3.5-27b. As a temporary fix, some users are explicitly authorizing the model to edit files and breaking down complex tasks into smaller, more manageable chunks. Laurenzo's report calls for more transparency from Anthropic, including exposing thinking_tokens in the API response so users can monitor the model's reasoning depth themselves.

This article is for informational purposes only and does not constitute investment advice.