Tư duy mở rộng của Claude

Nghiên cứu khám phá khái niệm 'tư duy mở rộng' trong Claude, tìm hiểu cách mô hình xử lý và tạo ra các chuỗi suy nghĩ dài hơn.

  • 12 min read
Tư duy mở rộng của Claude
Nghiên cứu khám phá khái niệm 'tư duy mở rộng' trong Claude, tìm hiểu cách mô hình xử lý và tạo ra các chuỗi suy nghĩ dài hơn.
                    <h2 id="the-visible-thought-process"><strong>The visible thought process</strong></h2>
                    <p>As well as giving Claude the ability to think for longer and thus answer tougher questions, we’ve decided to make its thought process visible in raw form. This has several benefits:</p>
                    <ul>
                        <li><strong>Trust.</strong> Being able to observe the way Claude thinks makes it easier to understand and check its answers—and might help users get better outputs.</li>
                        <li><strong>Alignment.</strong> In some of our previous <a href="https://www.anthropic.com/research/alignment-faking">Alignment Science research</a>, we’ve used contradictions between what the model inwardly thinks and what it outwardly says to identify when it might be engaging in concerning behaviors like deception.</li>
                        <li><strong>Interest.</strong> It’s often fascinating to watch Claude think. Some of our researchers with math and physics backgrounds have noted how eerily similar Claude’s thought process is to their own way of reasoning through difficult problems: exploring many different angles and branches of reasoning, and double- and triple-checking answers.</li>
                    </ul>
                    <div>
                        <!-- Placeholder for media -->
                        <video controls>
                            <!-- If there was a video, it would be here -->
                            <source src="your_video_source.mp4" type="video/mp4">
                            Your browser does not support the video tag.
                        </video>
                    </div>
                    <p>But a visible thought process also has several downsides. First, users might notice that the revealed thinking is more detached and less personal-sounding than Claude’s default outputs. That’s because we didn’t perform our standard <a href="https://www.anthropic.com/research/claude-character">character</a> training on the model’s thought process. We wanted to give Claude maximum leeway in thinking whatever thoughts were necessary to get to the answer—and as with human thinking, Claude sometimes finds itself thinking some incorrect, misleading, or half-baked thoughts along the way. Many users will find this useful; others might find it (and the less characterful content in the thought process) frustrating.</p>
                    <p>Another issue is what’s known as “faithfulness”—we don’t know for certain that what’s in the thought process truly represents what’s going on in the model’s mind (for instance, English-language words, such as those displayed in the thought process, might simply not be able to describe why the model displays a particular behavior). The problem of faithfulness—and how to ensure it—is one of our active areas of research. Thus far, our results suggest that models very often make decisions based on factors that they <em>don’t</em> explicitly discuss in their thinking process. This means we can’t rely on monitoring current models’ thinking to make strong arguments about their safety<sup>2</sup>.</p>
                    <p>Third, it poses several safety and security concerns. Malicious actors might be able to use the visible thought process to build better strategies to jailbreak Claude. Much more speculatively, it’s also possible that, if models learn during training that their internal thoughts are to be on display, they might be incentivized to think in different, less predictable ways—or to deliberately hide certain thoughts.</p>
                    <p>These latter concerns will be particularly acute for future, more capable versions of Claude—versions that would pose more of a risk if misaligned. We’ll weigh the pros and cons of revealing the thought process for future releases<sup>3</sup>. In the meantime, the visible thought process in Claude 3.7 Sonnet should be considered a research preview.</p>

                    <h2 id="new-tests-of-claudes-thinking"><strong>New tests of Claude’s thinking</strong></h2>
                    <h3 id="claude-as-an-agent"><strong>Claude as an agent</strong></h3>
                    <p>Claude 3.7 Sonnet benefits from what we might call “action scaling”—an improved capability that allows it to iteratively call functions, respond to environmental changes, and continue until an open-ended task is complete. One example of such a task is using a computer: Claude can issue virtual mouse clicks and keyboard presses to solve tasks on a user’s behalf. Compared to its predecessor, Claude 3.7 Sonnet can allocate more turns—and more time and computational power—to computer use tasks, and its results are often better.</p>
                    <p>We can see this in how Claude 3.7 Sonnet has improved on <a href="https://os-world.github.io/">OSWorld</a>, an evaluation that measures the capabilities of multimodal AI agents. Claude 3.7 Sonnet starts off somewhat better, but the difference in performance grows over time as the model continues to interact with the virtual computer.</p>
                    <div>
                        <figure>
                            <img alt="A chart showing the performance of Claude 3.5 Sonnet (new) versus Claude 3.7 on the OSWorld evaluation." srcSet="/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fa86667fc2e2aab1061f8d1a7b48e0a2196dd94d6-3840x1986.png&amp;w=3840&amp;q=75 1x" src="/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fa86667fc2e2aab1061f8d1a7b48e0a2196dd94d6-3840x1986.png&amp;w=3840&amp;q=75" loading="lazy" width="3840" height="1986" decoding="async"/>
                            <figcaption>The performance of Claude 3.7 Sonnet versus its predecessor model on the OSWorld evaluation, testing multimodal computer use skills. “Pass @ 1”: the model has only a single attempt to solve a particular problem for it to count as having passed.</figcaption>
                        </figure>
                    </div>
                    <h3 id="claude-plays-pokmon"><strong>Claude plays Pokémon</strong></h3>
                    <p>Together, Claude’s extended thinking and agent training help it do better on many standard evaluations like OSWorld. But they also give it a major boost on some other, perhaps more unexpected, tasks.</p>
                    <p>Playing Pokémon—specifically, the Game Boy classic <em>Pokémon Red</em>—is just such a task. We equipped Claude with basic memory, screen pixel input, and function calls to press buttons and navigate around the screen, allowing it to play Pokémon continuously beyond its usual context limits, sustaining gameplay through tens of thousands of interactions.</p>
                    <p>In the graph below, we’ve plotted the Pokémon progression of Claude 3.7 Sonnet alongside that of previous versions of Claude Sonnet, which didn’t have the option for extended thinking. As you can see, the previous versions became stuck very early in the game, with Claude 3.0 Sonnet failing to even leave the house in Pallet Town where the story begins.</p>
                    <p>But Claude 3.7 Sonnet’s improved agentic capabilities helped it advance much further, successfully battling three Pokémon Gym Leaders (the game’s bosses) and winning their Badges. Claude 3.7 Sonnet is super effective at trying multiple strategies and questioning previous assumptions, which allow it to improve its own capabilities as it progresses.</p>
                    <div>
                        <figure>
                            <img alt="A chart showing the performance of the various Claude Sonnet models at playing Pokémon." srcSet="/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269.png&amp;w=1920&amp;q=75 1x, /_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269.png&amp;w=3840&amp;q=75 2x" src="/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F9a30c8288e402eee24d4ef60272cb6365a36207a-1920x1269.png&amp;w=3840&amp;q=75" loading="lazy" width="1920" height="1269" decoding="async"/>
                            <figcaption>Claude 3.7 Sonnet demonstrates that it is the very best of all the Sonnet models so far at playing Pokémon Red. On the x-axis is the number of interactions Claude completes as it plays the game; on the y-axis are important milestones in the game involving collecting certain items, navigating to certain areas, and defeating certain game bosses.</figcaption>
                        </figure>
                    </div>
                    <p>Pokémon is a fun way to appreciate Claude 3.7 Sonnet’s capabilities, but we expect these capabilities to have a real-world impact far beyond playing games. The model's ability to maintain focus and accomplish open-ended goals will help developers build a wide range of state-of-the-art AI agents.</p>
                    <h3 id="serial-and-parallel-test-time-compute-scaling"><strong>Serial and parallel test-time compute scaling</strong></h3>
                    <p>When Claude 3.7 Sonnet is using its extended thinking capability, it could be described as benefiting from “serial test-time compute”. That is, it uses multiple, sequential reasoning steps before producing its final output, adding more computational resources as it goes. In general, this improves its performance in a predictable way: its accuracy on, for example, math questions improves logarithmically with the number of “thinking tokens” that it’s allowed to sample.</p>
                    <div>
                        <figure>
                            <img alt="A graph of Claude 3.7 Sonnet's mathematical performance according to how many tokens were used in its thinking process." src="/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F15498c675ac25c07436cd67e2ef1014c4fb354f0-3840x2138.png&amp;w=3840&amp;q=75" loading="lazy" width="3840" height="2138" decoding="async"/>
                            <figcaption>Claude 3.7 Sonnet’s performance on questions from the 2024 American Invitational Mathematics Examination 2024, according to how many thinking tokens it’s allowed per problem. Note that even though we allow Claude to use the entire thinking budget, it generally stops short. We include in the plot the tokens sampled that are used to summarize the final answer.</figcaption>
                        </figure>
                    </div>
                    <p>Our researchers have also been experimenting with improving the model’s performance using <em>parallel</em> test-time compute. They do this by sampling multiple independent thought processes and selecting the best one without knowing the true answer ahead of time. One way to do this is with <a href="https://arxiv.org/abs/2206.14858">majority</a> or consensus voting; selecting the answer that appears most commonly as the 'best' one. Another is using another language model (like a second copy of Claude) asked to check its work or a learned scoring function and pick what it thinks is best. Strategies like this (along with similar work) have been reported in the evaluation <a href="https://developers.googleblog.com/en/the-next-chapter-of-the-gemini-era-for-developers/">results</a> of <a href="https://arxiv.org/abs/2502.06807">several</a> <a href="https://arxiv.org/abs/2403.05530">other</a> <a href="https://x.ai/blog/grok-3">AI</a> <a href="https://openai.com/index/openai-o3-mini/">models</a>).</p>
                    <p>We achieved striking improvements using parallel test-time compute scaling on the <a href="https://arxiv.org/abs/2311.12022">GPQA evaluation</a>, a commonly-used set of challenging questions on biology, chemistry, and physics. Using the equivalent compute of 256 independent samples, a learned scoring model, and a maximum 64k-token thinking budget, Claude 3.7 Sonnet achieved a GPQA score of 84.8% (including a physics subscore of 96.5%), and benefits from continued scaling beyond the limits of majority vote. We report our results for both our scoring model methods and the majority vote method below.</p>
                    <div>
                        <figure>
                            <img alt="Four graphs showing the performance of Claude 3.7 Sonnet on the GPQA evaluation when using parallel test-time compute scaling." src="/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fae30675be60436a7de8e1afbc1db347862b865d2-3840x3764.png&amp;w=3840&amp;q=75" loading="lazy" width="3840" height="3764" decoding="async"/>
                            <figcaption>Experimental results from using parallel test-time compute scaling to improve Claude 3.7 Sonnet’s performance on the GPQA evaluation. The different lines refer to different methods of scoring the performance. “Majority @ N”: where multiple outputs are generated from a model for the same prompt with the majority vote taken as the final answer; “scoring model”: a separate model which is used to assess the performance of the model being evaluated; “pass @ N”: where models “pass” a test if any of a given number of attempts succeeds.</figcaption>
                        </figure>
                    </div>
                    <p>Methods like these allow us to improve the quality of Claude’s answers, usually without having to wait for it to finish its thoughts. Claude can have multiple different extended thought processes simultaneously, allowing it to consider more approaches to a problem and ultimately get it right much more often. Parallel test-time compute scaling isn’t available in our newly-deployed model, but we're continuing to research these methods for the future.</p>

                    <h2 id="claude-37-sonnets-safety-mechanisms"><strong>Claude 3.7 Sonnet’s safety mechanisms</strong></h2>
                    <p><strong>AI Safety Level.</strong> Anthropic’s <a href="https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy">Responsible Scaling Policy</a> commits us not to train or deploy models unless we have implemented appropriate safety and security measures. Our Frontier Red Team and Alignment Stress Testing team ran extensive tests on Claude 3.7 Sonnet to determine whether it required the same level of deployment and security safeguards as our previous models—known as the AI Safety Level (ASL) 2 standard—or stronger measures.</p>
                    <p>Our comprehensive evaluation of Claude 3.7 Sonnet confirmed that our current ASL-2 safety standard remains appropriate. At the same time, the model demonstrated increased sophistication and heightened capabilities across all domains. In controlled studies examining tasks related to the production of Chemical, Biological, Radiological, and Nuclear (CBRN) weapons, we observed some performance “uplift” among model-assisted participants compared to non-assisted participants. That is, participants were able to get further towards success than they would have just by using information that’s available online. However, all of the attempts to perform these tasks contained critical failures, completely impeding success.</p>
                    <p>Expert red-teaming of the model produced mixed feedback. Whereas some experts noted improvements in the model’s knowledge in certain areas of CBRN processes, they also found that the frequency of critical failures was too high for successful end-to-end task completion. We are proactively enhancing our ASL-2 measures by accelerating the development and deployment of targeted classifiers and monitoring systems.</p>
                    <p>In addition, the capabilities of our future models might require us to move to the next stage: ASL-3 safeguards. Our recent work on <a href="https://www.anthropic.com/research/constitutional-classifiers">Constitutional Classifiers</a> to prevent jailbreaks, along with other efforts, stands us in good stead to implement the requirements of the ASL-3 standard in the near future.</p>
                    <p><strong>Visible thought process.</strong> Even at ASL-2, Claude 3.7 Sonnet’s visible extended thinking feature is new, and thus requires new and appropriate safeguards. In rare cases, Claude’s thought process might include content that is potentially harmful (topics include child safety, cyber attacks, and dangerous weapons). In such cases, we will encrypt the thought process: this will not stop Claude from including the content in its thought process (which could still be important for the eventual production of perfectly benign responses), but the relevant part of the thought process will not be visible to users. Instead, they will see the message “the rest of the thought process is not available for this response”. We aim for this encryption to occur rarely, and only in cases where the potential for harm is high.</p>
                    <p><strong>Computer use.</strong> Finally, we have enhanced our safety measures for Claude’s computer use ability (which we discussed above: it allows Claude to see a user’s computer screen and take actions on their behalf). We have made substantial progress in defending against “prompt injection” attacks, where a malicious third party hides a secret message somewhere where Claude may see it while using the computer, potentially tricking it into taking actions the user didn’t intend. With new training to resist prompt injection, a new system prompt that includes instructions to ignore these attacks, and a classifier that triggers when the model encounters a potential prompt injection, we now prevent these attacks 88% of the time<sup>4</sup>, up from 74% of the time without the mitigations.</p>
                    <p>The above is just a short summary of some of our extensive safety work on Claude 3.7 Sonnet. For more information, analytic results, and several examples of the safeguards in action, see our full <a href="http://anthropic.com/claude-3-7-sonnet-system-card">System Card</a>.</p>

                    <h2 id="using-claude"><strong>Using Claude</strong></h2>
                    <p>You can use Claude 3.7 Sonnet now at <a href="http://claude.ai/redirect/website.v1.7bce1733-92a4-4087-91be-15690999445c">Claude.ai</a> or on <a href="https://docs.anthropic.com/en/api/getting-started">our API</a>. And just as Claude can now let you know what it thinks, we hope you’ll let us know what you think, too. Please send your feedback about the new model to <a href="mailto:feedback@anthropic.com">feedback@anthropic.com</a>.</p>
                </div>
            </div>
        </article>
    </div>
    <div>
        <div>
            <h4>Footnotes</h4>
            <p>1. Specifically, this is available for <a href="https://www.anthropic.com/news/claude-pro">Claude Pro</a>, <a href="https://www.anthropic.com/team">Team</a>, <a href="https://www.anthropic.com/enterprise">Enterprise</a>, and <a href="https://docs.anthropic.com/en/api/getting-started">API</a> users.</p>
            <p>2. Our faithfulness research is further described in our <a href="http://anthropic.com/claude-3-7-sonnet-system-card">System Card</a>. We also hope that a full understanding of the reasons for a model’s behavior, at the level of the activations in its neural network, might be achieved through future advances in <a href="https://www.anthropic.com/research/mapping-mind-language-model">mechanistic interpretability</a>.</p>
            <p>3. It’s possible that there’s a middle way between revealing the thought process in its entirety and keeping it entirely hidden. It might be preferable, for example, to train the model to always be truthful when asked about its internal thought process, but not to reveal those thoughts by default (and perhaps be able to refuse certain requests).</p>
            <p>4. This comes with a 0.5% false-positive rate (where the safeguards trigger even though there isn’t a prompt injection attack present). We’re working on reducing this rate as we develop our safety mechanisms.</p>
        </div>
        <div>
            <a href="https://twitter.com/intent/tweet?text=https://www.anthropic.com/research/visible-extended-thinking" target="_blank" rel="noopener" aria-label="Share on Twitter">
                <svg>
                    <!-- Twitter Icon -->
                </svg>
            </a>
            <a href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https://www.anthropic.com/research/visible-extended-thinking" target="_blank" rel="noopener" aria-label="Share on LinkedIn">
                <svg>
                    <!-- LinkedIn Icon -->
                </svg>
            </a>
        </div>
    </div>
</article>
<section>
    <div>
        <div>
            <a href="/research/deprecation-commitments">
                <div>
                    <div>
                        <div>
                            <p>Research</p>
                            <h3 class="bold">Commitments on model deprecation and preservation</h3>
                        </div>
                        <p>Nov 04, 2025</p>
                    </div>
                </div>
            </a>
            <a href="/research/introspection">
                <div>
                    <div>
                        <div>
                            <p>Research</p>
                            <h3 class="bold">Signs of introspection in large language models</h3>
                        </div>
                        <p>Oct 29, 2025</p>
                    </div>
                </div>
            </a>
            <a href="/research/economic-policy-responses">
                <div>
                    <div>
                        <div>
                            <p>Research</p>
                            <h3 class="bold">Preparing for AI’s economic impact: exploring policy responses</h3>
                        </div>
                        <p>Oct 14, 2025</p>
                    </div>
                </div>
            </a>
        </div>
    </div>
</section>

Recommended for You

Claude 3.7 Sonnet và Claude Code

Claude 3.7 Sonnet và Claude Code

Anthropic giới thiệu Claude 3.7 Sonnet, bản cập nhật cho mô hình Sonnet của mình, và Claude Code, một công cụ được thiết kế để hỗ trợ các tác vụ lập trình.

Anthropic Ký cam kết của Nhà Trắng với Thanh niên Hoa Kỳ- Đầu tư vào Giáo dục AI

Anthropic Ký cam kết của Nhà Trắng với Thanh niên Hoa Kỳ- Đầu tư vào Giáo dục AI

Anthropic tham gia cùng Nhà Trắng và các tổ chức khác trong cam kết đầu tư vào giáo dục AI cho giới trẻ Mỹ, thúc đẩy tương lai của lực lượng lao động.