Claude 3.7 Thinking and Tool Use Dev Guide

Overview

Claude 3.7 Sonnet is the first Claude model to offer step-by-step reasoning, which Anthropic has termed "extended thinking". With Claude 3.7 Sonnet, this feature is optional - you can choose between standard thinking and extended thinking for advanced reasoning tasks.

Thinking Blocks Structure

Thinking blocks represent Claude 3.7 Sonnet's internal thought process. When thinking is enabled, Claude will show its reasoning through thinking content blocks in the response.

Response Format

{
    'stream': {
        'contentBlockDelta': {
            'delta': {
                'text': 'string',
                'toolUse': {
                    'input': 'string'
                },
                'reasoningContent': {
                    'text': 'string',
                    'signature': 'string'
                }
            },
            'contentBlockIndex': 123
        }
    }
}

Request Example

{
    "content": [
        {
            "reasoningContent": {
                "reasoningText": {
                    "text": "This is an astronomy question about why we can't see the far side of the Moon from Earth. \nI will use the search_wikipedia tool to find information about the Moon's rotation and tidal locking.",
                    "signature": "eyJhbGciOiJFUzI1NiIsImtpZCI6ImtleS0xMjM0In0.eyJoYXNoIjoiYWJjMTIzIiwiaWF0IjoxNjE0NTM0NTY3fQ...."
                }
            }
        }
    ]
}

Tool Use with Thinking

When using thinking with tool use, the conversation follows this pattern:

  1. First assistant turn: Initial user message → Assistant responds with thinking blocks followed by tool use requests
  2. Tool result turn: User message with tool results → Assistant responds with either more tool calls or just text (no thinking blocks in this response)

The complete flow typically follows these steps:

  1. User sends initial message
  2. Assistant responds with thinking blocks and tool requests
  3. Send tool results back as User message
  4. Assistant responds with either more tool calls or just text (no thinking blocks)
  5. If more tools are requested, repeat steps 3-4 until conversation is complete

If thinking is enabled but a final assistant message doesn't start with a thinking block (preceding the last set of tool_use and tool_result blocks), you may see:

validationException - The model returned the following errors: messages.1.content.0.type: Expected `thinking` or `redacted_thinking`, but found `tool_use`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks).

If a thinking block does not contain the complete thinking text, you will receive the following validation error. To resolve this, make sure to include the accumulated thinking content in the 'reasoningText' parameter:

validationException - The model returned the following errors: messages.1.content.0: When providing `thinking` or `redacted_thinking` blocks, the blocks must match the parameters during the original request.

Preserving Thinking Blocks

When passing thinking and redacted_thinking blocks back to the API in a multi-turn conversation, you must provide the complete, unmodified block for:

  • Reasoning continuity: Thinking blocks capture Claude's step-by-step reasoning that led to tool requests
  • Context maintenance: While tool results appear as user messages in the API structure, they're part of a continuous reasoning flow

Implementation Considerations

Thinking Budget

  • Minimum budget_tokens is 1,024 tokens
  • Anthropic recommends at least 4,000 tokens for comprehensive reasoning
  • budget_tokens is a target, not a strict limit - actual usage may vary
  • Expect potentially longer response times due to additional processing

Compatibility

  • Thinking isn't compatible with temperature, top_p, or top_k modifications
  • Thinking isn't compatible with forced tool use
  • You cannot pre-fill responses when thinking is enabled

Context Window and Token Usage

  • Thinking tokens count towards the context window and are billed as output tokens
  • Thinking tokens count towards your service quota token per minute (TPM) limit
  • In multi-turn conversations:
    • Thinking blocks from previous turns are stripped and not counted towards context window
    • Exception: thinking blocks from the last turn if it's an assistant turn
    • Only thinking blocks actually shown to the model are billed
  • Always send thinking blocks back with your requests - the system will validate and use them as needed

Implementation Details for Signature Handling

Incorrect signature structure may cause errors like:

Invalid number of parameters set for tagged union structure messages[1].content[0].reasoningContent. Can only set one of the following keys: reasoningText redactedContent.
Unknown parameter in messages[1].content[0].reasoningContent: "signature" must be one of: reasoningText redactedContent
  • Thinking blocks contain a signature field - a cryptographic token verifying the thinking block was generated by Claude

  • The signature field is a direct property of the reasoningContent object, not nested inside reasoningText. The correct structure is:

    {
        "reasoningContent": {
            "reasoningText": {
                "text": "thinking content here"
            },
            "signature": "signature_value_here"
        }
    }
    
  • When accumulating thinking content from streaming responses:

    • Preserve the signature from the last thinking block that contains one
    • Only update the signature variable when a non-empty signature is received
    • Include this signature when sending accumulated thinking back to the model
  • Occasionally Claude's internal reasoning will be flagged by automated safety systems. When this occurs, the thinking block is encrypted and returned as a redacted_thinking block.

References

Next Post Previous Post
No Comment
Add Comment
comment url