Routes inference requests from Tokyo to Anthropic models in Tokyo, Seoul and 5 more regions
Provider: All Anthropic models | AWS Bedrock
Inference regions: ap-northeast-1, ap-northeast-2, ap-northeast-3, ap-south-1, ap-south-2, ap-southeast-1, ap-southeast-2
https://bedrock-runtime.ap-northeast-1.amazonaws.comInstall: pip install boto3
import boto3
client = boto3.client("bedrock-runtime", region_name="ap-northeast-1")
response = client.converse(
modelId="apac.anthropic.claude-3-7-sonnet-20250219-v1",
messages=[
{
"role": "user",
"content": [{"text": "Hello, how are you?"}],
}
],
inferenceConfig={
"maxTokens": 1024,
"temperature": 0.7,
},
)
print(response["output"]["message"]["content"][0]["text"])Additional examples: Basic invoke, Streaming, Extended Thinking
| Parameter | Type | Description |
|---|---|---|
| max_tokens | integer | Maximum number of tokens to generate in the response. (≥1) |
| temperature | float | Controls randomness. Lower values make output more deterministic. (0–1) Default: 1. |
| top_p | float | Nucleus sampling threshold. Considers tokens with cumulative probability up to this value. (0–1) Default: 1. |
| stop_sequences | string | Up to 4 sequences where the model will stop generating. |
| top_k | integer | Only sample from the top K most likely tokens at each step. (0–500) Default: 250. |
| thinking.type | enum | Enable or disable extended thinking. Default: disabled. |
| thinking.budget_tokens | integer | Token budget for extended thinking. Required when thinking is enabled. (1024–128000) |
Default mode. Pay per token with no upfront commitment.
Enables chain-of-thought reasoning before generating the final response, improving performance on complex tasks.