Install
Instrument
What gets captured
Every call tomessages.create (sync and streaming, async and sync) becomes an llm span with:
input— the fullmessagesarray andsystempromptoutput— the response content blocks (text + tool_use)modelprompt_tokens→ Anthropic’sinput_tokenscompletion_tokens→ Anthropic’soutput_tokenscache_creation_input_tokens,cache_read_input_tokens→ attached as metadata when prompt caching is in usestop_reason— attached as metadata (end_turn,tool_use,max_tokens)latency_ms
Streaming
Tool use
Claude’s tool calls appear in the span output undercontent[].tool_use. To trace tool execution as its own span:
messages.create call — that becomes a second llm span, and the two-turn conversation appears as a clear waterfall in the dashboard.
Prompt caching
Anthropic’s prompt caching works transparently — the SDK recordscache_creation_input_tokens and cache_read_input_tokens as span metadata so you can see cache hit rates in the dashboard’s Metrics view.
Filter by metadata.cache_hit_rate or query via the API:
Disabling
Known gotchas
- Extended thinking — thinking content is captured as part of the span output (
content[].thinking). It’ll show up in trace detail views alongside the visible response. - Batch API (
messages.batches) — not auto-instrumented. Wrap batch submissions manually withtrace()if you want observability. - Vertex and Bedrock — use the respective Anthropic SDK client (
AnthropicVertex,AnthropicBedrock) and the sameinstrument_anthropic()call.