3.5 KiB
todo:
smol:
- collapse short threads into fewer tracks. (checkbox)
big:
Add blocking/await visualization to teleprof profiler
Problem: When profiling async Rust code, it's hard to see where functions are actually blocked waiting on other async operations. Unlike tools like Superluminal that show blocking with arrows, we currently just show when spans are active but not the waiting relationships between them. This makes it difficult to understand why a function is taking so long - is it doing work or waiting? Additionally, our current thread-local parent tracking is fundamentally broken for async code because async functions can migrate between threads in work-stealing executors, and when a function yields at an await point, its SpanGuard is still in scope which would incorrectly make any other task starting on that thread appear as its child.
Solution: We'll instrument async functions differently from sync functions. For async functions, the #[instrument] macro will capture the parent span ID once at future creation time (before any awaits) and store it in the future's state machine. This parent ID moves with the future across threads. Then we'll transform the async function body to instrument segments between await points, and instrument the await points themselves as blocking spans, all using an explicit parent ID rather than thread-local tracking. The macro will call a new capture_parent() function to grab the parent once, then use SpanGuard::new_with_parent() for all spans within the async function, passing the captured parent explicitly. Await points will create AwaitGuard spans that also use the explicit parent. At UI render time we have the complete call tree, so we can detect if an awaited function was properly instrumented by checking if the blocking span has any child spans. If there are no children, we show a warning in the UI that the function appears uninstrumented. We'll render blocking spans with arrows pointing to their children to show the waiting relationships.
Why this approach: By capturing the parent once at future creation and storing it in the future's state, we sidestep all the thread-migration and thread-local storage problems. The parent ID moves with the future automatically as part of its state machine. This works with any executor without requiring executor-specific integration. For the interleaving problem, since we're not using thread-local parent tracking for async spans, there's no issue with multiple tasks running on the same thread - each has its parent baked into its state. Sync code continues to use thread-local parent tracking which works fine. The compile-time warning problem is solved by deferring warnings to runtime where we have full tree information.
What we're not doing: We're not trying to track the full async lifecycle with individual poll calls and yields. We're not integrating with the async executor or wakers. We're not trying to show gaps between polls or track when futures are suspended vs resumed. We're not emitting compile-time warnings about uninstrumented functions because we can't reliably detect this at macro expansion time without type information. We're not using thread-local parent tracking for async functions (only for sync functions). We're not trying to automatically instrument all expressions within async functions like instrument_calls does (though we could add that as an enhancement). Those features could be phase 2 but the current approach gives us solid async profiling without deep runtime integration.