update todos

2025-12-17 08:33:22 +01:00
parent 4ef5f7b96f
commit 5adacce08f
1 changed files with 4 additions and 4 deletions
@@ -8,10 +8,10 @@

 ### Add blocking/await visualization to teleprof profiler

-Problem: When profiling async Rust code, it's hard to see where functions are actually blocked waiting on other async operations. Unlike tools like Superluminal that show blocking with arrows, we currently just show when spans are active but not the waiting relationships between them. This makes it difficult to understand why a function is taking so long - is it doing work or waiting?
+Problem: When profiling async Rust code, it's hard to see where functions are actually blocked waiting on other async operations. Unlike tools like Superluminal that show blocking with arrows, we currently just show when spans are active but not the waiting relationships between them. This makes it difficult to understand why a function is taking so long - is it doing work or waiting? Additionally, our current thread-local parent tracking is fundamentally broken for async code because async functions can migrate between threads in work-stealing executors, and when a function yields at an await point, its SpanGuard is still in scope which would incorrectly make any other task starting on that thread appear as its child.

-Solution: We'll instrument both sides of async calls. The #[instrument] macro already wraps async function bodies to track their execution time. We'll enhance it to also transform await points to create blocking spans. When a user writes fetch_data().await, the macro will create an AwaitGuard that marks this as a blocking/waiting point. These blocking spans will be rendered differently in the UI with arrows pointing to the child spans (the actual work being done in the awaited function). At UI render time we have the complete call tree, so we can detect if an awaited function was properly instrumented by checking if the blocking span has any child spans. If there are no children, we show a warning in the UI that the function appears uninstrumented.
+Solution: We'll instrument async functions differently from sync functions. For async functions, the #[instrument] macro will capture the parent span ID once at future creation time (before any awaits) and store it in the future's state machine. This parent ID moves with the future across threads. Then we'll transform the async function body to instrument segments between await points, and instrument the await points themselves as blocking spans, all using an explicit parent ID rather than thread-local tracking. The macro will call a new capture_parent() function to grab the parent once, then use SpanGuard::new_with_parent() for all spans within the async function, passing the captured parent explicitly. Await points will create AwaitGuard spans that also use the explicit parent. At UI render time we have the complete call tree, so we can detect if an awaited function was properly instrumented by checking if the blocking span has any child spans. If there are no children, we show a warning in the UI that the function appears uninstrumented. We'll render blocking spans with arrows pointing to their children to show the waiting relationships.

-Why this approach: We're taking the minimal invasive approach that doesn't require executor or waker integration. By instrumenting at the await point rather than trying to track Poll::Pending returns, we avoid needing deep async runtime integration. The compile-time warning problem (can't detect if functions are already instrumented) is solved by deferring warnings to runtime where we have full tree information. For the async interleaving problem, our existing parent tracking should handle it correctly because we use thread-local parent span tracking with Cell, so when an async function suspends and another resumes on the same thread, each maintains its own parent chain through the guard drop/create cycle. The span guards properly save and restore the previous parent when they're created and dropped, so interleaved execution should nest correctly in the tree.
+Why this approach: By capturing the parent once at future creation and storing it in the future's state, we sidestep all the thread-migration and thread-local storage problems. The parent ID moves with the future automatically as part of its state machine. This works with any executor without requiring executor-specific integration. For the interleaving problem, since we're not using thread-local parent tracking for async spans, there's no issue with multiple tasks running on the same thread - each has its parent baked into its state. Sync code continues to use thread-local parent tracking which works fine. The compile-time warning problem is solved by deferring warnings to runtime where we have full tree information.

-What we're not doing: We're not trying to track the full async lifecycle with individual poll calls and yields. We're not integrating with the async executor or wakers. We're not trying to show gaps between polls or track when futures are suspended vs resumed. We're not emitting compile-time warnings about uninstrumented functions because we can't reliably detect this at macro expansion time without type information. We're not trying to handle the case where an async function migrates between threads (that would require rethinking our thread-local parent tracking). Those features could be phase 2 but require much deeper integration with the async runtime.
+What we're not doing: We're not trying to track the full async lifecycle with individual poll calls and yields. We're not integrating with the async executor or wakers. We're not trying to show gaps between polls or track when futures are suspended vs resumed. We're not emitting compile-time warnings about uninstrumented functions because we can't reliably detect this at macro expansion time without type information. We're not using thread-local parent tracking for async functions (only for sync functions). We're not trying to automatically instrument all expressions within async functions like instrument_calls does (though we could add that as an enhancement). Those features could be phase 2 but the current approach gives us solid async profiling without deep runtime integration.