Compare commits

..

5 Commits

Author SHA1 Message Date
5adacce08f update todos 2025-12-17 08:33:22 +01:00
4ef5f7b96f track todo's 2025-12-17 01:10:06 +01:00
5a6ba43f78 ~simplify~ 2025-12-17 01:08:59 +01:00
873e6419b8 update docs 2025-12-17 00:36:21 +01:00
a9d0d7ec42 fix text rendering 2025-12-17 00:33:23 +01:00
3 changed files with 408 additions and 613 deletions

View File

@@ -2,64 +2,57 @@
A lightweight, debug-only telemetry profiler for Rust applications. Shows thread activity and call stack hierarchy in real-time. A lightweight, debug-only telemetry profiler for Rust applications. Shows thread activity and call stack hierarchy in real-time.
Inspired by RAD Telemetry - built in ~1200 LOC with 'minimal' dependencies (for a Rust project). Inspired by RAD Telemetry - built in ~1200 LOC with minimal dependencies.
## Features ## Features
- **Icicle graph** showing call stack hierarchy (top half) - **Unified thread tracks** with expandable call stacks (click headers to toggle)
- **Thread timeline** showing per-thread activity over time (bottom half) - **Collapsed view** shows when threads are active (easy to spot blocking)
- **Expanded view** shows full call stack hierarchy with flame graph visualization
- **Ongoing span support** for long-running functions (main loops, render threads)
- **Pause mechanism** to freeze your application for inspection (Space bar)
- **Monokai color palette** for easy visual distinction - **Monokai color palette** for easy visual distinction
- **Pause mechanism** to freeze your application for inspection - **Incremental tree building** - only processes new spans each frame
- **Ringbuffer storage** (~16MB, 1M events) for recent history - **Ringbuffer storage** (~16MB, 1M events) for recent history
- **Lock-free event recording** via MPSC channels - **Lock-free event recording** via MPSC channels
## Dependencies ## Dependencies
Only 7 dependencies (~70 total including transitive): Only 7 direct dependencies (~70 total including transitive):
- `minifb` - Window and framebuffer - `minifb` - Window and framebuffer
- `crossbeam-channel` - Lock-free MPSC - `crossbeam-channel` - Lock-free MPSC
- `once_cell` - Lazy statics - `once_cell` - Lazy statics
- `fontdue` - `fontdue` - Font rasterization
- `procmacro2` - `procmacro2`, `syn`, `quote` - Proc macros
- `syn`
- `quote`
## Usage ## Usage
### Add to your `Cargo.toml`: ### Add to your `Cargo.toml`:
```toml ```toml
[dependencies] [dependencies]
teleprof = { path = "../teleprof" } teleprof = { path = "../teleprof" }
``` ```
### In your code: ### In your code:
```rust ```rust
fn main() { fn main() {
// Start the profiler window (separate thread) // Start the profiler window (separate thread)
#[cfg(debug_assertions)] #[cfg(debug_assertions)]
teleprof::start(); teleprof::start();
// Your application code // Name your thread (optional, shows in UI)
#[cfg(debug_assertions)]
teleprof::set_thread_name("main");
game_loop(); game_loop();
} }
fn game_loop() { fn game_loop() {
loop { loop {
// Profile a scope teleprof::span!("main_frame");
teleprof::span!("game_loop");
update(); update();
render(); render();
// Check if paused (optional)
if teleprof::PAUSE.try_lock().is_err() {
// Wait until unpaused
while teleprof::PAUSE.try_lock().is_err() {
std::thread::sleep(std::time::Duration::from_millis(100));
}
}
} }
} }
@@ -74,18 +67,13 @@ fn render() {
} }
``` ```
### For closures:
```rust
let work = || {
teleprof::span!("my_closure");
// work...
};
```
## Controls ## Controls
- **Space**: Toggle pause (acquires `PAUSE` lock to freeze your app) - **Space**: Toggle pause (freezes ongoing spans at current time)
- **Left click + drag**: Box select to zoom (click background not function)
- **Right click + drag**: Pan timeline
- **Scroll**: Zoom timeline horizontally
- **Click track header**: Expand/collapse thread's call stack
- **Escape**: Close profiler window - **Escape**: Close profiler window
## How it works ## How it works
@@ -94,19 +82,20 @@ let work = || {
2. When the guard drops, sends `SpanEnd` 2. When the guard drops, sends `SpanEnd`
3. Events are sent via lock-free MPSC channel 3. Events are sent via lock-free MPSC channel
4. Window thread drains events into a fixed-size ringbuffer 4. Window thread drains events into a fixed-size ringbuffer
5. Renders icicle graph (call hierarchy) and timeline (per-thread activity) 5. Incrementally builds per-thread call trees (only processes new spans)
6. Renders unified thread tracks with expandable call stacks
## Design Goals ## Design Goals
- **Minimal overhead**: Lock-free event recording - **Minimal overhead**: Lock-free event recording, incremental tree building
- **Debug-only**: Compile out in release builds with `#[cfg(debug_assertions)]` - **Debug-only**: Compile out in release builds with `#[cfg(debug_assertions)]`
- **Separate window**: Doesn't interfere with your app's rendering - **Separate window**: Doesn't interfere with your app's rendering
- **Simple API**: Just `span!("name")` and you're done - **Simple API**: Just `span!("name")` and you're done
- **Handle any thread pattern**: Long-lived, short-lived, thread pools (Rayon, etc.)
## Examples ## Examples
Run the included examples: Run the included examples:
```bash ```bash
# Multi-threaded physics simulation # Multi-threaded physics simulation
cargo run --example demo cargo run --example demo

File diff suppressed because it is too large Load Diff

17
todo.md Normal file
View File

@@ -0,0 +1,17 @@
# todo:
## smol:
- collapse short threads into fewer tracks. (checkbox)
## big:
### Add blocking/await visualization to teleprof profiler
Problem: When profiling async Rust code, it's hard to see where functions are actually blocked waiting on other async operations. Unlike tools like Superluminal that show blocking with arrows, we currently just show when spans are active but not the waiting relationships between them. This makes it difficult to understand why a function is taking so long - is it doing work or waiting? Additionally, our current thread-local parent tracking is fundamentally broken for async code because async functions can migrate between threads in work-stealing executors, and when a function yields at an await point, its SpanGuard is still in scope which would incorrectly make any other task starting on that thread appear as its child.
Solution: We'll instrument async functions differently from sync functions. For async functions, the #[instrument] macro will capture the parent span ID once at future creation time (before any awaits) and store it in the future's state machine. This parent ID moves with the future across threads. Then we'll transform the async function body to instrument segments between await points, and instrument the await points themselves as blocking spans, all using an explicit parent ID rather than thread-local tracking. The macro will call a new capture_parent() function to grab the parent once, then use SpanGuard::new_with_parent() for all spans within the async function, passing the captured parent explicitly. Await points will create AwaitGuard spans that also use the explicit parent. At UI render time we have the complete call tree, so we can detect if an awaited function was properly instrumented by checking if the blocking span has any child spans. If there are no children, we show a warning in the UI that the function appears uninstrumented. We'll render blocking spans with arrows pointing to their children to show the waiting relationships.
Why this approach: By capturing the parent once at future creation and storing it in the future's state, we sidestep all the thread-migration and thread-local storage problems. The parent ID moves with the future automatically as part of its state machine. This works with any executor without requiring executor-specific integration. For the interleaving problem, since we're not using thread-local parent tracking for async spans, there's no issue with multiple tasks running on the same thread - each has its parent baked into its state. Sync code continues to use thread-local parent tracking which works fine. The compile-time warning problem is solved by deferring warnings to runtime where we have full tree information.
What we're not doing: We're not trying to track the full async lifecycle with individual poll calls and yields. We're not integrating with the async executor or wakers. We're not trying to show gaps between polls or track when futures are suspended vs resumed. We're not emitting compile-time warnings about uninstrumented functions because we can't reliably detect this at macro expansion time without type information. We're not using thread-local parent tracking for async functions (only for sync functions). We're not trying to automatically instrument all expressions within async functions like instrument_calls does (though we could add that as an enhancement). Those features could be phase 2 but the current approach gives us solid async profiling without deep runtime integration.