Compare commits
5 Commits
09457e56d4
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
| 5adacce08f | |||
| 4ef5f7b96f | |||
| 5a6ba43f78 | |||
| 873e6419b8 | |||
| a9d0d7ec42 |
63
README.md
63
README.md
@@ -2,64 +2,57 @@
|
|||||||
|
|
||||||
A lightweight, debug-only telemetry profiler for Rust applications. Shows thread activity and call stack hierarchy in real-time.
|
A lightweight, debug-only telemetry profiler for Rust applications. Shows thread activity and call stack hierarchy in real-time.
|
||||||
|
|
||||||
Inspired by RAD Telemetry - built in ~1200 LOC with 'minimal' dependencies (for a Rust project).
|
Inspired by RAD Telemetry - built in ~1200 LOC with minimal dependencies.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Icicle graph** showing call stack hierarchy (top half)
|
- **Unified thread tracks** with expandable call stacks (click headers to toggle)
|
||||||
- **Thread timeline** showing per-thread activity over time (bottom half)
|
- **Collapsed view** shows when threads are active (easy to spot blocking)
|
||||||
|
- **Expanded view** shows full call stack hierarchy with flame graph visualization
|
||||||
|
- **Ongoing span support** for long-running functions (main loops, render threads)
|
||||||
|
- **Pause mechanism** to freeze your application for inspection (Space bar)
|
||||||
- **Monokai color palette** for easy visual distinction
|
- **Monokai color palette** for easy visual distinction
|
||||||
- **Pause mechanism** to freeze your application for inspection
|
- **Incremental tree building** - only processes new spans each frame
|
||||||
- **Ringbuffer storage** (~16MB, 1M events) for recent history
|
- **Ringbuffer storage** (~16MB, 1M events) for recent history
|
||||||
- **Lock-free event recording** via MPSC channels
|
- **Lock-free event recording** via MPSC channels
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
Only 7 dependencies (~70 total including transitive):
|
Only 7 direct dependencies (~70 total including transitive):
|
||||||
- `minifb` - Window and framebuffer
|
- `minifb` - Window and framebuffer
|
||||||
- `crossbeam-channel` - Lock-free MPSC
|
- `crossbeam-channel` - Lock-free MPSC
|
||||||
- `once_cell` - Lazy statics
|
- `once_cell` - Lazy statics
|
||||||
- `fontdue`
|
- `fontdue` - Font rasterization
|
||||||
- `procmacro2`
|
- `procmacro2`, `syn`, `quote` - Proc macros
|
||||||
- `syn`
|
|
||||||
- `quote`
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
### Add to your `Cargo.toml`:
|
### Add to your `Cargo.toml`:
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
[dependencies]
|
[dependencies]
|
||||||
teleprof = { path = "../teleprof" }
|
teleprof = { path = "../teleprof" }
|
||||||
```
|
```
|
||||||
|
|
||||||
### In your code:
|
### In your code:
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
fn main() {
|
fn main() {
|
||||||
// Start the profiler window (separate thread)
|
// Start the profiler window (separate thread)
|
||||||
#[cfg(debug_assertions)]
|
#[cfg(debug_assertions)]
|
||||||
teleprof::start();
|
teleprof::start();
|
||||||
|
|
||||||
// Your application code
|
// Name your thread (optional, shows in UI)
|
||||||
|
#[cfg(debug_assertions)]
|
||||||
|
teleprof::set_thread_name("main");
|
||||||
|
|
||||||
game_loop();
|
game_loop();
|
||||||
}
|
}
|
||||||
|
|
||||||
fn game_loop() {
|
fn game_loop() {
|
||||||
loop {
|
loop {
|
||||||
// Profile a scope
|
teleprof::span!("main_frame");
|
||||||
teleprof::span!("game_loop");
|
|
||||||
|
|
||||||
update();
|
update();
|
||||||
render();
|
render();
|
||||||
|
|
||||||
// Check if paused (optional)
|
|
||||||
if teleprof::PAUSE.try_lock().is_err() {
|
|
||||||
// Wait until unpaused
|
|
||||||
while teleprof::PAUSE.try_lock().is_err() {
|
|
||||||
std::thread::sleep(std::time::Duration::from_millis(100));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -74,18 +67,13 @@ fn render() {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### For closures:
|
|
||||||
|
|
||||||
```rust
|
|
||||||
let work = || {
|
|
||||||
teleprof::span!("my_closure");
|
|
||||||
// work...
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
## Controls
|
## Controls
|
||||||
|
|
||||||
- **Space**: Toggle pause (acquires `PAUSE` lock to freeze your app)
|
- **Space**: Toggle pause (freezes ongoing spans at current time)
|
||||||
|
- **Left click + drag**: Box select to zoom (click background not function)
|
||||||
|
- **Right click + drag**: Pan timeline
|
||||||
|
- **Scroll**: Zoom timeline horizontally
|
||||||
|
- **Click track header**: Expand/collapse thread's call stack
|
||||||
- **Escape**: Close profiler window
|
- **Escape**: Close profiler window
|
||||||
|
|
||||||
## How it works
|
## How it works
|
||||||
@@ -94,19 +82,20 @@ let work = || {
|
|||||||
2. When the guard drops, sends `SpanEnd`
|
2. When the guard drops, sends `SpanEnd`
|
||||||
3. Events are sent via lock-free MPSC channel
|
3. Events are sent via lock-free MPSC channel
|
||||||
4. Window thread drains events into a fixed-size ringbuffer
|
4. Window thread drains events into a fixed-size ringbuffer
|
||||||
5. Renders icicle graph (call hierarchy) and timeline (per-thread activity)
|
5. Incrementally builds per-thread call trees (only processes new spans)
|
||||||
|
6. Renders unified thread tracks with expandable call stacks
|
||||||
|
|
||||||
## Design Goals
|
## Design Goals
|
||||||
|
|
||||||
- **Minimal overhead**: Lock-free event recording
|
- **Minimal overhead**: Lock-free event recording, incremental tree building
|
||||||
- **Debug-only**: Compile out in release builds with `#[cfg(debug_assertions)]`
|
- **Debug-only**: Compile out in release builds with `#[cfg(debug_assertions)]`
|
||||||
- **Separate window**: Doesn't interfere with your app's rendering
|
- **Separate window**: Doesn't interfere with your app's rendering
|
||||||
- **Simple API**: Just `span!("name")` and you're done
|
- **Simple API**: Just `span!("name")` and you're done
|
||||||
|
- **Handle any thread pattern**: Long-lived, short-lived, thread pools (Rayon, etc.)
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
Run the included examples:
|
Run the included examples:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Multi-threaded physics simulation
|
# Multi-threaded physics simulation
|
||||||
cargo run --example demo
|
cargo run --example demo
|
||||||
@@ -122,4 +111,4 @@ The bouncing ball example demonstrates:
|
|||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
???
|
???
|
||||||
File diff suppressed because it is too large
Load Diff
17
todo.md
Normal file
17
todo.md
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
# todo:
|
||||||
|
|
||||||
|
## smol:
|
||||||
|
- collapse short threads into fewer tracks. (checkbox)
|
||||||
|
|
||||||
|
|
||||||
|
## big:
|
||||||
|
|
||||||
|
### Add blocking/await visualization to teleprof profiler
|
||||||
|
|
||||||
|
Problem: When profiling async Rust code, it's hard to see where functions are actually blocked waiting on other async operations. Unlike tools like Superluminal that show blocking with arrows, we currently just show when spans are active but not the waiting relationships between them. This makes it difficult to understand why a function is taking so long - is it doing work or waiting? Additionally, our current thread-local parent tracking is fundamentally broken for async code because async functions can migrate between threads in work-stealing executors, and when a function yields at an await point, its SpanGuard is still in scope which would incorrectly make any other task starting on that thread appear as its child.
|
||||||
|
|
||||||
|
Solution: We'll instrument async functions differently from sync functions. For async functions, the #[instrument] macro will capture the parent span ID once at future creation time (before any awaits) and store it in the future's state machine. This parent ID moves with the future across threads. Then we'll transform the async function body to instrument segments between await points, and instrument the await points themselves as blocking spans, all using an explicit parent ID rather than thread-local tracking. The macro will call a new capture_parent() function to grab the parent once, then use SpanGuard::new_with_parent() for all spans within the async function, passing the captured parent explicitly. Await points will create AwaitGuard spans that also use the explicit parent. At UI render time we have the complete call tree, so we can detect if an awaited function was properly instrumented by checking if the blocking span has any child spans. If there are no children, we show a warning in the UI that the function appears uninstrumented. We'll render blocking spans with arrows pointing to their children to show the waiting relationships.
|
||||||
|
|
||||||
|
Why this approach: By capturing the parent once at future creation and storing it in the future's state, we sidestep all the thread-migration and thread-local storage problems. The parent ID moves with the future automatically as part of its state machine. This works with any executor without requiring executor-specific integration. For the interleaving problem, since we're not using thread-local parent tracking for async spans, there's no issue with multiple tasks running on the same thread - each has its parent baked into its state. Sync code continues to use thread-local parent tracking which works fine. The compile-time warning problem is solved by deferring warnings to runtime where we have full tree information.
|
||||||
|
|
||||||
|
What we're not doing: We're not trying to track the full async lifecycle with individual poll calls and yields. We're not integrating with the async executor or wakers. We're not trying to show gaps between polls or track when futures are suspended vs resumed. We're not emitting compile-time warnings about uninstrumented functions because we can't reliably detect this at macro expansion time without type information. We're not using thread-local parent tracking for async functions (only for sync functions). We're not trying to automatically instrument all expressions within async functions like instrument_calls does (though we could add that as an enhancement). Those features could be phase 2 but the current approach gives us solid async profiling without deep runtime integration.
|
||||||
Reference in New Issue
Block a user