Compare commits
10 Commits
2cf75febdc
...
3da6ffaa77
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3da6ffaa77 | ||
|
|
6d1c59fb99 | ||
|
|
4b348d12be | ||
|
|
aeacaf6118 | ||
|
|
978678a46e | ||
|
|
078447539c | ||
|
|
e9fdbb1160 | ||
|
|
8cbef1dfc1 | ||
|
|
d3ab81b833 | ||
|
|
51bfccc3c2 |
2
.gitignore
vendored
2
.gitignore
vendored
@@ -1,2 +1,2 @@
|
||||
/target
|
||||
target
|
||||
Cargo.lock
|
||||
|
||||
320
BENCHMARKS_AND_TUNING.md
Normal file
320
BENCHMARKS_AND_TUNING.md
Normal file
@@ -0,0 +1,320 @@
|
||||
# smarm — Benchmarks & Tuning Recommendations
|
||||
|
||||
> Based on bench suite v0.3.0, Intel Xeon @ 2.80 GHz, 1-core sandbox,
|
||||
> kernel 6.18.5, rustc 1.95.0. Multi-core conclusions are extrapolated from
|
||||
> design reasoning and single-core sweep data; re-validate on real hardware.
|
||||
|
||||
---
|
||||
|
||||
## TL;DR
|
||||
|
||||
smarm is competitive with tokio for **channel-heavy, message-passing workloads**
|
||||
and wins outright on **uncontended channels** and **panic/unwind isolation**.
|
||||
It is significantly slower than tokio for **spawn-heavy** patterns and
|
||||
**timer-heavy** workloads. The preemption knobs (`alloc_interval`,
|
||||
`timeslice_cycles`) have minimal effect on single-core machines; they matter
|
||||
on multi-core under scheduler-thread contention.
|
||||
|
||||
---
|
||||
|
||||
## Bench results summary
|
||||
|
||||
All medians in µs. Tokio column is `current_thread` unless noted.
|
||||
|
||||
| Bench | smarm | tokio | ratio | winner |
|
||||
|----------------------|--------|--------|--------|---------------|
|
||||
| `chained_spawn` | 8 625 | 124 | 70× | tokio |
|
||||
| `ping_pong_oneshot` | 16 848 | 879 | 19× | tokio |
|
||||
| `spawn_storm_busy` | 126 k | 2 772 | 45× | tokio |
|
||||
| `yield_many` | 41 622 | 15 085 | 2.8× | tokio |
|
||||
| `yield_in_hot_loop` | 190 k | 153 k | 1.25× | tokio |
|
||||
| `many_timers` | 143 k | 14 462 | 10× | tokio |
|
||||
| `fan_out_compute` | 29 727 | 28 503 | 1.04× | **even** |
|
||||
| `multi_thread_scaling` | 30 k | 29 k | 1.04× | **even** |
|
||||
| `deep_recursion` | 83 | 25 | 3.3× | tokio |
|
||||
| `mpsc_contention` | 9 062 | 17 570 | 0.52× | **smarm** 1.9× |
|
||||
| `uncontended_channel`| 27 265 | 51 888 | 0.53× | **smarm** 1.9× |
|
||||
| `catch_unwind_panics`| 142 k | 682 k | 0.21× | **smarm** 4.8× |
|
||||
|
||||
---
|
||||
|
||||
## Where smarm wins
|
||||
|
||||
### Uncontended channels (1.9× faster)
|
||||
|
||||
When a single producer sends to a single consumer with no other actors
|
||||
competing for the queue, smarm's channel is meaningfully faster than
|
||||
tokio's. This is the core use case smarm is designed for: pipelines of
|
||||
actors passing owned data along a chain.
|
||||
|
||||
**Recommendation**: smarm is a good fit for any architecture where data
|
||||
flows through a chain of stages, each stage is an actor, and the
|
||||
channel between stages is the primary synchronisation point.
|
||||
|
||||
### Uncontended MPSC (1.9× faster, same reason)
|
||||
|
||||
Multi-producer single-consumer works well for the same reason. On a
|
||||
single-thread runtime, smarm's mutex is uncontended, so the lock is
|
||||
essentially free. On multi-core this advantage will shrink; re-measure.
|
||||
|
||||
### Panic isolation (4.8× faster recovery)
|
||||
|
||||
`catch_unwind_panics` creates 10 000 actors that each panic. smarm
|
||||
recovers and delivers `Signal::Panic` to the supervisor 4.8× faster
|
||||
than tokio. This matters if you're building a system that uses panics
|
||||
as a fast abort path for malformed input or actor-level faults, or if
|
||||
you're using supervision trees seriously.
|
||||
|
||||
**Recommendation**: if your system expects panics to be a normal
|
||||
operational event (not just bugs), smarm's supervision story is a
|
||||
genuine advantage over tokio's task abort model.
|
||||
|
||||
---
|
||||
|
||||
## Where smarm loses, and why
|
||||
|
||||
### Spawn-heavy workloads (19–70×)
|
||||
|
||||
Every smarm actor `mmap`s a 64 KiB stack with a guard page. This is
|
||||
a syscall. Tokio tasks are heap-allocated state machines — no stack,
|
||||
no syscall, ~100 bytes each. For workloads that spawn thousands of
|
||||
short-lived actors per second, this is a structural disadvantage.
|
||||
|
||||
**Recommendations**:
|
||||
- Avoid spawning actors for work that completes in microseconds.
|
||||
Use a worker-pool pattern: spawn N long-lived actors at startup,
|
||||
distribute work over channels.
|
||||
- If you genuinely need high-frequency short-lived actors, the stack
|
||||
allocation cost is a known roadmap item (stack caching, slab alloc).
|
||||
It is not an inherent design flaw — just not implemented yet.
|
||||
- `deep_recursion` shows the same problem at depth 500: smarm spawns
|
||||
a fresh actor per level, paying the mmap cost repeatedly. Recursive
|
||||
decomposition should use explicit stacks or iteration inside a single
|
||||
actor, not actor-per-level spawning.
|
||||
|
||||
### Timer-heavy workloads (10×)
|
||||
|
||||
smarm uses a global min-heap of `(deadline, Pid)` pairs behind the
|
||||
shared mutex. Tokio uses a sharded hierarchical timer wheel. With
|
||||
10 000 pending timers, smarm's O(log N) heap under lock is
|
||||
dramatically slower.
|
||||
|
||||
**Recommendations**:
|
||||
- Do not use smarm `sleep()` in tight loops with many concurrent
|
||||
sleeping actors if timing precision matters.
|
||||
- For IO timeouts: prefer a single timer actor that manages a priority
|
||||
queue and fans out wakeups over channels, rather than 1 000 actors
|
||||
each sleeping directly.
|
||||
- The hierarchical timer wheel is listed in `LOOM.md` deferred work.
|
||||
It is the correct fix if timer performance becomes a bottleneck.
|
||||
|
||||
### Yield overhead (2.8× in `yield_many`, 1.25× in `yield_in_hot_loop`)
|
||||
|
||||
Every `yield_now()` goes through the runtime mutex and run queue even
|
||||
on a single-thread scheduler. Tokio's current_thread scheduler handles
|
||||
yields with much lower overhead. smarm's naked context-switch is fast,
|
||||
but the lock acquisition around it dominates for high-frequency yields.
|
||||
|
||||
**Recommendation**: minimise explicit `yield_now()` calls in hot paths.
|
||||
In message-passing workloads this is natural — yield happens at
|
||||
`recv()` and `send()`, which is appropriate. If you are using
|
||||
`yield_now()` in a tight loop, consider whether the actor should
|
||||
instead be blocking on a channel or sleeping.
|
||||
|
||||
---
|
||||
|
||||
## Preemption knob recommendations
|
||||
|
||||
The knobs are `Config::alloc_interval(n)` and `Config::timeslice_cycles(c)`.
|
||||
Default: `alloc_interval = 128`, `timeslice_cycles = 300_000` (≈100 µs at 3 GHz).
|
||||
|
||||
### Findings from the sweep
|
||||
|
||||
The sweep varied alloc_interval in `{32, 64, 128, 256, 512}` and
|
||||
timeslice_cycles in `{150k, 300k, 600k, 1200k}` — 10 points total.
|
||||
|
||||
On a single-CPU machine the knobs are almost inert: most benches move
|
||||
< 5% across the entire grid. The exceptions are meaningful:
|
||||
|
||||
**Longer timeslices hurt under contention.** At `tc=600k` and `tc=1200k`:
|
||||
|
||||
- `spawn_storm_busy` degrades +11–15%
|
||||
- `catch_unwind_panics` degrades +10–12%
|
||||
|
||||
The cause: 8 background yielder actors hold the scheduler mutex longer
|
||||
per timeslice, delaying the 10 000 actors waiting to be joined. A
|
||||
longer timeslice amplifies the global-mutex bottleneck.
|
||||
|
||||
**Shorter timeslices marginally help timer-heavy work.** At `tc=150k`,
|
||||
`many_timers` improves 3–4%. Actors that are sleeping get rescheduled
|
||||
sooner because the runtime polls the timer heap more frequently.
|
||||
|
||||
**alloc_interval has no clear winner.** Moving from 32 to 512 causes
|
||||
< 3% variation on every bench. The check frequency is not the
|
||||
bottleneck — the lock is.
|
||||
|
||||
### Recommended starting points
|
||||
|
||||
| Workload | alloc_interval | timeslice_cycles |
|
||||
|-----------------------------------|----------------|------------------|
|
||||
| Default (unknown) | 128 (default) | 300 000 (default)|
|
||||
| Many concurrent sleeping actors | 128 | 150 000 |
|
||||
| High-throughput channel pipeline | 128 | 300 000 |
|
||||
| Compute-heavy (few allocs) | 32 | 300 000 |
|
||||
| Strict fairness / many actors | 64 | 150 000 |
|
||||
| Long-running compute batches | 256 | 600 000 |
|
||||
|
||||
**Note on `timeslice_cycles` calibration**: the default was tuned for
|
||||
≈100 µs on a 3 GHz CPU. On a 2.8 GHz machine that's ≈107 µs. On a
|
||||
4 GHz machine it's ≈75 µs. If you want a precise target timeslice,
|
||||
measure your CPU's TSC frequency at startup and set the cycles value
|
||||
accordingly:
|
||||
|
||||
```rust
|
||||
// Approximate TSC frequency measurement (call once at startup)
|
||||
fn tsc_hz() -> u64 {
|
||||
let t0 = smarm::preempt::rdtsc();
|
||||
std::thread::sleep(std::time::Duration::from_millis(100));
|
||||
let t1 = smarm::preempt::rdtsc();
|
||||
(t1 - t0) * 10 // extrapolate to 1 second
|
||||
}
|
||||
|
||||
let target_us = 100u64; // desired timeslice in microseconds
|
||||
let cycles = tsc_hz() / 1_000_000 * target_us;
|
||||
|
||||
let rt = smarm::runtime::init(
|
||||
smarm::runtime::Config::default()
|
||||
.timeslice_cycles(cycles)
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture recommendations
|
||||
|
||||
### Use actor pools, not per-request actors
|
||||
|
||||
```rust
|
||||
// Avoid: spawning an actor per request
|
||||
for req in requests {
|
||||
spawn(move || handle(req));
|
||||
}
|
||||
|
||||
// Prefer: fixed pool, channel dispatch
|
||||
let (tx, rx) = channel();
|
||||
for _ in 0..num_cpus {
|
||||
let rx = rx.clone();
|
||||
spawn(move || { while let Ok(req) = rx.recv() { handle(req); } });
|
||||
}
|
||||
for req in requests { tx.send(req).unwrap(); }
|
||||
```
|
||||
|
||||
The worker pool pattern amortises the 64 KiB mmap cost over the
|
||||
lifetime of the pool. The `chained_spawn` bench shows this cost is
|
||||
real: 8 625 µs for 1 000 sequential spawns vs tokio's 124 µs.
|
||||
|
||||
### Supervision for fault isolation
|
||||
|
||||
smarm delivers `Signal::Panic(pid, payload)` to the supervisor when an
|
||||
actor panics. Use `spawn_under` to register a supervisor channel and
|
||||
build restart logic:
|
||||
|
||||
```rust
|
||||
let (sup_tx, sup_rx) = channel::<smarm::Signal>();
|
||||
let child = smarm::spawn_under(sup_tx.clone(), move || {
|
||||
// ... actor body ...
|
||||
});
|
||||
|
||||
// Supervisor loop
|
||||
loop {
|
||||
match sup_rx.recv() {
|
||||
Ok(Signal::Panic(pid, _)) => {
|
||||
// restart, escalate, or record
|
||||
}
|
||||
Ok(Signal::Exit(_)) => break,
|
||||
Err(_) => break,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This pattern has essentially zero overhead compared to unmonitored
|
||||
spawning, and the `catch_unwind_panics` bench confirms it is 4.8×
|
||||
faster than tokio's abort/recover cycle.
|
||||
|
||||
### Explicit preemption in no-alloc hot loops
|
||||
|
||||
The allocator-driven preemption mechanism fires every `alloc_interval`
|
||||
allocations. Code that never allocates (tight numeric loops, parsing
|
||||
fixed-size buffers) will never yield preemptively. Add `smarm::check!()`
|
||||
at the natural loop boundary:
|
||||
|
||||
```rust
|
||||
for chunk in data.chunks(4096) {
|
||||
process(chunk); // no allocations
|
||||
smarm::check!(); // yield if timeslice expired
|
||||
}
|
||||
```
|
||||
|
||||
This is explicitly called out in `LOOM.md` as a known limitation.
|
||||
The `yield_in_hot_loop` bench (1M iterations of `yield_now()`) shows
|
||||
smarm is 1.25× slower than tokio even with explicit yields, which sets
|
||||
the floor on how much `check!()` can help in truly tight loops.
|
||||
|
||||
### IO-bound work
|
||||
|
||||
smarm's IO path (`wait_readable`, `wait_writable`, `block_on_io`) parks
|
||||
the actor without blocking the OS scheduler thread. This is correct and
|
||||
works well. There is no specific bench for IO-bound workloads in the
|
||||
current suite, but the architecture is sound for network servers and
|
||||
file-IO pipelines.
|
||||
|
||||
---
|
||||
|
||||
## Known limitations and roadmap items
|
||||
|
||||
These are from `LOOM.md` plus observations from the bench suite.
|
||||
|
||||
| Limitation | Impact | Roadmap status |
|
||||
|-------------------------------|--------------------|--------------------|
|
||||
| No stack size caching / slab | High spawn cost | Deferred |
|
||||
| Global single min-heap timers | Poor at many timers| Deferred (hierarch. wheel) |
|
||||
| Global `Mutex<RunQueue>` | Lock contention | Deferred (per-thread queues) |
|
||||
| No `join!()` macro | Ergonomics | Deferred |
|
||||
| x86-64 Linux only | Portability | ARM64 deferred |
|
||||
| No restart intensity caps | Supervision safety | Deferred |
|
||||
| Yield overhead under lock | Hot-loop fairness | Structural / ongoing |
|
||||
|
||||
The yield overhead and global mutex are the two issues most likely to
|
||||
matter on a real multi-core workload. The sweep confirmed that
|
||||
`timeslice_cycles` is a meaningful knob for controlling the mutex
|
||||
hold time; the right long-term fix is per-thread run queues with
|
||||
work stealing.
|
||||
|
||||
---
|
||||
|
||||
## Running the bench suite
|
||||
|
||||
```sh
|
||||
# Run all benches once, print results
|
||||
python3 benches/sweep.py run
|
||||
|
||||
# Save current results as regression baseline
|
||||
python3 benches/sweep.py run --save-baseline
|
||||
|
||||
# Check for regressions (>10% slower than baseline → exit 1)
|
||||
python3 benches/sweep.py regress
|
||||
|
||||
# Sweep preemption knobs across the grid defined in sweep.py
|
||||
python3 benches/sweep.py sweep
|
||||
|
||||
# Sweep and save raw data as CSV
|
||||
python3 benches/sweep.py sweep --save-csv results.csv
|
||||
|
||||
# Run a single knob configuration manually
|
||||
SMARM_ALLOC_INTERVAL=64 SMARM_TIMESLICE_CYCLES=150000 \
|
||||
cargo bench --bench general
|
||||
```
|
||||
|
||||
The regression threshold is 10% and is configurable in `sweep.py`
|
||||
(`REGRESSION_THRESHOLD_PCT`). The sweep grid is `SWEEP_GRID` in the
|
||||
same file.
|
||||
12
Cargo.toml
12
Cargo.toml
@@ -1,14 +1,18 @@
|
||||
[package]
|
||||
name = "smarm"
|
||||
version = "0.1.0"
|
||||
version = "0.3.0"
|
||||
edition = "2021"
|
||||
rust-version = "1.95"
|
||||
|
||||
[features]
|
||||
smarm-trace = []
|
||||
|
||||
[dependencies]
|
||||
libc = "0.2"
|
||||
|
||||
[dev-dependencies]
|
||||
tokio = { version = "1", features = ["rt", "macros", "sync"] }
|
||||
libc = "0.2"
|
||||
tokio = { version = "1", features = ["rt", "rt-multi-thread", "macros", "sync"] }
|
||||
|
||||
[profile.dev]
|
||||
panic = "unwind"
|
||||
@@ -21,3 +25,7 @@ codegen-units = 1
|
||||
[[bench]]
|
||||
name = "primes"
|
||||
harness = false
|
||||
|
||||
[[bench]]
|
||||
name = "multi_scheduler"
|
||||
harness = false
|
||||
|
||||
210
LOOM.md
Normal file
210
LOOM.md
Normal file
@@ -0,0 +1,210 @@
|
||||
# Loom
|
||||
|
||||
> Erlang-style actor concurrency for Rust, without the copies, the colors, or the GC pauses.
|
||||
|
||||
---
|
||||
|
||||
## Vision
|
||||
|
||||
Rust gives you the right ownership discipline for safe actor concurrency almost for free — `Send` already
|
||||
draws the boundary, the borrow checker already enforces it. What it lacks is an execution model to match:
|
||||
async/await is IO-centric, colors your functions, and trades stack simplicity for state-machine complexity;
|
||||
OS threads are too heavy to spawn per actor.
|
||||
|
||||
Loom adds a third option: **green-thread actors on a shared heap**, scheduled cooperatively, with
|
||||
message-passing as the only cross-actor communication primitive. You get Erlang's isolation model without
|
||||
Erlang's copying GC, and you get Rust's zero-copy ownership transfers without async's cognitive overhead.
|
||||
No function coloring. No `Box<dyn Future>`. Just actors, messages, and the borrow checker doing what it
|
||||
already does.
|
||||
|
||||
---
|
||||
|
||||
## Do: Core Runtime
|
||||
|
||||
### Actors and scheduling
|
||||
|
||||
Each actor is a lightweight green thread with its own heap-allocated, growable stack. Stacks are
|
||||
allocated via `mmap` with a guard page below the region; overflow is detected by the OS without Loom
|
||||
polling for it. Initial stacks are small and grow by remapping on demand.
|
||||
|
||||
The scheduler runs one OS thread per CPU. Each scheduler thread loops against a single global
|
||||
`Mutex<HashMap>` queue shared across all schedulers. If queue contention becomes a measured bottleneck
|
||||
this can be revisited; the interface will not change.
|
||||
|
||||
Loom requires `panic = unwind`. Users who set `panic = abort` accept that supervision and actor
|
||||
isolation are silently degraded to process death.
|
||||
|
||||
### Process descriptor
|
||||
|
||||
Each actor has a descriptor that is hot while the actor runs and will typically live in L1 cache.
|
||||
It holds:
|
||||
|
||||
- `stack_base: *mut u8` — bottom of the allocated stack region
|
||||
- `stack_cap: usize` — total allocated size
|
||||
- `stack_ptr: *mut u8` — current stack pointer (`rsp`), saved on yield
|
||||
- `pid: (u32, u32)` — index and generation counter (see PIDs below)
|
||||
- `alloc_count: u32` — countdown for preemption sampling
|
||||
- `timeslice_start: u64` — `RDTSC` value written on every resume
|
||||
- `resize_count: u16` — diagnostic counter for stack growth events
|
||||
- `context: *mut ContextSaveArea` — pointer to the register save area (cold, touched only on switch)
|
||||
|
||||
### Context switching
|
||||
|
||||
Context switching is implemented in a `#[naked]` assembly shim, one per supported architecture.
|
||||
The compiler cannot be asked to switch stacks.
|
||||
|
||||
**Suspend** (yield, preemption, or blocking):
|
||||
1. Save callee-saved integer registers and SIMD registers into `ContextSaveArea`.
|
||||
2. Save `rsp`/`sp` into the process descriptor.
|
||||
3. Load the scheduler's stack pointer from a thread-local and jump back into the scheduler loop.
|
||||
|
||||
**Resume**:
|
||||
1. Load `rsp`/`sp` from the process descriptor.
|
||||
2. Restore registers from `ContextSaveArea`.
|
||||
3. `ret` — the return address is already on the restored stack, execution resumes exactly where the
|
||||
actor yielded.
|
||||
|
||||
**x86-64**: saves `rbx`, `rbp`, `r12`–`r15` (6 × 8 = 48 bytes) and `xmm0`–`xmm15` (16 × 16 = 256
|
||||
bytes) = 304 bytes total. Full SSE baseline is required; the compiler may autovectorise freely.
|
||||
AVX-512 is deferred.
|
||||
|
||||
**ARM64**: saves `x19`–`x30` (12 × 8 = 96 bytes, including the link register `x30` which must be
|
||||
saved explicitly — it holds the return address, unlike x86 where `call` pushes it to the stack) and
|
||||
`d8`–`d15` (8 × 8 = 64 bytes) = 160 bytes total.
|
||||
|
||||
`ContextSaveArea` is a `Box<ContextSaveArea>` per actor. Lifetime equals the actor's lifetime;
|
||||
no churn, no bulk deallocation, `Box` is correct.
|
||||
|
||||
Initial platform target is x86-64 Linux. ARM64 and macOS are natural follow-ons.
|
||||
|
||||
### Allocator-driven preemption
|
||||
|
||||
Every Nth allocation, the allocator reads `RDTSC` and compares it against `timeslice_start`. If the
|
||||
threshold is exceeded the actor yields. The workloads that starve a scheduler — sustained compute,
|
||||
data transformation — are precisely the ones doing frequent allocations, so this approximation is
|
||||
correct by construction.
|
||||
|
||||
`RDTSC` is not monotonic across core migration; a slightly wrong timeslice is acceptable. Loom is
|
||||
not a real-time scheduler.
|
||||
|
||||
Known failure mode: tight no-alloc loops are invisible to this mechanism. Actors doing sustained
|
||||
allocation-free compute must call `loom::yield_now()` explicitly, or offload to a thread pool
|
||||
outside the actor scheduler (e.g. rayon). This is documented and acceptable — such loops are rare
|
||||
in message-passing workloads.
|
||||
|
||||
### Yield points
|
||||
|
||||
An actor yields at:
|
||||
|
||||
- **Channel send/recv** — the primary communication primitive
|
||||
- **Mutex contention** — attempting to lock a held `Arc<Mutex<>>` parks the actor
|
||||
- **IO** — blocking on a socket or file descriptor parks the actor until the IO thread signals readiness
|
||||
- **`loom::sleep(duration)`** — parks the actor; the timer wheel re-queues it on expiry
|
||||
- **`loom::yield_now()`** — explicit cooperative yield
|
||||
- **Allocator preemption** — as above
|
||||
- **Spawn** — does not yield by default; the new actor is queued and the spawner continues
|
||||
|
||||
`std::thread::sleep` inside an actor blocks the entire OS thread and should never be used. Loom
|
||||
may emit a warning if it can detect this.
|
||||
|
||||
### IO thread
|
||||
|
||||
A single dedicated IO thread runs an `epoll`/`kqueue` loop. Actors blocking on IO register their
|
||||
file descriptor and PID; the IO thread moves them back into the global queue when the fd is ready.
|
||||
A `HashMap<RawFd, Pid>` maps fds to parked actors. Cancellation (actor dies while waiting on IO)
|
||||
deregisters the fd. This is intentionally simple and not pluggable; Loom is not a general async
|
||||
executor.
|
||||
|
||||
### Communication
|
||||
|
||||
Messages must be `Send` or `Copy`. Non-`Send` types cannot cross an actor boundary; this is
|
||||
enforced by the type system with no runtime overhead.
|
||||
|
||||
Two primitives only:
|
||||
|
||||
- **Move** — transfer owned data across a channel. Zero copy. The sender relinquishes ownership
|
||||
at the type level. This is the default.
|
||||
- **`Arc<Mutex<T>>`** — for genuinely shared long-lived state. Explicit and visible.
|
||||
|
||||
Cross-actor `Rc` or bare pointers are banned. There is no cycle detector. Cross-actor cycles are
|
||||
banned by construction: either transfer ownership or use `Arc`.
|
||||
|
||||
### PIDs
|
||||
|
||||
A PID is a `(index, generation)` pair. The index may be reused after an actor dies; the generation
|
||||
counter increments on every death. A stale handle holding the wrong generation is a detectable
|
||||
error, not a silent misdirection. This avoids the ABA problem without reserving PID space forever.
|
||||
|
||||
### Supervision
|
||||
|
||||
Every actor has a supervisor, assigned at spawn. This is not optional. The root supervisor is
|
||||
provided by the runtime; its death is a process exit.
|
||||
|
||||
A supervisor receives one of three signals when a child actor terminates:
|
||||
|
||||
- `Signal::Exit(pid)` — normal completion
|
||||
- `Signal::Panic(pid, payload)` — caught via `catch_unwind` at the actor entry point boundary,
|
||||
before unwinding can reach the assembly shim
|
||||
- `Signal::Timeout(pid)` — actor exceeded a budget (see below)
|
||||
|
||||
The supervisor decides: restart the actor, escalate to its own supervisor, or ignore. Restart
|
||||
intensity is capped: if an actor panics more than N times within a time window, the supervisor
|
||||
stops restarting and escalates. This prevents a bad prelude or corrupted input from spinning the
|
||||
supervisor in a restart loop indefinitely. N and the window are configurable per supervisor with a
|
||||
sensible global default.
|
||||
|
||||
### Mutex timeout
|
||||
|
||||
Every `loom::mutex` lock attempt is mediated by the scheduler. If the lock is not acquired within
|
||||
a configurable timeout, the actor receives a `LockTimeout` error rather than parking forever. This
|
||||
is a hard runtime guarantee, not a convention. Default timeout is global and configurable;
|
||||
individual locks and individual call sites can override it.
|
||||
|
||||
### Task joining
|
||||
|
||||
Actors can spawn children and wait on a group of handles:
|
||||
|
||||
```rust
|
||||
let h1 = loom::spawn(|| compute_a());
|
||||
let h2 = loom::spawn(|| compute_b());
|
||||
let (a, b) = loom::join!(h1, h2);
|
||||
```
|
||||
|
||||
`join!` parks the calling actor until all handles complete. The last child to finish re-queues the
|
||||
parent. This is a countdown in the parent's descriptor; no polling, no waker registration. A
|
||||
`join_timeout!` variant is a natural extension.
|
||||
|
||||
### Timer wheel
|
||||
|
||||
`loom::sleep` and supervision timeouts are driven by a timer wheel in the scheduler. Sleeping
|
||||
actors are parked and re-queued by the timer thread on expiry. The timer wheel is internal
|
||||
infrastructure; its design is an implementation detail.
|
||||
|
||||
---
|
||||
|
||||
## Defer: Later Work
|
||||
|
||||
- **Stack sizing policy** — initial size, growth factor, and whether stacks ever shrink are
|
||||
implementation decisions to be made with profiling data, not up front.
|
||||
- **Queue contention** — if `Mutex<HashMap>` proves to be a bottleneck under profiling, evaluate
|
||||
`DashMap` or a lock-free work-stealing deque (e.g. `crossbeam-deque`). Not before.
|
||||
- **AVX-512 context save** — extend `ContextSaveArea` when there is a concrete use case.
|
||||
- **`loom::sleep` vs raw sleep semantics** — further control knobs deferred until the basic sleep
|
||||
is working and real use cases are understood.
|
||||
- **Supervision tree API** — the contract is defined; the recursive hierarchy, restart strategies,
|
||||
and introspection API are implementation work.
|
||||
- **no_std support** — the assembly shim is no_std friendly but the IO thread and allocator require
|
||||
OS primitives. Target is no_std + `alloc` on hosted platforms; bare metal is out of scope.
|
||||
- **Distribution** — Loom is a single-process runtime. No distribution protocol, no BEAM-style
|
||||
clustering.
|
||||
|
||||
---
|
||||
|
||||
## What Loom is Not
|
||||
|
||||
- Not a drop-in replacement for Tokio. Loom does not implement `Future` or the async executor interface.
|
||||
- Not a general allocator. Loom manages actor stacks; heap allocation for actor data goes through
|
||||
the system allocator.
|
||||
- Not Erlang. No hot code reloading, no distribution protocol, no BEAM bytecode. Loom is a
|
||||
concurrency runtime, not a platform.
|
||||
- Not a real-time scheduler. Timeslice accuracy is best-effort.
|
||||
82
README.md
Normal file
82
README.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# smarm
|
||||
|
||||
> Silly Marks Abstract Rust Machine. A prototype green-thread actor runtime for Rust.
|
||||
|
||||
Implements the core ideas in [`LOOM.md`](./LOOM.md): green-thread actors on a
|
||||
shared heap, scheduled cooperatively, communicating only by `Send` messages.
|
||||
Erlang's isolation model without Erlang's copying GC, Rust's zero-copy
|
||||
ownership transfers without async's function colouring.
|
||||
|
||||
The scheduler is multi-threaded — one OS thread per available CPU, all drawing
|
||||
from a shared run queue. The single-threaded `run()` entry point is kept as a
|
||||
convenience wrapper around `runtime::init(Config::exact(1)).run(f)`.
|
||||
|
||||
## What's here
|
||||
|
||||
| Module | What it does |
|
||||
|--------------|------------------------------------------------------------------------|
|
||||
| `stack` | `mmap`'d growable stack with guard page; SIGSEGV on overflow |
|
||||
| `context` | `#[naked]` x86-64 context-switch shims, callee-saved regs only |
|
||||
| `preempt` | Allocator-driven preemption; `check!()` macro for no-alloc loops |
|
||||
| `pid` | `(index, generation)` PIDs; stale handles are detectable, not silent |
|
||||
| `actor` | Trampoline + `catch_unwind` boundary at the actor entry point |
|
||||
| `scheduler` | Run queue, slot table, spawn/join, parking, idle path |
|
||||
| `channel` | Unbounded MPSC channel; `recv` parks the actor |
|
||||
| `mutex` | `Mutex<T>` with mandatory timeout; FIFO waiters; parks the green thread |
|
||||
| `timer` | Min-heap of `(deadline, reason)`; `Sleep` and `WaitTimeout` reasons |
|
||||
| `io` | `block_on_io` for blocking work; `wait_readable`/`wait_writable` + `read`/`write` via epoll |
|
||||
| `supervisor` | `Signal::Exit` / `Signal::Panic` delivered to a parent actor's mailbox |
|
||||
|
||||
## Quick taste
|
||||
|
||||
```rust
|
||||
use smarm::{run, spawn, channel};
|
||||
|
||||
run(|| {
|
||||
let (tx, rx) = channel::<i64>();
|
||||
let h = spawn(move || {
|
||||
for _ in 0..3 {
|
||||
let v = rx.recv().unwrap();
|
||||
println!("got {v}");
|
||||
}
|
||||
});
|
||||
for v in 1..=3i64 {
|
||||
tx.send(v).unwrap();
|
||||
}
|
||||
h.join().unwrap();
|
||||
});
|
||||
```
|
||||
|
||||
## Layout
|
||||
|
||||
```
|
||||
src/
|
||||
stack.rs context.rs preempt.rs pid.rs actor.rs
|
||||
scheduler.rs channel.rs mutex.rs timer.rs io.rs supervisor.rs
|
||||
lib.rs
|
||||
tests/
|
||||
per-module integration tests
|
||||
benches/
|
||||
primes.rs fan-out/fan-in compute, vs tokio current_thread
|
||||
LOOM.md design intent
|
||||
```
|
||||
|
||||
## Building and running
|
||||
|
||||
Standard Cargo. Requires Rust 1.95 or newer (the `#[naked]` attribute went stable
|
||||
in 1.88; we use a few unrelated post-1.88 features). x86-64 Linux only —
|
||||
ARM64 and macOS are on the deferred list because of the assembly shim and the
|
||||
epoll dependency.
|
||||
|
||||
```sh
|
||||
cargo test # all tests
|
||||
cargo test --test mutex # one module
|
||||
cargo bench # primes benchmark vs tokio
|
||||
```
|
||||
|
||||
## What's not here
|
||||
|
||||
See the **Defer** section of `LOOM.md`. Notable absences: supervisor
|
||||
restart-intensity caps, `join!` for handle groups, stack growth via remap,
|
||||
hierarchical timer wheel, fd-wait timeouts, `Signal::Timeout`. Each is
|
||||
mechanism we know how to add; none belongs in this iteration.
|
||||
44
benches/baseline-output/general.txt
Normal file
44
benches/baseline-output/general.txt
Normal file
@@ -0,0 +1,44 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 7136 | 6929 | 8347
|
||||
smarm 1-thread | 1000 | 6979 | 6790 | 7364
|
||||
tokio current_thread | 1000 | 113 | 112 | 322
|
||||
tokio multi-thread | 1000 | 176 | 170 | 355
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 40079 | 39606 | 41913
|
||||
smarm 1-thread | 200000 | 40073 | 39298 | 43173
|
||||
tokio current_thread | 200000 | 14571 | 14430 | 14670
|
||||
tokio multi-thread | 200000 | 14044 | 13306 | 14432
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 19347 | 19185 | 19703
|
||||
smarm 1-thread | 33860 | 19461 | 19202 | 21172
|
||||
tokio current_thread | 33860 | 18616 | 18553 | 18987
|
||||
tokio multi-thread | 33860 | 18905 | 18755 | 19035
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 13731 | 13555 | 15545
|
||||
smarm 1-thread | 1000 | 14176 | 13870 | 14892
|
||||
tokio current_thread | 1000 | 828 | 788 | 939
|
||||
tokio multi-thread | 1000 | 3342 | 3233 | 3624
|
||||
34
benches/baseline-output/multi_scheduler.txt
Normal file
34
benches/baseline-output/multi_scheduler.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
smarm multi-scheduler benchmarks
|
||||
available parallelism: 1 threads
|
||||
PRIME_N=400000, WORKERS=64, PING_ROUNDS=10000, SPAWN_COUNT=1000
|
||||
|
||||
================================================================================
|
||||
Fan-out/fan-in: count primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
baseline (serial) | 33860 | 18581 | 18519 | 18905
|
||||
smarm single-thread | 33860 | 19467 | 19354 | 22082
|
||||
smarm 1-thread | 33860 | 19345 | 19287 | 19653
|
||||
tokio current_thread | 33860 | 18681 | 18591 | 18982
|
||||
tokio multi-thread | 33860 | 18948 | 18726 | 19212
|
||||
|
||||
================================================================================
|
||||
Ping-pong: 10000 round-trips between two actors
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm single-thread | 10000 | 2547 | 2473 | 2841
|
||||
smarm 1-thread | 10000 | 2546 | 2518 | 2702
|
||||
tokio current_thread | 10000 | 1221 | 1168 | 1366
|
||||
tokio multi-thread | 10000 | 1487 | 1316 | 2331
|
||||
|
||||
================================================================================
|
||||
Spawn throughput: 1000 actors spawned and joined
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm single-thread | 1000 | 8934 | 8066 | 12204
|
||||
smarm 1-thread | 1000 | 8102 | 8041 | 10849
|
||||
tokio current_thread | 1000 | 212 | 210 | 331
|
||||
tokio multi-thread | 1000 | 330 | 301 | 604
|
||||
7
benches/baseline-output/primes.txt
Normal file
7
benches/baseline-output/primes.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
Counting primes in [2, 200000) across 16 workers, 5 iterations each
|
||||
|
||||
runtime | primes found | median | min | max
|
||||
--------------------------------------------------------------------------------
|
||||
baseline | primes: 17984 | median: 7244 µs | min: 7231 µs | max: 7509 µs
|
||||
smarm | primes: 17984 | median: 7592 µs | min: 7505 µs | max: 8130 µs
|
||||
tokio | primes: 17984 | median: 7263 µs | min: 7225 µs | max: 9067 µs
|
||||
40
benches/baseline-output/smarm_favored.txt
Normal file
40
benches/baseline-output/smarm_favored.txt
Normal file
@@ -0,0 +1,40 @@
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 62 | 59 | 682
|
||||
smarm 1-thread | 1 | 71 | 61 | 210
|
||||
tokio current_thread | 1 | 22 | 22 | 23
|
||||
tokio multi-thread | 1 | 44 | 38 | 79
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 182177 | 180380 | 184410
|
||||
tokio current_thread | 1000000 | 138335 | 136097 | 141196
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 31473 | 28719 | 33113
|
||||
tokio current_thread | 1000000 | 51925 | 51205 | 53043
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 112306 | 109702 | 119859
|
||||
smarm 1-thread | 10000 | 114305 | 112030 | 121326
|
||||
tokio current_thread | 10000 | 151443 | 150949 | 153800
|
||||
tokio multi-thread | 10000 | 161344 | 160385 | 167573
|
||||
126
benches/baseline-output/sweep/ai128_tc1200k.txt
Normal file
126
benches/baseline-output/sweep/ai128_tc1200k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 8720 | 8526 | 9319
|
||||
smarm 1-thread | 1000 | 8662 | 8571 | 8991
|
||||
tokio current_thread | 1000 | 123 | 123 | 152
|
||||
tokio multi-thread | 1000 | 188 | 184 | 230
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 41530 | 41242 | 43501
|
||||
smarm 1-thread | 200000 | 41575 | 41187 | 43323
|
||||
tokio current_thread | 200000 | 15098 | 15020 | 15348
|
||||
tokio multi-thread | 200000 | 15900 | 15827 | 16012
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29573 | 29435 | 31647
|
||||
smarm 1-thread | 33860 | 29521 | 29453 | 29847
|
||||
tokio current_thread | 33860 | 28495 | 28441 | 30150
|
||||
tokio multi-thread | 33860 | 34384 | 34297 | 34745
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 17190 | 16994 | 17541
|
||||
smarm 1-thread | 1000 | 17078 | 16916 | 19139
|
||||
tokio current_thread | 1000 | 899 | 896 | 1000
|
||||
tokio multi-thread | 1000 | 4198 | 4116 | 4573
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 138556 | 136165 | 140947
|
||||
smarm 1-thread | 10000 | 140223 | 136325 | 146781
|
||||
tokio current_thread | 10000 | 2671 | 2622 | 2913
|
||||
tokio multi-thread | 10000 | 6004 | 4360 | 12576
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9051 | 8967 | 11152
|
||||
smarm 1-thread | 320000 | 9058 | 9008 | 9998
|
||||
tokio current_thread | 320000 | 17375 | 17131 | 18514
|
||||
tokio multi-thread | 320000 | 17955 | 17452 | 18508
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 156969 | 153124 | 167711
|
||||
smarm 1-thread | 10000 | 150638 | 146070 | 168286
|
||||
tokio current_thread | 10000 | 13823 | 13482 | 14796
|
||||
tokio multi-thread | 10000 | 15034 | 14425 | 15320
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30075 | 29707 | 30720
|
||||
tokio multi 1-thread | 33860 | 29060 | 28835 | 44378
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 86 | 79 | 130
|
||||
smarm 1-thread | 1 | 83 | 78 | 146
|
||||
tokio current_thread | 1 | 25 | 25 | 31
|
||||
tokio multi-thread | 1 | 49 | 46 | 85
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 190902 | 187600 | 194333
|
||||
tokio current_thread | 1000000 | 150279 | 148175 | 188184
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 27687 | 27198 | 29555
|
||||
tokio current_thread | 1000000 | 54465 | 54048 | 55954
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 160308 | 154365 | 167009
|
||||
smarm 1-thread | 10000 | 158662 | 155458 | 168896
|
||||
tokio current_thread | 10000 | 267762 | 260876 | 294092
|
||||
tokio multi-thread | 10000 | 275097 | 269344 | 287681
|
||||
126
benches/baseline-output/sweep/ai128_tc150k.txt
Normal file
126
benches/baseline-output/sweep/ai128_tc150k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 8596 | 8491 | 8805
|
||||
smarm 1-thread | 1000 | 8552 | 8461 | 9003
|
||||
tokio current_thread | 1000 | 125 | 125 | 260
|
||||
tokio multi-thread | 1000 | 190 | 184 | 338
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 41885 | 41112 | 43292
|
||||
smarm 1-thread | 200000 | 42174 | 41063 | 43145
|
||||
tokio current_thread | 200000 | 15195 | 15010 | 15589
|
||||
tokio multi-thread | 200000 | 16037 | 15869 | 17057
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29872 | 29629 | 31596
|
||||
smarm 1-thread | 33860 | 29776 | 29528 | 30003
|
||||
tokio current_thread | 33860 | 28705 | 28605 | 30287
|
||||
tokio multi-thread | 33860 | 34655 | 34503 | 36596
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 16898 | 16574 | 17386
|
||||
smarm 1-thread | 1000 | 16871 | 16677 | 18467
|
||||
tokio current_thread | 1000 | 897 | 857 | 991
|
||||
tokio multi-thread | 1000 | 4325 | 4228 | 4458
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 133462 | 129526 | 138685
|
||||
smarm 1-thread | 10000 | 130118 | 127633 | 142344
|
||||
tokio current_thread | 10000 | 2713 | 2608 | 2831
|
||||
tokio multi-thread | 10000 | 7367 | 4345 | 11741
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9077 | 8944 | 9287
|
||||
smarm 1-thread | 320000 | 9100 | 9033 | 10604
|
||||
tokio current_thread | 320000 | 17310 | 17122 | 18616
|
||||
tokio multi-thread | 320000 | 17484 | 17413 | 17748
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 140039 | 135577 | 145123
|
||||
smarm 1-thread | 10000 | 139931 | 135513 | 143841
|
||||
tokio current_thread | 10000 | 14524 | 14378 | 14564
|
||||
tokio multi-thread | 10000 | 15066 | 14677 | 15336
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29620 | 29511 | 31347
|
||||
tokio multi 1-thread | 33860 | 29046 | 28817 | 29687
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 94 | 79 | 371
|
||||
smarm 1-thread | 1 | 183 | 83 | 317
|
||||
tokio current_thread | 1 | 25 | 25 | 31
|
||||
tokio multi-thread | 1 | 54 | 41 | 71
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 189034 | 187674 | 192204
|
||||
tokio current_thread | 1000000 | 151106 | 149564 | 155601
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 26949 | 26838 | 30868
|
||||
tokio current_thread | 1000000 | 52984 | 52149 | 55141
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 145860 | 143015 | 152734
|
||||
smarm 1-thread | 10000 | 144550 | 141592 | 149247
|
||||
tokio current_thread | 10000 | 267500 | 265301 | 278751
|
||||
tokio multi-thread | 10000 | 275320 | 268986 | 286891
|
||||
126
benches/baseline-output/sweep/ai128_tc300k.txt
Normal file
126
benches/baseline-output/sweep/ai128_tc300k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 8469 | 8414 | 8717
|
||||
smarm 1-thread | 1000 | 8625 | 8479 | 10212
|
||||
tokio current_thread | 1000 | 124 | 123 | 175
|
||||
tokio multi-thread | 1000 | 194 | 184 | 317
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 41949 | 41419 | 43784
|
||||
smarm 1-thread | 200000 | 42005 | 41491 | 45224
|
||||
tokio current_thread | 200000 | 15139 | 15049 | 16352
|
||||
tokio multi-thread | 200000 | 15985 | 15931 | 16306
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29640 | 29515 | 31229
|
||||
smarm 1-thread | 33860 | 29777 | 29642 | 30056
|
||||
tokio current_thread | 33860 | 28704 | 28584 | 30317
|
||||
tokio multi-thread | 33860 | 34870 | 34569 | 35876
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 17098 | 16968 | 18688
|
||||
smarm 1-thread | 1000 | 16918 | 16736 | 17326
|
||||
tokio current_thread | 1000 | 915 | 882 | 1000
|
||||
tokio multi-thread | 1000 | 4371 | 4265 | 4834
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 127075 | 124760 | 130259
|
||||
smarm 1-thread | 10000 | 125976 | 125121 | 128728
|
||||
tokio current_thread | 10000 | 2703 | 2646 | 2807
|
||||
tokio multi-thread | 10000 | 7201 | 4267 | 12853
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9116 | 8985 | 9237
|
||||
smarm 1-thread | 320000 | 9062 | 8947 | 10648
|
||||
tokio current_thread | 320000 | 17380 | 17192 | 18363
|
||||
tokio multi-thread | 320000 | 17854 | 17554 | 18219
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 137944 | 132081 | 141862
|
||||
smarm 1-thread | 10000 | 143773 | 137448 | 153703
|
||||
tokio current_thread | 10000 | 14174 | 13751 | 15079
|
||||
tokio multi-thread | 10000 | 15244 | 14625 | 16700
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30832 | 30082 | 33360
|
||||
tokio multi 1-thread | 33860 | 29736 | 29321 | 29958
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 84 | 78 | 122
|
||||
smarm 1-thread | 1 | 90 | 79 | 157
|
||||
tokio current_thread | 1 | 25 | 25 | 31
|
||||
tokio multi-thread | 1 | 48 | 47 | 62
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 190830 | 188562 | 196621
|
||||
tokio current_thread | 1000000 | 151537 | 150038 | 165825
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 27265 | 26969 | 29317
|
||||
tokio current_thread | 1000000 | 53894 | 53380 | 56189
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 145006 | 144092 | 149002
|
||||
smarm 1-thread | 10000 | 144417 | 142000 | 148224
|
||||
tokio current_thread | 10000 | 265376 | 260227 | 272279
|
||||
tokio multi-thread | 10000 | 277432 | 270860 | 283266
|
||||
126
benches/baseline-output/sweep/ai128_tc600k.txt
Normal file
126
benches/baseline-output/sweep/ai128_tc600k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 8721 | 8398 | 8994
|
||||
smarm 1-thread | 1000 | 8587 | 8440 | 8810
|
||||
tokio current_thread | 1000 | 124 | 124 | 294
|
||||
tokio multi-thread | 1000 | 188 | 184 | 299
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 42588 | 42084 | 45080
|
||||
smarm 1-thread | 200000 | 42252 | 41963 | 43615
|
||||
tokio current_thread | 200000 | 15101 | 14994 | 15573
|
||||
tokio multi-thread | 200000 | 15979 | 15890 | 16356
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29686 | 29491 | 31263
|
||||
smarm 1-thread | 33860 | 29841 | 29586 | 30570
|
||||
tokio current_thread | 33860 | 28652 | 28510 | 30359
|
||||
tokio multi-thread | 33860 | 34677 | 34461 | 35318
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 16909 | 16579 | 20782
|
||||
smarm 1-thread | 1000 | 16888 | 16537 | 20808
|
||||
tokio current_thread | 1000 | 925 | 911 | 1021
|
||||
tokio multi-thread | 1000 | 4192 | 4079 | 4531
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 145813 | 142042 | 152501
|
||||
smarm 1-thread | 10000 | 145119 | 141282 | 161294
|
||||
tokio current_thread | 10000 | 2968 | 2899 | 3231
|
||||
tokio multi-thread | 10000 | 6288 | 4289 | 12226
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9662 | 9254 | 11370
|
||||
smarm 1-thread | 320000 | 9673 | 9331 | 9989
|
||||
tokio current_thread | 320000 | 18015 | 17334 | 21096
|
||||
tokio multi-thread | 320000 | 18384 | 17837 | 19534
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 160492 | 154795 | 180307
|
||||
smarm 1-thread | 10000 | 161716 | 156498 | 191986
|
||||
tokio current_thread | 10000 | 13895 | 13576 | 14913
|
||||
tokio multi-thread | 10000 | 15074 | 14665 | 16070
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30001 | 29600 | 38039
|
||||
tokio multi 1-thread | 33860 | 29419 | 28906 | 30079
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 91 | 79 | 186
|
||||
smarm 1-thread | 1 | 87 | 81 | 131
|
||||
tokio current_thread | 1 | 25 | 25 | 103
|
||||
tokio multi-thread | 1 | 56 | 47 | 64
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 190023 | 188250 | 193824
|
||||
tokio current_thread | 1000000 | 154681 | 152074 | 187328
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 27264 | 26772 | 29512
|
||||
tokio current_thread | 1000000 | 53324 | 51744 | 59282
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 155983 | 152595 | 161438
|
||||
smarm 1-thread | 10000 | 162122 | 156170 | 200357
|
||||
tokio current_thread | 10000 | 276303 | 264291 | 296266
|
||||
tokio multi-thread | 10000 | 271350 | 267654 | 285897
|
||||
126
benches/baseline-output/sweep/ai256_tc300k.txt
Normal file
126
benches/baseline-output/sweep/ai256_tc300k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 9130 | 8720 | 10611
|
||||
smarm 1-thread | 1000 | 8808 | 8617 | 9659
|
||||
tokio current_thread | 1000 | 126 | 125 | 164
|
||||
tokio multi-thread | 1000 | 190 | 184 | 329
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 42270 | 41814 | 44737
|
||||
smarm 1-thread | 200000 | 42999 | 42104 | 45424
|
||||
tokio current_thread | 200000 | 15441 | 15196 | 16096
|
||||
tokio multi-thread | 200000 | 16249 | 16070 | 17620
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29813 | 29627 | 30176
|
||||
smarm 1-thread | 33860 | 29613 | 29440 | 31205
|
||||
tokio current_thread | 33860 | 28637 | 28406 | 29179
|
||||
tokio multi-thread | 33860 | 34472 | 34389 | 36092
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 16899 | 16804 | 17017
|
||||
smarm 1-thread | 1000 | 17001 | 16704 | 19533
|
||||
tokio current_thread | 1000 | 914 | 893 | 1021
|
||||
tokio multi-thread | 1000 | 4198 | 4136 | 4297
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 128621 | 126503 | 132268
|
||||
smarm 1-thread | 10000 | 131316 | 128354 | 133964
|
||||
tokio current_thread | 10000 | 2763 | 2696 | 2996
|
||||
tokio multi-thread | 10000 | 6023 | 4300 | 12908
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9225 | 9071 | 11272
|
||||
smarm 1-thread | 320000 | 9174 | 9028 | 9335
|
||||
tokio current_thread | 320000 | 17210 | 17100 | 18404
|
||||
tokio multi-thread | 320000 | 17550 | 17413 | 18080
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 136396 | 133330 | 142485
|
||||
smarm 1-thread | 10000 | 137374 | 134345 | 141168
|
||||
tokio current_thread | 10000 | 13789 | 13499 | 14621
|
||||
tokio multi-thread | 10000 | 15036 | 14729 | 15359
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30065 | 29819 | 32418
|
||||
tokio multi 1-thread | 33860 | 29501 | 28916 | 30057
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 94 | 81 | 257
|
||||
smarm 1-thread | 1 | 83 | 80 | 134
|
||||
tokio current_thread | 1 | 25 | 25 | 33
|
||||
tokio multi-thread | 1 | 57 | 48 | 109
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 188506 | 187971 | 190121
|
||||
tokio current_thread | 1000000 | 149663 | 148978 | 150733
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 26945 | 26703 | 29430
|
||||
tokio current_thread | 1000000 | 52332 | 51838 | 54062
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 146192 | 143776 | 150609
|
||||
smarm 1-thread | 10000 | 144012 | 140604 | 153892
|
||||
tokio current_thread | 10000 | 268341 | 260941 | 275404
|
||||
tokio multi-thread | 10000 | 272691 | 268094 | 307084
|
||||
126
benches/baseline-output/sweep/ai32_tc150k.txt
Normal file
126
benches/baseline-output/sweep/ai32_tc150k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 8653 | 8522 | 9163
|
||||
smarm 1-thread | 1000 | 8908 | 8660 | 10606
|
||||
tokio current_thread | 1000 | 124 | 123 | 175
|
||||
tokio multi-thread | 1000 | 244 | 184 | 340
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 42597 | 41857 | 43492
|
||||
smarm 1-thread | 200000 | 42621 | 42097 | 44386
|
||||
tokio current_thread | 200000 | 15368 | 15144 | 16484
|
||||
tokio multi-thread | 200000 | 16120 | 16012 | 19222
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30499 | 29657 | 33910
|
||||
smarm 1-thread | 33860 | 31190 | 30105 | 32675
|
||||
tokio current_thread | 33860 | 28748 | 28643 | 29398
|
||||
tokio multi-thread | 33860 | 34714 | 34499 | 36338
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 16990 | 16853 | 17540
|
||||
smarm 1-thread | 1000 | 16944 | 16740 | 18603
|
||||
tokio current_thread | 1000 | 937 | 921 | 1056
|
||||
tokio multi-thread | 1000 | 4342 | 4205 | 4549
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 130032 | 128075 | 153842
|
||||
smarm 1-thread | 10000 | 126396 | 125101 | 131406
|
||||
tokio current_thread | 10000 | 2685 | 2629 | 2841
|
||||
tokio multi-thread | 10000 | 6014 | 4126 | 11484
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9122 | 8987 | 9334
|
||||
smarm 1-thread | 320000 | 9073 | 8956 | 10151
|
||||
tokio current_thread | 320000 | 17259 | 17163 | 17673
|
||||
tokio multi-thread | 320000 | 22771 | 17709 | 24514
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 137844 | 134570 | 157034
|
||||
smarm 1-thread | 10000 | 141200 | 137494 | 156214
|
||||
tokio current_thread | 10000 | 14809 | 14024 | 16518
|
||||
tokio multi-thread | 10000 | 15089 | 14704 | 15331
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30880 | 29931 | 32667
|
||||
tokio multi 1-thread | 33860 | 29862 | 29116 | 31310
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 90 | 80 | 196
|
||||
smarm 1-thread | 1 | 87 | 79 | 126
|
||||
tokio current_thread | 1 | 25 | 25 | 53
|
||||
tokio multi-thread | 1 | 52 | 47 | 88
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 191187 | 187194 | 198269
|
||||
tokio current_thread | 1000000 | 152531 | 151113 | 154462
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 27413 | 27312 | 29463
|
||||
tokio current_thread | 1000000 | 53620 | 52594 | 55332
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 144199 | 141893 | 157984
|
||||
smarm 1-thread | 10000 | 144857 | 142722 | 152275
|
||||
tokio current_thread | 10000 | 268006 | 264666 | 274542
|
||||
tokio multi-thread | 10000 | 271827 | 268740 | 290301
|
||||
126
benches/baseline-output/sweep/ai32_tc300k.txt
Normal file
126
benches/baseline-output/sweep/ai32_tc300k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 8950 | 8591 | 10655
|
||||
smarm 1-thread | 1000 | 9688 | 8657 | 11720
|
||||
tokio current_thread | 1000 | 123 | 123 | 256
|
||||
tokio multi-thread | 1000 | 192 | 177 | 314
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 42965 | 41667 | 44850
|
||||
smarm 1-thread | 200000 | 42881 | 41634 | 48864
|
||||
tokio current_thread | 200000 | 15112 | 14986 | 15484
|
||||
tokio multi-thread | 200000 | 16006 | 15915 | 16647
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29931 | 29750 | 31707
|
||||
smarm 1-thread | 33860 | 29977 | 29670 | 30996
|
||||
tokio current_thread | 33860 | 28615 | 28441 | 30188
|
||||
tokio multi-thread | 33860 | 34371 | 34330 | 35176
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 16753 | 16498 | 18516
|
||||
smarm 1-thread | 1000 | 16728 | 16599 | 16874
|
||||
tokio current_thread | 1000 | 940 | 933 | 1037
|
||||
tokio multi-thread | 1000 | 4317 | 4236 | 4427
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 132575 | 128629 | 136999
|
||||
smarm 1-thread | 10000 | 130313 | 127372 | 157234
|
||||
tokio current_thread | 10000 | 2689 | 2611 | 2833
|
||||
tokio multi-thread | 10000 | 11337 | 4288 | 12635
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9122 | 9000 | 11033
|
||||
smarm 1-thread | 320000 | 9143 | 9015 | 9333
|
||||
tokio current_thread | 320000 | 17705 | 17250 | 18111
|
||||
tokio multi-thread | 320000 | 18044 | 17621 | 19484
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 141925 | 135531 | 188381
|
||||
smarm 1-thread | 10000 | 139655 | 134291 | 146458
|
||||
tokio current_thread | 10000 | 13837 | 13621 | 14877
|
||||
tokio multi-thread | 10000 | 14992 | 14542 | 15237
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29687 | 29554 | 31408
|
||||
tokio multi 1-thread | 33860 | 28963 | 28742 | 30236
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 83 | 80 | 128
|
||||
smarm 1-thread | 1 | 86 | 77 | 149
|
||||
tokio current_thread | 1 | 25 | 25 | 50
|
||||
tokio multi-thread | 1 | 53 | 47 | 84
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 197474 | 194313 | 201690
|
||||
tokio current_thread | 1000000 | 149289 | 148575 | 154319
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 26884 | 26675 | 29436
|
||||
tokio current_thread | 1000000 | 52594 | 51941 | 54495
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 148321 | 146050 | 152943
|
||||
smarm 1-thread | 10000 | 147961 | 144521 | 152158
|
||||
tokio current_thread | 10000 | 264487 | 260848 | 274838
|
||||
tokio multi-thread | 10000 | 272103 | 265687 | 285209
|
||||
126
benches/baseline-output/sweep/ai512_tc300k.txt
Normal file
126
benches/baseline-output/sweep/ai512_tc300k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 8574 | 8421 | 8729
|
||||
smarm 1-thread | 1000 | 8675 | 8401 | 12686
|
||||
tokio current_thread | 1000 | 125 | 125 | 148
|
||||
tokio multi-thread | 1000 | 188 | 184 | 291
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 42389 | 41316 | 46466
|
||||
smarm 1-thread | 200000 | 41776 | 41342 | 48940
|
||||
tokio current_thread | 200000 | 15168 | 15094 | 15658
|
||||
tokio multi-thread | 200000 | 15953 | 15862 | 17408
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29680 | 29572 | 30661
|
||||
smarm 1-thread | 33860 | 29816 | 29597 | 30401
|
||||
tokio current_thread | 33860 | 28657 | 28581 | 29488
|
||||
tokio multi-thread | 33860 | 34837 | 34529 | 37270
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 16735 | 16601 | 17444
|
||||
smarm 1-thread | 1000 | 16702 | 16500 | 17184
|
||||
tokio current_thread | 1000 | 898 | 873 | 994
|
||||
tokio multi-thread | 1000 | 4343 | 4241 | 4448
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 128408 | 126199 | 133268
|
||||
smarm 1-thread | 10000 | 131599 | 129387 | 135080
|
||||
tokio current_thread | 10000 | 2718 | 2661 | 2981
|
||||
tokio multi-thread | 10000 | 7264 | 4608 | 11583
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9289 | 9039 | 9751
|
||||
smarm 1-thread | 320000 | 9510 | 9157 | 9677
|
||||
tokio current_thread | 320000 | 17550 | 17290 | 18578
|
||||
tokio multi-thread | 320000 | 18336 | 17527 | 18989
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 139111 | 136105 | 146606
|
||||
smarm 1-thread | 10000 | 137302 | 133316 | 141350
|
||||
tokio current_thread | 10000 | 13720 | 13455 | 14607
|
||||
tokio multi-thread | 10000 | 14964 | 14546 | 15400
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30048 | 29705 | 31530
|
||||
tokio multi 1-thread | 33860 | 28894 | 28682 | 30094
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 93 | 81 | 161
|
||||
smarm 1-thread | 1 | 103 | 80 | 178
|
||||
tokio current_thread | 1 | 25 | 25 | 28
|
||||
tokio multi-thread | 1 | 53 | 47 | 74
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 188726 | 187640 | 192658
|
||||
tokio current_thread | 1000000 | 149332 | 148133 | 155745
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 27630 | 27086 | 29749
|
||||
tokio current_thread | 1000000 | 54225 | 53355 | 56307
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 144934 | 143038 | 163552
|
||||
smarm 1-thread | 10000 | 146614 | 143653 | 151325
|
||||
tokio current_thread | 10000 | 266330 | 263523 | 271639
|
||||
tokio multi-thread | 10000 | 274729 | 266323 | 285114
|
||||
126
benches/baseline-output/sweep/ai64_tc150k.txt
Normal file
126
benches/baseline-output/sweep/ai64_tc150k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 8849 | 8486 | 9224
|
||||
smarm 1-thread | 1000 | 8841 | 8477 | 9108
|
||||
tokio current_thread | 1000 | 124 | 124 | 219
|
||||
tokio multi-thread | 1000 | 187 | 184 | 283
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 41681 | 41278 | 43685
|
||||
smarm 1-thread | 200000 | 41721 | 41218 | 42261
|
||||
tokio current_thread | 200000 | 14969 | 14940 | 15051
|
||||
tokio multi-thread | 200000 | 16004 | 15868 | 17569
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 29679 | 29516 | 30105
|
||||
smarm 1-thread | 33860 | 29677 | 29594 | 31365
|
||||
tokio current_thread | 33860 | 28656 | 28572 | 29239
|
||||
tokio multi-thread | 33860 | 34783 | 34617 | 36531
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 17009 | 16822 | 17418
|
||||
smarm 1-thread | 1000 | 16866 | 16723 | 17315
|
||||
tokio current_thread | 1000 | 880 | 871 | 1035
|
||||
tokio multi-thread | 1000 | 4263 | 4178 | 4391
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 126566 | 124995 | 130402
|
||||
smarm 1-thread | 10000 | 128278 | 126209 | 135156
|
||||
tokio current_thread | 10000 | 2680 | 2640 | 2787
|
||||
tokio multi-thread | 10000 | 7411 | 4393 | 12421
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9073 | 8937 | 9324
|
||||
smarm 1-thread | 320000 | 9120 | 9018 | 9263
|
||||
tokio current_thread | 320000 | 17245 | 17180 | 17574
|
||||
tokio multi-thread | 320000 | 18518 | 17685 | 19621
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 141855 | 135415 | 145810
|
||||
smarm 1-thread | 10000 | 138265 | 135535 | 142346
|
||||
tokio current_thread | 10000 | 14441 | 13453 | 14650
|
||||
tokio multi-thread | 10000 | 14956 | 14529 | 15451
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30033 | 29659 | 31803
|
||||
tokio multi 1-thread | 33860 | 29078 | 28963 | 30231
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 83 | 79 | 132
|
||||
smarm 1-thread | 1 | 85 | 78 | 146
|
||||
tokio current_thread | 1 | 25 | 25 | 73
|
||||
tokio multi-thread | 1 | 51 | 47 | 64
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 191352 | 188830 | 196235
|
||||
tokio current_thread | 1000000 | 152382 | 150674 | 187815
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 27552 | 27099 | 30612
|
||||
tokio current_thread | 1000000 | 53160 | 52436 | 55255
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 145243 | 143291 | 173727
|
||||
smarm 1-thread | 10000 | 145242 | 142819 | 148457
|
||||
tokio current_thread | 10000 | 266471 | 262904 | 269145
|
||||
tokio multi-thread | 10000 | 274195 | 269312 | 286111
|
||||
126
benches/baseline-output/sweep/ai64_tc300k.txt
Normal file
126
benches/baseline-output/sweep/ai64_tc300k.txt
Normal file
@@ -0,0 +1,126 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 8735 | 8508 | 9314
|
||||
smarm 1-thread | 1000 | 8808 | 8506 | 10346
|
||||
tokio current_thread | 1000 | 123 | 123 | 172
|
||||
tokio multi-thread | 1000 | 190 | 184 | 273
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 41619 | 41255 | 43489
|
||||
smarm 1-thread | 200000 | 41544 | 41196 | 43259
|
||||
tokio current_thread | 200000 | 15382 | 15233 | 16007
|
||||
tokio multi-thread | 200000 | 16095 | 15999 | 16296
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30032 | 29838 | 31744
|
||||
smarm 1-thread | 33860 | 29782 | 29653 | 30601
|
||||
tokio current_thread | 33860 | 28754 | 28614 | 30700
|
||||
tokio multi-thread | 33860 | 34988 | 34570 | 36871
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 17088 | 16868 | 18654
|
||||
smarm 1-thread | 1000 | 16951 | 16797 | 17783
|
||||
tokio current_thread | 1000 | 932 | 899 | 1019
|
||||
tokio multi-thread | 1000 | 4340 | 4273 | 5245
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 129009 | 127353 | 132990
|
||||
smarm 1-thread | 10000 | 128009 | 126554 | 140472
|
||||
tokio current_thread | 10000 | 2666 | 2624 | 2794
|
||||
tokio multi-thread | 10000 | 5974 | 4368 | 11517
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 9044 | 8970 | 10788
|
||||
smarm 1-thread | 320000 | 9087 | 8995 | 12500
|
||||
tokio current_thread | 320000 | 17185 | 17072 | 18440
|
||||
tokio multi-thread | 320000 | 17720 | 17394 | 19182
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 145819 | 140671 | 150512
|
||||
smarm 1-thread | 10000 | 139046 | 135846 | 146127
|
||||
tokio current_thread | 10000 | 13866 | 13522 | 14670
|
||||
tokio multi-thread | 10000 | 14900 | 14471 | 16378
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 30695 | 29720 | 33196
|
||||
tokio multi 1-thread | 33860 | 29261 | 28895 | 31013
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 82 | 79 | 113
|
||||
smarm 1-thread | 1 | 85 | 78 | 143
|
||||
tokio current_thread | 1 | 25 | 25 | 56
|
||||
tokio multi-thread | 1 | 50 | 47 | 63
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 188698 | 187922 | 192263
|
||||
tokio current_thread | 1000000 | 150231 | 148746 | 151723
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 28461 | 27638 | 30283
|
||||
tokio current_thread | 1000000 | 52224 | 51880 | 54732
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 144604 | 143246 | 145585
|
||||
smarm 1-thread | 10000 | 148208 | 142691 | 151076
|
||||
tokio current_thread | 10000 | 265255 | 260637 | 271065
|
||||
tokio multi-thread | 10000 | 273131 | 271313 | 300420
|
||||
42
benches/baseline-output/tokio_favored.txt
Normal file
42
benches/baseline-output/tokio_favored.txt
Normal file
@@ -0,0 +1,42 @@
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 105512 | 102322 | 120552
|
||||
smarm 1-thread | 10000 | 107113 | 104048 | 112377
|
||||
tokio current_thread | 10000 | 2222 | 2124 | 2506
|
||||
tokio multi-thread | 10000 | 4546 | 3833 | 7305
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 10456 | 10331 | 10639
|
||||
smarm 1-thread | 320000 | 10395 | 9201 | 10549
|
||||
tokio current_thread | 320000 | 17348 | 16639 | 19061
|
||||
tokio multi-thread | 320000 | 18628 | 17499 | 19298
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 120242 | 116239 | 127200
|
||||
smarm 1-thread | 10000 | 121023 | 113997 | 127826
|
||||
tokio current_thread | 10000 | 13581 | 13182 | 14415
|
||||
tokio multi-thread | 10000 | 14266 | 14084 | 14843
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 19852 | 19601 | 22679
|
||||
tokio multi 1-thread | 33860 | 19638 | 18994 | 20102
|
||||
224
benches/baseline.json
Normal file
224
benches/baseline.json
Normal file
@@ -0,0 +1,224 @@
|
||||
{
|
||||
"chained_spawn": {
|
||||
"smarm 1-thread": {
|
||||
"result": 1000,
|
||||
"median": 8637,
|
||||
"min": 8553,
|
||||
"max": 8933
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 1000,
|
||||
"median": 124,
|
||||
"min": 124,
|
||||
"max": 153
|
||||
},
|
||||
"tokio multi-thread": {
|
||||
"result": 1000,
|
||||
"median": 188,
|
||||
"min": 183,
|
||||
"max": 229
|
||||
}
|
||||
},
|
||||
"yield_many": {
|
||||
"smarm 1-thread": {
|
||||
"result": 200000,
|
||||
"median": 41622,
|
||||
"min": 41063,
|
||||
"max": 44973
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 200000,
|
||||
"median": 15085,
|
||||
"min": 15013,
|
||||
"max": 15274
|
||||
},
|
||||
"tokio multi-thread": {
|
||||
"result": 200000,
|
||||
"median": 15964,
|
||||
"min": 15880,
|
||||
"max": 17959
|
||||
}
|
||||
},
|
||||
"fan_out_compute": {
|
||||
"smarm 1-thread": {
|
||||
"result": 33860,
|
||||
"median": 29727,
|
||||
"min": 29491,
|
||||
"max": 31634
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 33860,
|
||||
"median": 28503,
|
||||
"min": 28391,
|
||||
"max": 28866
|
||||
},
|
||||
"tokio multi-thread": {
|
||||
"result": 33860,
|
||||
"median": 34542,
|
||||
"min": 34396,
|
||||
"max": 36111
|
||||
}
|
||||
},
|
||||
"ping_pong_oneshot": {
|
||||
"smarm 1-thread": {
|
||||
"result": 1000,
|
||||
"median": 16848,
|
||||
"min": 16633,
|
||||
"max": 17301
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 1000,
|
||||
"median": 879,
|
||||
"min": 868,
|
||||
"max": 973
|
||||
},
|
||||
"tokio multi-thread": {
|
||||
"result": 1000,
|
||||
"median": 4328,
|
||||
"min": 4223,
|
||||
"max": 4461
|
||||
}
|
||||
},
|
||||
"spawn_storm_busy": {
|
||||
"smarm 1-thread": {
|
||||
"result": 10000,
|
||||
"median": 130058,
|
||||
"min": 126790,
|
||||
"max": 134475
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 10000,
|
||||
"median": 2772,
|
||||
"min": 2641,
|
||||
"max": 4367
|
||||
},
|
||||
"tokio multi-thread": {
|
||||
"result": 10000,
|
||||
"median": 7462,
|
||||
"min": 4469,
|
||||
"max": 12892
|
||||
}
|
||||
},
|
||||
"mpsc_contention": {
|
||||
"smarm 1-thread": {
|
||||
"result": 320000,
|
||||
"median": 9260,
|
||||
"min": 9095,
|
||||
"max": 10081
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 320000,
|
||||
"median": 17570,
|
||||
"min": 17213,
|
||||
"max": 18276
|
||||
},
|
||||
"tokio multi-thread": {
|
||||
"result": 320000,
|
||||
"median": 17593,
|
||||
"min": 17452,
|
||||
"max": 19564
|
||||
}
|
||||
},
|
||||
"many_timers": {
|
||||
"smarm 1-thread": {
|
||||
"result": 10000,
|
||||
"median": 135806,
|
||||
"min": 132573,
|
||||
"max": 141651
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 10000,
|
||||
"median": 14462,
|
||||
"min": 13555,
|
||||
"max": 15457
|
||||
},
|
||||
"tokio multi-thread": {
|
||||
"result": 10000,
|
||||
"median": 15011,
|
||||
"min": 14655,
|
||||
"max": 15368
|
||||
}
|
||||
},
|
||||
"multi_thread_scaling": {
|
||||
"smarm 1-thread": {
|
||||
"result": 33860,
|
||||
"median": 30029,
|
||||
"min": 29720,
|
||||
"max": 31351
|
||||
},
|
||||
"tokio multi 1-thread": {
|
||||
"result": 33860,
|
||||
"median": 28983,
|
||||
"min": 28908,
|
||||
"max": 29323
|
||||
}
|
||||
},
|
||||
"deep_recursion": {
|
||||
"smarm 1-thread": {
|
||||
"result": 1,
|
||||
"median": 83,
|
||||
"min": 78,
|
||||
"max": 587
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 1,
|
||||
"median": 25,
|
||||
"min": 25,
|
||||
"max": 33
|
||||
},
|
||||
"tokio multi-thread": {
|
||||
"result": 1,
|
||||
"median": 59,
|
||||
"min": 47,
|
||||
"max": 205
|
||||
}
|
||||
},
|
||||
"yield_in_hot_loop": {
|
||||
"smarm 1-thread": {
|
||||
"result": 1000000,
|
||||
"median": 188753,
|
||||
"min": 187007,
|
||||
"max": 194366
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 1000000,
|
||||
"median": 153929,
|
||||
"min": 152712,
|
||||
"max": 158749
|
||||
}
|
||||
},
|
||||
"uncontended_channel": {
|
||||
"smarm 1-thread": {
|
||||
"result": 1000000,
|
||||
"median": 26811,
|
||||
"min": 26498,
|
||||
"max": 29069
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 1000000,
|
||||
"median": 51888,
|
||||
"min": 51530,
|
||||
"max": 52708
|
||||
}
|
||||
},
|
||||
"catch_unwind_panics": {
|
||||
"smarm 1-thread": {
|
||||
"result": 10000,
|
||||
"median": 142215,
|
||||
"min": 140189,
|
||||
"max": 143570
|
||||
},
|
||||
"tokio current_thread": {
|
||||
"result": 10000,
|
||||
"median": 682295,
|
||||
"min": 670281,
|
||||
"max": 700774
|
||||
},
|
||||
"tokio multi-thread": {
|
||||
"result": 10000,
|
||||
"median": 662688,
|
||||
"min": 641453,
|
||||
"max": 681868
|
||||
}
|
||||
}
|
||||
}
|
||||
442
benches/general.rs
Normal file
442
benches/general.rs
Normal file
@@ -0,0 +1,442 @@
|
||||
//! General benchmarks — workloads where neither runtime has a structural
|
||||
//! advantage. Both should be competitive; large gaps here indicate a real
|
||||
//! difference in per-task or per-yield overhead.
|
||||
//!
|
||||
//! Workloads:
|
||||
//! 1. chained_spawn — task N spawns N+1, depth 1000. Spawn+exit overhead in
|
||||
//! a serial chain. Adapted from tokio's bench of the same
|
||||
//! name.
|
||||
//! 2. yield_many — 200 actors × 1000 yields. Pure scheduling throughput
|
||||
//! with no allocation, no IO. Adapted from tokio.
|
||||
//! 3. fan_out_compute— count primes in [2, 400_000) across 64 workers. Same
|
||||
//! shape as multi_scheduler::primes but lives here for
|
||||
//! completeness.
|
||||
//! 4. ping_pong_oneshot — N rounds of (spawn pair, send oneshot, await).
|
||||
//! Closer to a request/response workload than channel
|
||||
//! ping-pong.
|
||||
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::Instant;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Shared harness
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const ITERS: u32 = 15;
|
||||
|
||||
fn available_threads() -> usize {
|
||||
std::thread::available_parallelism().map(|n| n.get()).unwrap_or(1)
|
||||
}
|
||||
|
||||
fn print_header(title: &str) {
|
||||
println!("\n{}", "=".repeat(80));
|
||||
println!(" {title}");
|
||||
println!("{}", "=".repeat(80));
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
"runtime", "result", "median µs", "min µs", "max µs"
|
||||
);
|
||||
println!("{}", "-".repeat(80));
|
||||
}
|
||||
|
||||
fn run_n<F: FnMut() -> (u64, u128)>(name: &str, n: u32, mut f: F) {
|
||||
let mut times = Vec::new();
|
||||
let mut last = 0u64;
|
||||
// One warmup iteration, discarded.
|
||||
let _ = f();
|
||||
for _ in 0..n {
|
||||
let (v, t) = f();
|
||||
times.push(t);
|
||||
last = v;
|
||||
}
|
||||
times.sort_unstable();
|
||||
let median = times[times.len() / 2];
|
||||
let min = *times.iter().min().unwrap();
|
||||
let max = *times.iter().max().unwrap();
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
name, last, median, min, max
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 1. chained_spawn — depth 1000
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const CHAIN_DEPTH: u64 = 1_000;
|
||||
|
||||
fn bench_chained_smarm(threads: usize) -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c2 = counter.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(move || {
|
||||
// Fire-and-forget chain, matching tokio's bench shape: each link
|
||||
// spawns the next link and exits immediately; depth 0 signals done
|
||||
// via a channel. Crucially this does *not* nest joins on the
|
||||
// spawner's stack — important because smarm actor stacks are a
|
||||
// fixed 64 KiB.
|
||||
let (tx, rx) = smarm::channel::<()>();
|
||||
fn iter(c: Arc<AtomicU64>, tx: smarm::Sender<()>, n: u64) {
|
||||
if n == 0 {
|
||||
tx.send(()).unwrap();
|
||||
} else {
|
||||
let cc = c.clone();
|
||||
smarm::spawn(move || {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
iter(cc.clone(), tx, n - 1);
|
||||
});
|
||||
// Caller exits; JoinHandle dropped, no parking.
|
||||
}
|
||||
}
|
||||
iter(c2, tx, CHAIN_DEPTH);
|
||||
rx.recv().unwrap();
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_chained_tokio_current() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c2 = counter.clone();
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
// Use a oneshot done channel like tokio's own chained_spawn bench.
|
||||
let (done_tx, done_rx) = tokio::sync::oneshot::channel();
|
||||
fn iter(
|
||||
c: Arc<AtomicU64>,
|
||||
done: tokio::sync::oneshot::Sender<()>,
|
||||
n: u64,
|
||||
) {
|
||||
if n == 0 {
|
||||
let _ = done.send(());
|
||||
} else {
|
||||
tokio::task::spawn_local(async move {
|
||||
c.fetch_add(1, Ordering::Relaxed);
|
||||
iter(c, done, n - 1);
|
||||
});
|
||||
}
|
||||
}
|
||||
iter(c2, done_tx, CHAIN_DEPTH);
|
||||
let _ = done_rx.await;
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_chained_tokio_multi() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c2 = counter.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let (done_tx, done_rx) = tokio::sync::oneshot::channel();
|
||||
fn iter(c: Arc<AtomicU64>, done: tokio::sync::oneshot::Sender<()>, n: u64) {
|
||||
if n == 0 {
|
||||
let _ = done.send(());
|
||||
} else {
|
||||
tokio::spawn(async move {
|
||||
c.fetch_add(1, Ordering::Relaxed);
|
||||
iter(c, done, n - 1);
|
||||
});
|
||||
}
|
||||
}
|
||||
iter(c2, done_tx, CHAIN_DEPTH);
|
||||
let _ = done_rx.await;
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 2. yield_many — 200 actors × 1000 yields
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const YIELD_TASKS: u64 = 200;
|
||||
const YIELD_ROUNDS: u64 = 1_000;
|
||||
|
||||
fn bench_yield_smarm(threads: usize) -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(|| {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..YIELD_TASKS {
|
||||
handles.push(smarm::spawn(|| {
|
||||
for _ in 0..YIELD_ROUNDS {
|
||||
smarm::yield_now();
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
});
|
||||
(YIELD_TASKS * YIELD_ROUNDS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_yield_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..YIELD_TASKS {
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
for _ in 0..YIELD_ROUNDS {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
let _ = h.await;
|
||||
}
|
||||
});
|
||||
(YIELD_TASKS * YIELD_ROUNDS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_yield_tokio_multi() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..YIELD_TASKS {
|
||||
handles.push(tokio::spawn(async move {
|
||||
for _ in 0..YIELD_ROUNDS {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
let _ = h.await;
|
||||
}
|
||||
});
|
||||
(YIELD_TASKS * YIELD_ROUNDS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 3. fan_out_compute — primes, same shape as multi_scheduler::primes
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const PRIME_N: u64 = 400_000;
|
||||
const PRIME_WORKERS: u64 = 64;
|
||||
|
||||
fn is_prime(n: u64) -> bool {
|
||||
if n < 2 { return false; }
|
||||
if n < 4 { return true; }
|
||||
if n % 2 == 0 { return false; }
|
||||
let mut i = 3u64;
|
||||
while i * i <= n { if n % i == 0 { return false; } i += 2; }
|
||||
true
|
||||
}
|
||||
|
||||
fn count_primes(lo: u64, hi: u64) -> u64 {
|
||||
(lo..hi).filter(|&n| is_prime(n)).count() as u64
|
||||
}
|
||||
|
||||
fn primes_slice(w: u64) -> (u64, u64) {
|
||||
let per = PRIME_N / PRIME_WORKERS;
|
||||
let lo = w * per;
|
||||
let hi = if w + 1 == PRIME_WORKERS { PRIME_N } else { lo + per };
|
||||
(lo, hi)
|
||||
}
|
||||
|
||||
fn bench_primes_smarm(threads: usize) -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..PRIME_WORKERS {
|
||||
let (lo, hi) = primes_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(smarm::spawn(move || {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_primes_tokio_current() -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..PRIME_WORKERS {
|
||||
let (lo, hi) = primes_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_primes_tokio_multi() -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..PRIME_WORKERS {
|
||||
let (lo, hi) = primes_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(tokio::spawn(async move {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 4. ping_pong_oneshot — 1000 rounds of spawn-pair-await
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const PP_ROUNDS: u64 = 1_000;
|
||||
|
||||
fn bench_pp_smarm(threads: usize) -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(|| {
|
||||
for _ in 0..PP_ROUNDS {
|
||||
// smarm has no oneshot, so use a channel<()> per round — both
|
||||
// sides spawn, A sends ping, B replies pong, A joins B.
|
||||
let (tx_ping, rx_ping) = smarm::channel::<()>();
|
||||
let (tx_pong, rx_pong) = smarm::channel::<()>();
|
||||
let hb = smarm::spawn(move || {
|
||||
rx_ping.recv().unwrap();
|
||||
tx_pong.send(()).unwrap();
|
||||
});
|
||||
let ha = smarm::spawn(move || {
|
||||
tx_ping.send(()).unwrap();
|
||||
rx_pong.recv().unwrap();
|
||||
});
|
||||
ha.join().unwrap();
|
||||
hb.join().unwrap();
|
||||
}
|
||||
});
|
||||
(PP_ROUNDS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_pp_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
for _ in 0..PP_ROUNDS {
|
||||
let (tx1, rx1) = tokio::sync::oneshot::channel::<()>();
|
||||
let (tx2, rx2) = tokio::sync::oneshot::channel::<()>();
|
||||
let hb = tokio::task::spawn_local(async move {
|
||||
rx1.await.unwrap();
|
||||
tx2.send(()).unwrap();
|
||||
});
|
||||
let ha = tokio::task::spawn_local(async move {
|
||||
tx1.send(()).unwrap();
|
||||
rx2.await.unwrap();
|
||||
});
|
||||
let _ = ha.await;
|
||||
let _ = hb.await;
|
||||
}
|
||||
});
|
||||
(PP_ROUNDS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_pp_tokio_multi() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
for _ in 0..PP_ROUNDS {
|
||||
let (tx1, rx1) = tokio::sync::oneshot::channel::<()>();
|
||||
let (tx2, rx2) = tokio::sync::oneshot::channel::<()>();
|
||||
let hb = tokio::spawn(async move {
|
||||
rx1.await.unwrap();
|
||||
tx2.send(()).unwrap();
|
||||
});
|
||||
let ha = tokio::spawn(async move {
|
||||
tx1.send(()).unwrap();
|
||||
rx2.await.unwrap();
|
||||
});
|
||||
let _ = ha.await;
|
||||
let _ = hb.await;
|
||||
}
|
||||
});
|
||||
(PP_ROUNDS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// main
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Knob helper — reads SMARM_ALLOC_INTERVAL / SMARM_TIMESLICE_CYCLES env vars
|
||||
// so the sweep script can override the preemption knobs without recompiling.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn bench_cfg(threads: usize) -> smarm::runtime::Config {
|
||||
let mut cfg = smarm::runtime::Config::exact(threads);
|
||||
if let Ok(v) = std::env::var("SMARM_ALLOC_INTERVAL") {
|
||||
if let Ok(n) = v.parse::<u32>() { cfg = cfg.alloc_interval(n); }
|
||||
}
|
||||
if let Ok(v) = std::env::var("SMARM_TIMESLICE_CYCLES") {
|
||||
if let Ok(n) = v.parse::<u64>() { cfg = cfg.timeslice_cycles(n); }
|
||||
}
|
||||
cfg
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let n = available_threads();
|
||||
println!("smarm general benchmarks");
|
||||
println!("available parallelism: {n} threads");
|
||||
println!("ITERS={ITERS} (+1 warmup, discarded)");
|
||||
println!(
|
||||
"CHAIN_DEPTH={CHAIN_DEPTH}, YIELD_TASKS={YIELD_TASKS}×{YIELD_ROUNDS}, \
|
||||
PRIME_N={PRIME_N}/{PRIME_WORKERS} workers, PP_ROUNDS={PP_ROUNDS}"
|
||||
);
|
||||
|
||||
// ---- 1. chained_spawn ----
|
||||
print_header(&format!("chained_spawn: depth {CHAIN_DEPTH}"));
|
||||
run_n("smarm 1-thread", ITERS, || bench_chained_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_chained_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_chained_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_chained_tokio_multi);
|
||||
|
||||
// ---- 2. yield_many ----
|
||||
print_header(&format!("yield_many: {YIELD_TASKS} tasks × {YIELD_ROUNDS} yields"));
|
||||
run_n("smarm 1-thread", ITERS, || bench_yield_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_yield_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_yield_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_yield_tokio_multi);
|
||||
|
||||
// ---- 3. fan_out_compute ----
|
||||
print_header(&format!("fan_out_compute: primes in [2, {PRIME_N}) across {PRIME_WORKERS}"));
|
||||
run_n("smarm 1-thread", ITERS, || bench_primes_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_primes_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_primes_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_primes_tokio_multi);
|
||||
|
||||
// ---- 4. ping_pong_oneshot ----
|
||||
print_header(&format!("ping_pong_oneshot: {PP_ROUNDS} rounds"));
|
||||
run_n("smarm 1-thread", ITERS, || bench_pp_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_pp_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_pp_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_pp_tokio_multi);
|
||||
}
|
||||
343
benches/multi_scheduler.rs
Normal file
343
benches/multi_scheduler.rs
Normal file
@@ -0,0 +1,343 @@
|
||||
//! Benchmarks for the multi-scheduler runtime.
|
||||
//!
|
||||
//! Three workloads, three runtimes:
|
||||
//! - smarm single-thread (exact = 1)
|
||||
//! - smarm multi-thread (exact = available_parallelism)
|
||||
//! - tokio current_thread (single-thread baseline)
|
||||
//! - tokio multi-thread (the parallel comparison)
|
||||
//!
|
||||
//! Workloads:
|
||||
//! 1. Fan-out / fan-in compute (primes) — CPU-bound, tests parallelism
|
||||
//! 2. Ping-pong — message-passing overhead, park/unpark cost
|
||||
//! 3. Spawn throughput — cost of spawn + join per actor
|
||||
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::Instant;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Shared helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn available_threads() -> usize {
|
||||
std::thread::available_parallelism()
|
||||
.map(|n| n.get())
|
||||
.unwrap_or(1)
|
||||
}
|
||||
|
||||
fn print_header(title: &str) {
|
||||
println!("\n{}", "=".repeat(80));
|
||||
println!(" {title}");
|
||||
println!("{}", "=".repeat(80));
|
||||
println!(
|
||||
"{:>22} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
"runtime", "result", "median µs", "min µs", "max µs"
|
||||
);
|
||||
println!("{}", "-".repeat(80));
|
||||
}
|
||||
|
||||
fn run_n<F: FnMut() -> (u64, u128)>(name: &str, n: u32, mut f: F) {
|
||||
let mut times = Vec::new();
|
||||
let mut last = 0u64;
|
||||
for _ in 0..n {
|
||||
let (v, t) = f();
|
||||
times.push(t);
|
||||
last = v;
|
||||
}
|
||||
times.sort_unstable();
|
||||
let median = times[times.len() / 2];
|
||||
let min = *times.iter().min().unwrap();
|
||||
let max = *times.iter().max().unwrap();
|
||||
println!(
|
||||
"{:>22} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
name, last, median, min, max
|
||||
);
|
||||
}
|
||||
|
||||
const ITERS: u32 = 7;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Workload 1: fan-out / fan-in primes
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const PRIME_N: u64 = 400_000;
|
||||
const WORKERS: u64 = 64;
|
||||
|
||||
fn is_prime(n: u64) -> bool {
|
||||
if n < 2 { return false; }
|
||||
if n < 4 { return true; }
|
||||
if n % 2 == 0 { return false; }
|
||||
let mut i = 3u64;
|
||||
while i * i <= n { if n % i == 0 { return false; } i += 2; }
|
||||
true
|
||||
}
|
||||
|
||||
fn count_primes(lo: u64, hi: u64) -> u64 {
|
||||
(lo..hi).filter(|&n| is_prime(n)).count() as u64
|
||||
}
|
||||
|
||||
fn primes_slice(w: u64) -> (u64, u64) {
|
||||
let per = PRIME_N / WORKERS;
|
||||
let lo = w * per;
|
||||
let hi = if w + 1 == WORKERS { PRIME_N } else { lo + per };
|
||||
(lo, hi)
|
||||
}
|
||||
|
||||
fn bench_primes_smarm(threads: usize) -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(threads)).run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..WORKERS {
|
||||
let (lo, hi) = primes_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(smarm::spawn(move || {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_primes_tokio_current() -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..WORKERS {
|
||||
let (lo, hi) = primes_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_primes_tokio_multi() -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..WORKERS {
|
||||
let (lo, hi) = primes_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(tokio::spawn(async move {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_primes_baseline() -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
let total: u64 = (0..WORKERS).map(|w| {
|
||||
let (lo, hi) = primes_slice(w);
|
||||
count_primes(lo, hi)
|
||||
}).sum();
|
||||
(total, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Workload 2: channel ping-pong
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const PING_ROUNDS: u64 = 10_000;
|
||||
|
||||
fn bench_pingpong_smarm(threads: usize) -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(threads)).run(|| {
|
||||
let (tx_a, rx_a) = smarm::channel::<u64>();
|
||||
let (tx_b, rx_b) = smarm::channel::<u64>();
|
||||
let ha = smarm::spawn(move || {
|
||||
tx_a.send(0).unwrap();
|
||||
loop {
|
||||
let v = rx_b.recv().unwrap();
|
||||
if v >= PING_ROUNDS { break; }
|
||||
tx_a.send(v + 1).unwrap();
|
||||
}
|
||||
});
|
||||
let hb = smarm::spawn(move || {
|
||||
loop {
|
||||
let v = rx_a.recv().unwrap();
|
||||
tx_b.send(v + 1).unwrap();
|
||||
if v + 1 >= PING_ROUNDS { break; }
|
||||
}
|
||||
});
|
||||
ha.join().unwrap();
|
||||
hb.join().unwrap();
|
||||
});
|
||||
(PING_ROUNDS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_pingpong_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread()
|
||||
.enable_all()
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let (tx_a, mut rx_a) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let (tx_b, mut rx_b) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let ha = tokio::task::spawn_local(async move {
|
||||
tx_a.send(0).unwrap();
|
||||
loop {
|
||||
let v = rx_b.recv().await.unwrap();
|
||||
if v >= PING_ROUNDS { break; }
|
||||
tx_a.send(v + 1).unwrap();
|
||||
}
|
||||
});
|
||||
let hb = tokio::task::spawn_local(async move {
|
||||
loop {
|
||||
let v = rx_a.recv().await.unwrap();
|
||||
tx_b.send(v + 1).unwrap();
|
||||
if v + 1 >= PING_ROUNDS { break; }
|
||||
}
|
||||
});
|
||||
let _ = ha.await;
|
||||
let _ = hb.await;
|
||||
});
|
||||
(PING_ROUNDS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_pingpong_tokio_multi() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(2) // ping-pong only needs 2 threads
|
||||
.enable_all()
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let (tx_a, mut rx_a) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let (tx_b, mut rx_b) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let ha = tokio::spawn(async move {
|
||||
tx_a.send(0).unwrap();
|
||||
loop {
|
||||
let v = rx_b.recv().await.unwrap();
|
||||
if v >= PING_ROUNDS { break; }
|
||||
tx_a.send(v + 1).unwrap();
|
||||
}
|
||||
});
|
||||
let hb = tokio::spawn(async move {
|
||||
loop {
|
||||
let v = rx_a.recv().await.unwrap();
|
||||
tx_b.send(v + 1).unwrap();
|
||||
if v + 1 >= PING_ROUNDS { break; }
|
||||
}
|
||||
});
|
||||
let _ = ha.await;
|
||||
let _ = hb.await;
|
||||
});
|
||||
(PING_ROUNDS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Workload 3: spawn throughput
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const SPAWN_COUNT: u64 = 1_000;
|
||||
|
||||
fn bench_spawn_smarm(threads: usize) -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c = counter.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(threads)).run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..SPAWN_COUNT {
|
||||
let cc = c.clone();
|
||||
handles.push(smarm::spawn(move || {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_spawn_tokio_current() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c = counter.clone();
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..SPAWN_COUNT {
|
||||
let cc = c.clone();
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_spawn_tokio_multi() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c = counter.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..SPAWN_COUNT {
|
||||
let cc = c.clone();
|
||||
handles.push(tokio::spawn(async move {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// main
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn main() {
|
||||
let n = available_threads();
|
||||
println!("smarm multi-scheduler benchmarks");
|
||||
println!("available parallelism: {n} threads");
|
||||
println!("PRIME_N={PRIME_N}, WORKERS={WORKERS}, PING_ROUNDS={PING_ROUNDS}, SPAWN_COUNT={SPAWN_COUNT}");
|
||||
|
||||
// ---- Primes ----
|
||||
print_header(&format!("Fan-out/fan-in: count primes in [2, {PRIME_N}) across {WORKERS} workers"));
|
||||
run_n("baseline (serial)", ITERS, bench_primes_baseline);
|
||||
run_n("smarm single-thread", ITERS, || bench_primes_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_primes_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_primes_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_primes_tokio_multi);
|
||||
|
||||
// ---- Ping-pong ----
|
||||
print_header(&format!("Ping-pong: {PING_ROUNDS} round-trips between two actors"));
|
||||
run_n("smarm single-thread", ITERS, || bench_pingpong_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_pingpong_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_pingpong_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_pingpong_tokio_multi);
|
||||
|
||||
// ---- Spawn throughput ----
|
||||
print_header(&format!("Spawn throughput: {SPAWN_COUNT} actors spawned and joined"));
|
||||
run_n("smarm single-thread", ITERS, || bench_spawn_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_spawn_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_spawn_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_spawn_tokio_multi);
|
||||
}
|
||||
408
benches/smarm_favored.rs
Normal file
408
benches/smarm_favored.rs
Normal file
@@ -0,0 +1,408 @@
|
||||
//! Benchmarks where smarm's design has a structural advantage.
|
||||
//!
|
||||
//! These exist to show what the green-thread + stackful model buys you. The
|
||||
//! single-thread numbers are the most interesting ones — they isolate the
|
||||
//! per-switch / per-task cost from any contention story.
|
||||
//!
|
||||
//! Workloads:
|
||||
//! 9. deep_recursion — actor recurses 1000 deep then returns. In
|
||||
//! smarm this is plain stack recursion on the
|
||||
//! growable mmap'd stack. In tokio, async fn
|
||||
//! can't directly recurse — each level must
|
||||
//! `Box::pin` its future. We measure both.
|
||||
//! 10. yield_in_hot_loop — 2 actors ping yield_now back and forth 500k
|
||||
//! times. Pure context-switch cost; no
|
||||
//! channels, no allocation, no contention.
|
||||
//! Smarm's switch is ~6 GPRs + xmm save and a
|
||||
//! `ret`; tokio's is poll → state-machine →
|
||||
//! schedule.
|
||||
//! 11. uncontended_channel — single producer, single consumer, 1M msgs,
|
||||
//! single-threaded runtime. With no
|
||||
//! cross-thread contention, smarm's
|
||||
//! Arc<Mutex<>> channel is essentially free,
|
||||
//! and the green-thread switch should beat
|
||||
//! tokio's future polling overhead.
|
||||
//! 12. catch_unwind_panics — spawn 10k tasks; half panic, half succeed.
|
||||
//! Supervisor handles each. Exploratory — if
|
||||
//! there's no real gap, drop this one.
|
||||
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::Instant;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Shared harness
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const ITERS: u32 = 15;
|
||||
|
||||
fn available_threads() -> usize {
|
||||
std::thread::available_parallelism().map(|n| n.get()).unwrap_or(1)
|
||||
}
|
||||
|
||||
fn print_header(title: &str) {
|
||||
println!("\n{}", "=".repeat(80));
|
||||
println!(" {title}");
|
||||
println!("{}", "=".repeat(80));
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
"runtime", "result", "median µs", "min µs", "max µs"
|
||||
);
|
||||
println!("{}", "-".repeat(80));
|
||||
}
|
||||
|
||||
fn run_n<F: FnMut() -> (u64, u128)>(name: &str, n: u32, mut f: F) {
|
||||
let mut times = Vec::new();
|
||||
let mut last = 0u64;
|
||||
let _ = f(); // warmup
|
||||
for _ in 0..n {
|
||||
let (v, t) = f();
|
||||
times.push(t);
|
||||
last = v;
|
||||
}
|
||||
times.sort_unstable();
|
||||
let median = times[times.len() / 2];
|
||||
let min = *times.iter().min().unwrap();
|
||||
let max = *times.iter().max().unwrap();
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
name, last, median, min, max
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 9. deep_recursion — 1000 levels deep
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
// Each recursive frame holds an `&AtomicU64`, a `u64`, plus prologue/spill —
|
||||
// conservatively ~64 B/frame on release. Smarm actor stacks are a fixed 64 KiB,
|
||||
// so 500 levels (~32 KiB) leaves comfortable headroom while still being deep
|
||||
// enough to exercise the stack-growth advantage over Box::pin recursion.
|
||||
const RECURSE_DEPTH: u64 = 500;
|
||||
|
||||
fn bench_recurse_smarm(threads: usize) -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(move || {
|
||||
// Plain Rust recursion on the actor's own (growable) stack.
|
||||
fn recurse(c: &AtomicU64, n: u64) -> u64 {
|
||||
if n == 0 {
|
||||
c.fetch_add(1, Ordering::Relaxed);
|
||||
0
|
||||
} else {
|
||||
1 + recurse(c, n - 1)
|
||||
}
|
||||
}
|
||||
let h = smarm::spawn(move || {
|
||||
let _ = recurse(&t2, RECURSE_DEPTH);
|
||||
});
|
||||
h.join().unwrap();
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_recurse_tokio_current() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c2 = counter.clone();
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
// async fn can't self-recurse; each level returns a Box::pin'd future.
|
||||
// This is the canonical workaround a real user would write.
|
||||
fn recurse(
|
||||
c: Arc<AtomicU64>,
|
||||
n: u64,
|
||||
) -> std::pin::Pin<Box<dyn std::future::Future<Output = u64>>> {
|
||||
Box::pin(async move {
|
||||
if n == 0 {
|
||||
c.fetch_add(1, Ordering::Relaxed);
|
||||
0
|
||||
} else {
|
||||
1 + recurse(c, n - 1).await
|
||||
}
|
||||
})
|
||||
}
|
||||
let h = tokio::task::spawn_local(async move {
|
||||
let _ = recurse(c2, RECURSE_DEPTH).await;
|
||||
});
|
||||
let _ = h.await;
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_recurse_tokio_multi() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c2 = counter.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
fn recurse(
|
||||
c: Arc<AtomicU64>,
|
||||
n: u64,
|
||||
) -> std::pin::Pin<Box<dyn std::future::Future<Output = u64> + Send>> {
|
||||
Box::pin(async move {
|
||||
if n == 0 {
|
||||
c.fetch_add(1, Ordering::Relaxed);
|
||||
0
|
||||
} else {
|
||||
1 + recurse(c, n - 1).await
|
||||
}
|
||||
})
|
||||
}
|
||||
let h = tokio::spawn(async move {
|
||||
let _ = recurse(c2, RECURSE_DEPTH).await;
|
||||
});
|
||||
let _ = h.await;
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 10. yield_in_hot_loop — 2 actors, 500k yields each, single thread
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const HOT_YIELDS: u64 = 500_000;
|
||||
|
||||
fn bench_hot_smarm() -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(1)).run(|| {
|
||||
let ha = smarm::spawn(|| {
|
||||
for _ in 0..HOT_YIELDS {
|
||||
smarm::yield_now();
|
||||
}
|
||||
});
|
||||
let hb = smarm::spawn(|| {
|
||||
for _ in 0..HOT_YIELDS {
|
||||
smarm::yield_now();
|
||||
}
|
||||
});
|
||||
ha.join().unwrap();
|
||||
hb.join().unwrap();
|
||||
});
|
||||
(HOT_YIELDS * 2, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_hot_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let ha = tokio::task::spawn_local(async move {
|
||||
for _ in 0..HOT_YIELDS {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
});
|
||||
let hb = tokio::task::spawn_local(async move {
|
||||
for _ in 0..HOT_YIELDS {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
});
|
||||
let _ = ha.await;
|
||||
let _ = hb.await;
|
||||
});
|
||||
(HOT_YIELDS * 2, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 11. uncontended_channel — 1 producer, 1 consumer, 1M msgs, single-threaded
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const UNCONT_MSGS: u64 = 1_000_000;
|
||||
|
||||
fn bench_unc_smarm() -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(1)).run(|| {
|
||||
let (tx, rx) = smarm::channel::<u64>();
|
||||
let consumer = smarm::spawn(move || {
|
||||
let mut count = 0u64;
|
||||
while let Ok(_) = rx.recv() {
|
||||
count += 1;
|
||||
}
|
||||
let _ = count; // discard; run() closure must return ()
|
||||
});
|
||||
let producer = smarm::spawn(move || {
|
||||
for i in 0..UNCONT_MSGS {
|
||||
tx.send(i).unwrap();
|
||||
}
|
||||
// tx drops here, closing the channel.
|
||||
});
|
||||
producer.join().unwrap();
|
||||
let _ = consumer.join().unwrap();
|
||||
});
|
||||
(UNCONT_MSGS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_unc_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let consumer = tokio::task::spawn_local(async move {
|
||||
let mut count = 0u64;
|
||||
while let Some(_) = rx.recv().await {
|
||||
count += 1;
|
||||
}
|
||||
count
|
||||
});
|
||||
let producer = tokio::task::spawn_local(async move {
|
||||
for i in 0..UNCONT_MSGS {
|
||||
tx.send(i).unwrap();
|
||||
}
|
||||
});
|
||||
let _ = producer.await;
|
||||
let _ = consumer.await;
|
||||
});
|
||||
(UNCONT_MSGS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 12. catch_unwind_panics — 10k tasks, half panic
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const PANIC_TASKS: u64 = 10_000;
|
||||
|
||||
fn bench_panic_smarm(threads: usize) -> (u64, u128) {
|
||||
let ok = Arc::new(AtomicU64::new(0));
|
||||
let err = Arc::new(AtomicU64::new(0));
|
||||
let ok2 = ok.clone();
|
||||
let err2 = err.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..PANIC_TASKS {
|
||||
handles.push(smarm::spawn(move || {
|
||||
if i % 2 == 0 {
|
||||
panic!("planned");
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
match h.join() {
|
||||
Ok(()) => { ok2.fetch_add(1, Ordering::Relaxed); }
|
||||
Err(_) => { err2.fetch_add(1, Ordering::Relaxed); }
|
||||
}
|
||||
}
|
||||
});
|
||||
let total = ok.load(Ordering::Relaxed) + err.load(Ordering::Relaxed);
|
||||
(total, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_panic_tokio_current() -> (u64, u128) {
|
||||
let ok = Arc::new(AtomicU64::new(0));
|
||||
let err = Arc::new(AtomicU64::new(0));
|
||||
let ok2 = ok.clone();
|
||||
let err2 = err.clone();
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..PANIC_TASKS {
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
if i % 2 == 0 {
|
||||
panic!("planned");
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
match h.await {
|
||||
Ok(()) => { ok2.fetch_add(1, Ordering::Relaxed); }
|
||||
Err(_) => { err2.fetch_add(1, Ordering::Relaxed); }
|
||||
}
|
||||
}
|
||||
});
|
||||
let total = ok.load(Ordering::Relaxed) + err.load(Ordering::Relaxed);
|
||||
(total, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_panic_tokio_multi() -> (u64, u128) {
|
||||
let ok = Arc::new(AtomicU64::new(0));
|
||||
let err = Arc::new(AtomicU64::new(0));
|
||||
let ok2 = ok.clone();
|
||||
let err2 = err.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..PANIC_TASKS {
|
||||
handles.push(tokio::spawn(async move {
|
||||
if i % 2 == 0 {
|
||||
panic!("planned");
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
match h.await {
|
||||
Ok(()) => { ok2.fetch_add(1, Ordering::Relaxed); }
|
||||
Err(_) => { err2.fetch_add(1, Ordering::Relaxed); }
|
||||
}
|
||||
}
|
||||
});
|
||||
let total = ok.load(Ordering::Relaxed) + err.load(Ordering::Relaxed);
|
||||
(total, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// main
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Knob helper — reads SMARM_ALLOC_INTERVAL / SMARM_TIMESLICE_CYCLES env vars
|
||||
// so the sweep script can override the preemption knobs without recompiling.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn bench_cfg(threads: usize) -> smarm::runtime::Config {
|
||||
let mut cfg = smarm::runtime::Config::exact(threads);
|
||||
if let Ok(v) = std::env::var("SMARM_ALLOC_INTERVAL") {
|
||||
if let Ok(n) = v.parse::<u32>() { cfg = cfg.alloc_interval(n); }
|
||||
}
|
||||
if let Ok(v) = std::env::var("SMARM_TIMESLICE_CYCLES") {
|
||||
if let Ok(n) = v.parse::<u64>() { cfg = cfg.timeslice_cycles(n); }
|
||||
}
|
||||
cfg
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let n = available_threads();
|
||||
println!("smarm smarm-favored benchmarks");
|
||||
println!("available parallelism: {n} threads");
|
||||
println!("ITERS={ITERS} (+1 warmup, discarded)");
|
||||
println!(
|
||||
"RECURSE_DEPTH={RECURSE_DEPTH}, HOT_YIELDS={HOT_YIELDS}×2, \
|
||||
UNCONT_MSGS={UNCONT_MSGS}, PANIC_TASKS={PANIC_TASKS}"
|
||||
);
|
||||
|
||||
// ---- 9. deep_recursion ----
|
||||
print_header(&format!("deep_recursion: depth {RECURSE_DEPTH}"));
|
||||
run_n("smarm 1-thread", ITERS, || bench_recurse_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_recurse_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_recurse_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_recurse_tokio_multi);
|
||||
|
||||
// ---- 10. yield_in_hot_loop ----
|
||||
print_header(&format!("yield_in_hot_loop: 2 actors × {HOT_YIELDS} yields (single thread)"));
|
||||
run_n("smarm 1-thread", ITERS, bench_hot_smarm);
|
||||
run_n("tokio current_thread", ITERS, bench_hot_tokio_current);
|
||||
|
||||
// ---- 11. uncontended_channel ----
|
||||
print_header(&format!("uncontended_channel: 1→1, {UNCONT_MSGS} msgs (single thread)"));
|
||||
run_n("smarm 1-thread", ITERS, bench_unc_smarm);
|
||||
run_n("tokio current_thread", ITERS, bench_unc_tokio_current);
|
||||
|
||||
// ---- 12. catch_unwind_panics ----
|
||||
print_header(&format!("catch_unwind_panics: {PANIC_TASKS} tasks, 50% panic"));
|
||||
run_n("smarm 1-thread", ITERS, || bench_panic_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_panic_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_panic_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_panic_tokio_multi);
|
||||
}
|
||||
347
benches/sweep.py
Executable file
347
benches/sweep.py
Executable file
@@ -0,0 +1,347 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
smarm bench sweep + regression checker.
|
||||
|
||||
Usage:
|
||||
# Run a full knob sweep and print a comparison table:
|
||||
python3 benches/sweep.py sweep
|
||||
|
||||
# Check the current build against the committed baseline:
|
||||
python3 benches/sweep.py regress
|
||||
|
||||
# Run all benches once (default knobs) and print results:
|
||||
python3 benches/sweep.py run
|
||||
|
||||
The sweep grid is defined in SWEEP_GRID below.
|
||||
The regression baseline is loaded from benches/baseline.json.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Configuration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
REPO = Path(__file__).resolve().parent.parent
|
||||
|
||||
# Bench files to run (primes + multi_scheduler omitted — legacy harness,
|
||||
# not part of the 12-bench suite, and insensitive to the preemption knobs).
|
||||
BENCHES = ["general", "tokio_favored", "smarm_favored"]
|
||||
|
||||
# Knob sweep grid: (alloc_interval, timeslice_cycles)
|
||||
# alloc_interval: lower = check RDTSC more often = finer preemption
|
||||
# timeslice_cycles: lower = shorter timeslice = more cooperative
|
||||
SWEEP_GRID = [
|
||||
(32, 150_000),
|
||||
(64, 150_000),
|
||||
(128, 150_000), # default interval, shorter slice
|
||||
(32, 300_000),
|
||||
(64, 300_000),
|
||||
(128, 300_000), # <<< baseline (defaults)
|
||||
(256, 300_000),
|
||||
(512, 300_000),
|
||||
(128, 600_000),
|
||||
(128, 1_200_000),
|
||||
]
|
||||
|
||||
# Regression threshold: warn if median is more than this % worse than baseline.
|
||||
REGRESSION_THRESHOLD_PCT = 10
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Parsing
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Match lines like:
|
||||
# " smarm 1-thread | 1000000 | 31473 | 28719 | 33113"
|
||||
ROW_RE = re.compile(
|
||||
r"^\s*(?P<name>[^|]+?)\s*\|\s*(?P<result>\d+)\s*\|\s*(?P<median>\d+)\s*\|\s*(?P<min>\d+)\s*\|\s*(?P<max>\d+)\s*$"
|
||||
)
|
||||
|
||||
# Match section headers like:
|
||||
# " chained_spawn: depth 1000"
|
||||
HEADER_RE = re.compile(r"^\s{2}(?P<bench>[a-z_]+)[:—]")
|
||||
|
||||
|
||||
def parse_output(text: str) -> dict[str, dict[str, dict]]:
|
||||
"""
|
||||
Returns {bench_name: {runtime_label: {median, min, max, result}}}.
|
||||
bench_name is the snake_case name extracted from the section header.
|
||||
"""
|
||||
results: dict[str, dict[str, dict]] = {}
|
||||
current_bench = None
|
||||
|
||||
for line in text.splitlines():
|
||||
hm = HEADER_RE.match(line)
|
||||
if hm:
|
||||
current_bench = hm.group("bench")
|
||||
results.setdefault(current_bench, {})
|
||||
continue
|
||||
|
||||
if current_bench is None:
|
||||
continue
|
||||
|
||||
rm = ROW_RE.match(line)
|
||||
if rm:
|
||||
label = rm.group("name").strip()
|
||||
results[current_bench][label] = {
|
||||
"result": int(rm.group("result")),
|
||||
"median": int(rm.group("median")),
|
||||
"min": int(rm.group("min")),
|
||||
"max": int(rm.group("max")),
|
||||
}
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Running
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def run_benches(env_extra: dict[str, str] | None = None) -> dict[str, dict[str, dict]]:
|
||||
"""Run all BENCHES and return merged parsed results."""
|
||||
env = os.environ.copy()
|
||||
if env_extra:
|
||||
env.update(env_extra)
|
||||
|
||||
all_results: dict[str, dict[str, dict]] = {}
|
||||
|
||||
for bench in BENCHES:
|
||||
cmd = ["cargo", "bench", "--bench", bench]
|
||||
proc = subprocess.run(
|
||||
cmd,
|
||||
cwd=REPO,
|
||||
env=env,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
if proc.returncode != 0:
|
||||
print(f" ERROR running {bench}:\n{proc.stderr[-800:]}", file=sys.stderr)
|
||||
continue
|
||||
parsed = parse_output(proc.stdout)
|
||||
all_results.update(parsed)
|
||||
|
||||
return all_results
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Baseline JSON
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
BASELINE_PATH = REPO / "benches" / "baseline.json"
|
||||
|
||||
|
||||
def load_baseline() -> dict:
|
||||
if not BASELINE_PATH.exists():
|
||||
sys.exit(
|
||||
f"No baseline found at {BASELINE_PATH}.\n"
|
||||
"Run: python3 benches/sweep.py run then save the output manually,\n"
|
||||
"or use --save-baseline with the run subcommand."
|
||||
)
|
||||
return json.loads(BASELINE_PATH.read_text())
|
||||
|
||||
|
||||
def save_baseline(results: dict) -> None:
|
||||
BASELINE_PATH.write_text(json.dumps(results, indent=2))
|
||||
print(f"Baseline saved to {BASELINE_PATH}")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Regression check
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def check_regressions(current: dict, baseline: dict) -> bool:
|
||||
"""
|
||||
Compare current results to baseline. Print warnings for regressions.
|
||||
Returns True if any regression found.
|
||||
"""
|
||||
any_regression = False
|
||||
|
||||
for bench, runtimes in baseline.items():
|
||||
cur_bench = current.get(bench, {})
|
||||
for label, base_data in runtimes.items():
|
||||
cur_data = cur_bench.get(label)
|
||||
if cur_data is None:
|
||||
print(f" MISSING {bench}/{label} — not present in current run")
|
||||
any_regression = True
|
||||
continue
|
||||
|
||||
base_med = base_data["median"]
|
||||
cur_med = cur_data["median"]
|
||||
if base_med == 0:
|
||||
continue
|
||||
|
||||
pct = (cur_med - base_med) / base_med * 100
|
||||
if pct > REGRESSION_THRESHOLD_PCT:
|
||||
print(
|
||||
f" REGRESSION {bench}/{label}: "
|
||||
f"{base_med} → {cur_med} µs ({pct:+.1f}%)"
|
||||
)
|
||||
any_regression = True
|
||||
elif pct < -REGRESSION_THRESHOLD_PCT:
|
||||
print(
|
||||
f" IMPROVEMENT {bench}/{label}: "
|
||||
f"{base_med} → {cur_med} µs ({pct:+.1f}%)"
|
||||
)
|
||||
|
||||
return any_regression
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pretty print
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def print_results(results: dict, label: str = "") -> None:
|
||||
if label:
|
||||
print(f"\n{'='*70}")
|
||||
print(f" {label}")
|
||||
print(f"{'='*70}")
|
||||
for bench, runtimes in sorted(results.items()):
|
||||
print(f"\n [{bench}]")
|
||||
print(f" {'runtime':>28} | {'result':>10} | {'median µs':>10} | {'min':>8} | {'max':>8}")
|
||||
print(f" {'-'*75}")
|
||||
for rt_label, data in runtimes.items():
|
||||
print(
|
||||
f" {rt_label:>28} | {data['result']:>10} | "
|
||||
f"{data['median']:>10} | {data['min']:>8} | {data['max']:>8}"
|
||||
)
|
||||
|
||||
|
||||
def print_sweep_table(sweep_results: list[tuple[int, int, dict]]) -> None:
|
||||
"""Print a compact comparison across sweep points for each bench/runtime."""
|
||||
# Collect all bench/label pairs
|
||||
all_keys: list[tuple[str, str]] = []
|
||||
for _, _, results in sweep_results:
|
||||
for bench, runtimes in results.items():
|
||||
for label in runtimes:
|
||||
key = (bench, label)
|
||||
if key not in all_keys:
|
||||
all_keys.append(key)
|
||||
|
||||
# Header
|
||||
col_w = 12
|
||||
print(f"\n{'bench/runtime':<45}", end="")
|
||||
for interval, cycles, _ in sweep_results:
|
||||
tag = f"ai={interval}/tc={cycles//1000}k"
|
||||
print(f" {tag:>{col_w}}", end="")
|
||||
print()
|
||||
print("-" * (45 + (col_w + 2) * len(sweep_results)))
|
||||
|
||||
for bench, label in all_keys:
|
||||
key_str = f"{bench}/{label}"
|
||||
print(f" {key_str:<43}", end="")
|
||||
for _, _, results in sweep_results:
|
||||
val = results.get(bench, {}).get(label, {}).get("median")
|
||||
cell = str(val) if val is not None else "—"
|
||||
print(f" {cell:>{col_w}}", end="")
|
||||
print()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subcommands
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def cmd_run(args) -> None:
|
||||
print("Building release binaries…")
|
||||
subprocess.run(
|
||||
["cargo", "build", "--release", "--benches"],
|
||||
cwd=REPO, check=True, capture_output=True,
|
||||
)
|
||||
print("Running benches…")
|
||||
results = run_benches()
|
||||
print_results(results, "Results (default knobs)")
|
||||
if args.save_baseline:
|
||||
save_baseline(results)
|
||||
|
||||
|
||||
def cmd_regress(args) -> None:
|
||||
baseline = load_baseline()
|
||||
print("Building release binaries…")
|
||||
subprocess.run(
|
||||
["cargo", "build", "--release", "--benches"],
|
||||
cwd=REPO, check=True, capture_output=True,
|
||||
)
|
||||
print("Running benches…")
|
||||
current = run_benches()
|
||||
print_results(current, "Current results")
|
||||
print(f"\nRegression check (threshold: >{REGRESSION_THRESHOLD_PCT}% slower than baseline)")
|
||||
print("-" * 60)
|
||||
found = check_regressions(current, baseline)
|
||||
if not found:
|
||||
print(" No regressions detected.")
|
||||
sys.exit(1 if found else 0)
|
||||
|
||||
|
||||
def cmd_sweep(args) -> None:
|
||||
print("Building release binaries (once)…")
|
||||
subprocess.run(
|
||||
["cargo", "build", "--release", "--benches"],
|
||||
cwd=REPO, check=True, capture_output=True,
|
||||
)
|
||||
# Benches are pre-built; env vars change runtime behaviour, no recompile needed.
|
||||
sweep_results: list[tuple[int, int, dict]] = []
|
||||
|
||||
for interval, cycles in SWEEP_GRID:
|
||||
tag = f"alloc_interval={interval}, timeslice_cycles={cycles}"
|
||||
print(f" Running: {tag} …", flush=True)
|
||||
env_extra = {
|
||||
"SMARM_ALLOC_INTERVAL": str(interval),
|
||||
"SMARM_TIMESLICE_CYCLES": str(cycles),
|
||||
}
|
||||
results = run_benches(env_extra)
|
||||
sweep_results.append((interval, cycles, results))
|
||||
|
||||
print_sweep_table(sweep_results)
|
||||
|
||||
if args.save_csv:
|
||||
import csv
|
||||
rows = []
|
||||
for interval, cycles, results in sweep_results:
|
||||
for bench, runtimes in results.items():
|
||||
for label, data in runtimes.items():
|
||||
rows.append({
|
||||
"alloc_interval": interval,
|
||||
"timeslice_cycles": cycles,
|
||||
"bench": bench,
|
||||
"runtime": label,
|
||||
**data,
|
||||
})
|
||||
with open(args.save_csv, "w", newline="") as f:
|
||||
writer = csv.DictWriter(f, fieldnames=rows[0].keys())
|
||||
writer.writeheader()
|
||||
writer.writerows(rows)
|
||||
print(f"\nCSV saved to {args.save_csv}")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
sub = parser.add_subparsers(dest="cmd", required=True)
|
||||
|
||||
p_run = sub.add_parser("run", help="Run benches once with default knobs")
|
||||
p_run.add_argument("--save-baseline", action="store_true",
|
||||
help="Save results as the regression baseline")
|
||||
p_run.set_defaults(func=cmd_run)
|
||||
|
||||
p_reg = sub.add_parser("regress", help="Check current results against baseline")
|
||||
p_reg.set_defaults(func=cmd_regress)
|
||||
|
||||
p_sw = sub.add_parser("sweep", help="Sweep preemption knobs and compare")
|
||||
p_sw.add_argument("--save-csv", metavar="FILE",
|
||||
help="Write full sweep results to a CSV file")
|
||||
p_sw.set_defaults(func=cmd_sweep)
|
||||
|
||||
args = parser.parse_args()
|
||||
args.func(args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
487
benches/tokio_favored.rs
Normal file
487
benches/tokio_favored.rs
Normal file
@@ -0,0 +1,487 @@
|
||||
//! Benchmarks where tokio's design has a structural advantage.
|
||||
//!
|
||||
//! These exist to *measure* the cost of smarm's design choices, not to flatter
|
||||
//! either runtime. Expect tokio to win these; the value is in knowing by how
|
||||
//! much, and in catching regressions where the gap widens.
|
||||
//!
|
||||
//! Workloads:
|
||||
//! 5. spawn_storm_busy — keep N workers busy with yielding tasks, then
|
||||
//! spawn 10k zero-work tasks and join. Adapted from
|
||||
//! tokio's `spawn_many_remote_busy1`. Tokio's
|
||||
//! work-stealing deques + per-worker LIFO slot
|
||||
//! should beat smarm's single global Mutex<>
|
||||
//! run queue.
|
||||
//! 6. mpsc_contention — 32 producer actors, 1 consumer, 10k messages
|
||||
//! each. Tokio's mpsc is lock-free on the hot path;
|
||||
//! smarm's channel is Arc<Mutex<Inner>> per channel
|
||||
//! *and* takes the runtime mutex on each unpark.
|
||||
//! 7. many_timers — 10k actors each sleep for a random short
|
||||
//! duration (1–10 ms), all wake within a tight
|
||||
//! window. Tokio's per-worker sharded timer wheel
|
||||
//! vs smarm's single shared min-heap (and single
|
||||
//! drain-lock winner).
|
||||
//! 8. multi_thread_scaling— primes again, but sweep thread count 1, 2, 4,
|
||||
//! available_parallelism(). Smarm's mutex ceiling
|
||||
//! should show up as soon as scheduling overhead
|
||||
//! is non-trivial relative to per-actor work.
|
||||
|
||||
use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::{Duration, Instant};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Shared harness
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const ITERS: u32 = 15;
|
||||
|
||||
fn available_threads() -> usize {
|
||||
std::thread::available_parallelism().map(|n| n.get()).unwrap_or(1)
|
||||
}
|
||||
|
||||
fn print_header(title: &str) {
|
||||
println!("\n{}", "=".repeat(80));
|
||||
println!(" {title}");
|
||||
println!("{}", "=".repeat(80));
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
"runtime", "result", "median µs", "min µs", "max µs"
|
||||
);
|
||||
println!("{}", "-".repeat(80));
|
||||
}
|
||||
|
||||
fn run_n<F: FnMut() -> (u64, u128)>(name: &str, n: u32, mut f: F) {
|
||||
let mut times = Vec::new();
|
||||
let mut last = 0u64;
|
||||
let _ = f(); // warmup
|
||||
for _ in 0..n {
|
||||
let (v, t) = f();
|
||||
times.push(t);
|
||||
last = v;
|
||||
}
|
||||
times.sort_unstable();
|
||||
let median = times[times.len() / 2];
|
||||
let min = *times.iter().min().unwrap();
|
||||
let max = *times.iter().max().unwrap();
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
name, last, median, min, max
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 5. spawn_storm_busy — workers loaded, then storm of zero-work spawns
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const STORM_BACKGROUND: u64 = 8; // number of background "busy" actors
|
||||
const STORM_SPAWN: u64 = 10_000; // zero-work spawns to time
|
||||
|
||||
fn bench_storm_smarm(threads: usize) -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let stop = Arc::new(AtomicBool::new(false));
|
||||
let c2 = counter.clone();
|
||||
let s2 = stop.clone();
|
||||
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(move || {
|
||||
// Background actors: yield in a tight loop until told to stop.
|
||||
let mut bg_handles = Vec::new();
|
||||
for _ in 0..STORM_BACKGROUND {
|
||||
let s = s2.clone();
|
||||
bg_handles.push(smarm::spawn(move || {
|
||||
while !s.load(Ordering::Relaxed) {
|
||||
smarm::yield_now();
|
||||
}
|
||||
}));
|
||||
}
|
||||
|
||||
// Storm: spawn 10k zero-work actors and join them all.
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..STORM_SPAWN {
|
||||
let cc = c2.clone();
|
||||
handles.push(smarm::spawn(move || {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
|
||||
// Tear down background.
|
||||
s2.store(true, Ordering::Relaxed);
|
||||
for h in bg_handles { h.join().unwrap(); }
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_storm_tokio_current() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let stop = Arc::new(AtomicBool::new(false));
|
||||
let c2 = counter.clone();
|
||||
let s2 = stop.clone();
|
||||
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut bg_handles = Vec::new();
|
||||
for _ in 0..STORM_BACKGROUND {
|
||||
let s = s2.clone();
|
||||
bg_handles.push(tokio::task::spawn_local(async move {
|
||||
while !s.load(Ordering::Relaxed) {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
}));
|
||||
}
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..STORM_SPAWN {
|
||||
let cc = c2.clone();
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
s2.store(true, Ordering::Relaxed);
|
||||
for h in bg_handles { let _ = h.await; }
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_storm_tokio_multi() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let stop = Arc::new(AtomicBool::new(false));
|
||||
let c2 = counter.clone();
|
||||
let s2 = stop.clone();
|
||||
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut bg_handles = Vec::new();
|
||||
for _ in 0..STORM_BACKGROUND {
|
||||
let s = s2.clone();
|
||||
bg_handles.push(tokio::spawn(async move {
|
||||
while !s.load(Ordering::Relaxed) {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
}));
|
||||
}
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..STORM_SPAWN {
|
||||
let cc = c2.clone();
|
||||
handles.push(tokio::spawn(async move {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
s2.store(true, Ordering::Relaxed);
|
||||
for h in bg_handles { let _ = h.await; }
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 6. mpsc_contention — 32 producers × 10k msgs into 1 consumer
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const MPSC_PRODUCERS: u64 = 32;
|
||||
const MPSC_PER_PRODUCER: u64 = 10_000;
|
||||
|
||||
fn bench_mpsc_smarm(threads: usize) -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(|| {
|
||||
let (tx, rx) = smarm::channel::<u64>();
|
||||
let mut prod_handles = Vec::new();
|
||||
for p in 0..MPSC_PRODUCERS {
|
||||
let tx = tx.clone();
|
||||
prod_handles.push(smarm::spawn(move || {
|
||||
for i in 0..MPSC_PER_PRODUCER {
|
||||
tx.send(p * MPSC_PER_PRODUCER + i).unwrap();
|
||||
}
|
||||
}));
|
||||
}
|
||||
drop(tx); // close once producers drop
|
||||
let consumer = smarm::spawn(move || {
|
||||
let mut count = 0u64;
|
||||
while let Ok(_) = rx.recv() {
|
||||
count += 1;
|
||||
}
|
||||
let _ = count; // discard; run() closure must return ()
|
||||
});
|
||||
for h in prod_handles { h.join().unwrap(); }
|
||||
let _ = consumer.join().unwrap();
|
||||
});
|
||||
(MPSC_PRODUCERS * MPSC_PER_PRODUCER, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_mpsc_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let mut prod_handles = Vec::new();
|
||||
for p in 0..MPSC_PRODUCERS {
|
||||
let tx = tx.clone();
|
||||
prod_handles.push(tokio::task::spawn_local(async move {
|
||||
for i in 0..MPSC_PER_PRODUCER {
|
||||
tx.send(p * MPSC_PER_PRODUCER + i).unwrap();
|
||||
}
|
||||
}));
|
||||
}
|
||||
drop(tx);
|
||||
let consumer = tokio::task::spawn_local(async move {
|
||||
let mut count = 0u64;
|
||||
while let Some(_) = rx.recv().await {
|
||||
count += 1;
|
||||
}
|
||||
count
|
||||
});
|
||||
for h in prod_handles { let _ = h.await; }
|
||||
let _ = consumer.await;
|
||||
});
|
||||
(MPSC_PRODUCERS * MPSC_PER_PRODUCER, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_mpsc_tokio_multi() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let mut prod_handles = Vec::new();
|
||||
for p in 0..MPSC_PRODUCERS {
|
||||
let tx = tx.clone();
|
||||
prod_handles.push(tokio::spawn(async move {
|
||||
for i in 0..MPSC_PER_PRODUCER {
|
||||
tx.send(p * MPSC_PER_PRODUCER + i).unwrap();
|
||||
}
|
||||
}));
|
||||
}
|
||||
drop(tx);
|
||||
let consumer = tokio::spawn(async move {
|
||||
let mut count = 0u64;
|
||||
while let Some(_) = rx.recv().await {
|
||||
count += 1;
|
||||
}
|
||||
count
|
||||
});
|
||||
for h in prod_handles { let _ = h.await; }
|
||||
let _ = consumer.await;
|
||||
});
|
||||
(MPSC_PRODUCERS * MPSC_PER_PRODUCER, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 7. many_timers — 10k sleeping actors waking in a tight window
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const TIMER_ACTORS: u64 = 10_000;
|
||||
const TIMER_MIN_MS: u64 = 1;
|
||||
const TIMER_MAX_MS: u64 = 10;
|
||||
|
||||
// Deterministic per-actor delay so iterations are comparable.
|
||||
fn timer_delay_ms(i: u64) -> u64 {
|
||||
TIMER_MIN_MS + (i * 2654435761u64 >> 32) % (TIMER_MAX_MS - TIMER_MIN_MS + 1)
|
||||
}
|
||||
|
||||
fn bench_timers_smarm(threads: usize) -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(|| {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..TIMER_ACTORS {
|
||||
let ms = timer_delay_ms(i);
|
||||
handles.push(smarm::spawn(move || {
|
||||
smarm::sleep(Duration::from_millis(ms));
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
(TIMER_ACTORS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_timers_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread()
|
||||
.enable_time()
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..TIMER_ACTORS {
|
||||
let ms = timer_delay_ms(i);
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
tokio::time::sleep(Duration::from_millis(ms)).await;
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(TIMER_ACTORS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_timers_tokio_multi() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.enable_time()
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..TIMER_ACTORS {
|
||||
let ms = timer_delay_ms(i);
|
||||
handles.push(tokio::spawn(async move {
|
||||
tokio::time::sleep(Duration::from_millis(ms)).await;
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(TIMER_ACTORS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 8. multi_thread_scaling — primes, sweep thread count
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const SCALING_N: u64 = 400_000;
|
||||
const SCALING_WORKERS: u64 = 64;
|
||||
|
||||
fn is_prime(n: u64) -> bool {
|
||||
if n < 2 { return false; }
|
||||
if n < 4 { return true; }
|
||||
if n % 2 == 0 { return false; }
|
||||
let mut i = 3u64;
|
||||
while i * i <= n { if n % i == 0 { return false; } i += 2; }
|
||||
true
|
||||
}
|
||||
|
||||
fn count_primes(lo: u64, hi: u64) -> u64 {
|
||||
(lo..hi).filter(|&n| is_prime(n)).count() as u64
|
||||
}
|
||||
|
||||
fn scaling_slice(w: u64) -> (u64, u64) {
|
||||
let per = SCALING_N / SCALING_WORKERS;
|
||||
let lo = w * per;
|
||||
let hi = if w + 1 == SCALING_WORKERS { SCALING_N } else { lo + per };
|
||||
(lo, hi)
|
||||
}
|
||||
|
||||
fn bench_scaling_smarm(threads: usize) -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(bench_cfg(threads)).run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..SCALING_WORKERS {
|
||||
let (lo, hi) = scaling_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(smarm::spawn(move || {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_scaling_tokio_multi(threads: usize) -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(threads)
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..SCALING_WORKERS {
|
||||
let (lo, hi) = scaling_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(tokio::spawn(async move {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// main
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Knob helper — reads SMARM_ALLOC_INTERVAL / SMARM_TIMESLICE_CYCLES env vars
|
||||
// so the sweep script can override the preemption knobs without recompiling.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn bench_cfg(threads: usize) -> smarm::runtime::Config {
|
||||
let mut cfg = smarm::runtime::Config::exact(threads);
|
||||
if let Ok(v) = std::env::var("SMARM_ALLOC_INTERVAL") {
|
||||
if let Ok(n) = v.parse::<u32>() { cfg = cfg.alloc_interval(n); }
|
||||
}
|
||||
if let Ok(v) = std::env::var("SMARM_TIMESLICE_CYCLES") {
|
||||
if let Ok(n) = v.parse::<u64>() { cfg = cfg.timeslice_cycles(n); }
|
||||
}
|
||||
cfg
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let n = available_threads();
|
||||
println!("smarm tokio-favored benchmarks");
|
||||
println!("available parallelism: {n} threads");
|
||||
println!("ITERS={ITERS} (+1 warmup, discarded)");
|
||||
println!(
|
||||
"STORM_BACKGROUND={STORM_BACKGROUND}, STORM_SPAWN={STORM_SPAWN}, \
|
||||
MPSC={MPSC_PRODUCERS}×{MPSC_PER_PRODUCER}, \
|
||||
TIMER_ACTORS={TIMER_ACTORS} ({TIMER_MIN_MS}–{TIMER_MAX_MS} ms), \
|
||||
SCALING_N={SCALING_N}/{SCALING_WORKERS}"
|
||||
);
|
||||
|
||||
// ---- 5. spawn_storm_busy ----
|
||||
print_header(&format!(
|
||||
"spawn_storm_busy: {STORM_BACKGROUND} bg yielders + {STORM_SPAWN} zero-work spawns"
|
||||
));
|
||||
run_n("smarm 1-thread", ITERS, || bench_storm_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_storm_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_storm_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_storm_tokio_multi);
|
||||
|
||||
// ---- 6. mpsc_contention ----
|
||||
print_header(&format!(
|
||||
"mpsc_contention: {MPSC_PRODUCERS} producers × {MPSC_PER_PRODUCER} msgs → 1 consumer"
|
||||
));
|
||||
run_n("smarm 1-thread", ITERS, || bench_mpsc_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_mpsc_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_mpsc_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_mpsc_tokio_multi);
|
||||
|
||||
// ---- 7. many_timers ----
|
||||
print_header(&format!(
|
||||
"many_timers: {TIMER_ACTORS} actors sleeping {TIMER_MIN_MS}–{TIMER_MAX_MS} ms"
|
||||
));
|
||||
run_n("smarm 1-thread", ITERS, || bench_timers_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_timers_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_timers_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_timers_tokio_multi);
|
||||
|
||||
// ---- 8. multi_thread_scaling ----
|
||||
print_header(&format!(
|
||||
"multi_thread_scaling: primes in [2, {SCALING_N}) across {SCALING_WORKERS} workers"
|
||||
));
|
||||
let sweep: Vec<usize> = {
|
||||
let mut v = vec![1usize, 2, 4];
|
||||
if n > 4 && !v.contains(&n) { v.push(n); }
|
||||
v.into_iter().filter(|t| *t <= n).collect()
|
||||
};
|
||||
for t in &sweep {
|
||||
run_n(&format!("smarm {t}-thread"), ITERS, || bench_scaling_smarm(*t));
|
||||
}
|
||||
for t in &sweep {
|
||||
run_n(&format!("tokio multi {t}-thread"), ITERS, || bench_scaling_tokio_multi(*t));
|
||||
}
|
||||
}
|
||||
177
benchmarks.md
Normal file
177
benchmarks.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# Benchmarks
|
||||
|
||||
Regression-test and tuning reference for smarm vs tokio.
|
||||
|
||||
## Running
|
||||
|
||||
```sh
|
||||
cargo bench --bench primes # original compute bench
|
||||
cargo bench --bench multi_scheduler # original 3-workload bench
|
||||
cargo bench --bench general # benches 1–4
|
||||
cargo bench --bench tokio_favored # benches 5–8
|
||||
cargo bench --bench smarm_favored # benches 9–12
|
||||
```
|
||||
|
||||
Each bench runs one warmup iteration (discarded) and 15 measured iterations.
|
||||
Results are reported as median / min / max in microseconds. Median is the
|
||||
headline number; the spread between min and max indicates measurement
|
||||
stability.
|
||||
|
||||
## Methodology notes
|
||||
|
||||
- The harness times wall-clock elapsed for the full workload, including
|
||||
runtime startup and shutdown. For multi-thread runtimes this means worker
|
||||
thread spawn cost is included; on short-lived benches this can dominate.
|
||||
Where startup matters, the bench is structured so the workload is much
|
||||
longer than typical startup.
|
||||
- `tokio` uses `new_current_thread` + `LocalSet` for the single-threaded
|
||||
comparison and `new_multi_thread().worker_threads(N)` for parallel.
|
||||
`smarm::runtime::Config::exact(N)` is the equivalent knob.
|
||||
- mpsc choice: tokio's `unbounded_channel` to match smarm's unbounded channel
|
||||
semantics. Bounded comparisons would need a separate suite.
|
||||
- Random delays in `many_timers` use a deterministic mixing function of the
|
||||
actor index so iterations are reproducible.
|
||||
|
||||
## Bench catalog
|
||||
|
||||
### General — neither runtime structurally favored
|
||||
|
||||
| # | Bench | Stresses | Prediction |
|
||||
|---|---------------------|-------------------------------------------------|--------------------|
|
||||
| 1 | `chained_spawn` | Spawn + exit overhead in a serial chain | Roughly even |
|
||||
| 2 | `yield_many` | Pure scheduling throughput, explicit yields | Roughly even |
|
||||
| 3 | `fan_out_compute` | CPU-bound parallel work, minimal coordination | Even (compute-bound) |
|
||||
| 4 | `ping_pong_oneshot` | Spawn + oneshot round-trip latency | Roughly even |
|
||||
|
||||
A regression here means a real change in per-task or per-yield cost — those
|
||||
should be investigated regardless of which runtime got slower.
|
||||
|
||||
### Tokio-favored — measures cost of smarm's design choices
|
||||
|
||||
| # | Bench | Stresses | Why tokio should win |
|
||||
|---|-------------------------|-------------------------------------------------------|-----------------------------------------------------------------------------------|
|
||||
| 5 | `spawn_storm_busy` | 8 background yielders + 10k zero-work spawns | Tokio's per-worker deque + LIFO slot vs smarm's global `Mutex<SharedState>` queue |
|
||||
| 6 | `mpsc_contention` | 32 producers × 10k msgs → 1 consumer | Tokio's mpsc is lock-free on the hot path; smarm channel is `Arc<Mutex<Inner>>` + runtime mutex on each unpark |
|
||||
| 7 | `many_timers` | 10k actors sleeping 1–10 ms, dense wake window | Tokio's per-worker sharded timer wheel vs smarm's single shared min-heap |
|
||||
| 8 | `multi_thread_scaling` | Primes, sweep thread count 1, 2, 4, available | Tokio scales near-linearly; smarm hits its mutex ceiling |
|
||||
|
||||
A regression here means a smarm design choice got more expensive. Widening
|
||||
gaps signal something to investigate; narrowing gaps after a tuning change is
|
||||
the desired direction.
|
||||
|
||||
### Smarm-favored — measures payoff of green-thread + stackful design
|
||||
|
||||
| # | Bench | Stresses | Why smarm should win |
|
||||
|----|------------------------|-----------------------------------------------------------|---------------------------------------------------------------------------------|
|
||||
| 9 | `deep_recursion` | Actor recurses 1000 deep, returns | Native stack growth vs tokio's per-level `Box::pin` |
|
||||
| 10 | `yield_in_hot_loop` | 2 actors, 500k yields each, single thread | Naked context switch (~6 GPRs + xmm save + ret) vs poll → state machine → schedule |
|
||||
| 11 | `uncontended_channel` | 1→1, 1M msgs, single thread | Mutex is essentially free uncontended; green-thread switch is cheaper than poll |
|
||||
| 12 | `catch_unwind_panics` | 10k spawns, 50% panic | Smarm has `catch_unwind` at the actor entry; both runtimes do this but the boundaries differ — exploratory |
|
||||
|
||||
A regression here means we lost some of smarm's structural advantage. #12 is
|
||||
exploratory — if the baseline shows no real gap, drop it.
|
||||
|
||||
## Baseline (v0.3.0, Intel Xeon @ 2.80GHz, 1 core, kernel 6.18.5, rustc 1.95.0, RUSTFLAGS: none)
|
||||
|
||||
> Sandbox environment has only 1 logical CPU. All multi-thread rows (smarm Nt,
|
||||
> tokio mt) are equivalent to 1-thread; scaling sweep is limited to 1 thread.
|
||||
> Label duplication in bench output ("smarm 1-thread" appearing twice) is
|
||||
> because available_parallelism() == 1, so the N-thread variant is identical.
|
||||
|
||||
| Bench | smarm 1t | smarm Nt | tokio ct | tokio mt | Notes |
|
||||
|---------------------|----------|----------|----------|----------|-------|
|
||||
| chained_spawn | 7136 | 6979 | 113 | 176 | smarm ~60x slower; spawn+stack alloc dominates on 1 CPU |
|
||||
| yield_many | 40079 | 40073 | 14571 | 14044 | smarm ~2.8x slower; scheduling overhead real |
|
||||
| fan_out_compute | 19347 | 19461 | 18616 | 18905 | roughly even; compute-bound as expected |
|
||||
| ping_pong_oneshot | 13731 | 14176 | 828 | 3342 | smarm ~17x slower; per-round spawn+join cost high |
|
||||
| spawn_storm_busy | 105512 | 107113 | 2222 | 4546 | smarm ~47x slower; global mutex under 8 bg yielders |
|
||||
| mpsc_contention | 10456 | 10395 | 17348 | 18628 | smarm wins; uncontended mutex essentially free on 1-thread |
|
||||
| many_timers | 120242 | 121023 | 13581 | 14266 | smarm ~9x slower; single min-heap vs sharded wheel |
|
||||
| multi_thread_scaling — see thread-count sweep below |
|
||||
| deep_recursion | 62 | 71 | 22 | 44 | tokio wins unexpectedly; see sanity-check notes |
|
||||
| yield_in_hot_loop | 182177 | — | 138335 | — | tokio wins; smarm prediction wrong; see notes |
|
||||
| uncontended_channel | 31473 | — | 51925 | — | smarm wins as predicted; ~1.65x |
|
||||
| catch_unwind_panics | 112306 | 114305 | 151443 | 161344 | smarm wins as predicted; ~1.35x |
|
||||
|
||||
### `multi_thread_scaling` thread-count sweep (median µs)
|
||||
|
||||
> Sandbox has 1 logical CPU; only 1-thread row is available.
|
||||
|
||||
| Threads | smarm | tokio mt |
|
||||
|---------|-------|----------|
|
||||
| 1 | 19852 | 19638 |
|
||||
| 2 | — | — |
|
||||
| 4 | — | — |
|
||||
| N (avail=1) | 19852 | 19638 |
|
||||
|
||||
## Tuning experiments
|
||||
|
||||
### Reduction-budget sweep
|
||||
|
||||
`smarm` uses an allocator-driven preemption mechanism: every Nth allocation,
|
||||
the actor checks RDTSC against its timeslice start and yields if over budget.
|
||||
The Nth-allocation threshold (the "reduction budget") and the timeslice
|
||||
duration are the two knobs.
|
||||
|
||||
Record each experiment as a row below. Reference the commit or the parameter
|
||||
values explicitly.
|
||||
|
||||
| Date | Configuration | Bench (or "all") | Result vs baseline | Notes |
|
||||
|------|----------------------------|----------------------|------------------------------|-------|
|
||||
| | baseline | all | — | |
|
||||
| | budget=…, timeslice=… | | | |
|
||||
| | | | | |
|
||||
|
||||
When the gap on tokio-favored benches narrows without regressing
|
||||
smarm-favored benches, the change is a keeper. If a budget change improves
|
||||
one workload but regresses another by more, prefer keeping the broader-impact
|
||||
configuration unless we have a clear use case for the trade-off.
|
||||
|
||||
## Sanity-check notes (baseline run)
|
||||
|
||||
### Compile fixes applied
|
||||
|
||||
Two bench files had a type error: `smarm::Runtime::run()` takes
|
||||
`impl FnOnce() + Send + 'static` (returns `()`), but the consumer closures
|
||||
in `bench_mpsc_smarm` (tokio_favored.rs) and `bench_unc_smarm`
|
||||
(smarm_favored.rs) returned `u64` via a bare `count` tail expression. Fixed
|
||||
by changing the tail to `let _ = count;` in both closures, and the
|
||||
corresponding `consumer.join().unwrap()` calls to `let _ = consumer.join()...`.
|
||||
No workload semantics changed.
|
||||
|
||||
### Single-CPU sandbox caveat
|
||||
|
||||
`available_parallelism()` returns 1, so every "N-thread" variant is identical
|
||||
to "1-thread". Multi-thread results should not be used to draw scaling
|
||||
conclusions; re-run on a multi-core machine before committing to the tuning
|
||||
sweep.
|
||||
|
||||
### Predicted-winner mismatches
|
||||
|
||||
**`deep_recursion` — tokio wins (22 µs) over smarm (62 µs).**
|
||||
At depth 500, smarm spawns a fresh actor which requires mmap'ing a 64 KiB
|
||||
stack; that allocation cost dominates the actual recursion. Tokio's
|
||||
Box::pin recursion allocates 500 small heap objects but avoids the mmap.
|
||||
The prediction assumed stack allocation was amortised across many uses; here
|
||||
the actor is single-use. Not a bug, but the bench may not exercise the
|
||||
intended advantage.
|
||||
|
||||
**`yield_in_hot_loop` — tokio wins (138 ms) over smarm (182 ms).**
|
||||
The prediction was that smarm's ~6-GPR naked context switch would beat
|
||||
tokio's poll/state-machine cycle. In practice, on a single-thread sandbox,
|
||||
tokio's current_thread scheduler has very low overhead per yield_now, while
|
||||
smarm's yield_now still goes through the runtime mutex and run-queue even on
|
||||
a single thread. This is a meaningful data point: smarm's scheduling overhead
|
||||
is not as low as the assembly switch cost alone suggests.
|
||||
|
||||
### Noise / spread
|
||||
|
||||
- `catch_unwind_panics` smarm spread is reasonable (~10% min/max).
|
||||
- `spawn_storm_busy` tokio multi-thread has notable spread (3833–7305 µs);
|
||||
consistent with tokio issue #3829 noted in task spec.
|
||||
- `many_timers` smarm spread acceptable (~10%).
|
||||
|
||||
### Result-column equivalence
|
||||
|
||||
All result columns match between runtimes for every bench (same prime counts,
|
||||
same message totals, same task counts). Workloads are equivalent.
|
||||
@@ -1,12 +1,8 @@
|
||||
//! Unbounded MPSC channels.
|
||||
//!
|
||||
//! Single-threaded scheduler: the inner state is `Rc<RefCell<Inner<T>>>`,
|
||||
//! not `Arc<Mutex>`. We hand-implement `Send` for `Sender<T>` and
|
||||
//! `Receiver<T>` when `T: Send`, on the basis that the only way two actor
|
||||
//! contexts touch the same channel is by being scheduled on the *same* OS
|
||||
//! thread (v0.1 has exactly one). When we add a second scheduler thread,
|
||||
//! this lie must be retired: replace `Rc<RefCell>` with `Arc<Mutex>` (or a
|
||||
//! lock-free queue) and remove the unsafe Send impls.
|
||||
//! Inner state is `Arc<Mutex<Inner<T>>>` so channels can be sent across OS
|
||||
//! threads (required for the multi-scheduler runtime where a sender and
|
||||
//! receiver may run on different scheduler threads simultaneously).
|
||||
//!
|
||||
//! Semantics:
|
||||
//! - Senders are clonable; the last sender drop closes the channel.
|
||||
@@ -19,12 +15,11 @@
|
||||
//! parked, the receiver is unparked.
|
||||
|
||||
use crate::pid::Pid;
|
||||
use std::cell::RefCell;
|
||||
use std::collections::VecDeque;
|
||||
use std::rc::Rc;
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
pub fn channel<T>() -> (Sender<T>, Receiver<T>) {
|
||||
let inner = Rc::new(RefCell::new(Inner {
|
||||
let inner = Arc::new(Mutex::new(Inner {
|
||||
queue: VecDeque::new(),
|
||||
parked_receiver: None,
|
||||
senders: 1,
|
||||
@@ -41,20 +36,13 @@ struct Inner<T> {
|
||||
}
|
||||
|
||||
pub struct Sender<T> {
|
||||
inner: Rc<RefCell<Inner<T>>>,
|
||||
inner: Arc<Mutex<Inner<T>>>,
|
||||
}
|
||||
|
||||
pub struct Receiver<T> {
|
||||
inner: Rc<RefCell<Inner<T>>>,
|
||||
inner: Arc<Mutex<Inner<T>>>,
|
||||
}
|
||||
|
||||
// SAFETY (v0.1 only): the scheduler is single-threaded. Sender/Receiver can
|
||||
// be captured into actor closures (which require Send), but they will only
|
||||
// ever be touched from one OS thread. When multi-threading lands, swap the
|
||||
// `Rc<RefCell>` for `Arc<Mutex>` and remove these.
|
||||
unsafe impl<T: Send> Send for Sender<T> {}
|
||||
unsafe impl<T: Send> Send for Receiver<T> {}
|
||||
|
||||
#[derive(Debug, PartialEq, Eq)]
|
||||
pub struct SendError<T>(pub T);
|
||||
|
||||
@@ -71,7 +59,7 @@ impl std::error::Error for RecvError {}
|
||||
|
||||
impl<T> Clone for Sender<T> {
|
||||
fn clone(&self) -> Self {
|
||||
self.inner.borrow_mut().senders += 1;
|
||||
self.inner.lock().unwrap().senders += 1;
|
||||
Sender { inner: self.inner.clone() }
|
||||
}
|
||||
}
|
||||
@@ -79,11 +67,9 @@ impl<T> Clone for Sender<T> {
|
||||
impl<T> Drop for Sender<T> {
|
||||
fn drop(&mut self) {
|
||||
let unpark = {
|
||||
let mut g = self.inner.borrow_mut();
|
||||
let mut g = self.inner.lock().unwrap();
|
||||
g.senders -= 1;
|
||||
if g.senders == 0 && g.queue.is_empty() {
|
||||
// Channel closed and drained. Wake the receiver so it can
|
||||
// see RecvError.
|
||||
g.parked_receiver.take()
|
||||
} else {
|
||||
None
|
||||
@@ -97,23 +83,27 @@ impl<T> Drop for Sender<T> {
|
||||
|
||||
impl<T> Drop for Receiver<T> {
|
||||
fn drop(&mut self) {
|
||||
self.inner.borrow_mut().receiver_alive = false;
|
||||
self.inner.lock().unwrap().receiver_alive = false;
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Sender<T> {
|
||||
pub fn send(&self, value: T) -> Result<(), SendError<T>> {
|
||||
let unpark = {
|
||||
let mut g = self.inner.borrow_mut();
|
||||
let mut g = self.inner.lock().unwrap();
|
||||
if !g.receiver_alive {
|
||||
return Err(SendError(value));
|
||||
}
|
||||
g.queue.push_back(value);
|
||||
// If the receiver is parked, unpark it.
|
||||
g.parked_receiver.take()
|
||||
};
|
||||
if let Some(pid) = unpark {
|
||||
let me = crate::actor::current_pid();
|
||||
crate::te!(crate::trace::Event::Send { sender: me.unwrap_or(crate::pid::Pid::new(u32::MAX, u32::MAX)), receiver: Some(pid) });
|
||||
crate::scheduler::unpark(pid);
|
||||
} else {
|
||||
let me = crate::actor::current_pid();
|
||||
crate::te!(crate::trace::Event::Send { sender: me.unwrap_or(crate::pid::Pid::new(u32::MAX, u32::MAX)), receiver: None });
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
@@ -122,16 +112,14 @@ impl<T> Sender<T> {
|
||||
impl<T> Receiver<T> {
|
||||
pub fn recv(&self) -> Result<T, RecvError> {
|
||||
loop {
|
||||
// Try to take a message.
|
||||
{
|
||||
let mut g = self.inner.borrow_mut();
|
||||
let mut g = self.inner.lock().unwrap();
|
||||
if let Some(v) = g.queue.pop_front() {
|
||||
return Ok(v);
|
||||
}
|
||||
if g.senders == 0 {
|
||||
return Err(RecvError);
|
||||
}
|
||||
// Empty + open: register and park.
|
||||
let me = crate::actor::current_pid()
|
||||
.expect("recv() called outside an actor");
|
||||
debug_assert!(
|
||||
@@ -139,19 +127,21 @@ impl<T> Receiver<T> {
|
||||
"channel has more than one receiver"
|
||||
);
|
||||
g.parked_receiver = Some(me);
|
||||
crate::te!(crate::trace::Event::RecvPark(me));
|
||||
}
|
||||
// Release the borrow before parking — the unparker will need it.
|
||||
// Release the lock before parking — the unparker will need it.
|
||||
crate::scheduler::park_current();
|
||||
// Loop: the message that woke us might already have been taken
|
||||
// (it can't, with one receiver, but the senders=0 path can fire
|
||||
// here too).
|
||||
// Woken up — record it before looping to check the queue.
|
||||
if let Some(me) = crate::actor::current_pid() {
|
||||
crate::te!(crate::trace::Event::RecvWake(me));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Non-blocking. `Ok(Some(v))` if a message was available, `Ok(None)` if
|
||||
/// the channel is empty but open, `Err(RecvError)` if closed and drained.
|
||||
pub fn try_recv(&self) -> Result<Option<T>, RecvError> {
|
||||
let mut g = self.inner.borrow_mut();
|
||||
let mut g = self.inner.lock().unwrap();
|
||||
if let Some(v) = g.queue.pop_front() {
|
||||
return Ok(Some(v));
|
||||
}
|
||||
|
||||
521
src/io.rs
Normal file
521
src/io.rs
Normal file
@@ -0,0 +1,521 @@
|
||||
//! Off-scheduler IO: blocking-work offload and epoll-based fd readiness.
|
||||
//!
|
||||
//! `block_on_io(closure)` runs `closure` on a dedicated worker OS thread,
|
||||
//! parks the calling actor in the meantime, and returns the closure's
|
||||
//! value when it completes. Lets actors call into blocking C libraries,
|
||||
//! synchronous file IO, or anything else that doesn't fit the readiness
|
||||
//! model.
|
||||
//!
|
||||
//! `wait_readable(fd)` / `wait_writable(fd)` register interest in an fd
|
||||
//! with epoll and park the calling actor. When the fd becomes ready, the
|
||||
//! epoll thread unparks the actor. The actual `read(2)`/`write(2)` syscall
|
||||
//! runs back on the scheduler thread, *inside* the actor — buffer never
|
||||
//! leaves the actor, no copying through an intermediary thread. Built on
|
||||
//! these are the conveniences `read(fd, &mut buf)` and `write(fd, &buf)`.
|
||||
//!
|
||||
//! Architecture
|
||||
//! ============
|
||||
//! Per `run()`, two OS threads:
|
||||
//! - **epoll thread**: owns the epollfd. Loops in `epoll_wait`. On a
|
||||
//! ready fd, pushes `Completion::FdReady { pid, fd, events }` to the
|
||||
//! shared completion queue and writes the scheduler-wake pipe. On the
|
||||
//! shutdown pipe (also registered in epollfd), exits.
|
||||
//! - **pool thread**: blocks on the request mpsc. Runs the closure
|
||||
//! inside `catch_unwind`, pushes `Completion::Blocking { pid, result }`,
|
||||
//! writes the scheduler-wake pipe.
|
||||
//!
|
||||
//! Both threads share a single `completions: Arc<Mutex<VecDeque<Completion>>>`
|
||||
//! and the same scheduler-wake pipe.
|
||||
//!
|
||||
//! `epoll_ctl` (register/unregister fd interest) is called by the
|
||||
//! scheduler thread *directly* on the epollfd. That's well-defined per
|
||||
//! `epoll_ctl(2)`: a thread may be calling `epoll_wait` on the epollfd
|
||||
//! while another thread calls `epoll_ctl`. Avoids needing a second mpsc
|
||||
//! and a second wake mechanism.
|
||||
//!
|
||||
//! Epoll mode
|
||||
//! ==========
|
||||
//! Level-triggered with EPOLLONESHOT. After a wakeup the kernel
|
||||
//! auto-disarms the fd, so we never get two wakeups for one
|
||||
//! `wait_readable` call. The scheduler explicitly `EPOLL_CTL_DEL`s the fd
|
||||
//! on completion to free the slot for re-registration. Net effect: each
|
||||
//! `wait_readable(fd)` is one ADD, one wakeup, one DEL — symmetric and
|
||||
//! stateless between calls.
|
||||
//!
|
||||
//! Fd hygiene
|
||||
//! ==========
|
||||
//! If an actor dies while waiting on an fd, the registration is leaked
|
||||
//! (the fd stays in the epollfd, armed). EPOLLONESHOT bounds the damage:
|
||||
//! at most one stale wakeup, after which the kernel disarms. The stale
|
||||
//! wakeup hits a dead pid in `waiters` and is dropped. Acceptable for v0.2;
|
||||
//! a future pass should DEL on actor death.
|
||||
//!
|
||||
//! Buffers used with `read`/`write` should be on fds opened with
|
||||
//! `O_NONBLOCK`. If they aren't, the syscall may block the scheduler
|
||||
//! thread despite the readiness notification (the fd reporting readable
|
||||
//! doesn't guarantee the syscall completes without blocking — e.g. a
|
||||
//! signal could be delivered). Documented; not enforced.
|
||||
//!
|
||||
//! Panic handling
|
||||
//! ==============
|
||||
//! The pool worker runs the closure inside `catch_unwind` and ships either
|
||||
//! the return value or the panic payload back to the scheduler.
|
||||
//! `block_on_io` resumes the panic on the calling actor's stack, so the
|
||||
//! actor's supervisor sees a real `Signal::Panic` as if the work had run
|
||||
//! inline. Fd-wait primitives don't run user code on the IO thread, so
|
||||
//! they have no equivalent panic-propagation path.
|
||||
|
||||
use crate::pid::Pid;
|
||||
use std::any::Any;
|
||||
use std::collections::{HashMap, VecDeque};
|
||||
use std::io;
|
||||
use std::os::fd::RawFd;
|
||||
use std::panic;
|
||||
use std::sync::mpsc;
|
||||
use std::sync::{Arc, Mutex};
|
||||
use std::thread::JoinHandle as OsJoinHandle;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Wire types
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// What the pool stores while computing a result. `Ok` is the closure's
|
||||
/// return value (boxed as `Any`); `Err` is the panic payload.
|
||||
pub type IoResult = Result<Box<dyn Any + Send>, Box<dyn Any + Send>>;
|
||||
|
||||
struct Request {
|
||||
pid: Pid,
|
||||
/// The work to perform. Returns the wire-form result directly.
|
||||
work: Box<dyn FnOnce() -> IoResult + Send>,
|
||||
}
|
||||
|
||||
/// Completion message from either IO thread back to the scheduler.
|
||||
pub enum Completion {
|
||||
/// A `block_on_io` closure has finished (Ok = return value, Err = panic
|
||||
/// payload).
|
||||
Blocking { pid: Pid, result: IoResult },
|
||||
/// An fd registered via `wait_readable`/`wait_writable` is ready. The
|
||||
/// scheduler looks up the parked pid in `waiters`, unparks it, and
|
||||
/// removes the entry. `pid` isn't in this variant because the epoll
|
||||
/// thread doesn't have access to the `waiters` map; the scheduler
|
||||
/// thread owns that.
|
||||
FdReady { fd: RawFd, events: u32 },
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// IoThread — created per `run()`, owned by `SchedulerState`.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub struct IoThread {
|
||||
// ----- Channels & queues -----
|
||||
|
||||
/// Submission queue into the blocking-work pool.
|
||||
tx: mpsc::Sender<Request>,
|
||||
/// Shared completion queue, fed by both the pool and the epoll thread.
|
||||
completions: Arc<Mutex<VecDeque<Completion>>>,
|
||||
/// Pipe the scheduler polls in its idle path. Both IO threads write to
|
||||
/// `wake_write` after pushing a completion.
|
||||
wake_read: RawFd,
|
||||
wake_write: RawFd,
|
||||
|
||||
// ----- Epoll machinery -----
|
||||
|
||||
/// The epollfd, owned by `IoThread`. Callable cross-thread via
|
||||
/// `epoll_ctl` per the man page.
|
||||
epollfd: RawFd,
|
||||
/// Pipe used to signal the epoll thread to exit. Registered inside the
|
||||
/// epollfd so a single `epoll_wait` covers both fd readiness and
|
||||
/// shutdown.
|
||||
shutdown_read: RawFd,
|
||||
shutdown_write: RawFd,
|
||||
/// One parked actor per registered fd. Populated by `wait_readable` /
|
||||
/// `wait_writable` and drained by the scheduler when a `FdReady`
|
||||
/// completion is processed.
|
||||
pub waiters: HashMap<RawFd, Pid>,
|
||||
|
||||
// ----- Threads -----
|
||||
|
||||
pool_thread: Option<OsJoinHandle<()>>,
|
||||
epoll_thread: Option<OsJoinHandle<()>>,
|
||||
|
||||
/// Number of `block_on_io` requests in-flight. Used by the scheduler's
|
||||
/// idle path to decide whether to wait on the pipe or exit. Fd waits
|
||||
/// are not counted here; they're counted by `waiters.len()`.
|
||||
pub outstanding: u32,
|
||||
}
|
||||
|
||||
impl IoThread {
|
||||
pub fn start() -> io::Result<Self> {
|
||||
// Scheduler-facing wake pipe.
|
||||
let (wake_read, wake_write) = make_pipe()?;
|
||||
// Pool submission channel + shared completion queue.
|
||||
let (tx, rx) = mpsc::channel::<Request>();
|
||||
let completions: Arc<Mutex<VecDeque<Completion>>> =
|
||||
Arc::new(Mutex::new(VecDeque::new()));
|
||||
|
||||
// Epoll machinery.
|
||||
let epollfd = unsafe { libc::epoll_create1(libc::EPOLL_CLOEXEC) };
|
||||
if epollfd < 0 {
|
||||
// Best-effort fd cleanup before bailing.
|
||||
unsafe {
|
||||
libc::close(wake_read);
|
||||
libc::close(wake_write);
|
||||
}
|
||||
return Err(io::Error::last_os_error());
|
||||
}
|
||||
|
||||
let (shutdown_read, shutdown_write) = match make_pipe() {
|
||||
Ok(p) => p,
|
||||
Err(e) => {
|
||||
unsafe {
|
||||
libc::close(epollfd);
|
||||
libc::close(wake_read);
|
||||
libc::close(wake_write);
|
||||
}
|
||||
return Err(e);
|
||||
}
|
||||
};
|
||||
|
||||
// Register the shutdown pipe in epollfd. We use a sentinel `data`
|
||||
// value to recognise shutdown events. RawFd values are non-negative,
|
||||
// so u64::MAX is unambiguously not a real fd-data encoding.
|
||||
let mut shutdown_ev = libc::epoll_event {
|
||||
events: libc::EPOLLIN as u32,
|
||||
u64: SHUTDOWN_EPOLL_TOKEN,
|
||||
};
|
||||
if unsafe {
|
||||
libc::epoll_ctl(
|
||||
epollfd,
|
||||
libc::EPOLL_CTL_ADD,
|
||||
shutdown_read,
|
||||
&mut shutdown_ev as *mut _,
|
||||
)
|
||||
} < 0
|
||||
{
|
||||
let e = io::Error::last_os_error();
|
||||
unsafe {
|
||||
libc::close(epollfd);
|
||||
libc::close(shutdown_read);
|
||||
libc::close(shutdown_write);
|
||||
libc::close(wake_read);
|
||||
libc::close(wake_write);
|
||||
}
|
||||
return Err(e);
|
||||
}
|
||||
|
||||
// Spawn pool thread.
|
||||
let pool_comps = completions.clone();
|
||||
let pool_thread = std::thread::Builder::new()
|
||||
.name("smarm-io-pool".into())
|
||||
.spawn(move || pool_loop(rx, pool_comps, wake_write))?;
|
||||
|
||||
// Spawn epoll thread.
|
||||
let epoll_comps = completions.clone();
|
||||
let epoll_thread = std::thread::Builder::new()
|
||||
.name("smarm-io-epoll".into())
|
||||
.spawn(move || epoll_loop(epollfd, epoll_comps, wake_write))?;
|
||||
|
||||
Ok(Self {
|
||||
tx,
|
||||
completions,
|
||||
wake_read,
|
||||
wake_write,
|
||||
epollfd,
|
||||
shutdown_read,
|
||||
shutdown_write,
|
||||
waiters: HashMap::new(),
|
||||
pool_thread: Some(pool_thread),
|
||||
epoll_thread: Some(epoll_thread),
|
||||
outstanding: 0,
|
||||
})
|
||||
}
|
||||
|
||||
/// Hand a request to the pool. Increments `outstanding`.
|
||||
pub fn submit(&mut self, pid: Pid, work: Box<dyn FnOnce() -> IoResult + Send>) {
|
||||
self.outstanding += 1;
|
||||
// Send can only fail if the pool has hung up, which only happens
|
||||
// on shutdown. submit during shutdown is a bug.
|
||||
self.tx
|
||||
.send(Request { pid, work })
|
||||
.expect("io pool hung up unexpectedly");
|
||||
}
|
||||
|
||||
/// Drain every available completion. Caller (the scheduler) routes the
|
||||
/// results and updates `outstanding` / `waiters` accordingly.
|
||||
pub fn drain_completions(&mut self) -> Vec<Completion> {
|
||||
let mut q = self.completions.lock().unwrap();
|
||||
let mut out = Vec::with_capacity(q.len());
|
||||
while let Some(c) = q.pop_front() {
|
||||
out.push(c);
|
||||
}
|
||||
out
|
||||
}
|
||||
|
||||
pub fn wake_fd(&self) -> RawFd {
|
||||
self.wake_read
|
||||
}
|
||||
|
||||
/// Register interest in `fd` becoming readable/writable; record `pid`
|
||||
/// as the parked waiter. The epoll thread will push a `FdReady`
|
||||
/// completion when the kernel signals.
|
||||
///
|
||||
/// EPOLLONESHOT: one wakeup per registration. The scheduler must
|
||||
/// `epoll_del` on completion to free the slot for re-registration.
|
||||
pub fn epoll_register(
|
||||
&mut self,
|
||||
fd: RawFd,
|
||||
pid: Pid,
|
||||
readable: bool,
|
||||
writable: bool,
|
||||
) -> io::Result<()> {
|
||||
// Two actors waiting on the same fd would be a misuse: the kernel
|
||||
// delivers exactly one EPOLLONESHOT wakeup, so the second waiter
|
||||
// would hang. Reject up front.
|
||||
if self.waiters.contains_key(&fd) {
|
||||
return Err(io::Error::new(
|
||||
io::ErrorKind::AlreadyExists,
|
||||
"fd already has a parked waiter",
|
||||
));
|
||||
}
|
||||
|
||||
// Defensive cleanup: if a previous actor died while waiting on this
|
||||
// fd, the kernel-side registration was leaked (we don't walk all
|
||||
// waiters on actor death). A bare DEL is harmless if the fd isn't
|
||||
// registered (ENOENT), and removes any leak.
|
||||
unsafe {
|
||||
libc::epoll_ctl(self.epollfd, libc::EPOLL_CTL_DEL, fd, std::ptr::null_mut());
|
||||
}
|
||||
|
||||
let mut events: u32 = libc::EPOLLONESHOT as u32;
|
||||
if readable {
|
||||
events |= libc::EPOLLIN as u32;
|
||||
}
|
||||
if writable {
|
||||
events |= libc::EPOLLOUT as u32;
|
||||
}
|
||||
let mut ev = libc::epoll_event {
|
||||
events,
|
||||
u64: fd as u64,
|
||||
};
|
||||
let r = unsafe {
|
||||
libc::epoll_ctl(self.epollfd, libc::EPOLL_CTL_ADD, fd, &mut ev as *mut _)
|
||||
};
|
||||
if r < 0 {
|
||||
return Err(io::Error::last_os_error());
|
||||
}
|
||||
self.waiters.insert(fd, pid);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Remove `fd` from the epollfd. Called by the scheduler after a
|
||||
/// `FdReady` completion, so the next `wait_readable(fd)` can ADD again.
|
||||
///
|
||||
/// Does NOT touch `waiters` — that's the scheduler's bookkeeping; this
|
||||
/// is purely the kernel-side cleanup.
|
||||
pub fn epoll_deregister(&mut self, fd: RawFd) {
|
||||
// EPOLL_CTL_DEL of an already-removed fd returns ENOENT; ignore.
|
||||
unsafe {
|
||||
libc::epoll_ctl(self.epollfd, libc::EPOLL_CTL_DEL, fd, std::ptr::null_mut());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for IoThread {
|
||||
fn drop(&mut self) {
|
||||
// 1. Signal the epoll thread to exit by writing the shutdown pipe.
|
||||
unsafe {
|
||||
let buf: [u8; 1] = [0];
|
||||
// Single byte; we don't care about EINTR retry here — worst
|
||||
// case the epoll thread blocks until process exit, which is
|
||||
// fine because we then close fds out from under it.
|
||||
libc::write(self.shutdown_write, buf.as_ptr() as *const _, 1);
|
||||
}
|
||||
|
||||
// 2. Hang up the pool's request channel so the pool thread exits.
|
||||
let (dead_tx, _) = mpsc::channel::<Request>();
|
||||
let real_tx = std::mem::replace(&mut self.tx, dead_tx);
|
||||
drop(real_tx);
|
||||
|
||||
// 3. Join both threads.
|
||||
if let Some(h) = self.epoll_thread.take() {
|
||||
let _ = h.join();
|
||||
}
|
||||
if let Some(h) = self.pool_thread.take() {
|
||||
let _ = h.join();
|
||||
}
|
||||
|
||||
// 4. Close fds.
|
||||
unsafe {
|
||||
libc::close(self.epollfd);
|
||||
libc::close(self.shutdown_read);
|
||||
libc::close(self.shutdown_write);
|
||||
libc::close(self.wake_read);
|
||||
libc::close(self.wake_write);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Sentinel `epoll_event.u64` distinguishing the shutdown pipe from
|
||||
/// registered actor fds. RawFd values fit in i32, so the high bits are
|
||||
/// available for a marker; we use u64::MAX which can't be a valid fd.
|
||||
const SHUTDOWN_EPOLL_TOKEN: u64 = u64::MAX;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Pool loop
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn pool_loop(
|
||||
rx: mpsc::Receiver<Request>,
|
||||
completions: Arc<Mutex<VecDeque<Completion>>>,
|
||||
wake_write: RawFd,
|
||||
) {
|
||||
while let Ok(Request { pid, work }) = rx.recv() {
|
||||
let result: IoResult = match panic::catch_unwind(panic::AssertUnwindSafe(work)) {
|
||||
Ok(r) => r,
|
||||
Err(payload) => Err(payload),
|
||||
};
|
||||
completions
|
||||
.lock()
|
||||
.unwrap()
|
||||
.push_back(Completion::Blocking { pid, result });
|
||||
wake_scheduler(wake_write);
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Epoll loop
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn epoll_loop(
|
||||
epollfd: RawFd,
|
||||
completions: Arc<Mutex<VecDeque<Completion>>>,
|
||||
wake_write: RawFd,
|
||||
) {
|
||||
// Buffer for epoll_wait. 64 is plenty for our scale; if a real load
|
||||
// appears that needs more, this is a one-line change.
|
||||
const MAX_EVENTS: usize = 64;
|
||||
let mut events: [libc::epoll_event; MAX_EVENTS] = unsafe { std::mem::zeroed() };
|
||||
|
||||
loop {
|
||||
let n = unsafe {
|
||||
libc::epoll_wait(
|
||||
epollfd,
|
||||
events.as_mut_ptr(),
|
||||
MAX_EVENTS as libc::c_int,
|
||||
-1,
|
||||
)
|
||||
};
|
||||
|
||||
if n < 0 {
|
||||
let e = unsafe { *libc::__errno_location() };
|
||||
if e == libc::EINTR {
|
||||
continue;
|
||||
}
|
||||
// Anything else here is a programming error (EBADF on epollfd
|
||||
// after we've closed it from Drop — the close races with us).
|
||||
// Treat as shutdown.
|
||||
return;
|
||||
}
|
||||
|
||||
let mut shutdown_requested = false;
|
||||
let mut pushed_any = false;
|
||||
{
|
||||
let mut q = completions.lock().unwrap();
|
||||
for ev in events.iter().take(n as usize) {
|
||||
if ev.u64 == SHUTDOWN_EPOLL_TOKEN {
|
||||
shutdown_requested = true;
|
||||
continue;
|
||||
}
|
||||
let fd = ev.u64 as RawFd;
|
||||
let evs = ev.events;
|
||||
q.push_back(Completion::FdReady {
|
||||
fd,
|
||||
events: evs,
|
||||
});
|
||||
pushed_any = true;
|
||||
}
|
||||
}
|
||||
|
||||
if pushed_any {
|
||||
wake_scheduler(wake_write);
|
||||
}
|
||||
if shutdown_requested {
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Write one byte to the scheduler's wake pipe. Retries on EINTR; ignores
|
||||
/// EAGAIN (pipe full means there's already an outstanding wake we haven't
|
||||
/// consumed yet, which is sufficient).
|
||||
fn wake_scheduler(wake_write: RawFd) {
|
||||
let buf: [u8; 1] = [0];
|
||||
unsafe {
|
||||
loop {
|
||||
let n = libc::write(wake_write, buf.as_ptr() as *const _, 1);
|
||||
if n < 0 {
|
||||
let e = *libc::__errno_location();
|
||||
if e == libc::EINTR {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Pipe helpers (unchanged from v0.2)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn make_pipe() -> io::Result<(RawFd, RawFd)> {
|
||||
let mut fds: [libc::c_int; 2] = [0; 2];
|
||||
let r = unsafe { libc::pipe2(fds.as_mut_ptr(), libc::O_CLOEXEC | libc::O_NONBLOCK) };
|
||||
if r != 0 {
|
||||
return Err(io::Error::last_os_error());
|
||||
}
|
||||
Ok((fds[0], fds[1]))
|
||||
}
|
||||
|
||||
/// Drain pending bytes from the wake pipe. The scheduler calls this after
|
||||
/// a `poll` wakeup so the next idle call sees an empty pipe.
|
||||
pub fn drain_wake_pipe(fd: RawFd) {
|
||||
let mut buf = [0u8; 64];
|
||||
loop {
|
||||
let n = unsafe { libc::read(fd, buf.as_mut_ptr() as *mut _, buf.len()) };
|
||||
if n <= 0 {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Block on `fd` for up to `timeout`, returning when either there's data
|
||||
/// to read or the timeout elapses. `None` for `timeout` means wait forever.
|
||||
pub fn poll_wake(fd: RawFd, timeout: Option<std::time::Duration>) {
|
||||
let timeout_ms: libc::c_int = match timeout {
|
||||
None => -1,
|
||||
Some(d) => {
|
||||
let ms = d.as_millis();
|
||||
if ms > i32::MAX as u128 {
|
||||
i32::MAX
|
||||
} else {
|
||||
ms as i32
|
||||
}
|
||||
}
|
||||
};
|
||||
let mut pfd = libc::pollfd {
|
||||
fd,
|
||||
events: libc::POLLIN,
|
||||
revents: 0,
|
||||
};
|
||||
loop {
|
||||
let r = unsafe { libc::poll(&mut pfd as *mut _, 1, timeout_ms) };
|
||||
if r < 0 {
|
||||
let e = unsafe { *libc::__errno_location() };
|
||||
if e == libc::EINTR {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
39
src/lib.rs
39
src/lib.rs
@@ -2,11 +2,12 @@
|
||||
//!
|
||||
//! Erlang-style green-thread actor concurrency for Rust.
|
||||
//!
|
||||
//! v0.1 is single-threaded. One scheduler, one OS thread. The scheduler
|
||||
//! cooperatively interleaves green-thread actors with hand-rolled context
|
||||
//! switches. Actors communicate by sending `Send` messages over channels;
|
||||
//! every actor has a supervisor, which is itself just an actor with a
|
||||
//! `Receiver<Signal>`.
|
||||
//! Multi-threaded: N scheduler OS threads (default: one per CPU) share a
|
||||
//! single global run queue behind a `Mutex`. Actors communicate by sending
|
||||
//! `Send` messages over channels; every actor has a supervisor. Synchronisation
|
||||
//! primitives — `Mutex<T>` with mandatory lock timeouts, channel `recv`,
|
||||
//! `sleep`, and epoll-backed `wait_readable`/`wait_writable` — all park the
|
||||
//! green thread, never the OS thread.
|
||||
//!
|
||||
//! See `LOOM.md` for the design intent and the deferred-for-later list.
|
||||
|
||||
@@ -19,13 +20,13 @@ pub mod channel;
|
||||
pub mod scheduler;
|
||||
pub mod supervisor;
|
||||
pub mod timer;
|
||||
pub mod io;
|
||||
pub mod mutex;
|
||||
pub mod runtime;
|
||||
pub mod trace;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Global allocator
|
||||
//
|
||||
// The preempting allocator wraps `System`. While `PREEMPTION_ENABLED` is
|
||||
// false (the default outside an actor) it adds one branch per allocation
|
||||
// and no syscalls. The scheduler flips it on per-resume.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[global_allocator]
|
||||
@@ -36,6 +37,24 @@ static ALLOCATOR: preempt::PreemptingAllocator = preempt::PreemptingAllocator;
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub use channel::{channel, Receiver, RecvError, Sender};
|
||||
pub use mutex::{LockTimeout, Mutex, MutexGuard};
|
||||
pub use pid::Pid;
|
||||
pub use scheduler::{run, self_pid, sleep, spawn, spawn_under, yield_now, JoinError, JoinHandle};
|
||||
pub use runtime::{init, Config, Runtime};
|
||||
pub use scheduler::{
|
||||
block_on_io, run, self_pid, sleep, spawn, spawn_under, wait_readable, wait_writable,
|
||||
yield_now, JoinError, JoinHandle,
|
||||
};
|
||||
pub use supervisor::Signal;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// check!()
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Voluntarily check whether this actor's timeslice has expired, yielding
|
||||
/// if so.
|
||||
#[macro_export]
|
||||
macro_rules! check {
|
||||
() => {
|
||||
$crate::preempt::maybe_preempt()
|
||||
};
|
||||
}
|
||||
|
||||
248
src/mutex.rs
Normal file
248
src/mutex.rs
Normal file
@@ -0,0 +1,248 @@
|
||||
//! Actor-aware mutex with mandatory timeout.
|
||||
//!
|
||||
//! `Mutex<T>` parks the calling *green* thread on contention rather than
|
||||
//! blocking the OS thread. Every lock attempt is bounded by a timeout.
|
||||
//!
|
||||
//! Internals use `Arc<std::sync::Mutex<...>>` so the type is genuinely
|
||||
//! `Send + Sync` and can be shared across scheduler threads.
|
||||
//!
|
||||
//! Fairness: FIFO. Poisoning: none. Reentrance: deadlock (caller bug).
|
||||
|
||||
use crate::pid::Pid;
|
||||
use crate::scheduler;
|
||||
use crate::timer::{self, TimerTarget};
|
||||
use std::collections::VecDeque;
|
||||
use std::sync::{Arc, Mutex as StdMutex};
|
||||
use std::time::Duration;
|
||||
|
||||
pub const DEFAULT_TIMEOUT: Duration = Duration::from_secs(30);
|
||||
|
||||
#[derive(Debug, PartialEq, Eq, Clone, Copy)]
|
||||
pub struct LockTimeout;
|
||||
|
||||
impl std::fmt::Display for LockTimeout {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
write!(f, "mutex lock timed out")
|
||||
}
|
||||
}
|
||||
impl std::error::Error for LockTimeout {}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Internals
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
struct Wait {
|
||||
pid: Pid,
|
||||
seq: u64,
|
||||
}
|
||||
|
||||
struct MutexState {
|
||||
holder: Option<Pid>,
|
||||
waiters: VecDeque<Wait>,
|
||||
next_seq: u64,
|
||||
default_timeout: Duration,
|
||||
}
|
||||
|
||||
struct MutexCore {
|
||||
state: StdMutex<MutexState>,
|
||||
}
|
||||
|
||||
impl MutexCore {
|
||||
fn new(default_timeout: Duration) -> Self {
|
||||
Self {
|
||||
state: StdMutex::new(MutexState {
|
||||
holder: None,
|
||||
waiters: VecDeque::new(),
|
||||
next_seq: 0,
|
||||
default_timeout,
|
||||
}),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl TimerTarget for MutexCore {
|
||||
fn on_timeout(&self, pid: Pid, wait_seq: u64) {
|
||||
let unpark = {
|
||||
let mut st = self.state.lock().unwrap();
|
||||
// Remove from waiters only if still there with matching seq.
|
||||
// If the lock was already granted (holder == Some(pid)), the
|
||||
// timer fired after the grant — treat as no-op; the actor
|
||||
// will see `is_holder == true` and return Ok.
|
||||
if st.holder == Some(pid) {
|
||||
return;
|
||||
}
|
||||
let pos = st.waiters.iter().position(|w| w.pid == pid && w.seq == wait_seq);
|
||||
if pos.is_some() {
|
||||
st.waiters.remove(pos.unwrap());
|
||||
true
|
||||
} else {
|
||||
false
|
||||
}
|
||||
};
|
||||
if unpark {
|
||||
scheduler::unpark(pid);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Public API
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub struct Mutex<T> {
|
||||
core: Arc<MutexCore>,
|
||||
/// Protected value. `None` while a guard is live; `Some` while free.
|
||||
value: Arc<StdMutex<Option<T>>>,
|
||||
}
|
||||
|
||||
impl<T> Mutex<T> {
|
||||
pub fn new(value: T) -> Self {
|
||||
Self {
|
||||
core: Arc::new(MutexCore::new(DEFAULT_TIMEOUT)),
|
||||
value: Arc::new(StdMutex::new(Some(value))),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn set_default_timeout(&self, timeout: Duration) {
|
||||
self.core.state.lock().unwrap().default_timeout = timeout;
|
||||
}
|
||||
|
||||
pub fn lock(&self) -> Result<MutexGuard<'_, T>, LockTimeout> {
|
||||
let timeout = self.core.state.lock().unwrap().default_timeout;
|
||||
self.lock_timeout(timeout)
|
||||
}
|
||||
|
||||
pub fn lock_timeout(&self, timeout: Duration) -> Result<MutexGuard<'_, T>, LockTimeout> {
|
||||
// Outside the runtime (e.g. in tests, after run() returns) there is no
|
||||
// current actor PID. Fall back to a blocking std::sync::Mutex acquire.
|
||||
let Some(me) = crate::actor::current_pid() else {
|
||||
return self.lock_blocking();
|
||||
};
|
||||
|
||||
// Fast path: nobody holds it.
|
||||
{
|
||||
let mut st = self.core.state.lock().unwrap();
|
||||
if st.holder.is_none() {
|
||||
st.holder = Some(me);
|
||||
drop(st);
|
||||
let value = self.value.lock().unwrap().take()
|
||||
.expect("Mutex: value missing on free fast path");
|
||||
return Ok(MutexGuard { mutex: self, value: Some(value) });
|
||||
}
|
||||
}
|
||||
|
||||
// Slow path: register as a waiter, set timeout, park.
|
||||
let _np = scheduler::NoPreempt::enter();
|
||||
let seq = {
|
||||
let mut st = self.core.state.lock().unwrap();
|
||||
let seq = st.next_seq;
|
||||
st.next_seq = st.next_seq.wrapping_add(1);
|
||||
st.waiters.push_back(Wait { pid: me, seq });
|
||||
seq
|
||||
};
|
||||
|
||||
let target: Arc<dyn TimerTarget> = self.core.clone();
|
||||
let deadline = timer::deadline_from_now(timeout);
|
||||
scheduler::insert_wait_timer(deadline, me, target, seq);
|
||||
scheduler::park_current();
|
||||
|
||||
// Resumed. Are we the holder?
|
||||
let is_holder = self.core.state.lock().unwrap().holder == Some(me);
|
||||
if is_holder {
|
||||
let value = self.value.lock().unwrap().take()
|
||||
.expect("Mutex: value missing after grant");
|
||||
Ok(MutexGuard { mutex: self, value: Some(value) })
|
||||
} else {
|
||||
Err(LockTimeout)
|
||||
}
|
||||
}
|
||||
|
||||
pub fn try_lock(&self) -> Option<MutexGuard<'_, T>> {
|
||||
let me = crate::actor::current_pid()?;
|
||||
let mut st = self.core.state.lock().unwrap();
|
||||
if st.holder.is_some() {
|
||||
return None;
|
||||
}
|
||||
st.holder = Some(me);
|
||||
drop(st);
|
||||
let value = self.value.lock().unwrap().take()
|
||||
.expect("Mutex: value missing on try_lock free path");
|
||||
Some(MutexGuard { mutex: self, value: Some(value) })
|
||||
}
|
||||
|
||||
/// Blocking fallback used when called outside the smarm runtime.
|
||||
/// Spins on the internal std mutex; no actor parking, no timeout.
|
||||
fn lock_blocking(&self) -> Result<MutexGuard<'_, T>, LockTimeout> {
|
||||
// We have no PID to register as holder, so we bypass the holder/waiter
|
||||
// tracking and just grab the value mutex directly. This is safe because
|
||||
// outside the runtime there are no green threads competing.
|
||||
let value = loop {
|
||||
let v = self.value.lock().unwrap().take();
|
||||
if let Some(v) = v { break v; }
|
||||
std::thread::yield_now();
|
||||
};
|
||||
Ok(MutexGuard { mutex: self, value: Some(value) })
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Clone for Mutex<T> {
|
||||
fn clone(&self) -> Self {
|
||||
Self { core: self.core.clone(), value: self.value.clone() }
|
||||
}
|
||||
}
|
||||
|
||||
// Genuinely Send + Sync now that internals are Arc<std::sync::Mutex<...>>.
|
||||
unsafe impl<T: Send> Send for Mutex<T> {}
|
||||
unsafe impl<T: Send> Sync for Mutex<T> {}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Guard
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub struct MutexGuard<'a, T> {
|
||||
mutex: &'a Mutex<T>,
|
||||
value: Option<T>,
|
||||
}
|
||||
|
||||
impl<T> std::ops::Deref for MutexGuard<'_, T> {
|
||||
type Target = T;
|
||||
fn deref(&self) -> &T { self.value.as_ref().expect("MutexGuard: value missing") }
|
||||
}
|
||||
|
||||
impl<T> std::ops::DerefMut for MutexGuard<'_, T> {
|
||||
fn deref_mut(&mut self) -> &mut T {
|
||||
self.value.as_mut().expect("MutexGuard: value missing")
|
||||
}
|
||||
}
|
||||
|
||||
impl<T: std::fmt::Debug> std::fmt::Debug for MutexGuard<'_, T> {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
f.debug_tuple("MutexGuard")
|
||||
.field(self.value.as_ref().expect("MutexGuard: value missing"))
|
||||
.finish()
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Drop for MutexGuard<'_, T> {
|
||||
fn drop(&mut self) {
|
||||
let v = self.value.take().expect("MutexGuard: double drop");
|
||||
*self.mutex.value.lock().unwrap() = Some(v);
|
||||
|
||||
let next_pid = {
|
||||
let mut st = self.mutex.core.state.lock().unwrap();
|
||||
match st.waiters.pop_front() {
|
||||
Some(w) => {
|
||||
st.holder = Some(w.pid);
|
||||
Some(w.pid)
|
||||
}
|
||||
None => {
|
||||
st.holder = None;
|
||||
None
|
||||
}
|
||||
}
|
||||
};
|
||||
if let Some(pid) = next_pid {
|
||||
scheduler::unpark(pid);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -6,10 +6,16 @@
|
||||
//! `switch_to_scheduler` to yield. Resetting the counter to `ALLOC_INTERVAL`
|
||||
//! amortises the RDTSC across many cheap events.
|
||||
//!
|
||||
//! Events today are heap allocations (via `PreemptingAllocator`). v0.2 will
|
||||
//! add stack-frame entries as a second event source — frames are stack
|
||||
//! allocations, the counter naming still fits — sharing this same counter
|
||||
//! so both routes behave consistently.
|
||||
//! Two event sources today:
|
||||
//! - `PreemptingAllocator` — heap allocations.
|
||||
//! - `smarm::check!()` — explicit preemption point for tight no-alloc
|
||||
//! loops, since stable Rust gives us no transparent way to preempt
|
||||
//! such loops (`__rust_probestack` is emitted inline by LLVM and not
|
||||
//! called at runtime).
|
||||
//!
|
||||
//! Both sources share `ALLOC_COUNT`, so the timeslice check fires at the
|
||||
//! same rate regardless of whether the actor is alloc-heavy, check-heavy,
|
||||
//! or mixed.
|
||||
//!
|
||||
//! All state is thread-local. The scheduler enables preemption on resume
|
||||
//! and disables it on the return path, so the scheduler can never preempt
|
||||
@@ -80,9 +86,17 @@ unsafe impl GlobalAlloc for PreemptingAllocator {
|
||||
}
|
||||
|
||||
/// Shared preemption check. Called by every preemption event source — the
|
||||
/// heap allocator today, the stack-frame entry hook in v0.2. Decrements
|
||||
/// `ALLOC_COUNT`; every `ALLOC_INTERVAL` calls reads the timeslice clock
|
||||
/// and yields if expired.
|
||||
/// heap allocator today, `smarm::check!()` for tight no-alloc loops.
|
||||
/// Decrements `ALLOC_COUNT`; every `ALLOC_INTERVAL` calls reads the
|
||||
/// timeslice clock and yields if expired.
|
||||
///
|
||||
/// **Invariant**: must not be called inside a "prep-to-park" region —
|
||||
/// e.g. between registering as a channel's parked receiver and calling
|
||||
/// `park_current()`. A preemption-driven yield in that window would
|
||||
/// reach the scheduler with state=Runnable, the unparker would no-op,
|
||||
/// the actor would then park, and the wakeup would be lost. Library
|
||||
/// code that touches the parking primitives must keep its prep-to-park
|
||||
/// regions allocation-free and check!()-free.
|
||||
#[inline(always)]
|
||||
pub fn maybe_preempt() {
|
||||
ALLOC_COUNT.with(|c| {
|
||||
|
||||
762
src/runtime.rs
Normal file
762
src/runtime.rs
Normal file
@@ -0,0 +1,762 @@
|
||||
//! Multi-scheduler runtime: configuration, initialisation, and the shared
|
||||
//! state that all scheduler OS threads operate against.
|
||||
//!
|
||||
//! # Architecture
|
||||
//!
|
||||
//! ```text
|
||||
//! init(Config) → Runtime (Arc<RuntimeInner>)
|
||||
//!
|
||||
//! RuntimeInner {
|
||||
//! shared: Mutex<SharedState> ← slot table, run queue, timers, IO
|
||||
//! stats: Vec<SchedulerStats> ← one per thread, lockless atomics (RFC 000)
|
||||
//! io_parked: AtomicU32 ← actors parked on IO
|
||||
//! sleeping: AtomicU32 ← actors parked on timer
|
||||
//! }
|
||||
//! ```
|
||||
//!
|
||||
//! `Runtime::run(f)` spawns N OS threads (one per `Config::resolved_thread_count()`),
|
||||
//! each running `schedule_loop`. It blocks until all scheduler threads exit,
|
||||
//! i.e. until the run queue is empty and nothing is pending.
|
||||
//!
|
||||
//! Each scheduler thread holds an `Arc<RuntimeInner>` clone. Per-thread
|
||||
//! identity is a small integer index, stored in a thread-local, used to index
|
||||
//! into `stats`.
|
||||
//!
|
||||
//! # Timer / IO drain (try-lock, one-winner)
|
||||
//!
|
||||
//! On each loop iteration every scheduler thread tries `try_lock()` on a
|
||||
//! separate `drain_lock: Mutex<()>`. The winner drains due timers and IO
|
||||
//! completions; losers skip and move straight to popping an actor from the
|
||||
//! run queue. This is the simplest correct approach; revisit if the drain
|
||||
//! becomes a measured bottleneck.
|
||||
|
||||
use crate::actor::{
|
||||
clear_current_pid, current_pid, is_actor_done, reset_actor_done,
|
||||
set_current_actor_box, set_current_pid, take_last_outcome, Actor, Outcome,
|
||||
};
|
||||
use crate::channel::Sender;
|
||||
use crate::context::{get_actor_sp, set_actor_sp, switch_to_actor};
|
||||
use crate::io::IoThread;
|
||||
use crate::pid::Pid;
|
||||
use crate::preempt::PREEMPTION_ENABLED;
|
||||
use crate::supervisor::Signal;
|
||||
use crate::timer::Timers;
|
||||
|
||||
use std::collections::VecDeque;
|
||||
use std::sync::atomic::{AtomicU32, AtomicU64, Ordering};
|
||||
use std::sync::{Arc, Mutex};
|
||||
use std::thread;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Config
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Runtime configuration.
|
||||
///
|
||||
/// ```
|
||||
/// use smarm::runtime::Config;
|
||||
///
|
||||
/// // Use all available CPUs (default):
|
||||
/// let c = Config::default();
|
||||
///
|
||||
/// // Exactly 4 scheduler threads:
|
||||
/// let c = Config::exact(4);
|
||||
///
|
||||
/// // Between 2 and 8, clamped to available parallelism:
|
||||
/// let c = Config::new(2, 8, None);
|
||||
/// ```
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct Config {
|
||||
min: usize,
|
||||
max: usize,
|
||||
exact: Option<usize>,
|
||||
}
|
||||
|
||||
impl Config {
|
||||
/// Exact thread count; takes precedence over min/max.
|
||||
pub fn exact(n: usize) -> Self {
|
||||
assert!(n >= 1, "scheduler thread count must be ≥ 1");
|
||||
Self { min: n, max: n, exact: Some(n) }
|
||||
}
|
||||
|
||||
/// Bounded range. Thread count = clamp(available_parallelism, min, max).
|
||||
pub fn new(min: usize, max: usize, exact: Option<usize>) -> Self {
|
||||
assert!(min >= 1, "min must be ≥ 1");
|
||||
assert!(max >= min, "max must be ≥ min");
|
||||
if let Some(e) = exact {
|
||||
assert!(e >= 1, "exact must be ≥ 1");
|
||||
}
|
||||
Self { min, max, exact }
|
||||
}
|
||||
|
||||
/// The number of scheduler threads this config resolves to.
|
||||
pub fn resolved_thread_count(&self) -> usize {
|
||||
if let Some(e) = self.exact {
|
||||
return e;
|
||||
}
|
||||
let avail = thread::available_parallelism()
|
||||
.map(|n| n.get())
|
||||
.unwrap_or(1);
|
||||
avail.clamp(self.min, self.max)
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for Config {
|
||||
fn default() -> Self {
|
||||
let avail = thread::available_parallelism()
|
||||
.map(|n| n.get())
|
||||
.unwrap_or(1);
|
||||
Self { min: 1, max: avail, exact: None }
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Per-thread stats (RFC 000 Layer 1 primitives)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Lockless per-scheduler-thread counters. Written only by the owning thread;
|
||||
/// readable from any thread (introspection actor, tests).
|
||||
pub struct SchedulerStats {
|
||||
/// PID index of the actor currently on-CPU, or `u32::MAX` when idle.
|
||||
pub current_pid_index: AtomicU32,
|
||||
/// Snapshot of run queue length maintained on every push/pop.
|
||||
pub run_queue_len: AtomicU64,
|
||||
}
|
||||
|
||||
impl SchedulerStats {
|
||||
fn new() -> Self {
|
||||
Self {
|
||||
current_pid_index: AtomicU32::new(u32::MAX),
|
||||
run_queue_len: AtomicU64::new(0),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Runtime stats snapshot (for tests / introspection)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub struct RuntimeStats {
|
||||
pub(crate) inner: Arc<RuntimeInner>,
|
||||
}
|
||||
|
||||
impl RuntimeStats {
|
||||
/// Sum of run queue lengths across all scheduler threads.
|
||||
pub fn total_run_queue_len(&self) -> u64 {
|
||||
self.inner.stats.iter()
|
||||
.map(|s| s.run_queue_len.load(Ordering::Relaxed))
|
||||
.sum()
|
||||
}
|
||||
|
||||
/// Number of scheduler threads.
|
||||
pub fn scheduler_count(&self) -> usize {
|
||||
self.inner.stats.len()
|
||||
}
|
||||
|
||||
/// Actors currently parked on IO.
|
||||
pub fn io_parked_count(&self) -> u32 {
|
||||
self.inner.io_parked.load(Ordering::Relaxed)
|
||||
}
|
||||
|
||||
/// Actors currently sleeping on a timer.
|
||||
pub fn sleeping_count(&self) -> u32 {
|
||||
self.inner.sleeping.load(Ordering::Relaxed)
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Shared state (behind Mutex<>)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub(crate) const ACTOR_STACK_SIZE: usize = 64 * 1024;
|
||||
|
||||
#[derive(Debug)]
|
||||
pub(crate) enum State { Runnable, Parked, Done }
|
||||
|
||||
pub(crate) struct Slot {
|
||||
pub(crate) generation: u32,
|
||||
pub(crate) actor: Option<Actor>,
|
||||
pub(crate) state: State,
|
||||
pub(crate) waiters: Vec<Pid>,
|
||||
pub(crate) outcome: Option<Outcome>,
|
||||
pub(crate) supervisor_channel: Option<Sender<Signal>>,
|
||||
pub(crate) outstanding_handles: u32,
|
||||
pub(crate) pending_io_result: Option<crate::io::IoResult>,
|
||||
/// Set by `unpark()` when the actor is still running (not yet Parked).
|
||||
/// The scheduler checks this after a Park yield and re-queues instead
|
||||
/// of sleeping, closing the lost-wakeup window.
|
||||
pub(crate) pending_unpark: bool,
|
||||
}
|
||||
|
||||
impl Slot {
|
||||
fn vacant() -> Self {
|
||||
Self {
|
||||
generation: 0,
|
||||
actor: None,
|
||||
state: State::Done,
|
||||
waiters: Vec::new(),
|
||||
outcome: None,
|
||||
supervisor_channel: None,
|
||||
outstanding_handles: 0,
|
||||
pending_io_result: None,
|
||||
pending_unpark: false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) type Closure = Box<dyn FnOnce() + Send>;
|
||||
|
||||
pub(crate) struct SharedState {
|
||||
pub(crate) slots: Vec<Slot>,
|
||||
pub(crate) free_list: Vec<u32>,
|
||||
pub(crate) run_queue: VecDeque<Pid>,
|
||||
pub(crate) root_pid: Option<Pid>,
|
||||
pub(crate) timers: Timers,
|
||||
pub(crate) io: Option<IoThread>,
|
||||
/// Closures awaiting their first resume, keyed by Pid.
|
||||
pub(crate) pending_closures: Vec<(Pid, Closure)>,
|
||||
}
|
||||
|
||||
impl SharedState {
|
||||
fn new() -> Self {
|
||||
Self {
|
||||
slots: Vec::new(),
|
||||
free_list: Vec::new(),
|
||||
run_queue: VecDeque::new(),
|
||||
root_pid: None,
|
||||
timers: Timers::new(),
|
||||
io: None,
|
||||
pending_closures: Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn allocate_slot(&mut self) -> (u32, u32) {
|
||||
if let Some(idx) = self.free_list.pop() {
|
||||
let gen = self.slots[idx as usize].generation;
|
||||
(idx, gen)
|
||||
} else {
|
||||
let idx = self.slots.len() as u32;
|
||||
self.slots.push(Slot::vacant());
|
||||
(idx, 0)
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn slot(&self, pid: Pid) -> Option<&Slot> {
|
||||
let s = self.slots.get(pid.index() as usize)?;
|
||||
if s.generation == pid.generation() { Some(s) } else { None }
|
||||
}
|
||||
|
||||
pub(crate) fn slot_mut(&mut self, pid: Pid) -> Option<&mut Slot> {
|
||||
let s = self.slots.get_mut(pid.index() as usize)?;
|
||||
if s.generation == pid.generation() { Some(s) } else { None }
|
||||
}
|
||||
|
||||
pub(crate) fn pop_pending_closure(&mut self, pid: Pid) -> Option<Closure> {
|
||||
let pos = self.pending_closures.iter().position(|(p, _)| *p == pid)?;
|
||||
Some(self.pending_closures.swap_remove(pos).1)
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// RuntimeInner — the shared core behind an Arc
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub(crate) struct RuntimeInner {
|
||||
pub(crate) shared: Mutex<SharedState>,
|
||||
/// Try-lock: exactly one scheduler thread drains timers/IO per iteration.
|
||||
drain_lock: Mutex<()>,
|
||||
/// Per-thread stats, indexed by scheduler thread slot (0..N).
|
||||
pub(crate) stats: Vec<SchedulerStats>,
|
||||
/// Global counters for RFC 000 primitives.
|
||||
pub(crate) io_parked: AtomicU32,
|
||||
pub(crate) sleeping: AtomicU32,
|
||||
}
|
||||
|
||||
impl RuntimeInner {
|
||||
fn new(thread_count: usize) -> Arc<Self> {
|
||||
let stats = (0..thread_count).map(|_| SchedulerStats::new()).collect();
|
||||
Arc::new(Self {
|
||||
shared: Mutex::new(SharedState::new()),
|
||||
drain_lock: Mutex::new(()),
|
||||
stats,
|
||||
io_parked: AtomicU32::new(0),
|
||||
sleeping: AtomicU32::new(0),
|
||||
})
|
||||
}
|
||||
|
||||
pub(crate) fn with_shared<R>(&self, f: impl FnOnce(&mut SharedState) -> R) -> R {
|
||||
// Preemption must be off while we hold the shared mutex. If an actor
|
||||
// called with_shared (e.g. from spawn, join, sleep) and the allocator
|
||||
// fired maybe_preempt() while the lock was held, switch_to_scheduler()
|
||||
// would context-switch to the scheduler loop, which would immediately
|
||||
// deadlock trying to acquire the same mutex.
|
||||
let prev = crate::preempt::PREEMPTION_ENABLED.with(|c| c.replace(false));
|
||||
let result = f(&mut self.shared.lock().unwrap());
|
||||
crate::preempt::PREEMPTION_ENABLED.with(|c| c.set(prev));
|
||||
result
|
||||
}
|
||||
|
||||
/// Returns `None` when the mutex is poisoned.
|
||||
/// Used in `unpark` / channel Drop which can fire after teardown.
|
||||
pub(crate) fn try_with_shared<R>(&self, f: impl FnOnce(&mut SharedState) -> R) -> Option<R> {
|
||||
let prev = crate::preempt::PREEMPTION_ENABLED.with(|c| c.replace(false));
|
||||
let result = match self.shared.lock() {
|
||||
Ok(mut g) => Some(f(&mut g)),
|
||||
Err(p) => Some(f(&mut p.into_inner())),
|
||||
};
|
||||
crate::preempt::PREEMPTION_ENABLED.with(|c| c.set(prev));
|
||||
result
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Runtime — the public handle
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub struct Runtime {
|
||||
inner: Arc<RuntimeInner>,
|
||||
thread_count: usize,
|
||||
}
|
||||
|
||||
/// Initialise the runtime with the given config. Returns a reusable handle.
|
||||
pub fn init(config: Config) -> Runtime {
|
||||
let n = config.resolved_thread_count();
|
||||
Runtime {
|
||||
inner: RuntimeInner::new(n),
|
||||
thread_count: n,
|
||||
}
|
||||
}
|
||||
|
||||
impl Runtime {
|
||||
/// Run `f` as the initial actor, block until all actors finish.
|
||||
/// Can be called multiple times sequentially on the same `Runtime`.
|
||||
pub fn run(&self, f: impl FnOnce() + Send + 'static) {
|
||||
// Install smarm's panic hook on first call. The default Rust hook is
|
||||
// not reentrant — concurrent actor panics can trigger a double-panic
|
||||
// abort when the backtrace printer takes an internal lock that is
|
||||
// already held. smarm catches every actor panic via `catch_unwind` in
|
||||
// the trampoline, so panics never need to reach the hook for runtime
|
||||
// correctness; the hook fires only as a side-effect of unwinding before
|
||||
// `catch_unwind` catches it.
|
||||
//
|
||||
// We install once and leave it installed: the previous hook is chained
|
||||
// so that panics outside actor context (e.g. in the test harness
|
||||
// itself) are still reported normally.
|
||||
static HOOK_INSTALLED: std::sync::OnceLock<()> = std::sync::OnceLock::new();
|
||||
HOOK_INSTALLED.get_or_init(|| {
|
||||
let prev = std::panic::take_hook();
|
||||
std::panic::set_hook(Box::new(move |info| {
|
||||
// If we are currently executing inside an actor trampoline the
|
||||
// panic will be caught by `catch_unwind` momentarily. Suppress
|
||||
// the hook output to avoid interleaved noise and reentrancy.
|
||||
// Outside actor context, delegate to the previous hook so that
|
||||
// genuine runtime panics are still reported.
|
||||
if crate::actor::current_pid().is_some() {
|
||||
// Inside an actor — catch_unwind handles it; stay silent.
|
||||
} else {
|
||||
prev(info);
|
||||
}
|
||||
}));
|
||||
});
|
||||
|
||||
// Open the trace store for this run (no-op without smarm-trace).
|
||||
#[cfg(feature = "smarm-trace")]
|
||||
crate::trace::open();
|
||||
|
||||
// Re-initialise shared state for this run.
|
||||
{
|
||||
let mut s = self.inner.shared.lock().unwrap();
|
||||
assert!(s.run_queue.is_empty(), "run() called while previous run still active");
|
||||
s.root_pid = Some(ROOT_PID);
|
||||
s.io = Some(IoThread::start().expect("failed to start IO thread"));
|
||||
}
|
||||
|
||||
// Spawn the initial actor through the public spawn path (which
|
||||
// requires a running runtime in the thread-local).
|
||||
RUNTIME.with(|r| *r.borrow_mut() = Some(self.inner.clone()));
|
||||
let initial_handle = crate::scheduler::spawn(f);
|
||||
|
||||
// Launch N-1 extra scheduler threads. The calling thread is thread 0.
|
||||
let mut os_threads = Vec::new();
|
||||
for slot in 1..self.thread_count {
|
||||
let inner = self.inner.clone();
|
||||
let t = thread::spawn(move || {
|
||||
RUNTIME.with(|r| *r.borrow_mut() = Some(inner.clone()));
|
||||
SCHED_SLOT.with(|s| s.set(slot));
|
||||
schedule_loop(&inner, slot);
|
||||
RUNTIME.with(|r| *r.borrow_mut() = None);
|
||||
});
|
||||
os_threads.push(t);
|
||||
}
|
||||
|
||||
// Thread 0 runs the loop on the calling thread.
|
||||
SCHED_SLOT.with(|s| s.set(0));
|
||||
schedule_loop(&self.inner, 0);
|
||||
|
||||
// Wait for all other scheduler threads.
|
||||
for t in os_threads {
|
||||
let _ = t.join();
|
||||
}
|
||||
|
||||
// Drop initial handle (decrements outstanding_handles count).
|
||||
drop(initial_handle);
|
||||
|
||||
// Tear down IO and clean up shared state for the next run() call.
|
||||
let mut s = self.inner.shared.lock().unwrap();
|
||||
drop(s.io.take()); // joins IO threads
|
||||
s.pending_closures.clear();
|
||||
// Reset per-thread stats.
|
||||
for stat in &self.inner.stats {
|
||||
stat.current_pid_index.store(u32::MAX, Ordering::Relaxed);
|
||||
stat.run_queue_len.store(0, Ordering::Relaxed);
|
||||
}
|
||||
self.inner.io_parked.store(0, Ordering::Relaxed);
|
||||
self.inner.sleeping.store(0, Ordering::Relaxed);
|
||||
|
||||
RUNTIME.with(|r| *r.borrow_mut() = None);
|
||||
|
||||
// Flush trace to disk (no-op without smarm-trace).
|
||||
#[cfg(feature = "smarm-trace")]
|
||||
crate::trace::flush();
|
||||
}
|
||||
|
||||
/// Snapshot of runtime statistics for introspection / tests.
|
||||
pub fn stats(&self) -> RuntimeStats {
|
||||
RuntimeStats { inner: self.inner.clone() }
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Thread-locals
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
use std::cell::{Cell, RefCell};
|
||||
|
||||
thread_local! {
|
||||
/// The RuntimeInner for the current run(). Set by run() on the calling
|
||||
/// thread and by each spawned scheduler thread.
|
||||
pub(crate) static RUNTIME: RefCell<Option<Arc<RuntimeInner>>> =
|
||||
const { RefCell::new(None) };
|
||||
|
||||
/// This scheduler thread's index into RuntimeInner::stats.
|
||||
static SCHED_SLOT: Cell<usize> = const { Cell::new(0) };
|
||||
|
||||
/// What the actor wants when it yields back to the scheduler.
|
||||
static YIELD_INTENT: Cell<YieldIntent> = const { Cell::new(YieldIntent::Yield) };
|
||||
}
|
||||
|
||||
#[derive(Copy, Clone)]
|
||||
pub(crate) enum YieldIntent { Yield, Park }
|
||||
|
||||
pub(crate) fn set_yield_intent(i: YieldIntent) {
|
||||
YIELD_INTENT.with(|c| c.set(i));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Sentinel root PID
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub const ROOT_PID: Pid = Pid::new(u32::MAX, u32::MAX);
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Slot reclamation
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub(crate) fn reclaim_slot(s: &mut SharedState, pid: Pid) {
|
||||
let idx = pid.index();
|
||||
let slot = &mut s.slots[idx as usize];
|
||||
slot.generation = slot.generation.wrapping_add(1);
|
||||
slot.actor = None;
|
||||
slot.outcome = None;
|
||||
slot.waiters.clear();
|
||||
slot.supervisor_channel = None;
|
||||
slot.state = State::Done;
|
||||
slot.outstanding_handles = 0;
|
||||
slot.pending_unpark = false;
|
||||
slot.pending_io_result = None;
|
||||
s.free_list.push(idx);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// finalize_actor
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn finalize_actor(inner: &Arc<RuntimeInner>, pid: Pid, outcome: Outcome) {
|
||||
let (joiner_outcome, sup_signal) = match outcome {
|
||||
Outcome::Exit => (Outcome::Exit, Signal::Exit(pid)),
|
||||
Outcome::Panic(payload) => (
|
||||
Outcome::Panic(payload),
|
||||
Signal::Panic(pid, Box::new(()) as Box<dyn std::any::Any + Send>),
|
||||
),
|
||||
};
|
||||
|
||||
let (waiters, supervisor_pid) = inner.with_shared(|s| {
|
||||
let slot = s.slot_mut(pid).expect("finalize_actor: slot vanished");
|
||||
let sup = slot.actor.as_ref().map(|a| a.supervisor);
|
||||
slot.outcome = Some(joiner_outcome);
|
||||
slot.state = State::Done;
|
||||
slot.actor = None;
|
||||
(std::mem::take(&mut slot.waiters), sup)
|
||||
});
|
||||
|
||||
// Deliver to supervisor.
|
||||
if let Some(sup) = supervisor_pid {
|
||||
let sender = inner.with_shared(|s| {
|
||||
s.slot(sup).and_then(|slot| slot.supervisor_channel.clone())
|
||||
});
|
||||
if let Some(sender) = sender {
|
||||
let _ = sender.send(sup_signal);
|
||||
}
|
||||
}
|
||||
|
||||
// Unpark joiners.
|
||||
for joiner in waiters {
|
||||
crate::scheduler::unpark(joiner);
|
||||
}
|
||||
|
||||
// Reclaim if no outstanding handles.
|
||||
inner.with_shared(|s| {
|
||||
let reclaim = s.slot(pid).map(|slot| slot.outstanding_handles == 0).unwrap_or(false);
|
||||
if reclaim { reclaim_slot(s, pid); }
|
||||
});
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// schedule_loop — runs on each scheduler OS thread
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn schedule_loop(inner: &Arc<RuntimeInner>, slot: usize) {
|
||||
let stats = &inner.stats[slot];
|
||||
|
||||
loop {
|
||||
// ----------------------------------------------------------------
|
||||
// 1. Try to win the drain lock (timers + IO). One winner per round;
|
||||
// losers skip immediately and proceed to step 2.
|
||||
// ----------------------------------------------------------------
|
||||
if let Ok(_drain_guard) = inner.drain_lock.try_lock() {
|
||||
let now = std::time::Instant::now();
|
||||
|
||||
// Drain due timers.
|
||||
let due = inner.with_shared(|s| s.timers.pop_due(now));
|
||||
for entry in due {
|
||||
match entry.reason {
|
||||
crate::timer::Reason::Sleep => {
|
||||
inner.with_shared(|s| {
|
||||
if let Some(slot) = s.slot_mut(entry.pid) {
|
||||
if matches!(slot.state, State::Parked) {
|
||||
slot.state = State::Runnable;
|
||||
s.run_queue.push_back(entry.pid);
|
||||
crate::te!(crate::trace::Event::Enqueue(entry.pid));
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
crate::timer::Reason::WaitTimeout { target, wait_seq } => {
|
||||
// Runs outside with_shared — the callback may call unpark.
|
||||
target.on_timeout(entry.pid, wait_seq);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Drain IO completions.
|
||||
let completions = inner.with_shared(|s| {
|
||||
s.io.as_mut().map(|io| io.drain_completions()).unwrap_or_default()
|
||||
});
|
||||
for completion in completions {
|
||||
match completion {
|
||||
crate::io::Completion::Blocking { pid, result } => {
|
||||
inner.with_shared(|s| {
|
||||
if let Some(io) = s.io.as_mut() {
|
||||
io.outstanding = io.outstanding.saturating_sub(1);
|
||||
}
|
||||
if let Some(slot) = s.slot_mut(pid) {
|
||||
slot.pending_io_result = Some(result);
|
||||
if matches!(slot.state, State::Parked) {
|
||||
slot.state = State::Runnable;
|
||||
s.run_queue.push_back(pid);
|
||||
crate::te!(crate::trace::Event::Enqueue(pid));
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
crate::io::Completion::FdReady { fd, events: _ } => {
|
||||
inner.with_shared(|s| {
|
||||
let parked_pid = s.io.as_mut().and_then(|io| {
|
||||
let pid = io.waiters.remove(&fd);
|
||||
io.epoll_deregister(fd);
|
||||
pid
|
||||
});
|
||||
if let Some(pid) = parked_pid {
|
||||
if let Some(slot) = s.slot_mut(pid) {
|
||||
match slot.state {
|
||||
State::Parked => {
|
||||
slot.state = State::Runnable;
|
||||
s.run_queue.push_back(pid);
|
||||
crate::te!(crate::trace::Event::UnparkDirect(pid));
|
||||
crate::te!(crate::trace::Event::Enqueue(pid));
|
||||
}
|
||||
// Actor is between epoll_register
|
||||
// and park_current. Set the flag so
|
||||
// the upcoming Park yield re-queues
|
||||
// instead of suspending. Mirrors
|
||||
// scheduler::unpark().
|
||||
State::Runnable => {
|
||||
slot.pending_unpark = true;
|
||||
crate::te!(crate::trace::Event::UnparkDeferred(pid));
|
||||
}
|
||||
State::Done => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
} // drain_guard drops here
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
// 2. Pop a runnable actor from the shared queue.
|
||||
// ----------------------------------------------------------------
|
||||
let pid = match inner.with_shared(|s| {
|
||||
let len = s.run_queue.len() as u64;
|
||||
stats.run_queue_len.store(len, Ordering::Relaxed);
|
||||
s.run_queue.pop_front()
|
||||
}) {
|
||||
Some(p) => {
|
||||
crate::te!(crate::trace::Event::Dequeue(p));
|
||||
p
|
||||
}
|
||||
None => {
|
||||
// Queue was empty when we popped. Re-examine under the lock to
|
||||
// decide whether to exit or wait. All four conditions must hold
|
||||
// simultaneously before we exit:
|
||||
// 1. run queue is still empty
|
||||
// 2. no live actors (nothing parked, nothing mid-finalize)
|
||||
// 3. no pending timers
|
||||
// 4. no outstanding IO
|
||||
// If any is non-zero we keep spinning — "check the fridge is
|
||||
// empty before you leave for the airport".
|
||||
let (next_deadline, io_outstanding, wake_fd, all_clear) =
|
||||
inner.with_shared(|s| {
|
||||
let next = s.timers.peek_deadline();
|
||||
let (out, fd) = match s.io.as_ref() {
|
||||
Some(io) => (
|
||||
io.outstanding + io.waiters.len() as u32,
|
||||
Some(io.wake_fd()),
|
||||
),
|
||||
None => (0, None),
|
||||
};
|
||||
let live = s.slots.iter().filter(|slot| slot.actor.is_some()).count();
|
||||
let queue_empty = s.run_queue.is_empty();
|
||||
let all_clear = queue_empty && live == 0 && next.is_none() && out == 0;
|
||||
(next, out, fd, all_clear)
|
||||
});
|
||||
|
||||
if all_clear {
|
||||
return;
|
||||
}
|
||||
|
||||
// Something is still in flight. Sleep on the appropriate source
|
||||
// to avoid hammering the mutex; the loop will retry on wake.
|
||||
match (next_deadline, wake_fd) {
|
||||
(Some(deadline), fd_opt) => {
|
||||
let now = std::time::Instant::now();
|
||||
if deadline > now {
|
||||
let timeout = deadline - now;
|
||||
match fd_opt {
|
||||
Some(fd) => {
|
||||
crate::io::poll_wake(fd, Some(timeout));
|
||||
crate::io::drain_wake_pipe(fd);
|
||||
}
|
||||
None => thread::sleep(timeout),
|
||||
}
|
||||
}
|
||||
}
|
||||
(None, Some(fd)) if io_outstanding > 0 => {
|
||||
crate::io::poll_wake(fd, None);
|
||||
crate::io::drain_wake_pipe(fd);
|
||||
}
|
||||
_ => {
|
||||
thread::sleep(std::time::Duration::from_micros(100));
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
};
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
// 3. Resume the actor.
|
||||
// ----------------------------------------------------------------
|
||||
let sp = match inner.with_shared(|s| {
|
||||
s.slot(pid).and_then(|slot| slot.actor.as_ref().map(|a| a.sp))
|
||||
}) {
|
||||
Some(sp) => sp,
|
||||
None => {
|
||||
continue; // stale pid
|
||||
}
|
||||
};
|
||||
|
||||
// First resume: move the closure into the trampoline's thread-local.
|
||||
if let Some(b) = inner.with_shared(|s| s.pop_pending_closure(pid)) {
|
||||
set_current_actor_box(b);
|
||||
}
|
||||
|
||||
// Update per-thread stats: record who's on-CPU.
|
||||
stats.current_pid_index.store(pid.index(), Ordering::Relaxed);
|
||||
|
||||
set_actor_sp(sp);
|
||||
set_current_pid(pid);
|
||||
reset_actor_done();
|
||||
YIELD_INTENT.with(|c| c.set(YieldIntent::Yield));
|
||||
crate::preempt::reset_timeslice();
|
||||
PREEMPTION_ENABLED.with(|c| c.set(true));
|
||||
|
||||
crate::te!(crate::trace::Event::Resume(pid));
|
||||
unsafe { switch_to_actor() };
|
||||
|
||||
PREEMPTION_ENABLED.with(|c| c.set(false));
|
||||
stats.current_pid_index.store(u32::MAX, Ordering::Relaxed);
|
||||
clear_current_pid();
|
||||
|
||||
let intent = YIELD_INTENT.with(|c| c.get());
|
||||
let new_sp = get_actor_sp();
|
||||
|
||||
if is_actor_done() {
|
||||
crate::te!(crate::trace::Event::Done(pid));
|
||||
let outcome = take_last_outcome().unwrap_or(Outcome::Exit);
|
||||
finalize_actor(inner, pid, outcome);
|
||||
} else {
|
||||
inner.with_shared(|s| {
|
||||
if let Some(slot) = s.slot_mut(pid) {
|
||||
if let Some(actor) = slot.actor.as_mut() {
|
||||
actor.sp = new_sp;
|
||||
}
|
||||
match intent {
|
||||
YieldIntent::Yield => {
|
||||
crate::te!(crate::trace::Event::Yield(pid));
|
||||
slot.state = State::Runnable;
|
||||
s.run_queue.push_back(pid);
|
||||
crate::te!(crate::trace::Event::Enqueue(pid));
|
||||
}
|
||||
YieldIntent::Park => {
|
||||
// Check if unpark() fired while the actor was
|
||||
// still running (between registering in the
|
||||
// channel and calling park_current). If so,
|
||||
// re-queue immediately instead of parking.
|
||||
if slot.pending_unpark {
|
||||
slot.pending_unpark = false;
|
||||
slot.state = State::Runnable;
|
||||
s.run_queue.push_back(pid);
|
||||
crate::te!(crate::trace::Event::UnparkFlagConsumed(pid));
|
||||
crate::te!(crate::trace::Event::Enqueue(pid));
|
||||
} else {
|
||||
crate::te!(crate::trace::Event::Park(pid));
|
||||
slot.state = State::Parked;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
621
src/scheduler.rs
621
src/scheduler.rs
@@ -1,200 +1,75 @@
|
||||
//! The single-threaded scheduler.
|
||||
//! Scheduler public API — thin façade over the multi-scheduler runtime.
|
||||
//!
|
||||
//! There is one global scheduler per OS thread, stored in a thread-local.
|
||||
//! `run(initial)` initialises it, spawns the initial actor, drives the loop
|
||||
//! until the run queue is empty, then tears it down.
|
||||
//! All heavy lifting lives in `runtime.rs`. This module exposes the same
|
||||
//! surface that the rest of the codebase (channel, mutex, io, timer, actor)
|
||||
//! calls into, plus the public API re-exported from `lib.rs`.
|
||||
//!
|
||||
//! Slot table: a `Vec<Slot>` indexed by `Pid::index()`, with a free list of
|
||||
//! reusable indices. Each slot has a `generation` counter that increments
|
||||
//! every time the slot is freed; `Pid` carries the generation it was minted
|
||||
//! with, so a stale PID has a mismatching generation and is detected on
|
||||
//! lookup.
|
||||
//!
|
||||
//! Run queue: a `VecDeque<Pid>` of runnable actors. The state of an actor
|
||||
//! is implicit in slot.state: `Runnable` means it's either in the queue or
|
||||
//! currently executing; `Parked` means it's waiting for something to unpark
|
||||
//! it (channel send, join completion, …); `Done` means it has finished and
|
||||
//! is awaiting reaping.
|
||||
//!
|
||||
//! Joining: `JoinHandle::join()` parks the calling actor and registers it
|
||||
//! on the target slot's `waiters` list. When the target actor finishes,
|
||||
//! the scheduler reaps the slot and unparks every waiter, passing them the
|
||||
//! outcome via a side channel (the target's `outcome` field, drained on
|
||||
//! the joiner side).
|
||||
//! The single-threaded `run()` entry point is kept as a convenience wrapper
|
||||
//! around `runtime::init(Config::exact(1)).run(f)`.
|
||||
|
||||
use crate::actor::{
|
||||
clear_current_pid, current_pid, is_actor_done, reset_actor_done,
|
||||
set_current_actor_box, set_current_pid, take_last_outcome, trampoline, Actor, Outcome,
|
||||
};
|
||||
use crate::actor::current_pid;
|
||||
use crate::channel::Sender;
|
||||
use crate::context::{get_actor_sp, init_actor_stack, set_actor_sp, switch_to_actor};
|
||||
use crate::pid::Pid;
|
||||
use crate::preempt::PREEMPTION_ENABLED;
|
||||
use crate::stack::Stack;
|
||||
use crate::runtime::{
|
||||
self, RuntimeInner, YieldIntent, ROOT_PID, RUNTIME,
|
||||
};
|
||||
use crate::supervisor::Signal;
|
||||
use std::cell::RefCell;
|
||||
use std::collections::VecDeque;
|
||||
use std::sync::Arc;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Configuration
|
||||
// with_runtime / try_with_runtime
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const ACTOR_STACK_SIZE: usize = 64 * 1024;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Per-actor slot
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
enum State {
|
||||
/// Either in the run queue or currently executing.
|
||||
Runnable,
|
||||
/// Removed from the queue, waiting for `unpark()`.
|
||||
Parked,
|
||||
/// The actor has finished. Slot persists until the last `JoinHandle`
|
||||
/// has been joined (or dropped). Then the slot is freed.
|
||||
Done,
|
||||
}
|
||||
|
||||
struct Slot {
|
||||
/// Bumped every time this slot is freed and re-used. A `Pid` with a
|
||||
/// non-matching generation is stale.
|
||||
generation: u32,
|
||||
/// `None` when the slot is free. `Some` otherwise.
|
||||
actor: Option<Actor>,
|
||||
state: State,
|
||||
/// PIDs waiting in `JoinHandle::join`.
|
||||
waiters: Vec<Pid>,
|
||||
/// The outcome the actor produced, captured when it finished.
|
||||
/// Drained by `JoinHandle::join`.
|
||||
outcome: Option<Outcome>,
|
||||
/// If this slot is a supervisor, the sender into its `Signal` mailbox.
|
||||
/// Cloned out and used when one of its children dies.
|
||||
supervisor_channel: Option<Sender<Signal>>,
|
||||
/// Number of `JoinHandle`s still outstanding for this actor. The slot
|
||||
/// is reclaimed only when the actor is done AND outstanding_handles == 0.
|
||||
outstanding_handles: u32,
|
||||
}
|
||||
|
||||
impl Slot {
|
||||
fn vacant() -> Self {
|
||||
Self {
|
||||
generation: 0,
|
||||
actor: None,
|
||||
state: State::Done,
|
||||
waiters: Vec::new(),
|
||||
outcome: None,
|
||||
supervisor_channel: None,
|
||||
outstanding_handles: 0,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Scheduler state
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
struct SchedulerState {
|
||||
slots: Vec<Slot>,
|
||||
free_list: Vec<u32>,
|
||||
run_queue: VecDeque<Pid>,
|
||||
/// The root supervisor's PID. Children spawned at the top level are
|
||||
/// supervised by this. Set by `run()`.
|
||||
root_pid: Option<Pid>,
|
||||
/// Pending sleep timers. Min-heap keyed by deadline.
|
||||
timers: crate::timer::Timers,
|
||||
}
|
||||
|
||||
impl SchedulerState {
|
||||
fn new() -> Self {
|
||||
Self {
|
||||
slots: Vec::new(),
|
||||
free_list: Vec::new(),
|
||||
run_queue: VecDeque::new(),
|
||||
root_pid: None,
|
||||
timers: crate::timer::Timers::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Allocate a slot; return its (index, generation).
|
||||
fn allocate_slot(&mut self) -> (u32, u32) {
|
||||
if let Some(idx) = self.free_list.pop() {
|
||||
let s = &mut self.slots[idx as usize];
|
||||
(idx, s.generation)
|
||||
} else {
|
||||
let idx = self.slots.len() as u32;
|
||||
self.slots.push(Slot::vacant());
|
||||
(idx, 0)
|
||||
}
|
||||
}
|
||||
|
||||
fn slot(&self, pid: Pid) -> Option<&Slot> {
|
||||
let s = self.slots.get(pid.index() as usize)?;
|
||||
if s.generation == pid.generation() { Some(s) } else { None }
|
||||
}
|
||||
|
||||
fn slot_mut(&mut self, pid: Pid) -> Option<&mut Slot> {
|
||||
let s = self.slots.get_mut(pid.index() as usize)?;
|
||||
if s.generation == pid.generation() { Some(s) } else { None }
|
||||
}
|
||||
}
|
||||
|
||||
thread_local! {
|
||||
static SCHED: RefCell<Option<SchedulerState>> = const { RefCell::new(None) };
|
||||
}
|
||||
|
||||
fn with_sched<R>(f: impl FnOnce(&mut SchedulerState) -> R) -> R {
|
||||
SCHED.with(|c| {
|
||||
let mut g = c.borrow_mut();
|
||||
let s = g.as_mut().expect("scheduler not running");
|
||||
f(s)
|
||||
/// Borrow the current runtime. Panics if called outside `Runtime::run()`.
|
||||
pub(crate) fn with_runtime<R>(f: impl FnOnce(&Arc<RuntimeInner>) -> R) -> R {
|
||||
RUNTIME.with(|r| {
|
||||
let b = r.borrow();
|
||||
let inner = b.as_ref().expect("smarm: not inside Runtime::run()");
|
||||
f(inner)
|
||||
})
|
||||
}
|
||||
|
||||
/// Same as `with_sched` but returns `None` when there's no scheduler instead
|
||||
/// of panicking. Used on cleanup paths (channel sender drop during shutdown,
|
||||
/// for example).
|
||||
fn try_with_sched<R>(f: impl FnOnce(&mut SchedulerState) -> R) -> Option<R> {
|
||||
SCHED.with(|c| {
|
||||
let mut g = c.borrow_mut();
|
||||
g.as_mut().map(f)
|
||||
})
|
||||
/// Borrow the runtime if present; returns `None` otherwise.
|
||||
/// Used on cleanup paths (channel Drop during teardown).
|
||||
pub(crate) fn try_with_runtime<R>(f: impl FnOnce(&Arc<RuntimeInner>) -> R) -> Option<R> {
|
||||
RUNTIME.with(|r| r.borrow().as_ref().map(|inner| f(inner)))
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// JoinHandle
|
||||
// JoinHandle / JoinError
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct JoinError {
|
||||
/// Whatever `panic!` was called with.
|
||||
pub payload: Box<dyn std::any::Any + Send>,
|
||||
}
|
||||
|
||||
pub struct JoinHandle {
|
||||
pid: Pid,
|
||||
/// `false` once `join()` has been called and the handle has consumed
|
||||
/// its outcome. Prevents the Drop impl from double-decrementing.
|
||||
consumed: bool,
|
||||
}
|
||||
|
||||
impl JoinHandle {
|
||||
pub fn pid(&self) -> Pid { self.pid }
|
||||
|
||||
/// Block the calling actor until the target completes. Returns
|
||||
/// `Ok(())` on normal exit, `Err(JoinError)` if the target panicked.
|
||||
pub fn join(mut self) -> Result<(), JoinError> {
|
||||
use crate::actor::Outcome;
|
||||
use crate::runtime::State; // need State visibility
|
||||
|
||||
let me = current_pid().expect("join() called outside an actor");
|
||||
|
||||
loop {
|
||||
let outcome = with_sched(|s| {
|
||||
let outcome = with_runtime(|inner| {
|
||||
inner.with_shared(|s| {
|
||||
let slot = s.slot_mut(self.pid)
|
||||
.expect("join: target slot has been reused");
|
||||
if matches!(slot.state, State::Done) {
|
||||
Some(slot.outcome.take().expect("Done slot must have an outcome"))
|
||||
Some(slot.outcome.take().expect("Done slot must have outcome"))
|
||||
} else {
|
||||
slot.waiters.push(me);
|
||||
None
|
||||
}
|
||||
})
|
||||
});
|
||||
|
||||
match outcome {
|
||||
@@ -206,23 +81,30 @@ impl JoinHandle {
|
||||
Outcome::Panic(p) => Err(JoinError { payload: p }),
|
||||
};
|
||||
}
|
||||
None => park_current(),
|
||||
None => {
|
||||
let _np = NoPreempt::enter();
|
||||
park_current();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn decrement_handle_count(&mut self) {
|
||||
with_sched(|s| {
|
||||
with_runtime(|inner| {
|
||||
inner.with_shared(|s| {
|
||||
let should_reclaim = match s.slot_mut(self.pid) {
|
||||
Some(slot) => {
|
||||
slot.outstanding_handles = slot.outstanding_handles.saturating_sub(1);
|
||||
matches!(slot.state, State::Done) && slot.outstanding_handles == 0
|
||||
slot.outstanding_handles =
|
||||
slot.outstanding_handles.saturating_sub(1);
|
||||
matches!(slot.state, crate::runtime::State::Done)
|
||||
&& slot.outstanding_handles == 0
|
||||
}
|
||||
None => false,
|
||||
};
|
||||
if should_reclaim {
|
||||
reclaim_slot(s, self.pid);
|
||||
crate::runtime::reclaim_slot(s, self.pid);
|
||||
}
|
||||
})
|
||||
});
|
||||
}
|
||||
}
|
||||
@@ -230,345 +112,238 @@ impl JoinHandle {
|
||||
impl Drop for JoinHandle {
|
||||
fn drop(&mut self) {
|
||||
if !self.consumed {
|
||||
// May be called outside run() if handle is dropped after teardown.
|
||||
if try_with_runtime(|_| ()).is_some() {
|
||||
self.decrement_handle_count();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Slot reclamation
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn reclaim_slot(s: &mut SchedulerState, pid: Pid) {
|
||||
let idx = pid.index();
|
||||
let slot = &mut s.slots[idx as usize];
|
||||
// Bump generation so any stale PIDs from now on miss.
|
||||
slot.generation = slot.generation.wrapping_add(1);
|
||||
// Drop the actor (its stack with it).
|
||||
slot.actor = None;
|
||||
slot.outcome = None;
|
||||
slot.waiters.clear();
|
||||
slot.supervisor_channel = None;
|
||||
slot.state = State::Done; // semantically vacant; allocator checks free_list
|
||||
slot.outstanding_handles = 0;
|
||||
s.free_list.push(idx);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// spawn / spawn_under / self_pid
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Spawn `f` as a child of the currently-executing actor.
|
||||
/// Outside an actor (only legal from `run()`'s initial setup), the child's
|
||||
/// supervisor is the root supervisor.
|
||||
pub fn spawn(f: impl FnOnce() + Send + 'static) -> JoinHandle {
|
||||
let parent = current_pid()
|
||||
.or_else(|| with_sched(|s| s.root_pid))
|
||||
.or_else(|| with_runtime(|inner| inner.with_shared(|s| s.root_pid)))
|
||||
.expect("spawn() before run()");
|
||||
spawn_under(parent, f)
|
||||
}
|
||||
|
||||
/// Spawn `f` with `supervisor` as its parent. The supervisor will receive
|
||||
/// a `Signal` on its registered channel when the child terminates.
|
||||
pub fn spawn_under(supervisor: Pid, f: impl FnOnce() + Send + 'static) -> JoinHandle {
|
||||
let pid = with_sched(|s| {
|
||||
let pid = with_runtime(|inner| {
|
||||
inner.with_shared(|s| {
|
||||
let (idx, gen) = s.allocate_slot();
|
||||
let pid = Pid::new(idx, gen);
|
||||
let stack = Stack::new(ACTOR_STACK_SIZE)
|
||||
let stack = crate::stack::Stack::new(crate::runtime::ACTOR_STACK_SIZE)
|
||||
.expect("stack allocation failed");
|
||||
let sp = init_actor_stack(stack.top(), trampoline);
|
||||
let sp = init_actor_stack(stack.top(), crate::actor::trampoline);
|
||||
let slot = &mut s.slots[idx as usize];
|
||||
slot.actor = Some(Actor { pid, stack, sp, supervisor });
|
||||
slot.state = State::Runnable;
|
||||
slot.actor = Some(crate::actor::Actor { pid, stack, sp, supervisor });
|
||||
slot.state = crate::runtime::State::Runnable;
|
||||
slot.outstanding_handles = 1;
|
||||
slot.outcome = None;
|
||||
slot.waiters.clear();
|
||||
slot.supervisor_channel = None;
|
||||
slot.pending_unpark = false;
|
||||
slot.pending_io_result = None;
|
||||
s.run_queue.push_back(pid);
|
||||
s.pending_closures.push((pid, Box::new(f) as crate::runtime::Closure));
|
||||
crate::te!(crate::trace::Event::Spawn { parent: supervisor, child: pid });
|
||||
crate::te!(crate::trace::Event::Enqueue(pid));
|
||||
pid
|
||||
});
|
||||
|
||||
// Stash the closure where `schedule_loop` will find it before the first
|
||||
// resume.
|
||||
PENDING_CLOSURES.with(|c| {
|
||||
c.borrow_mut().push((pid, Box::new(f) as Closure));
|
||||
})
|
||||
});
|
||||
|
||||
JoinHandle { pid, consumed: false }
|
||||
}
|
||||
|
||||
type Closure = Box<dyn FnOnce() + Send>;
|
||||
|
||||
thread_local! {
|
||||
/// Closures awaiting their first resume. Keyed by the PID the scheduler
|
||||
/// allocated for them in `spawn_under`. The scheduler pops from here in
|
||||
/// `pop_pending_closure` right before each first resume.
|
||||
static PENDING_CLOSURES: RefCell<Vec<(Pid, Closure)>> = const { RefCell::new(Vec::new()) };
|
||||
}
|
||||
|
||||
fn pop_pending_closure(pid: Pid) -> Option<Closure> {
|
||||
PENDING_CLOSURES.with(|c| {
|
||||
let mut v = c.borrow_mut();
|
||||
v.iter().position(|(p, _)| *p == pid).map(|i| v.swap_remove(i).1)
|
||||
})
|
||||
}
|
||||
use crate::context::init_actor_stack;
|
||||
|
||||
pub fn self_pid() -> Pid {
|
||||
current_pid().expect("self_pid() called outside an actor")
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// yield_now / park / unpark
|
||||
// yield_now / park_current / unpark
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Cooperative yield. The current actor goes to the back of the run queue.
|
||||
pub fn yield_now() {
|
||||
// Mark ourselves as needing to be re-queued, then yield.
|
||||
YIELD_INTENT.with(|c| c.set(YieldIntent::Yield));
|
||||
runtime::set_yield_intent(YieldIntent::Yield);
|
||||
unsafe { crate::context::switch_to_scheduler() };
|
||||
}
|
||||
|
||||
/// Park the current actor (remove it from the run queue until `unpark`).
|
||||
pub fn park_current() {
|
||||
YIELD_INTENT.with(|c| c.set(YieldIntent::Park));
|
||||
runtime::set_yield_intent(YieldIntent::Park);
|
||||
unsafe { crate::context::switch_to_scheduler() };
|
||||
}
|
||||
|
||||
/// Park the current actor for at least `duration`. A zero duration behaves
|
||||
/// like `yield_now` (the deadline is immediately in the past, so the timer
|
||||
/// pops on the next scheduler iteration).
|
||||
pub fn unpark(pid: Pid) {
|
||||
let result = try_with_runtime(|inner| {
|
||||
inner.with_shared(|s| {
|
||||
if let Some(slot) = s.slot_mut(pid) {
|
||||
match slot.state {
|
||||
crate::runtime::State::Parked => {
|
||||
// Actor is suspended — safe to re-queue immediately.
|
||||
slot.state = crate::runtime::State::Runnable;
|
||||
s.run_queue.push_back(pid);
|
||||
crate::te!(crate::trace::Event::UnparkDirect(pid));
|
||||
crate::te!(crate::trace::Event::Enqueue(pid));
|
||||
}
|
||||
crate::runtime::State::Runnable => {
|
||||
// Actor is still running (between registering its
|
||||
// parked_receiver and calling park_current). Set the
|
||||
// flag; the scheduler will re-queue after the Park
|
||||
// yield instead of sleeping.
|
||||
slot.pending_unpark = true;
|
||||
crate::te!(crate::trace::Event::UnparkDeferred(pid));
|
||||
}
|
||||
crate::runtime::State::Done => {}
|
||||
}
|
||||
}
|
||||
})
|
||||
});
|
||||
let _ = result;
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// NoPreempt
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub struct NoPreempt(bool);
|
||||
|
||||
impl NoPreempt {
|
||||
pub fn enter() -> Self {
|
||||
let prev = crate::preempt::PREEMPTION_ENABLED.with(|c| c.replace(false));
|
||||
NoPreempt(prev)
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for NoPreempt {
|
||||
fn drop(&mut self) {
|
||||
crate::preempt::PREEMPTION_ENABLED.with(|c| c.set(self.0));
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// sleep / insert_wait_timer
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub fn sleep(duration: std::time::Duration) {
|
||||
let me = current_pid().expect("sleep() called outside an actor");
|
||||
let _np = NoPreempt::enter();
|
||||
let deadline = crate::timer::deadline_from_now(duration);
|
||||
with_sched(|s| s.timers.insert(deadline, me));
|
||||
with_runtime(|inner| inner.with_shared(|s| s.timers.insert_sleep(deadline, me)));
|
||||
park_current();
|
||||
}
|
||||
|
||||
/// Wake a parked actor. If the actor isn't parked (already runnable or done)
|
||||
/// this is a no-op — that's important; channel and join can both fire
|
||||
/// spurious unparks under some orderings and we want them to be cheap.
|
||||
/// Also a no-op if the scheduler isn't running (covers channel-sender drop
|
||||
/// during runtime teardown).
|
||||
pub fn unpark(pid: Pid) {
|
||||
try_with_sched(|s| {
|
||||
if let Some(slot) = s.slot_mut(pid) {
|
||||
if matches!(slot.state, State::Parked) {
|
||||
slot.state = State::Runnable;
|
||||
s.run_queue.push_back(pid);
|
||||
}
|
||||
}
|
||||
pub fn insert_wait_timer(
|
||||
deadline: std::time::Instant,
|
||||
pid: Pid,
|
||||
target: std::sync::Arc<dyn crate::timer::TimerTarget>,
|
||||
wait_seq: u64,
|
||||
) {
|
||||
with_runtime(|inner| {
|
||||
inner.with_shared(|s| {
|
||||
s.timers.insert(
|
||||
deadline,
|
||||
pid,
|
||||
crate::timer::Reason::WaitTimeout { target, wait_seq },
|
||||
);
|
||||
})
|
||||
});
|
||||
}
|
||||
|
||||
/// What an actor wants the scheduler to do when control returns from it.
|
||||
#[derive(Copy, Clone)]
|
||||
enum YieldIntent {
|
||||
/// Re-queue (yield_now or preemption).
|
||||
Yield,
|
||||
/// Remove from the run queue (waiting for unpark).
|
||||
Park,
|
||||
// ---------------------------------------------------------------------------
|
||||
// block_on_io / wait_readable / wait_writable / read / write
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub fn block_on_io<F, T>(f: F) -> T
|
||||
where
|
||||
F: FnOnce() -> T + Send + 'static,
|
||||
T: Send + 'static,
|
||||
{
|
||||
let me = current_pid().expect("block_on_io() called outside an actor");
|
||||
let work: Box<dyn FnOnce() -> crate::io::IoResult + Send> = Box::new(move || {
|
||||
let v: T = f();
|
||||
Ok(Box::new(v) as Box<dyn std::any::Any + Send>)
|
||||
});
|
||||
{
|
||||
let _np = NoPreempt::enter();
|
||||
with_runtime(|inner| inner.with_shared(|s| {
|
||||
let io = s.io.as_mut().expect("io thread not started");
|
||||
io.submit(me, work);
|
||||
}));
|
||||
park_current();
|
||||
}
|
||||
let result = with_runtime(|inner| inner.with_shared(|s| {
|
||||
s.slot_mut(me)
|
||||
.expect("block_on_io: own slot vanished")
|
||||
.pending_io_result
|
||||
.take()
|
||||
.expect("block_on_io: resumed without a result")
|
||||
}));
|
||||
match result {
|
||||
Ok(any) => *any.downcast::<T>().expect("block_on_io: type mismatch"),
|
||||
Err(payload) => std::panic::resume_unwind(payload),
|
||||
}
|
||||
}
|
||||
|
||||
thread_local! {
|
||||
static YIELD_INTENT: std::cell::Cell<YieldIntent> = const { std::cell::Cell::new(YieldIntent::Yield) };
|
||||
pub fn wait_readable(fd: std::os::fd::RawFd) -> std::io::Result<()> {
|
||||
wait_fd(fd, true, false)
|
||||
}
|
||||
|
||||
pub fn wait_writable(fd: std::os::fd::RawFd) -> std::io::Result<()> {
|
||||
wait_fd(fd, false, true)
|
||||
}
|
||||
|
||||
fn wait_fd(fd: std::os::fd::RawFd, readable: bool, writable: bool) -> std::io::Result<()> {
|
||||
let me = current_pid().expect("wait_*() called outside an actor");
|
||||
let _np = NoPreempt::enter();
|
||||
with_runtime(|inner| inner.with_shared(|s| {
|
||||
let io = s.io.as_mut().expect("io thread not started");
|
||||
io.epoll_register(fd, me, readable, writable)
|
||||
}))?;
|
||||
park_current();
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn read(fd: std::os::fd::RawFd, buf: &mut [u8]) -> std::io::Result<usize> {
|
||||
wait_readable(fd)?;
|
||||
let n = unsafe { libc::read(fd, buf.as_mut_ptr() as *mut _, buf.len()) };
|
||||
if n < 0 { Err(std::io::Error::last_os_error()) } else { Ok(n as usize) }
|
||||
}
|
||||
|
||||
pub fn write(fd: std::os::fd::RawFd, buf: &[u8]) -> std::io::Result<usize> {
|
||||
wait_writable(fd)?;
|
||||
let n = unsafe { libc::write(fd, buf.as_ptr() as *const _, buf.len()) };
|
||||
if n < 0 { Err(std::io::Error::last_os_error()) } else { Ok(n as usize) }
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Supervisor channel registration
|
||||
// register_supervisor_channel
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Register `sender` as the mailbox for signals about children supervised
|
||||
/// by `pid`. Idempotent; later calls overwrite.
|
||||
pub fn register_supervisor_channel(pid: Pid, sender: Sender<Signal>) {
|
||||
with_sched(|s| {
|
||||
with_runtime(|inner| inner.with_shared(|s| {
|
||||
if let Some(slot) = s.slot_mut(pid) {
|
||||
slot.supervisor_channel = Some(sender);
|
||||
} else {
|
||||
panic!("register_supervisor_channel: pid {:?} not found", pid);
|
||||
}
|
||||
});
|
||||
}));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// run() — the runtime entry point
|
||||
// Legacy run() — convenience wrapper
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Boot the runtime, spawn `initial` as a child of the root supervisor,
|
||||
/// drive the scheduler until the run queue is empty, tear down.
|
||||
///
|
||||
/// The root supervisor is a *sentinel* PID, not a real actor. Signals
|
||||
/// addressed to it are dropped on the floor — that's what "process exits"
|
||||
/// means in the spec when nothing escalates further. User code that wants
|
||||
/// real supervision spawns its own supervisor actor and uses `spawn_under`.
|
||||
pub fn run<F: FnOnce() + Send + 'static>(initial: F) {
|
||||
SCHED.with(|c| {
|
||||
assert!(c.borrow().is_none(), "smarm::run() called recursively");
|
||||
let mut state = SchedulerState::new();
|
||||
state.root_pid = Some(ROOT_PID);
|
||||
*c.borrow_mut() = Some(state);
|
||||
});
|
||||
|
||||
let initial_handle = spawn(initial);
|
||||
|
||||
schedule_loop();
|
||||
|
||||
// Drop the handle BEFORE the scheduler is torn down — its Drop impl
|
||||
// calls `with_sched` to decrement the outstanding-handle count.
|
||||
drop(initial_handle);
|
||||
|
||||
// Take the SchedulerState out of the thread-local BEFORE dropping it.
|
||||
// Dropping it while still inside SCHED.with's RefCell borrow would
|
||||
// re-enter (via channel senders' Drop → unpark → try_with_sched).
|
||||
let state = SCHED.with(|c| c.borrow_mut().take());
|
||||
drop(state);
|
||||
PENDING_CLOSURES.with(|c| c.borrow_mut().clear());
|
||||
/// Single-threaded runtime entry point (backwards-compatible wrapper).
|
||||
/// Equivalent to `runtime::init(Config::exact(1)).run(f)`.
|
||||
pub fn run<F: FnOnce() + Send + 'static>(f: F) {
|
||||
crate::runtime::init(crate::runtime::Config::exact(1)).run(f);
|
||||
}
|
||||
|
||||
/// Reserved sentinel pid for the root supervisor. Never allocated to a
|
||||
/// real actor; lookups return `None`; signals are dropped.
|
||||
pub const ROOT_PID: Pid = Pid::new(u32::MAX, u32::MAX);
|
||||
|
||||
fn schedule_loop() {
|
||||
loop {
|
||||
// 1. Drain due timers into the run queue.
|
||||
let now = std::time::Instant::now();
|
||||
let due = with_sched(|s| s.timers.pop_due(now));
|
||||
for pid in due {
|
||||
// Same idempotency as `unpark`: only re-queue if still parked.
|
||||
with_sched(|s| {
|
||||
if let Some(slot) = s.slot_mut(pid) {
|
||||
if matches!(slot.state, State::Parked) {
|
||||
slot.state = State::Runnable;
|
||||
s.run_queue.push_back(pid);
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// 2. Pop a runnable actor. If none, sleep on the soonest timer or
|
||||
// exit if there isn't one.
|
||||
let pid = match with_sched(|s| s.run_queue.pop_front()) {
|
||||
Some(p) => p,
|
||||
None => {
|
||||
let next = with_sched(|s| s.timers.peek_deadline());
|
||||
match next {
|
||||
Some(deadline) => {
|
||||
let now = std::time::Instant::now();
|
||||
if deadline > now {
|
||||
// No other thread can wake us; plain sleep is
|
||||
// correct. When the IO thread lands in v0.2
|
||||
// this becomes a Condvar / pipe wakeup.
|
||||
std::thread::sleep(deadline - now);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
None => return, // no runnables, no timers — done.
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
// Look up sp; skip stale or already-reaped pids.
|
||||
let sp = match with_sched(|s| {
|
||||
s.slot(pid).and_then(|slot| slot.actor.as_ref().map(|a| a.sp))
|
||||
}) {
|
||||
Some(sp) => sp,
|
||||
None => continue,
|
||||
};
|
||||
|
||||
// If this is a first resume, move the pending closure to the
|
||||
// thread-local the trampoline reads.
|
||||
if let Some(b) = pop_pending_closure(pid) {
|
||||
set_current_actor_box(b);
|
||||
}
|
||||
|
||||
set_actor_sp(sp);
|
||||
set_current_pid(pid);
|
||||
reset_actor_done();
|
||||
YIELD_INTENT.with(|c| c.set(YieldIntent::Yield));
|
||||
|
||||
crate::preempt::reset_timeslice();
|
||||
PREEMPTION_ENABLED.with(|c| c.set(true));
|
||||
|
||||
unsafe { switch_to_actor() };
|
||||
|
||||
PREEMPTION_ENABLED.with(|c| c.set(false));
|
||||
clear_current_pid();
|
||||
|
||||
let intent = YIELD_INTENT.with(|c| c.get());
|
||||
let new_sp = get_actor_sp();
|
||||
|
||||
if is_actor_done() {
|
||||
let outcome = take_last_outcome().unwrap_or(Outcome::Exit);
|
||||
finalize_actor(pid, outcome);
|
||||
} else {
|
||||
with_sched(|s| {
|
||||
if let Some(slot) = s.slot_mut(pid) {
|
||||
if let Some(actor) = slot.actor.as_mut() {
|
||||
actor.sp = new_sp;
|
||||
}
|
||||
match intent {
|
||||
YieldIntent::Yield => {
|
||||
slot.state = State::Runnable;
|
||||
s.run_queue.push_back(pid);
|
||||
}
|
||||
YieldIntent::Park => {
|
||||
slot.state = State::Parked;
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn finalize_actor(pid: Pid, outcome: Outcome) {
|
||||
// Joiners get the typed Result with the panic payload. The supervisor
|
||||
// gets an informational `Signal::Panic` with an empty payload — its job
|
||||
// is policy (restart/escalate), not forensics. Users who need the
|
||||
// payload in supervision can plumb their own channel.
|
||||
|
||||
let (joiner_outcome, sup_signal) = match outcome {
|
||||
Outcome::Exit => (Outcome::Exit, Signal::Exit(pid)),
|
||||
Outcome::Panic(payload) => (
|
||||
Outcome::Panic(payload),
|
||||
Signal::Panic(pid, Box::new(()) as Box<dyn std::any::Any + Send>),
|
||||
),
|
||||
};
|
||||
|
||||
// Stash outcome, mark Done, collect waiters, drop the actor stack.
|
||||
let (waiters, supervisor_pid) = with_sched(|s| {
|
||||
let slot = s.slot_mut(pid).expect("finalize_actor: slot vanished");
|
||||
let sup = slot.actor.as_ref().map(|a| a.supervisor);
|
||||
slot.outcome = Some(joiner_outcome);
|
||||
slot.state = State::Done;
|
||||
slot.actor = None;
|
||||
let w = std::mem::take(&mut slot.waiters);
|
||||
(w, sup)
|
||||
});
|
||||
|
||||
// Deliver to supervisor (best-effort; ignore SendError).
|
||||
if let Some(sup) = supervisor_pid {
|
||||
let sender = with_sched(|s| {
|
||||
s.slot(sup).and_then(|slot| slot.supervisor_channel.clone())
|
||||
});
|
||||
if let Some(sender) = sender {
|
||||
let _ = sender.send(sup_signal);
|
||||
}
|
||||
}
|
||||
|
||||
// Unpark joiners.
|
||||
for joiner in waiters {
|
||||
unpark(joiner);
|
||||
}
|
||||
|
||||
// Reclaim if no outstanding handles.
|
||||
with_sched(|s| {
|
||||
let should_reclaim = match s.slot(pid) {
|
||||
Some(slot) => slot.outstanding_handles == 0,
|
||||
None => false,
|
||||
};
|
||||
if should_reclaim {
|
||||
reclaim_slot(s, pid);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
112
src/timer.rs
112
src/timer.rs
@@ -1,38 +1,86 @@
|
||||
//! Sleep timers.
|
||||
//! Sleep + wait-with-timeout timers.
|
||||
//!
|
||||
//! A min-heap of `(deadline, Pid)` entries lives on `SchedulerState`. When
|
||||
//! an actor calls `sleep`, the runtime inserts the entry, marks the actor
|
||||
//! parked, and yields. On every scheduler loop iteration the runtime pops
|
||||
//! all entries whose deadline has passed and unparks them. When the run
|
||||
//! queue is empty but the heap is not, the runtime sleeps the OS thread
|
||||
//! until the soonest deadline, then re-checks.
|
||||
//! A min-heap of `(deadline, seq, reason)` entries lives on `SchedulerState`.
|
||||
//! When an actor sleeps or starts a bounded wait (e.g. `mutex.lock()` with a
|
||||
//! timeout), the runtime inserts an entry, marks the actor parked, and yields.
|
||||
//! On every scheduler loop iteration the runtime pops all entries whose
|
||||
//! deadline has passed and dispatches each according to its `Reason`:
|
||||
//!
|
||||
//! `BinaryHeap` is a max-heap, so entries are stored with their deadline
|
||||
//! wrapped in `Reverse` to get min-heap behaviour.
|
||||
//! - `Sleep`: unpark the actor.
|
||||
//! - `WaitTimeout`: call `on_timeout` on the registered target. The target
|
||||
//! (e.g. a `Mutex`) decides whether the actor was actually still waiting
|
||||
//! (timer fires first → unpark with error) or had already been granted
|
||||
//! what it was waiting for (lock granted first → no-op).
|
||||
//!
|
||||
//! Stale pids (slot reused since the timer was inserted) are detected on
|
||||
//! `due_pids` pop and silently dropped — same convention as the run queue.
|
||||
//! `BinaryHeap` is a max-heap; entries are wrapped in `Reverse` to get
|
||||
//! min-heap behaviour.
|
||||
//!
|
||||
//! No cancellation. When a non-timer wakeup happens (e.g. lock granted
|
||||
//! before timeout), the timer entry is left in the heap. It will be popped
|
||||
//! eventually and the dispatch will observe "actor is no longer parked /
|
||||
//! wait_seq is stale" and no-op. Cost is ~32 bytes per stale entry plus a
|
||||
//! few cycles on pop; acceptable given the upper bound is "one entry per
|
||||
//! parked actor".
|
||||
//!
|
||||
//! Stale pids (slot reused since the timer was inserted) are filtered on
|
||||
//! pop by the scheduler — same convention as the run queue.
|
||||
|
||||
use crate::pid::Pid;
|
||||
use std::cmp::Reverse;
|
||||
use std::collections::BinaryHeap;
|
||||
use std::sync::Arc;
|
||||
use std::time::{Duration, Instant};
|
||||
|
||||
#[derive(PartialEq, Eq)]
|
||||
/// What to do when a timer entry's deadline arrives.
|
||||
///
|
||||
/// Held inside `Entry`, dispatched by the scheduler in `pop_due`.
|
||||
pub enum Reason {
|
||||
/// `loom::sleep(d)`. Unpark `pid` unconditionally (modulo the usual
|
||||
/// "still parked?" check the scheduler applies).
|
||||
Sleep,
|
||||
/// A bounded wait — currently only `Mutex::lock_timeout`. On expiry the
|
||||
/// scheduler calls `target.on_timeout(pid, wait_seq)`. The target then
|
||||
/// decides whether `pid` was actually still waiting, and if so unparks
|
||||
/// it with whatever error the wait was bounded for. `wait_seq` lets the
|
||||
/// target tell apart "this wait" from "a later wait by the same actor
|
||||
/// on the same target".
|
||||
WaitTimeout {
|
||||
target: Arc<dyn TimerTarget>,
|
||||
wait_seq: u64,
|
||||
},
|
||||
}
|
||||
|
||||
/// Callback the scheduler invokes when a `WaitTimeout` entry pops.
|
||||
///
|
||||
/// Implementors: do not touch `SchedulerState` other than via the public
|
||||
/// `unpark` / channel APIs. The scheduler is mid-iteration when this fires.
|
||||
pub trait TimerTarget: Send + Sync {
|
||||
fn on_timeout(&self, pid: Pid, wait_seq: u64);
|
||||
}
|
||||
|
||||
pub struct Entry {
|
||||
pub deadline: Instant,
|
||||
/// Insertion order, used purely as a tiebreaker so `Entry: Ord` works
|
||||
/// without having to compare the `Reason` payload (which contains an
|
||||
/// `Rc<dyn TimerTarget>` and isn't `Ord`).
|
||||
seq: u64,
|
||||
pub pid: Pid,
|
||||
pub reason: Reason,
|
||||
}
|
||||
|
||||
impl PartialEq for Entry {
|
||||
fn eq(&self, other: &Self) -> bool {
|
||||
self.deadline == other.deadline && self.seq == other.seq
|
||||
}
|
||||
}
|
||||
impl Eq for Entry {}
|
||||
|
||||
impl Ord for Entry {
|
||||
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
|
||||
// Only `deadline` matters for ordering; pid is a tiebreaker so the
|
||||
// type is Ord, but the order among same-deadline entries is
|
||||
// irrelevant.
|
||||
self.deadline
|
||||
.cmp(&other.deadline)
|
||||
.then_with(|| self.pid.index().cmp(&other.pid.index()))
|
||||
.then_with(|| self.pid.generation().cmp(&other.pid.generation()))
|
||||
// Earlier deadline first; ties broken by insertion order so the
|
||||
// ordering is total. `Reason` and `Pid` deliberately don't
|
||||
// participate.
|
||||
self.deadline.cmp(&other.deadline).then_with(|| self.seq.cmp(&other.seq))
|
||||
}
|
||||
}
|
||||
|
||||
@@ -46,15 +94,25 @@ impl PartialOrd for Entry {
|
||||
pub struct Timers {
|
||||
/// Reverse-wrapped so the smallest deadline is at the top.
|
||||
heap: BinaryHeap<Reverse<Entry>>,
|
||||
/// Monotonic counter for the tiebreaker `seq` field.
|
||||
next_seq: u64,
|
||||
}
|
||||
|
||||
impl Timers {
|
||||
pub fn new() -> Self {
|
||||
Self { heap: BinaryHeap::new() }
|
||||
Self { heap: BinaryHeap::new(), next_seq: 0 }
|
||||
}
|
||||
|
||||
pub fn insert(&mut self, deadline: Instant, pid: Pid) {
|
||||
self.heap.push(Reverse(Entry { deadline, pid }));
|
||||
/// Insert a `Sleep` timer. Convenience for the common case.
|
||||
pub fn insert_sleep(&mut self, deadline: Instant, pid: Pid) {
|
||||
self.insert(deadline, pid, Reason::Sleep);
|
||||
}
|
||||
|
||||
/// Insert an arbitrary timer entry.
|
||||
pub fn insert(&mut self, deadline: Instant, pid: Pid, reason: Reason) {
|
||||
let seq = self.next_seq;
|
||||
self.next_seq = self.next_seq.wrapping_add(1);
|
||||
self.heap.push(Reverse(Entry { deadline, seq, pid, reason }));
|
||||
}
|
||||
|
||||
pub fn is_empty(&self) -> bool {
|
||||
@@ -66,13 +124,13 @@ impl Timers {
|
||||
self.heap.peek().map(|r| r.0.deadline)
|
||||
}
|
||||
|
||||
/// Pop and return every pid whose deadline is ≤ `now`.
|
||||
pub fn pop_due(&mut self, now: Instant) -> Vec<Pid> {
|
||||
/// Pop every entry whose deadline is ≤ `now`, in deadline order.
|
||||
/// The scheduler dispatches each entry by inspecting `entry.reason`.
|
||||
pub fn pop_due(&mut self, now: Instant) -> Vec<Entry> {
|
||||
let mut out = Vec::new();
|
||||
while let Some(r) = self.heap.peek() {
|
||||
if r.0.deadline <= now {
|
||||
let e = self.heap.pop().unwrap().0;
|
||||
out.push(e.pid);
|
||||
out.push(self.heap.pop().unwrap().0);
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
@@ -81,7 +139,7 @@ impl Timers {
|
||||
}
|
||||
}
|
||||
|
||||
/// Wall-clock duration helper exposed for `sleep`.
|
||||
/// Wall-clock duration helper exposed for `sleep` and `lock_timeout`.
|
||||
pub fn deadline_from_now(duration: Duration) -> Instant {
|
||||
Instant::now()
|
||||
.checked_add(duration)
|
||||
|
||||
246
src/trace.rs
Normal file
246
src/trace.rs
Normal file
@@ -0,0 +1,246 @@
|
||||
//! Structured per-event tracing for smarm.
|
||||
//!
|
||||
//! Enabled by `--features smarm-trace`. Zero cost without the feature.
|
||||
//!
|
||||
//! Architecture: MPSC. Every scheduler thread holds a thread-local Sender
|
||||
//! clone (one mutex acquire per thread, on first use). A dedicated drain
|
||||
//! thread owns the Receiver, batches records, and writes to a BufWriter.
|
||||
//! The hot path (record()) is a single channel send — no mutex, no disk I/O.
|
||||
//!
|
||||
//! Usage:
|
||||
//! cargo test --test runtime <test_name> --features smarm-trace
|
||||
//!
|
||||
//! Output: smarm_trace.json in cwd, or $SMARM_TRACE_FILE.
|
||||
//! View: https://ui.perfetto.dev or chrome://tracing
|
||||
|
||||
#[cfg(feature = "smarm-trace")]
|
||||
#[macro_export]
|
||||
macro_rules! te {
|
||||
($kind:expr) => { $crate::trace::record($kind) };
|
||||
}
|
||||
|
||||
#[cfg(not(feature = "smarm-trace"))]
|
||||
#[macro_export]
|
||||
macro_rules! te {
|
||||
($kind:expr) => { () };
|
||||
}
|
||||
|
||||
#[cfg(feature = "smarm-trace")]
|
||||
pub use inner::*;
|
||||
|
||||
#[cfg(feature = "smarm-trace")]
|
||||
mod inner {
|
||||
use crate::pid::Pid;
|
||||
use std::io::Write;
|
||||
use std::sync::{mpsc, Mutex};
|
||||
use std::time::Instant;
|
||||
|
||||
// -----------------------------------------------------------------------
|
||||
// Event kinds
|
||||
// -----------------------------------------------------------------------
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub enum Event {
|
||||
// Actor lifecycle
|
||||
Spawn { parent: Pid, child: Pid },
|
||||
Resume(Pid),
|
||||
Yield(Pid),
|
||||
Park(Pid),
|
||||
Done(Pid),
|
||||
// Wakeup paths
|
||||
UnparkDirect(Pid), // unpark() saw Parked -> re-queued immediately
|
||||
UnparkDeferred(Pid), // unpark() saw Runnable -> set pending_unpark flag
|
||||
UnparkFlagConsumed(Pid), // scheduler saw flag on Park -> re-queued instead
|
||||
// Channel
|
||||
Send { sender: Pid, receiver: Option<Pid> },
|
||||
RecvPark(Pid),
|
||||
RecvWake(Pid),
|
||||
// Queue
|
||||
Enqueue(Pid),
|
||||
Dequeue(Pid),
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------
|
||||
// Wire format sent through the channel
|
||||
// -----------------------------------------------------------------------
|
||||
|
||||
struct Record {
|
||||
nanos: u64, // ns since open()
|
||||
tid: u64, // OS thread id
|
||||
event: Event,
|
||||
}
|
||||
|
||||
// Sentinel: drain thread flushes and exits when it receives this.
|
||||
enum Msg {
|
||||
Event(Record),
|
||||
Flush,
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------
|
||||
// Global sender + start time
|
||||
// -----------------------------------------------------------------------
|
||||
|
||||
struct Global {
|
||||
sender: mpsc::Sender<Msg>,
|
||||
start: Instant,
|
||||
}
|
||||
|
||||
static GLOBAL: Mutex<Option<Global>> = Mutex::new(None);
|
||||
|
||||
// Per-thread state: cached Sender clone + cached copy of start Instant.
|
||||
// The Sender clone is taken once per thread (one mutex hit).
|
||||
// The start Instant is copied alongside it — also one mutex hit per thread.
|
||||
// record() never touches GLOBAL after that.
|
||||
struct LocalState {
|
||||
tx: mpsc::Sender<Msg>,
|
||||
start: Instant,
|
||||
}
|
||||
|
||||
thread_local! {
|
||||
static LOCAL_STATE: std::cell::RefCell<Option<LocalState>> =
|
||||
std::cell::RefCell::new(None);
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------
|
||||
// Lifecycle
|
||||
// -----------------------------------------------------------------------
|
||||
|
||||
pub fn open() {
|
||||
let path = std::env::var("SMARM_TRACE_FILE")
|
||||
.unwrap_or_else(|_| "smarm_trace.json".to_owned());
|
||||
|
||||
let (tx, rx) = mpsc::channel::<Msg>();
|
||||
let start = Instant::now();
|
||||
|
||||
*GLOBAL.lock().unwrap() = Some(Global { sender: tx, start });
|
||||
|
||||
// Drain thread: owns the Receiver, writes to disk.
|
||||
let path_for_thread = path.clone();
|
||||
std::thread::Builder::new()
|
||||
.name("smarm-trace-drain".into())
|
||||
.spawn(move || drain_thread(rx, &path_for_thread))
|
||||
.expect("failed to spawn trace drain thread");
|
||||
|
||||
eprintln!("[smarm-trace] writing to {}", path);
|
||||
}
|
||||
|
||||
/// Send a Flush sentinel and block until the drain thread finishes writing.
|
||||
/// Called by Runtime::run after all scheduler threads have exited.
|
||||
pub fn flush() {
|
||||
// Drop the global sender so the drain thread's recv() returns Err
|
||||
// after the Flush sentinel, signalling clean shutdown.
|
||||
let sender = {
|
||||
let mut g = GLOBAL.lock().unwrap();
|
||||
g.take().map(|g| g.sender)
|
||||
};
|
||||
if let Some(tx) = sender {
|
||||
let _ = tx.send(Msg::Flush);
|
||||
// tx drops here — drain thread will see disconnected after Flush.
|
||||
}
|
||||
// Clear thread-local state.
|
||||
LOCAL_STATE.with(|c| *c.borrow_mut() = None);
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------
|
||||
// Hot path
|
||||
// -----------------------------------------------------------------------
|
||||
|
||||
pub fn record(event: Event) {
|
||||
// Disable preemption for the entire duration of record(). Any
|
||||
// allocation here (mutex internals, channel send, lazy init) would
|
||||
// trigger PreemptingAllocator -> maybe_preempt -> switch_to_scheduler,
|
||||
// which would try to re-acquire inner.shared (already held at many
|
||||
// te!() call sites) -> deadlock. Guard at the very top, before any
|
||||
// allocation-capable call.
|
||||
let was_enabled = crate::preempt::PREEMPTION_ENABLED
|
||||
.with(|e| { let v = e.get(); e.set(false); v });
|
||||
|
||||
LOCAL_STATE.with(|cell| {
|
||||
let mut opt = cell.borrow_mut();
|
||||
// Lazily initialise: one mutex hit per thread, ever.
|
||||
if opt.is_none() {
|
||||
if let Some(g) = GLOBAL.lock().unwrap().as_ref() {
|
||||
let tx = g.sender.clone();
|
||||
*opt = Some(LocalState { tx, start: g.start });
|
||||
}
|
||||
}
|
||||
if let Some(ls) = opt.as_ref() {
|
||||
let nanos = ls.start.elapsed().as_nanos() as u64;
|
||||
let tid = os_tid();
|
||||
let _ = ls.tx.send(Msg::Event(Record { nanos, tid, event }));
|
||||
}
|
||||
});
|
||||
|
||||
crate::preempt::PREEMPTION_ENABLED.with(|e| e.set(was_enabled));
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------
|
||||
// Drain thread
|
||||
// -----------------------------------------------------------------------
|
||||
|
||||
fn drain_thread(rx: mpsc::Receiver<Msg>, path: &str) {
|
||||
let f = match std::fs::File::create(path) {
|
||||
Ok(f) => f,
|
||||
Err(e) => { eprintln!("[smarm-trace] create failed: {}", e); return; }
|
||||
};
|
||||
let mut w = std::io::BufWriter::new(f);
|
||||
let _ = writeln!(w, "{{\"traceEvents\":[");
|
||||
|
||||
let mut count: u64 = 0;
|
||||
let mut first = true;
|
||||
|
||||
loop {
|
||||
match rx.recv() {
|
||||
Ok(Msg::Event(r)) => {
|
||||
let (name, actor_idx) = chrome_fields(&r.event);
|
||||
let ts_us = r.nanos as f64 / 1000.0;
|
||||
if !first { let _ = w.write_all(b",\n"); }
|
||||
first = false;
|
||||
let _ = write!(w,
|
||||
"{{\"ph\":\"i\",\"ts\":{:.3},\"pid\":{},\"tid\":{},\"name\":{:?},\"s\":\"g\"}}",
|
||||
ts_us, actor_idx, r.tid, name);
|
||||
count += 1;
|
||||
}
|
||||
Ok(Msg::Flush) | Err(_) => {
|
||||
// Clean close.
|
||||
let _ = writeln!(w, "\n]}}");
|
||||
let _ = w.flush();
|
||||
eprintln!("[smarm-trace] {} events written", count);
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------
|
||||
// Chrome trace helpers
|
||||
// -----------------------------------------------------------------------
|
||||
|
||||
fn chrome_fields(ev: &Event) -> (String, u32) {
|
||||
match ev {
|
||||
Event::Spawn { parent, child } =>
|
||||
(format!("spawn c={}", child.index()), parent.index()),
|
||||
Event::Resume(p) => ("resume".into(), p.index()),
|
||||
Event::Yield(p) => ("yield".into(), p.index()),
|
||||
Event::Park(p) => ("park".into(), p.index()),
|
||||
Event::Done(p) => ("done".into(), p.index()),
|
||||
Event::UnparkDirect(p) => ("unpark_direct".into(), p.index()),
|
||||
Event::UnparkDeferred(p) => ("unpark_deferred".into(), p.index()),
|
||||
Event::UnparkFlagConsumed(p) => ("unpark_flag_consumed".into(), p.index()),
|
||||
Event::Send { sender, receiver } => (
|
||||
format!("send rx={}", receiver
|
||||
.map(|p| p.index().to_string())
|
||||
.unwrap_or_else(|| "none".into())),
|
||||
sender.index(),
|
||||
),
|
||||
Event::RecvPark(p) => ("recv_park".into(), p.index()),
|
||||
Event::RecvWake(p) => ("recv_wake".into(), p.index()),
|
||||
Event::Enqueue(p) => ("enqueue".into(), p.index()),
|
||||
Event::Dequeue(p) => ("dequeue".into(), p.index()),
|
||||
}
|
||||
}
|
||||
|
||||
fn os_tid() -> u64 {
|
||||
unsafe { libc::syscall(libc::SYS_gettid) as u64 }
|
||||
}
|
||||
}
|
||||
99
tests/io.rs
Normal file
99
tests/io.rs
Normal file
@@ -0,0 +1,99 @@
|
||||
//! Tests for `block_on_io` — running a blocking closure on a worker OS
|
||||
//! thread while the calling actor is parked.
|
||||
|
||||
use smarm::{block_on_io, run, spawn, yield_now};
|
||||
use std::sync::atomic::{AtomicU32, Ordering};
|
||||
use std::sync::{Arc, Mutex};
|
||||
use std::time::Duration;
|
||||
|
||||
#[test]
|
||||
fn block_on_io_returns_the_closures_value() {
|
||||
let captured: Arc<Mutex<Option<u64>>> = Arc::new(Mutex::new(None));
|
||||
let c = captured.clone();
|
||||
run(move || {
|
||||
let v: u64 = block_on_io(|| {
|
||||
// Burn a tiny bit of time so this actually crosses thread.
|
||||
std::thread::sleep(Duration::from_millis(5));
|
||||
42
|
||||
});
|
||||
*c.lock().unwrap() = Some(v);
|
||||
});
|
||||
assert_eq!(*captured.lock().unwrap(), Some(42));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn other_actors_run_while_block_on_io_is_in_flight() {
|
||||
// While actor A is parked in block_on_io, actor B should be able to
|
||||
// make progress.
|
||||
let order: Arc<Mutex<Vec<u8>>> = Arc::new(Mutex::new(Vec::new()));
|
||||
let oa = order.clone();
|
||||
let ob = order.clone();
|
||||
|
||||
run(move || {
|
||||
let a = spawn(move || {
|
||||
oa.lock().unwrap().push(1); // A starts first.
|
||||
block_on_io(|| {
|
||||
std::thread::sleep(Duration::from_millis(50));
|
||||
});
|
||||
oa.lock().unwrap().push(4); // A resumes last.
|
||||
});
|
||||
let b = spawn(move || {
|
||||
// Make sure A enters block_on_io first.
|
||||
yield_now();
|
||||
ob.lock().unwrap().push(2);
|
||||
yield_now();
|
||||
ob.lock().unwrap().push(3);
|
||||
});
|
||||
a.join().unwrap();
|
||||
b.join().unwrap();
|
||||
});
|
||||
|
||||
// Required interleaving: 1 (A starts) before 2,3 (B runs while A
|
||||
// is parked), and 4 (A resumes) after 2,3.
|
||||
let v = order.lock().unwrap();
|
||||
assert_eq!(v[0], 1, "log: {:?}", *v);
|
||||
assert_eq!(v[v.len() - 1], 4, "log: {:?}", *v);
|
||||
let pos_2 = v.iter().position(|&x| x == 2).unwrap();
|
||||
let pos_3 = v.iter().position(|&x| x == 3).unwrap();
|
||||
let pos_4 = v.iter().position(|&x| x == 4).unwrap();
|
||||
assert!(pos_2 < pos_4, "B's first step ran after A resumed: {:?}", *v);
|
||||
assert!(pos_3 < pos_4, "B's second step ran after A resumed: {:?}", *v);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn many_concurrent_block_on_io_calls_all_complete() {
|
||||
let counter = Arc::new(AtomicU32::new(0));
|
||||
let c = counter.clone();
|
||||
run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..10 {
|
||||
let cc = c.clone();
|
||||
handles.push(spawn(move || {
|
||||
let n: u32 = block_on_io(|| {
|
||||
std::thread::sleep(Duration::from_millis(10));
|
||||
1
|
||||
});
|
||||
cc.fetch_add(n, Ordering::SeqCst);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
assert_eq!(counter.load(Ordering::SeqCst), 10);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn block_on_io_panic_propagates_to_caller() {
|
||||
let saw_err = Arc::new(std::sync::atomic::AtomicBool::new(false));
|
||||
let s = saw_err.clone();
|
||||
run(move || {
|
||||
let h = spawn(move || {
|
||||
// The closure panics on the worker thread; that should
|
||||
// resurface as a panic in this actor.
|
||||
let _: () = block_on_io(|| panic!("boom on io thread"));
|
||||
});
|
||||
if h.join().is_err() {
|
||||
s.store(true, Ordering::SeqCst);
|
||||
}
|
||||
});
|
||||
assert!(saw_err.load(Ordering::SeqCst));
|
||||
}
|
||||
324
tests/io_epoll.rs
Normal file
324
tests/io_epoll.rs
Normal file
@@ -0,0 +1,324 @@
|
||||
//! Tests for epoll-based fd readiness primitives: `wait_readable`,
|
||||
//! `wait_writable`, and the `read`/`write` sugar on top of them.
|
||||
//!
|
||||
//! Pipes are the convenient test target: cheap to create, easy to drive,
|
||||
//! and we already use `libc::pipe2` internally. Each pipe is one direction
|
||||
//! and respects `O_NONBLOCK` if we ask for it.
|
||||
|
||||
use smarm::{run, spawn, wait_readable, wait_writable, yield_now};
|
||||
use std::os::fd::RawFd;
|
||||
use std::sync::atomic::{AtomicU32, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::sync::Mutex as StdMutex;
|
||||
use std::time::Duration;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Pipe helper
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
struct Pipe {
|
||||
read: RawFd,
|
||||
write: RawFd,
|
||||
}
|
||||
|
||||
impl Pipe {
|
||||
fn new() -> Self {
|
||||
let mut fds: [libc::c_int; 2] = [0; 2];
|
||||
let r = unsafe { libc::pipe2(fds.as_mut_ptr(), libc::O_CLOEXEC | libc::O_NONBLOCK) };
|
||||
assert_eq!(r, 0, "pipe2 failed");
|
||||
Pipe {
|
||||
read: fds[0],
|
||||
write: fds[1],
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for Pipe {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
libc::close(self.read);
|
||||
libc::close(self.write);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn raw_write(fd: RawFd, buf: &[u8]) -> isize {
|
||||
unsafe { libc::write(fd, buf.as_ptr() as *const _, buf.len()) }
|
||||
}
|
||||
|
||||
fn raw_read(fd: RawFd, buf: &mut [u8]) -> isize {
|
||||
unsafe { libc::read(fd, buf.as_mut_ptr() as *mut _, buf.len()) }
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// wait_readable parks until data arrives, then libc::read succeeds.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn wait_readable_blocks_until_data_arrives_then_read_succeeds() {
|
||||
let captured: Arc<StdMutex<Vec<u8>>> = Arc::new(StdMutex::new(Vec::new()));
|
||||
let cap = captured.clone();
|
||||
|
||||
let p = Arc::new(Pipe::new());
|
||||
let p_reader = p.clone();
|
||||
let p_writer = p.clone();
|
||||
|
||||
run(move || {
|
||||
let reader = spawn(move || {
|
||||
// Initially the pipe is empty; this parks.
|
||||
wait_readable(p_reader.read).expect("wait_readable failed");
|
||||
// Now data should be readable.
|
||||
let mut buf = [0u8; 16];
|
||||
let n = raw_read(p_reader.read, &mut buf);
|
||||
assert!(n > 0, "read returned {}", n);
|
||||
cap.lock().unwrap().extend_from_slice(&buf[..n as usize]);
|
||||
});
|
||||
|
||||
let writer = spawn(move || {
|
||||
// Yield so the reader gets to park first.
|
||||
yield_now();
|
||||
yield_now();
|
||||
// Sleep a touch so the reader is definitely waiting in epoll.
|
||||
smarm::sleep(Duration::from_millis(5));
|
||||
let n = raw_write(p_writer.write, b"hello");
|
||||
assert_eq!(n, 5);
|
||||
});
|
||||
|
||||
reader.join().unwrap();
|
||||
writer.join().unwrap();
|
||||
});
|
||||
|
||||
assert_eq!(*captured.lock().unwrap(), b"hello");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// The smarm::scheduler::read sugar — wait_readable + libc::read in one call.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn read_sugar_returns_bytes_from_pipe() {
|
||||
let captured: Arc<StdMutex<Vec<u8>>> = Arc::new(StdMutex::new(Vec::new()));
|
||||
let cap = captured.clone();
|
||||
|
||||
let p = Arc::new(Pipe::new());
|
||||
let p_reader = p.clone();
|
||||
let p_writer = p.clone();
|
||||
|
||||
run(move || {
|
||||
let reader = spawn(move || {
|
||||
let mut buf = [0u8; 16];
|
||||
let n = smarm::scheduler::read(p_reader.read, &mut buf)
|
||||
.expect("smarm::scheduler::read failed");
|
||||
cap.lock().unwrap().extend_from_slice(&buf[..n]);
|
||||
});
|
||||
|
||||
let writer = spawn(move || {
|
||||
yield_now();
|
||||
smarm::sleep(Duration::from_millis(5));
|
||||
let _ = raw_write(p_writer.write, b"world");
|
||||
});
|
||||
|
||||
reader.join().unwrap();
|
||||
writer.join().unwrap();
|
||||
});
|
||||
|
||||
assert_eq!(*captured.lock().unwrap(), b"world");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// wait_writable + write — though pipes are almost always writable; the
|
||||
// useful test here is that the call doesn't hang on a writable fd.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn write_sugar_sends_bytes_to_pipe() {
|
||||
let counter = Arc::new(AtomicU32::new(0));
|
||||
let c = counter.clone();
|
||||
|
||||
let p = Arc::new(Pipe::new());
|
||||
let p_writer = p.clone();
|
||||
let p_reader = p.clone();
|
||||
|
||||
run(move || {
|
||||
let writer = spawn(move || {
|
||||
// Pipe is empty + has buffer space, so this returns immediately
|
||||
// after wait_writable wakes (which happens fast because the
|
||||
// kernel marks an empty pipe as immediately writable).
|
||||
let n = smarm::scheduler::write(p_writer.write, b"smarm")
|
||||
.expect("write failed");
|
||||
assert_eq!(n, 5);
|
||||
c.fetch_add(1, Ordering::SeqCst);
|
||||
});
|
||||
|
||||
let reader = spawn(move || {
|
||||
// Give the writer time.
|
||||
smarm::sleep(Duration::from_millis(10));
|
||||
let mut buf = [0u8; 16];
|
||||
let n = raw_read(p_reader.read, &mut buf);
|
||||
assert_eq!(n, 5);
|
||||
assert_eq!(&buf[..5], b"smarm");
|
||||
});
|
||||
|
||||
writer.join().unwrap();
|
||||
reader.join().unwrap();
|
||||
});
|
||||
|
||||
assert_eq!(counter.load(Ordering::SeqCst), 1);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// While an actor is parked on wait_readable, other actors keep running.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn other_actors_run_while_one_is_parked_on_wait_readable() {
|
||||
let log: Arc<StdMutex<Vec<u8>>> = Arc::new(StdMutex::new(Vec::new()));
|
||||
let la = log.clone();
|
||||
let lb = log.clone();
|
||||
|
||||
let p = Arc::new(Pipe::new());
|
||||
let p_a = p.clone();
|
||||
let p_b = p.clone();
|
||||
|
||||
run(move || {
|
||||
let a = spawn(move || {
|
||||
la.lock().unwrap().push(b'A');
|
||||
wait_readable(p_a.read).unwrap();
|
||||
la.lock().unwrap().push(b'a');
|
||||
});
|
||||
|
||||
let b = spawn(move || {
|
||||
// A starts parking on the empty pipe; B should be free to do
|
||||
// its work in the meantime.
|
||||
for _ in 0..3 {
|
||||
yield_now();
|
||||
lb.lock().unwrap().push(b'B');
|
||||
}
|
||||
// Now wake A.
|
||||
let _ = raw_write(p_b.write, b"x");
|
||||
});
|
||||
|
||||
a.join().unwrap();
|
||||
b.join().unwrap();
|
||||
});
|
||||
|
||||
let v = log.lock().unwrap();
|
||||
// A goes first ('A'), then B makes progress (multiple 'B's) while A is
|
||||
// parked, then A wakes and finishes ('a').
|
||||
let pos_big_a = v.iter().position(|&c| c == b'A').unwrap();
|
||||
let pos_lit_a = v.iter().position(|&c| c == b'a').unwrap();
|
||||
let big_b_count = v.iter().filter(|&&c| c == b'B').count();
|
||||
assert_eq!(big_b_count, 3, "B should have made 3 steps: {:?}", *v);
|
||||
assert!(pos_big_a < pos_lit_a, "A pre-park before A post-park: {:?}", *v);
|
||||
// At least the last B step should be before A resumes.
|
||||
let last_big_b = v.iter().rposition(|&c| c == b'B').unwrap();
|
||||
assert!(last_big_b < pos_lit_a, "B should finish before A resumes: {:?}", *v);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Two-way pipe ping-pong via wait_readable.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn ping_pong_between_two_pipes_completes() {
|
||||
// a_to_b: actor A writes, actor B reads.
|
||||
// b_to_a: actor B writes, actor A reads.
|
||||
let a_to_b = Arc::new(Pipe::new());
|
||||
let b_to_a = Arc::new(Pipe::new());
|
||||
|
||||
let counter = Arc::new(AtomicU32::new(0));
|
||||
let ca = counter.clone();
|
||||
let cb = counter.clone();
|
||||
|
||||
let a_to_b_a = a_to_b.clone();
|
||||
let a_to_b_b = a_to_b.clone();
|
||||
let b_to_a_a = b_to_a.clone();
|
||||
let b_to_a_b = b_to_a.clone();
|
||||
|
||||
run(move || {
|
||||
let a = spawn(move || {
|
||||
for _ in 0..5 {
|
||||
let _ = raw_write(a_to_b_a.write, b"x");
|
||||
wait_readable(b_to_a_a.read).unwrap();
|
||||
let mut buf = [0u8; 4];
|
||||
let _ = raw_read(b_to_a_a.read, &mut buf);
|
||||
ca.fetch_add(1, Ordering::SeqCst);
|
||||
}
|
||||
});
|
||||
|
||||
let b = spawn(move || {
|
||||
for _ in 0..5 {
|
||||
wait_readable(a_to_b_b.read).unwrap();
|
||||
let mut buf = [0u8; 4];
|
||||
let _ = raw_read(a_to_b_b.read, &mut buf);
|
||||
let _ = raw_write(b_to_a_b.write, b"y");
|
||||
cb.fetch_add(1, Ordering::SeqCst);
|
||||
}
|
||||
});
|
||||
|
||||
a.join().unwrap();
|
||||
b.join().unwrap();
|
||||
});
|
||||
|
||||
// Both sides did 5 rounds; counter is incremented by both, so total = 10.
|
||||
assert_eq!(counter.load(Ordering::SeqCst), 10);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Same fd reused across calls — DEL+ADD cycle works.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn same_fd_can_be_waited_on_repeatedly() {
|
||||
let p = Arc::new(Pipe::new());
|
||||
let p_r = p.clone();
|
||||
let p_w = p.clone();
|
||||
let counter = Arc::new(AtomicU32::new(0));
|
||||
let c = counter.clone();
|
||||
|
||||
run(move || {
|
||||
let reader = spawn(move || {
|
||||
for _ in 0..4 {
|
||||
wait_readable(p_r.read).unwrap();
|
||||
let mut buf = [0u8; 4];
|
||||
let n = raw_read(p_r.read, &mut buf);
|
||||
assert!(n > 0);
|
||||
c.fetch_add(1, Ordering::SeqCst);
|
||||
}
|
||||
});
|
||||
|
||||
let writer = spawn(move || {
|
||||
for _ in 0..4 {
|
||||
yield_now();
|
||||
smarm::sleep(Duration::from_millis(2));
|
||||
let _ = raw_write(p_w.write, b"z");
|
||||
}
|
||||
});
|
||||
|
||||
reader.join().unwrap();
|
||||
writer.join().unwrap();
|
||||
});
|
||||
|
||||
assert_eq!(counter.load(Ordering::SeqCst), 4);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Sanity that wait_writable on an already-writable pipe returns promptly.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn wait_writable_on_empty_pipe_returns_quickly() {
|
||||
let p = Arc::new(Pipe::new());
|
||||
let p_w = p.clone();
|
||||
|
||||
let start = std::time::Instant::now();
|
||||
run(move || {
|
||||
wait_writable(p_w.write).unwrap();
|
||||
});
|
||||
let elapsed = start.elapsed();
|
||||
assert!(
|
||||
elapsed < Duration::from_millis(200),
|
||||
"wait_writable should be fast on a writable fd, took {:?}",
|
||||
elapsed
|
||||
);
|
||||
}
|
||||
314
tests/mutex.rs
Normal file
314
tests/mutex.rs
Normal file
@@ -0,0 +1,314 @@
|
||||
//! `loom::Mutex<T>` tests. All run under the scheduler because `lock()`
|
||||
//! needs to be able to park.
|
||||
|
||||
use smarm::{run, spawn, yield_now, LockTimeout, Mutex};
|
||||
use std::sync::Arc;
|
||||
use std::sync::Mutex as StdMutex;
|
||||
use std::sync::atomic::{AtomicU32, Ordering};
|
||||
use std::time::{Duration, Instant};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Uncontended fast path
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn lock_free_mutex_succeeds() {
|
||||
let captured = Arc::new(AtomicU32::new(0));
|
||||
let c = captured.clone();
|
||||
run(move || {
|
||||
let m = Mutex::new(42u32);
|
||||
{
|
||||
let g = m.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
c.store(*g, Ordering::SeqCst);
|
||||
}
|
||||
// After drop we can lock again.
|
||||
let g2 = m.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
assert_eq!(*g2, 42);
|
||||
});
|
||||
assert_eq!(captured.load(Ordering::SeqCst), 42);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn try_lock_returns_some_when_free_none_when_held() {
|
||||
let success_flag = Arc::new(AtomicU32::new(0));
|
||||
let s = success_flag.clone();
|
||||
run(move || {
|
||||
let m = Mutex::new(0u32);
|
||||
let g = m.try_lock().expect("free");
|
||||
// Holding the guard; a second try_lock on the same actor should fail.
|
||||
assert!(m.try_lock().is_none());
|
||||
drop(g);
|
||||
// Now free again.
|
||||
let g2 = m.try_lock().expect("free again");
|
||||
drop(g2);
|
||||
s.store(1, Ordering::SeqCst);
|
||||
});
|
||||
assert_eq!(success_flag.load(Ordering::SeqCst), 1);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn guard_mutates_value_visible_through_next_lock() {
|
||||
let final_value = Arc::new(AtomicU32::new(0));
|
||||
let f = final_value.clone();
|
||||
run(move || {
|
||||
let m = Mutex::new(0u32);
|
||||
{
|
||||
let mut g = m.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
*g = 7;
|
||||
}
|
||||
let g2 = m.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
f.store(*g2, Ordering::SeqCst);
|
||||
});
|
||||
assert_eq!(final_value.load(Ordering::SeqCst), 7);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Contention: a second actor parks until the first releases.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn contended_lock_parks_until_holder_releases() {
|
||||
// Actor A locks, yields (still holding), then releases. Actor B tries
|
||||
// to lock in between — B should park, then succeed after A drops.
|
||||
let log: Arc<StdMutex<Vec<&'static str>>> = Arc::new(StdMutex::new(Vec::new()));
|
||||
let la = log.clone();
|
||||
let lb = log.clone();
|
||||
|
||||
run(move || {
|
||||
let m = Mutex::new(0u32);
|
||||
let m_a = m.clone();
|
||||
let m_b = m.clone();
|
||||
|
||||
let a = spawn(move || {
|
||||
let g = m_a.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
la.lock().unwrap().push("A_locked");
|
||||
// First yield: lets B run past its first yield_now.
|
||||
yield_now();
|
||||
// Second yield: lets B reach B_try and attempt lock() while we
|
||||
// still hold it, so B parks on the mutex.
|
||||
yield_now();
|
||||
la.lock().unwrap().push("A_dropping");
|
||||
drop(g);
|
||||
la.lock().unwrap().push("A_dropped");
|
||||
});
|
||||
let b = spawn(move || {
|
||||
// One yield: lets A run and acquire the lock first.
|
||||
yield_now();
|
||||
lb.lock().unwrap().push("B_try");
|
||||
let _g = m_b.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
lb.lock().unwrap().push("B_locked");
|
||||
});
|
||||
a.join().unwrap();
|
||||
b.join().unwrap();
|
||||
});
|
||||
|
||||
let v = log.lock().unwrap();
|
||||
// A locks, B tries (parks), A drops, B gets the lock.
|
||||
let pos_a_locked = v.iter().position(|s| *s == "A_locked").unwrap();
|
||||
let pos_b_try = v.iter().position(|s| *s == "B_try").unwrap();
|
||||
let pos_a_dropped = v.iter().position(|s| *s == "A_dropped").unwrap();
|
||||
let pos_b_locked = v.iter().position(|s| *s == "B_locked").unwrap();
|
||||
|
||||
assert!(pos_a_locked < pos_b_try, "log: {:?}", *v);
|
||||
assert!(pos_b_try < pos_a_dropped, "B should attempt before A drops: {:?}", *v);
|
||||
assert!(pos_a_dropped < pos_b_locked, "B should lock only after A drops: {:?}", *v);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Timeout: B times out while A holds forever.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn lock_timeout_returns_err_when_holder_never_releases() {
|
||||
let saw_err = Arc::new(std::sync::atomic::AtomicBool::new(false));
|
||||
let s = saw_err.clone();
|
||||
|
||||
run(move || {
|
||||
let m: Mutex<u32> = Mutex::new(0);
|
||||
let m_a = m.clone();
|
||||
let m_b = m.clone();
|
||||
|
||||
let a = spawn(move || {
|
||||
// Hold the lock for 100ms, blocking B's attempt with a 20ms timeout.
|
||||
let _g = m_a.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
smarm::sleep(Duration::from_millis(100));
|
||||
// _g drops here.
|
||||
});
|
||||
let b = spawn(move || {
|
||||
// Let A acquire first.
|
||||
yield_now();
|
||||
let t0 = Instant::now();
|
||||
let res = m_b.lock_timeout(Duration::from_millis(20));
|
||||
let elapsed = t0.elapsed();
|
||||
assert!(matches!(res, Err(LockTimeout)), "got {:?}", res);
|
||||
// Sanity: actually waited approximately the timeout.
|
||||
assert!(
|
||||
elapsed >= Duration::from_millis(15),
|
||||
"timed out too fast: {:?}",
|
||||
elapsed
|
||||
);
|
||||
assert!(
|
||||
elapsed < Duration::from_millis(80),
|
||||
"timed out far too slow: {:?}",
|
||||
elapsed
|
||||
);
|
||||
s.store(true, Ordering::SeqCst);
|
||||
});
|
||||
a.join().unwrap();
|
||||
b.join().unwrap();
|
||||
});
|
||||
|
||||
assert!(saw_err.load(Ordering::SeqCst));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// FIFO fairness: when many actors queue, they get the lock in arrival order.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn waiters_are_granted_the_lock_in_fifo_order() {
|
||||
let order: Arc<StdMutex<Vec<u32>>> = Arc::new(StdMutex::new(Vec::new()));
|
||||
|
||||
run({
|
||||
let order = order.clone();
|
||||
move || {
|
||||
let m: Mutex<()> = Mutex::new(());
|
||||
|
||||
// Holder: takes the lock, yields to let others queue up, then
|
||||
// releases. Each waiter records its arrival order on acquisition.
|
||||
let m_holder = m.clone();
|
||||
let holder = spawn(move || {
|
||||
let g = m_holder.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
// Let waiters pile up.
|
||||
for _ in 0..5 {
|
||||
yield_now();
|
||||
}
|
||||
drop(g);
|
||||
});
|
||||
|
||||
// Spawn 4 waiters in order 1, 2, 3, 4. Each yields once before
|
||||
// calling lock(), so we know the holder ran first.
|
||||
let mut handles = vec![holder];
|
||||
for id in 1u32..=4 {
|
||||
let m_w = m.clone();
|
||||
let o = order.clone();
|
||||
handles.push(spawn(move || {
|
||||
// Stagger the lock attempts so they arrive in order.
|
||||
for _ in 0..id {
|
||||
yield_now();
|
||||
}
|
||||
let _g = m_w.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
o.lock().unwrap().push(id);
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
let v = order.lock().unwrap().clone();
|
||||
assert_eq!(v, vec![1, 2, 3, 4], "waiters should acquire in arrival order");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Grant-vs-timeout race: holder drops just before timer would fire — waiter
|
||||
// should get the lock, not LockTimeout.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn grant_wins_when_holder_releases_before_timeout() {
|
||||
let got_lock = Arc::new(std::sync::atomic::AtomicBool::new(false));
|
||||
let g = got_lock.clone();
|
||||
|
||||
run(move || {
|
||||
let m: Mutex<u32> = Mutex::new(0);
|
||||
let m_a = m.clone();
|
||||
let m_b = m.clone();
|
||||
|
||||
let a = spawn(move || {
|
||||
let _g = m_a.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
// Hold for 10ms, well under B's 100ms timeout.
|
||||
smarm::sleep(Duration::from_millis(10));
|
||||
});
|
||||
let b = spawn(move || {
|
||||
yield_now();
|
||||
let res = m_b.lock_timeout(Duration::from_millis(100));
|
||||
if res.is_ok() {
|
||||
g.store(true, Ordering::SeqCst);
|
||||
}
|
||||
});
|
||||
a.join().unwrap();
|
||||
b.join().unwrap();
|
||||
});
|
||||
|
||||
assert!(got_lock.load(Ordering::SeqCst));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Panic in critical section: next waiter still gets the lock (no poisoning).
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn next_waiter_gets_lock_after_holder_panics() {
|
||||
let next_got_it = Arc::new(std::sync::atomic::AtomicBool::new(false));
|
||||
let n = next_got_it.clone();
|
||||
|
||||
run(move || {
|
||||
let m: Mutex<u32> = Mutex::new(7);
|
||||
let m_a = m.clone();
|
||||
let m_b = m.clone();
|
||||
|
||||
let a = spawn(move || {
|
||||
let _g = m_a.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
yield_now();
|
||||
panic!("holder dies mid-critical-section");
|
||||
});
|
||||
let b = spawn(move || {
|
||||
yield_now();
|
||||
// A is dead but its guard's Drop ran during unwind. We get the lock.
|
||||
let g = m_b.lock_timeout(Duration::from_millis(100)).unwrap();
|
||||
assert_eq!(*g, 7);
|
||||
n.store(true, Ordering::SeqCst);
|
||||
});
|
||||
let _ = a.join(); // panic — expected
|
||||
b.join().unwrap();
|
||||
});
|
||||
|
||||
assert!(next_got_it.load(Ordering::SeqCst));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Multiple short critical sections under contention all complete (no lost
|
||||
// wakeups, no deadlock). Counts up to N from M actors.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn many_actors_increment_shared_counter_via_mutex() {
|
||||
const ACTORS: u32 = 8;
|
||||
const PER_ACTOR: u32 = 50;
|
||||
|
||||
let final_value = Arc::new(AtomicU32::new(0));
|
||||
let fv = final_value.clone();
|
||||
|
||||
run(move || {
|
||||
let m: Mutex<u32> = Mutex::new(0);
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..ACTORS {
|
||||
let m_i = m.clone();
|
||||
handles.push(spawn(move || {
|
||||
for _ in 0..PER_ACTOR {
|
||||
let mut g = m_i.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
*g += 1;
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
let g = m.lock_timeout(Duration::from_millis(500)).unwrap();
|
||||
fv.store(*g, Ordering::SeqCst);
|
||||
});
|
||||
|
||||
assert_eq!(final_value.load(Ordering::SeqCst), ACTORS * PER_ACTOR);
|
||||
}
|
||||
66
tests/preempt.rs
Normal file
66
tests/preempt.rs
Normal file
@@ -0,0 +1,66 @@
|
||||
//! Tests for explicit preemption via `smarm::check!()`.
|
||||
|
||||
use smarm::{run, spawn};
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::Arc;
|
||||
|
||||
#[test]
|
||||
fn check_yields_when_timeslice_expired() {
|
||||
// A single actor that drives the timeslice clock to zero manually,
|
||||
// then calls check!() and expects to yield. The scheduler has nothing
|
||||
// else to run, so it just re-queues us. To prove we actually yielded,
|
||||
// observe the run counter on the slot... we don't have one. So
|
||||
// instead: spawn a second actor that increments a counter and joins
|
||||
// it; verify both actors made progress in interleaved order under
|
||||
// forced timeslice expiry.
|
||||
let order: Arc<std::sync::Mutex<Vec<u8>>> = Arc::new(std::sync::Mutex::new(Vec::new()));
|
||||
let o1 = order.clone();
|
||||
let o2 = order.clone();
|
||||
|
||||
run(move || {
|
||||
let a = spawn(move || {
|
||||
o1.lock().unwrap().push(b'A');
|
||||
// Force the timeslice to be considered expired.
|
||||
smarm::preempt::expire_timeslice_for_test();
|
||||
smarm::check!();
|
||||
o1.lock().unwrap().push(b'a');
|
||||
});
|
||||
let b = spawn(move || {
|
||||
o2.lock().unwrap().push(b'B');
|
||||
smarm::preempt::expire_timeslice_for_test();
|
||||
smarm::check!();
|
||||
o2.lock().unwrap().push(b'b');
|
||||
});
|
||||
a.join().unwrap();
|
||||
b.join().unwrap();
|
||||
});
|
||||
|
||||
// FIFO scheduling + forced preemption: A starts, expires, yields to B;
|
||||
// B starts, expires, yields to A; A finishes, B finishes.
|
||||
// Required: both uppercase letters appear before either lowercase.
|
||||
let v = order.lock().unwrap();
|
||||
let pos_big_a = v.iter().position(|&c| c == b'A').unwrap();
|
||||
let pos_big_b = v.iter().position(|&c| c == b'B').unwrap();
|
||||
let pos_lit_a = v.iter().position(|&c| c == b'a').unwrap();
|
||||
let pos_lit_b = v.iter().position(|&c| c == b'b').unwrap();
|
||||
assert!(pos_big_a < pos_lit_a, "A's tail ran before B's head: {:?}", *v);
|
||||
assert!(pos_big_b < pos_lit_b, "B's tail ran before A's head: {:?}", *v);
|
||||
assert!(pos_big_a.max(pos_big_b) < pos_lit_a.min(pos_lit_b),
|
||||
"preemption didn't interleave: {:?}", *v);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn check_is_a_noop_when_timeslice_not_expired() {
|
||||
// After a fresh resume, check!() should be cheap and not yield. Run
|
||||
// a single actor that calls check!() many times; it should complete
|
||||
// promptly.
|
||||
let count = Arc::new(AtomicU64::new(0));
|
||||
let c = count.clone();
|
||||
run(move || {
|
||||
for _ in 0..1_000 {
|
||||
smarm::check!();
|
||||
c.fetch_add(1, Ordering::Relaxed);
|
||||
}
|
||||
});
|
||||
assert_eq!(count.load(Ordering::Relaxed), 1_000);
|
||||
}
|
||||
426
tests/runtime.rs
Normal file
426
tests/runtime.rs
Normal file
@@ -0,0 +1,426 @@
|
||||
//! Tests for the multi-scheduler runtime: Config, Runtime::run, and
|
||||
//! correctness under genuine parallelism.
|
||||
//!
|
||||
//! The single-threaded correctness properties (channel ordering, mutex
|
||||
//! fairness, timer accuracy, etc.) are already covered by the per-module
|
||||
//! tests. This file focuses on what changes when N > 1 scheduler threads
|
||||
//! are involved:
|
||||
//!
|
||||
//! - Config construction and validation
|
||||
//! - Runtime::run blocks until all actors finish
|
||||
//! - All existing cooperative behaviours hold under multi-threading
|
||||
//! - Actors genuinely run on different OS threads
|
||||
//! - No lost wakeups under concurrent park/unpark
|
||||
//! - No slot leaks under high spawn/join churn
|
||||
//! - Panic on one scheduler thread doesn't kill others
|
||||
|
||||
use smarm::{channel, runtime::{Config, Runtime}, spawn, yield_now, JoinHandle};
|
||||
use std::sync::{
|
||||
atomic::{AtomicBool, AtomicU64, AtomicUsize, Ordering},
|
||||
Arc, Barrier,
|
||||
};
|
||||
use std::time::Duration;
|
||||
use std::collections::HashSet;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Build a runtime with exactly `n` scheduler threads.
|
||||
fn rt(n: usize) -> Runtime {
|
||||
smarm::runtime::init(Config::exact(n))
|
||||
}
|
||||
|
||||
/// Convenient single-threaded runtime (regression guard).
|
||||
fn rt1() -> Runtime { rt(1) }
|
||||
|
||||
/// Multi-threaded runtime using all available parallelism.
|
||||
fn rt_par() -> Runtime {
|
||||
smarm::runtime::init(Config::default())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Config
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn config_exact_overrides_bounds() {
|
||||
let c = Config::exact(3);
|
||||
assert_eq!(c.resolved_thread_count(), 3);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn config_default_clamps_to_available_parallelism() {
|
||||
let c = Config::default();
|
||||
let n = c.resolved_thread_count();
|
||||
let avail = std::thread::available_parallelism()
|
||||
.map(|n| n.get())
|
||||
.unwrap_or(1);
|
||||
// Default min is 1, default max is available_parallelism.
|
||||
assert!(n >= 1 && n <= avail);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn config_min_max_clamps() {
|
||||
// Force a range that excludes exact: min=2, max=4, available might be >4.
|
||||
let c = Config::new(2, 4, None);
|
||||
let n = c.resolved_thread_count();
|
||||
assert!(n >= 2 && n <= 4, "expected 2..=4, got {n}");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn config_min_1_max_1_is_single_threaded() {
|
||||
let c = Config::new(1, 1, None);
|
||||
assert_eq!(c.resolved_thread_count(), 1);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Runtime::run — basic lifecycle
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn runtime_run_executes_closure() {
|
||||
let flag = Arc::new(AtomicBool::new(false));
|
||||
let f = flag.clone();
|
||||
rt(1).run(move || { f.store(true, Ordering::SeqCst); });
|
||||
assert!(flag.load(Ordering::SeqCst));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn runtime_run_blocks_until_all_actors_done() {
|
||||
// Spawn a chain of actors; the counter should be exactly N when run returns.
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c = counter.clone();
|
||||
rt(2).run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..20 {
|
||||
let cc = c.clone();
|
||||
handles.push(spawn(move || {
|
||||
cc.fetch_add(1, Ordering::SeqCst);
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
});
|
||||
assert_eq!(counter.load(Ordering::SeqCst), 20);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn runtime_can_be_used_multiple_times_sequentially() {
|
||||
// Each call to run() is independent.
|
||||
let r = rt(2);
|
||||
let a = Arc::new(AtomicU64::new(0));
|
||||
let b = Arc::new(AtomicU64::new(0));
|
||||
let ac = a.clone();
|
||||
let bc = b.clone();
|
||||
r.run(move || { ac.fetch_add(1, Ordering::SeqCst); });
|
||||
r.run(move || { bc.fetch_add(1, Ordering::SeqCst); });
|
||||
assert_eq!(a.load(Ordering::SeqCst), 1);
|
||||
assert_eq!(b.load(Ordering::SeqCst), 1);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Single-threaded regression: exact(1) must behave identically to old run()
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn exact_1_spawn_join_works() {
|
||||
let v = Arc::new(AtomicU64::new(0));
|
||||
let vc = v.clone();
|
||||
rt1().run(move || {
|
||||
let h = spawn(move || { vc.store(42, Ordering::SeqCst); });
|
||||
h.join().unwrap();
|
||||
});
|
||||
assert_eq!(v.load(Ordering::SeqCst), 42);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn exact_1_channel_recv_parks_and_wakes() {
|
||||
let v = Arc::new(AtomicU64::new(0));
|
||||
let vc = v.clone();
|
||||
rt1().run(move || {
|
||||
let (tx, rx) = channel::<u64>();
|
||||
let h = spawn(move || {
|
||||
let val = rx.recv().unwrap();
|
||||
vc.store(val, Ordering::SeqCst);
|
||||
});
|
||||
yield_now();
|
||||
tx.send(99).unwrap();
|
||||
h.join().unwrap();
|
||||
});
|
||||
assert_eq!(v.load(Ordering::SeqCst), 99);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn exact_1_panic_captured() {
|
||||
let saw_err = Arc::new(AtomicBool::new(false));
|
||||
let s = saw_err.clone();
|
||||
rt1().run(move || {
|
||||
let h = spawn(|| panic!("oops"));
|
||||
if h.join().is_err() { s.store(true, Ordering::SeqCst); }
|
||||
});
|
||||
assert!(saw_err.load(Ordering::SeqCst));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Multi-threaded correctness
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn multi_thread_all_actors_complete() {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c = counter.clone();
|
||||
rt_par().run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..100 {
|
||||
let cc = c.clone();
|
||||
handles.push(spawn(move || {
|
||||
cc.fetch_add(1, Ordering::SeqCst);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
assert_eq!(counter.load(Ordering::SeqCst), 100);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn multi_thread_channel_wakeup_across_threads() {
|
||||
// Receiver parks; sender runs (potentially on a different OS thread).
|
||||
// Verifies no lost wakeup.
|
||||
let received = Arc::new(AtomicU64::new(0));
|
||||
let rc = received.clone();
|
||||
rt_par().run(move || {
|
||||
let (tx, rx) = channel::<u64>();
|
||||
let h = spawn(move || {
|
||||
let v = rx.recv().unwrap();
|
||||
rc.store(v, Ordering::SeqCst);
|
||||
});
|
||||
// Let receiver park.
|
||||
yield_now();
|
||||
tx.send(7).unwrap();
|
||||
h.join().unwrap();
|
||||
});
|
||||
assert_eq!(received.load(Ordering::SeqCst), 7);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn multi_thread_many_channels_no_lost_wakeups() {
|
||||
// N pairs of (sender actor, receiver actor). Each pair exchanges one
|
||||
// message. All must complete — any lost wakeup causes a deadlock/timeout.
|
||||
const PAIRS: usize = 50;
|
||||
let count = Arc::new(AtomicU64::new(0));
|
||||
let c = count.clone();
|
||||
rt_par().run(move || {
|
||||
let mut handles: Vec<JoinHandle> = Vec::new();
|
||||
for _ in 0..PAIRS {
|
||||
let (tx, rx) = channel::<u64>();
|
||||
let cc = c.clone();
|
||||
handles.push(spawn(move || {
|
||||
let v = rx.recv().unwrap();
|
||||
cc.fetch_add(v, Ordering::SeqCst);
|
||||
}));
|
||||
handles.push(spawn(move || {
|
||||
tx.send(1).unwrap();
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
assert_eq!(count.load(Ordering::SeqCst), PAIRS as u64);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn multi_thread_mutex_contention_no_deadlock() {
|
||||
use smarm::Mutex;
|
||||
const ACTORS: usize = 20;
|
||||
const PER: u64 = 100;
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t = total.clone();
|
||||
rt_par().run(move || {
|
||||
let m: Mutex<u64> = Mutex::new(0);
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..ACTORS {
|
||||
let mc = m.clone();
|
||||
let tc = t.clone();
|
||||
handles.push(spawn(move || {
|
||||
for _ in 0..PER {
|
||||
let mut g = mc.lock_timeout(Duration::from_secs(5)).unwrap();
|
||||
*g += 1;
|
||||
tc.fetch_add(0, Ordering::SeqCst); // just a memory barrier
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
let g = m.lock_timeout(Duration::from_secs(1)).unwrap();
|
||||
t.store(*g, Ordering::SeqCst);
|
||||
});
|
||||
assert_eq!(total.load(Ordering::SeqCst), ACTORS as u64 * PER);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn multi_thread_join_across_threads() {
|
||||
// Parent joins a child that may run on a different scheduler thread.
|
||||
let v = Arc::new(AtomicU64::new(0));
|
||||
let vc = v.clone();
|
||||
rt_par().run(move || {
|
||||
let h = spawn(move || {
|
||||
// Do some work to make scheduling interesting.
|
||||
for _ in 0..10 { yield_now(); }
|
||||
vc.store(1, Ordering::SeqCst);
|
||||
});
|
||||
h.join().unwrap();
|
||||
});
|
||||
assert_eq!(v.load(Ordering::SeqCst), 1);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Actors run on distinct OS threads
|
||||
//
|
||||
// We collect the OS thread IDs that actors execute on. With N schedulers
|
||||
// and enough actors, we expect to see more than one thread ID.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn actors_run_on_multiple_os_threads() {
|
||||
let thread_ids: Arc<smarm::Mutex<HashSet<u64>>> =
|
||||
Arc::new(smarm::Mutex::new(HashSet::new()));
|
||||
|
||||
rt_par().run({
|
||||
let ids = thread_ids.clone();
|
||||
move || {
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..64 {
|
||||
let idc = ids.clone();
|
||||
handles.push(spawn(move || {
|
||||
let tid = unsafe { libc::syscall(libc::SYS_gettid) as u64 };
|
||||
let mut g = idc.lock_timeout(Duration::from_secs(1)).unwrap();
|
||||
g.insert(tid);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
}
|
||||
});
|
||||
|
||||
let n = std::thread::available_parallelism().map(|n| n.get()).unwrap_or(1);
|
||||
|
||||
let ids = thread_ids.lock_timeout(Duration::from_secs(1)).unwrap();
|
||||
// If we have >1 scheduler threads, we expect >1 OS thread IDs.
|
||||
// On a single-CPU machine this may be 1; we just assert ≥ 1.
|
||||
assert!(!ids.is_empty());
|
||||
if n > 1 {
|
||||
// Strongly expect parallelism — not a hard assert since scheduling
|
||||
// is non-deterministic, but 64 actors should spread.
|
||||
// We log rather than assert to avoid flakiness on loaded CI.
|
||||
if ids.len() == 1 {
|
||||
eprintln!("WARNING: 64 actors all ran on the same OS thread (flaky on loaded system)");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Scheduler stats (RFC 000 Layer 1 primitives)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn scheduler_stats_run_queue_len_is_observable() {
|
||||
// After spawning actors but before they run, the queue should be non-empty.
|
||||
// We can't observe this from inside run() without a snapshot API, but we
|
||||
// can verify the stats struct is accessible and returns sane values after
|
||||
// run() completes (queue len == 0 at quiescence).
|
||||
let r = rt_par();
|
||||
r.run(|| {
|
||||
for _ in 0..10 { spawn(|| {}); }
|
||||
// Don't join — let them drain naturally.
|
||||
});
|
||||
let stats = r.stats();
|
||||
assert_eq!(stats.total_run_queue_len(), 0, "queue should be empty after run()");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn scheduler_stats_thread_count_matches_config() {
|
||||
let r = rt(3);
|
||||
r.run(|| {});
|
||||
assert_eq!(r.stats().scheduler_count(), 3);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Panic isolation: a panicking actor doesn't kill the scheduler thread
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn panic_in_actor_does_not_kill_runtime() {
|
||||
let completed = Arc::new(AtomicU64::new(0));
|
||||
let c = completed.clone();
|
||||
rt_par().run(move || {
|
||||
// Spawn a panicker alongside well-behaved actors.
|
||||
let bad = spawn(|| panic!("deliberate"));
|
||||
let mut good_handles = Vec::new();
|
||||
for _ in 0..10 {
|
||||
let cc = c.clone();
|
||||
good_handles.push(spawn(move || {
|
||||
cc.fetch_add(1, Ordering::SeqCst);
|
||||
}));
|
||||
}
|
||||
let _ = bad.join(); // expect Err
|
||||
for h in good_handles { h.join().unwrap(); }
|
||||
});
|
||||
assert_eq!(completed.load(Ordering::SeqCst), 10);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// No slot leaks: rapid spawn/join churn
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn no_slot_leak_under_churn() {
|
||||
// Spawn and join many short actors in a loop. If slots leak, the slot
|
||||
// table grows unboundedly. We can't directly measure it without an
|
||||
// introspection API, but the test at least checks correctness under
|
||||
// churn and will OOM if there's a severe leak.
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c = counter.clone();
|
||||
rt_par().run(move || {
|
||||
for _ in 0..500 {
|
||||
let cc = c.clone();
|
||||
spawn(move || { cc.fetch_add(1, Ordering::SeqCst); })
|
||||
.join()
|
||||
.unwrap();
|
||||
}
|
||||
});
|
||||
assert_eq!(counter.load(Ordering::SeqCst), 500);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Ping-pong: channel round-trips between two actors
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn ping_pong_completes() {
|
||||
const ROUNDS: u64 = 1_000;
|
||||
let final_val = Arc::new(AtomicU64::new(0));
|
||||
let fv = final_val.clone();
|
||||
rt_par().run(move || {
|
||||
let (tx_a, rx_a) = channel::<u64>();
|
||||
let (tx_b, rx_b) = channel::<u64>();
|
||||
let h_a = spawn(move || {
|
||||
tx_a.send(0).unwrap();
|
||||
for _ in 0..ROUNDS {
|
||||
let v = rx_b.recv().unwrap();
|
||||
tx_a.send(v + 1).unwrap();
|
||||
}
|
||||
});
|
||||
let h_b = spawn(move || {
|
||||
for _ in 0..=ROUNDS {
|
||||
let v = rx_a.recv().unwrap();
|
||||
if v < ROUNDS {
|
||||
tx_b.send(v).unwrap();
|
||||
} else {
|
||||
fv.store(v, Ordering::SeqCst);
|
||||
}
|
||||
}
|
||||
});
|
||||
h_a.join().unwrap();
|
||||
h_b.join().unwrap();
|
||||
});
|
||||
assert_eq!(final_val.load(Ordering::SeqCst), ROUNDS);
|
||||
}
|
||||
448
tests/stress.rs
Normal file
448
tests/stress.rs
Normal file
@@ -0,0 +1,448 @@
|
||||
//! Stress tests targeting lost wakeups, PID table pressure, thundering herds,
|
||||
//! and panic isolation under concurrency.
|
||||
//!
|
||||
//! These tests are designed to find bugs that functional happy-path tests
|
||||
//! cannot: races in the park/unpark protocol, slot leaks under concurrent
|
||||
//! churn, and scheduler corruption from concurrent panics.
|
||||
//!
|
||||
//! Every test that could hang is bounded by a join on a known-finite set of
|
||||
//! handles. A deadlock from a lost wakeup will cause the test binary to time
|
||||
//! out rather than produce a false pass — run with `cargo test -- --timeout`
|
||||
//! or under a CI timeout.
|
||||
|
||||
use smarm::{channel, runtime::{Config, Runtime}, spawn, yield_now, JoinHandle};
|
||||
use std::sync::{
|
||||
atomic::{AtomicU64, AtomicUsize, Ordering},
|
||||
Arc,
|
||||
};
|
||||
|
||||
fn rt(n: usize) -> Runtime {
|
||||
smarm::runtime::init(Config::exact(n))
|
||||
}
|
||||
|
||||
fn rt_par() -> Runtime {
|
||||
smarm::runtime::init(Config::default())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// P0: Lost-wakeup — many concurrent sender/receiver pairs
|
||||
//
|
||||
// 500 independent (tx, rx) pairs. Each sender and receiver are separate
|
||||
// actors. No ordering is imposed between pairs. Any lost wakeup causes one
|
||||
// receiver to park forever, deadlocking the join at the end.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn lost_wakeup_many_pairs() {
|
||||
const PAIRS: usize = 500;
|
||||
let count = Arc::new(AtomicU64::new(0));
|
||||
|
||||
for threads in [1, 2, 4] {
|
||||
count.store(0, Ordering::SeqCst);
|
||||
let c = count.clone();
|
||||
|
||||
rt(threads).run(move || {
|
||||
let mut handles: Vec<JoinHandle> = Vec::with_capacity(PAIRS * 2);
|
||||
|
||||
for _ in 0..PAIRS {
|
||||
let (tx, rx) = channel::<u64>();
|
||||
let cc = c.clone();
|
||||
|
||||
// Receiver parks immediately.
|
||||
handles.push(spawn(move || {
|
||||
let v = rx.recv().unwrap();
|
||||
cc.fetch_add(v, Ordering::SeqCst);
|
||||
}));
|
||||
|
||||
// Sender fires without any yield — races with receiver parking.
|
||||
handles.push(spawn(move || {
|
||||
tx.send(1).unwrap();
|
||||
}));
|
||||
}
|
||||
|
||||
for h in handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
});
|
||||
|
||||
assert_eq!(
|
||||
count.load(Ordering::SeqCst),
|
||||
PAIRS as u64,
|
||||
"lost wakeup on {threads}-thread runtime"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// P0: Lost-wakeup — rapid-fire single receiver
|
||||
//
|
||||
// One receiver, SENDERS senders, all spawned at once. The receiver loops
|
||||
// receiving SENDERS messages. Race: a sender may fire before the receiver
|
||||
// has parked, or exactly as it is transitioning to parked.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn lost_wakeup_rapid_fire_single_receiver() {
|
||||
const SENDERS: u64 = 200;
|
||||
|
||||
for threads in [1, 2, 4] {
|
||||
let received = Arc::new(AtomicU64::new(0));
|
||||
let rc = received.clone();
|
||||
|
||||
rt(threads).run(move || {
|
||||
let (tx, rx) = channel::<u64>();
|
||||
let mut handles: Vec<JoinHandle> = Vec::with_capacity(SENDERS as usize + 1);
|
||||
|
||||
// Receiver loops until it has seen all messages.
|
||||
handles.push(spawn(move || {
|
||||
let mut n = 0u64;
|
||||
while n < SENDERS {
|
||||
rx.recv().unwrap();
|
||||
n += 1;
|
||||
}
|
||||
rc.store(n, Ordering::SeqCst);
|
||||
}));
|
||||
|
||||
// All senders fire with no deliberate delay.
|
||||
for _ in 0..SENDERS {
|
||||
let txc = tx.clone();
|
||||
handles.push(spawn(move || {
|
||||
txc.send(1).unwrap();
|
||||
}));
|
||||
}
|
||||
|
||||
for h in handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
});
|
||||
|
||||
assert_eq!(
|
||||
received.load(Ordering::SeqCst),
|
||||
SENDERS,
|
||||
"missed messages on {threads}-thread runtime"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// P0: Lost-wakeup — wakeup during yield chain
|
||||
//
|
||||
// Receiver yields N times before it would naturally park. Sender fires
|
||||
// during that window. Tests the race between "actor is on the run queue
|
||||
// yielding" and "actor transitions to parked."
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn lost_wakeup_during_yield_chain() {
|
||||
const YIELDS: usize = 20;
|
||||
const PAIRS: usize = 100;
|
||||
let count = Arc::new(AtomicU64::new(0));
|
||||
|
||||
let c = count.clone();
|
||||
rt_par().run(move || {
|
||||
let mut handles: Vec<JoinHandle> = Vec::with_capacity(PAIRS * 2);
|
||||
|
||||
for _ in 0..PAIRS {
|
||||
let (tx, rx) = channel::<u64>();
|
||||
let cc = c.clone();
|
||||
|
||||
handles.push(spawn(move || {
|
||||
// Yield several times, then block.
|
||||
for _ in 0..YIELDS {
|
||||
yield_now();
|
||||
}
|
||||
let v = rx.recv().unwrap();
|
||||
cc.fetch_add(v, Ordering::SeqCst);
|
||||
}));
|
||||
|
||||
handles.push(spawn(move || {
|
||||
// Fire immediately — may arrive while receiver is still yielding.
|
||||
tx.send(1).unwrap();
|
||||
}));
|
||||
}
|
||||
|
||||
for h in handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
});
|
||||
|
||||
assert_eq!(count.load(Ordering::SeqCst), PAIRS as u64);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// P2: Thundering herd
|
||||
//
|
||||
// N actors all block on recv from their own channel. A coordinator sends
|
||||
// to all channels in rapid succession. All N actors must wake and complete.
|
||||
// Common bug: wakeup list walked destructively while lock is dropped
|
||||
// mid-walk, causing some actors to never be re-queued.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn thundering_herd_all_wake() {
|
||||
const HERD: usize = 200;
|
||||
let woke = Arc::new(AtomicUsize::new(0));
|
||||
|
||||
let w = woke.clone();
|
||||
rt_par().run(move || {
|
||||
let mut senders: Vec<smarm::Sender<u8>> = Vec::with_capacity(HERD);
|
||||
let mut handles: Vec<JoinHandle> = Vec::with_capacity(HERD + 1);
|
||||
|
||||
for _ in 0..HERD {
|
||||
let (tx, rx) = channel::<u8>();
|
||||
senders.push(tx);
|
||||
let wc = w.clone();
|
||||
handles.push(spawn(move || {
|
||||
rx.recv().unwrap();
|
||||
wc.fetch_add(1, Ordering::SeqCst);
|
||||
}));
|
||||
}
|
||||
|
||||
// Let all receivers park before we send.
|
||||
for _ in 0..4 { yield_now(); }
|
||||
|
||||
// Coordinator blasts all channels.
|
||||
handles.push(spawn(move || {
|
||||
for tx in senders {
|
||||
tx.send(1).unwrap();
|
||||
}
|
||||
}));
|
||||
|
||||
for h in handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
});
|
||||
|
||||
assert_eq!(woke.load(Ordering::SeqCst), HERD);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// P1: Concurrent spawn/join churn — PID table pressure
|
||||
//
|
||||
// K parent actors each spawn M children and join them, all concurrently.
|
||||
// Exercises PID allocation/deallocation racing across scheduler threads.
|
||||
// A generation-counter bug or slot leak will either corrupt a join result
|
||||
// or accumulate memory without bound.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn concurrent_spawn_join_churn() {
|
||||
const PARENTS: usize = 20;
|
||||
const CHILDREN_PER_PARENT: usize = 50;
|
||||
const EXPECTED: u64 = (PARENTS * CHILDREN_PER_PARENT) as u64;
|
||||
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t = total.clone();
|
||||
|
||||
rt_par().run(move || {
|
||||
let mut parent_handles: Vec<JoinHandle> = Vec::with_capacity(PARENTS);
|
||||
|
||||
for _ in 0..PARENTS {
|
||||
let tc = t.clone();
|
||||
parent_handles.push(spawn(move || {
|
||||
let mut child_handles: Vec<JoinHandle> =
|
||||
Vec::with_capacity(CHILDREN_PER_PARENT);
|
||||
|
||||
for _ in 0..CHILDREN_PER_PARENT {
|
||||
let tcc = tc.clone();
|
||||
child_handles.push(spawn(move || {
|
||||
tcc.fetch_add(1, Ordering::SeqCst);
|
||||
}));
|
||||
}
|
||||
|
||||
for h in child_handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
}));
|
||||
}
|
||||
|
||||
for h in parent_handles {
|
||||
h.join().unwrap();
|
||||
}
|
||||
});
|
||||
|
||||
assert_eq!(total.load(Ordering::SeqCst), EXPECTED);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// P0: Join race — join called after child has already finished
|
||||
//
|
||||
// The child is given time to complete before the parent calls join. This
|
||||
// exercises a different code path than "join before child finishes":
|
||||
// the wakeup has already fired and the result must be stored in the slot.
|
||||
// A bug here leaves the parent hanging or returns a corrupted result.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn join_race_child_finishes_first() {
|
||||
const REPS: usize = 300;
|
||||
let ok = Arc::new(AtomicUsize::new(0));
|
||||
|
||||
let o = ok.clone();
|
||||
rt_par().run(move || {
|
||||
let mut handles: Vec<JoinHandle> = Vec::with_capacity(REPS);
|
||||
|
||||
for _ in 0..REPS {
|
||||
let oc = o.clone();
|
||||
let h = spawn(move || {
|
||||
// Child does a tiny bit of work and exits quickly.
|
||||
oc.fetch_add(1, Ordering::SeqCst);
|
||||
});
|
||||
handles.push(h);
|
||||
}
|
||||
|
||||
// Yield enough to let children run to completion before we join.
|
||||
for _ in 0..8 { yield_now(); }
|
||||
|
||||
for h in handles {
|
||||
// If child already finished, join must return immediately with Ok.
|
||||
h.join().unwrap();
|
||||
}
|
||||
});
|
||||
|
||||
assert_eq!(ok.load(Ordering::SeqCst), REPS);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// P3: Panic storm — concurrent panics don't corrupt the scheduler
|
||||
//
|
||||
// Many actors panic at the same time while a separate cohort of well-behaved
|
||||
// actors makes progress. If a panic corrupts the run queue or the slot table,
|
||||
// the well-behaved actors will deadlock or produce wrong counts.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn panic_storm_does_not_corrupt_scheduler() {
|
||||
const PANICKERS: usize = 50;
|
||||
const WORKERS: usize = 50;
|
||||
const WORK_PER_ACTOR: u64 = 10;
|
||||
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t = total.clone();
|
||||
|
||||
rt_par().run(move || {
|
||||
let mut handles: Vec<JoinHandle> = Vec::with_capacity(PANICKERS + WORKERS);
|
||||
|
||||
// Spawn all panickers.
|
||||
for _ in 0..PANICKERS {
|
||||
handles.push(spawn(|| panic!("deliberate panic storm")));
|
||||
}
|
||||
|
||||
// Interleave well-behaved workers.
|
||||
for _ in 0..WORKERS {
|
||||
let tc = t.clone();
|
||||
handles.push(spawn(move || {
|
||||
for _ in 0..WORK_PER_ACTOR {
|
||||
yield_now();
|
||||
tc.fetch_add(1, Ordering::SeqCst);
|
||||
}
|
||||
}));
|
||||
}
|
||||
|
||||
// Collect results — panickers return Err, workers return Ok.
|
||||
let mut panic_count = 0usize;
|
||||
let mut ok_count = 0usize;
|
||||
for h in handles {
|
||||
match h.join() {
|
||||
Ok(()) => ok_count += 1,
|
||||
Err(_) => panic_count += 1,
|
||||
}
|
||||
}
|
||||
|
||||
assert_eq!(panic_count, PANICKERS, "wrong number of panics captured");
|
||||
assert_eq!(ok_count, WORKERS, "some workers lost");
|
||||
});
|
||||
|
||||
assert_eq!(
|
||||
total.load(Ordering::SeqCst),
|
||||
WORKERS as u64 * WORK_PER_ACTOR,
|
||||
"workers produced wrong count — scheduler corruption suspected"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// P1: Sequential slot reuse — generation counter correctness
|
||||
//
|
||||
// Spawn an actor, join it, then spawn a new actor. The new actor will likely
|
||||
// reuse the same slot index. A stale handle to the first actor must not
|
||||
// accidentally refer to the second. We can't hold a stale handle across a
|
||||
// join (join consumes the handle), but we can verify that PID generations
|
||||
// are distinct across reuse.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn pid_generation_increments_on_reuse() {
|
||||
use smarm::self_pid;
|
||||
|
||||
let pids: Arc<smarm::Mutex<Vec<smarm::Pid>>> =
|
||||
Arc::new(smarm::Mutex::new(Vec::new()));
|
||||
|
||||
let p = pids.clone();
|
||||
rt(1).run(move || {
|
||||
// Single-threaded to maximise slot reuse.
|
||||
for _ in 0..100 {
|
||||
let pc = p.clone();
|
||||
spawn(move || {
|
||||
let pid = self_pid();
|
||||
let mut g = pc.lock_timeout(std::time::Duration::from_secs(5)).unwrap();
|
||||
g.push(pid);
|
||||
})
|
||||
.join()
|
||||
.unwrap();
|
||||
}
|
||||
});
|
||||
|
||||
let g = pids.lock_timeout(std::time::Duration::from_secs(1)).unwrap();
|
||||
// Any two PIDs that share an index must have different generations.
|
||||
for i in 0..g.len() {
|
||||
for j in (i + 1)..g.len() {
|
||||
if g[i].index() == g[j].index() {
|
||||
assert_ne!(
|
||||
g[i].generation(),
|
||||
g[j].generation(),
|
||||
"slot {} reused without incrementing generation",
|
||||
g[i].index()
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// P0: Channel backpressure — slow receiver, fast sender
|
||||
//
|
||||
// Sender produces messages faster than the receiver consumes them. The
|
||||
// channel must not lose messages or deadlock regardless of how deep the
|
||||
// queue grows. Tests unbounded channel growth and correct message ordering.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn channel_backpressure_no_loss() {
|
||||
const MESSAGES: u64 = 10_000;
|
||||
|
||||
let received = Arc::new(AtomicU64::new(0));
|
||||
let rc = received.clone();
|
||||
|
||||
rt_par().run(move || {
|
||||
let (tx, rx) = channel::<u64>();
|
||||
|
||||
let receiver = spawn(move || {
|
||||
let mut sum = 0u64;
|
||||
for _ in 0..MESSAGES {
|
||||
sum += rx.recv().unwrap();
|
||||
}
|
||||
rc.store(sum, Ordering::SeqCst);
|
||||
});
|
||||
|
||||
// Send all messages from the parent without waiting.
|
||||
for i in 0..MESSAGES {
|
||||
tx.send(i).unwrap();
|
||||
}
|
||||
|
||||
receiver.join().unwrap();
|
||||
});
|
||||
|
||||
// Sum of 0..MESSAGES
|
||||
let expected: u64 = (0..MESSAGES).sum();
|
||||
assert_eq!(received.load(Ordering::SeqCst), expected);
|
||||
}
|
||||
@@ -114,3 +114,94 @@ fn many_concurrent_sleepers_all_wake() {
|
||||
});
|
||||
assert_eq!(counter.load(std::sync::atomic::Ordering::SeqCst), 20);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Direct tests on the Timers data structure. No scheduler involved — these
|
||||
// cover the new Reason machinery without needing a Mutex implementation.
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
use smarm::pid::Pid;
|
||||
use smarm::timer::{Reason, TimerTarget, Timers};
|
||||
|
||||
struct RecordingTarget {
|
||||
calls: Mutex<Vec<(Pid, u64)>>,
|
||||
}
|
||||
impl TimerTarget for RecordingTarget {
|
||||
fn on_timeout(&self, pid: Pid, seq: u64) {
|
||||
self.calls.lock().unwrap().push((pid, seq));
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn timers_pop_due_returns_entries_in_deadline_order() {
|
||||
let mut t = Timers::new();
|
||||
let now = Instant::now();
|
||||
// Insert out of order; pop_due should hand them back sorted by deadline.
|
||||
t.insert_sleep(now + Duration::from_millis(30), Pid::new(0, 0));
|
||||
t.insert_sleep(now + Duration::from_millis(10), Pid::new(1, 0));
|
||||
t.insert_sleep(now + Duration::from_millis(20), Pid::new(2, 0));
|
||||
|
||||
// Advance past all of them.
|
||||
let due = t.pop_due(now + Duration::from_millis(50));
|
||||
let pids: Vec<u32> = due.iter().map(|e| e.pid.index()).collect();
|
||||
assert_eq!(pids, vec![1, 2, 0]);
|
||||
assert!(t.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn timers_only_pop_entries_whose_deadline_has_passed() {
|
||||
let mut t = Timers::new();
|
||||
let now = Instant::now();
|
||||
t.insert_sleep(now + Duration::from_millis(5), Pid::new(0, 0));
|
||||
t.insert_sleep(now + Duration::from_millis(100), Pid::new(1, 0));
|
||||
|
||||
let due = t.pop_due(now + Duration::from_millis(20));
|
||||
assert_eq!(due.len(), 1);
|
||||
assert_eq!(due[0].pid.index(), 0);
|
||||
assert!(!t.is_empty());
|
||||
// The unpopped entry's deadline is still visible.
|
||||
assert!(t.peek_deadline().is_some());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn timers_mix_sleep_and_wait_timeout_reasons() {
|
||||
let mut t = Timers::new();
|
||||
let target = Arc::new(RecordingTarget { calls: Mutex::new(Vec::new()) });
|
||||
let now = Instant::now();
|
||||
|
||||
t.insert_sleep(now + Duration::from_millis(5), Pid::new(0, 0));
|
||||
t.insert(
|
||||
now + Duration::from_millis(10),
|
||||
Pid::new(1, 0),
|
||||
Reason::WaitTimeout { target: target.clone(), wait_seq: 42 },
|
||||
);
|
||||
|
||||
let due = t.pop_due(now + Duration::from_millis(20));
|
||||
assert_eq!(due.len(), 2);
|
||||
|
||||
// Order: Sleep (5ms) first, WaitTimeout (10ms) second.
|
||||
match &due[0].reason {
|
||||
Reason::Sleep => {}
|
||||
_ => panic!("first entry should be a Sleep"),
|
||||
}
|
||||
match &due[1].reason {
|
||||
Reason::WaitTimeout { wait_seq, .. } => assert_eq!(*wait_seq, 42),
|
||||
_ => panic!("second entry should be a WaitTimeout"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn same_deadline_entries_pop_in_insertion_order() {
|
||||
// The `seq` tiebreaker means inserting two entries with the same
|
||||
// deadline preserves the order they were inserted.
|
||||
let mut t = Timers::new();
|
||||
let now = Instant::now();
|
||||
let d = now + Duration::from_millis(10);
|
||||
t.insert_sleep(d, Pid::new(0, 0));
|
||||
t.insert_sleep(d, Pid::new(1, 0));
|
||||
t.insert_sleep(d, Pid::new(2, 0));
|
||||
|
||||
let due = t.pop_due(now + Duration::from_millis(20));
|
||||
let pids: Vec<u32> = due.iter().map(|e| e.pid.index()).collect();
|
||||
assert_eq!(pids, vec![0, 1, 2]);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user