Markk116/smarm

Fork 0

Files

smarm d432349f99 Update the documentation

2026-05-25 22:14:07 +02:00

10 KiB

Raw Permalink Blame History

Benchmarks

Regression-test and tuning reference for smarm vs tokio.

Running

cargo bench --bench primes              # original compute bench
cargo bench --bench multi_scheduler     # original 3-workload bench
cargo bench --bench general             # benches 1–4
cargo bench --bench tokio_favored       # benches 5–8
cargo bench --bench smarm_favored       # benches 9–12

Each bench runs one warmup iteration (discarded) and 15 measured iterations. Results are reported as median / min / max in microseconds. Median is the headline number; the spread between min and max indicates measurement stability.

Methodology notes

The harness times wall-clock elapsed for the full workload, including runtime startup and shutdown. For multi-thread runtimes this means worker thread spawn cost is included; on short-lived benches this can dominate. Where startup matters, the bench is structured so the workload is much longer than typical startup.
tokio uses new_current_thread + LocalSet for the single-threaded comparison and new_multi_thread().worker_threads(N) for parallel. smarm::runtime::Config::exact(N) is the equivalent knob.
mpsc choice: tokio's unbounded_channel to match smarm's unbounded channel semantics. Bounded comparisons would need a separate suite.
Random delays in many_timers use a deterministic mixing function of the actor index so iterations are reproducible.

Bench catalog

General — neither runtime structurally favored

#	Bench	Stresses	Prediction
1	`chained_spawn`	Spawn + exit overhead in a serial chain	Roughly even
2	`yield_many`	Pure scheduling throughput, explicit yields	Roughly even
3	`fan_out_compute`	CPU-bound parallel work, minimal coordination	Even (compute-bound)
4	`ping_pong_oneshot`	Spawn + oneshot round-trip latency	Roughly even

A regression here means a real change in per-task or per-yield cost — those should be investigated regardless of which runtime got slower.

Tokio-favored — measures cost of smarm's design choices

#	Bench	Stresses	Why tokio should win
5	`spawn_storm_busy`	8 background yielders + 10k zero-work spawns	Tokio's per-worker deque + LIFO slot vs smarm's global `Mutex<SharedState>` queue
6	`mpsc_contention`	32 producers × 10k msgs → 1 consumer	Tokio's mpsc is lock-free on the hot path; smarm channel is `Arc<Mutex<Inner>>` + runtime mutex on each unpark
7	`many_timers`	10k actors sleeping 1–10 ms, dense wake window	Tokio's per-worker sharded timer wheel vs smarm's single shared min-heap
8	`multi_thread_scaling`	Primes, sweep thread count 1, 2, 4, available	Tokio scales near-linearly; smarm hits its mutex ceiling

A regression here means a smarm design choice got more expensive. Widening gaps signal something to investigate; narrowing gaps after a tuning change is the desired direction.

Smarm-favored — measures payoff of green-thread + stackful design

#	Bench	Stresses	Why smarm should win
9	`deep_recursion`	Actor recurses 1000 deep, returns	Native stack growth vs tokio's per-level `Box::pin`
10	`yield_in_hot_loop`	2 actors, 500k yields each, single thread	Naked context switch (~6 GPRs + xmm save + ret) vs poll → state machine → schedule
11	`uncontended_channel`	1→1, 1M msgs, single thread	Mutex is essentially free uncontended; green-thread switch is cheaper than poll
12	`catch_unwind_panics`	10k spawns, 50% panic	Smarm has `catch_unwind` at the actor entry; both runtimes do this but the boundaries differ — exploratory

A regression here means we lost some of smarm's structural advantage. #12 is exploratory — if the baseline shows no real gap, drop it.

Baseline (v0.3.0, Intel Xeon @ 2.80GHz, 1 core, kernel 6.18.5, rustc 1.95.0, RUSTFLAGS: none)

Sandbox environment has only 1 logical CPU. All multi-thread rows (smarm Nt, tokio mt) are equivalent to 1-thread; scaling sweep is limited to 1 thread. Label duplication in bench output ("smarm 1-thread" appearing twice) is because available_parallelism() == 1, so the N-thread variant is identical.

Bench	smarm 1t	smarm Nt	tokio ct	tokio mt	Notes
chained_spawn	7136	6979	113	176	smarm ~60x slower; spawn+stack alloc dominates on 1 CPU
yield_many	40079	40073	14571	14044	smarm ~2.8x slower; scheduling overhead real
fan_out_compute	19347	19461	18616	18905	roughly even; compute-bound as expected
ping_pong_oneshot	13731	14176	828	3342	smarm ~17x slower; per-round spawn+join cost high
spawn_storm_busy	105512	107113	2222	4546	smarm ~47x slower; global mutex under 8 bg yielders
mpsc_contention	10456	10395	17348	18628	smarm wins; uncontended mutex essentially free on 1-thread
many_timers	120242	121023	13581	14266	smarm ~9x slower; single min-heap vs sharded wheel
multi_thread_scaling — see thread-count sweep below
deep_recursion	62	71	22	44	tokio wins unexpectedly; see sanity-check notes
yield_in_hot_loop	182177	—	138335	—	tokio wins; smarm prediction wrong; see notes
uncontended_channel	31473	—	51925	—	smarm wins as predicted; ~1.65x
catch_unwind_panics	112306	114305	151443	161344	smarm wins as predicted; ~1.35x

`multi_thread_scaling` thread-count sweep (median µs)

Sandbox has 1 logical CPU; only 1-thread row is available.

Threads	smarm	tokio mt
1	19852	19638
2	—	—
4	—	—
N (avail=1)	19852	19638

Tuning experiments

Reduction-budget sweep

smarm uses an allocator-driven preemption mechanism: every Nth allocation, the actor checks RDTSC against its timeslice start and yields if over budget. The Nth-allocation threshold (the "reduction budget") and the timeslice duration are the two knobs.

Record each experiment as a row below. Reference the commit or the parameter values explicitly.

Configuration	Bench (or "all")	Result vs baseline
baseline	all	—
budget=…, timeslice=…

When the gap on tokio-favored benches narrows without regressing smarm-favored benches, the change is a keeper. If a budget change improves one workload but regresses another by more, prefer keeping the broader-impact configuration unless we have a clear use case for the trade-off.

Sanity-check notes (baseline run)

Compile fixes applied

Two bench files had a type error: smarm::Runtime::run() takes impl FnOnce() + Send + 'static (returns ()), but the consumer closures in bench_mpsc_smarm (tokio_favored.rs) and bench_unc_smarm (smarm_favored.rs) returned u64 via a bare count tail expression. Fixed by changing the tail to let _ = count; in both closures, and the corresponding consumer.join().unwrap() calls to let _ = consumer.join().... No workload semantics changed.

Single-CPU sandbox caveat

available_parallelism() returns 1, so every "N-thread" variant is identical to "1-thread". Multi-thread results should not be used to draw scaling conclusions; re-run on a multi-core machine before committing to the tuning sweep.

Predicted-winner mismatches

deep_recursion — tokio wins (22 µs) over smarm (62 µs). At depth 500, smarm spawns a fresh actor which requires mmap'ing a 64 KiB stack; that allocation cost dominates the actual recursion. Tokio's Box::pin recursion allocates 500 small heap objects but avoids the mmap. The prediction assumed stack allocation was amortised across many uses; here the actor is single-use. Not a bug, but the bench may not exercise the intended advantage.

yield_in_hot_loop — tokio wins (138 ms) over smarm (182 ms). The prediction was that smarm's ~6-GPR naked context switch would beat tokio's poll/state-machine cycle. In practice, on a single-thread sandbox, tokio's current_thread scheduler has very low overhead per yield_now, while smarm's yield_now still goes through the runtime mutex and run-queue even on a single thread. This is a meaningful data point: smarm's scheduling overhead is not as low as the assembly switch cost alone suggests.

Noise / spread

catch_unwind_panics smarm spread is reasonable (~10% min/max).
spawn_storm_busy tokio multi-thread has notable spread (3833–7305 µs); consistent with tokio issue #3829 noted in task spec.
many_timers smarm spread acceptable (~10%).

Result-column equivalence

All result columns match between runtimes for every bench (same prime counts, same message totals, same task counts). Workloads are equivalent.

10 KiB Raw Permalink Blame History Unescape Escape