smarm

Markk116/smarm

Fork 0

Commit Graph

Author	SHA1	Message	Date
Bench	3da6ffaa77	benches: expose preemption knobs + sweep runner Config API changes (src/preempt.rs, src/runtime.rs): - preempt: promote ALLOC_INTERVAL and TIMESLICE_CYCLES from bare consts to DEFAULT_ALLOC_INTERVAL / DEFAULT_TIMESLICE_CYCLES; store active values in thread-locals set on each actor resume so multiple runtimes can use different settings concurrently. - runtime: add alloc_interval / timeslice_cycles fields to Config; add Config::alloc_interval(n) and Config::timeslice_cycles(c) builder methods; thread the values through RuntimeInner to the reset_timeslice() call in schedule_loop. Bench changes: - Add bench_cfg(threads) helper to general/tokio_favored/smarm_favored that wraps Config::exact and reads SMARM_ALLOC_INTERVAL / SMARM_TIMESLICE_CYCLES env vars, so the sweep script can vary knobs without recompiling. Sweep tooling (benches/sweep.py): - 'run': run the 3-file bench suite once; --save-baseline persists JSON - 'regress': compare current run against baseline.json, exit 1 on any bench that regresses >10% vs stored medians - 'sweep': run the full SWEEP_GRID (10 points), print comparison table, optional --save-csv; binaries pre-built so no recompile per point Sweep results (10-point grid, 1-CPU sandbox): - The preemption knobs have very little effect on this single-CPU machine. Most benches move <5% across the entire grid. - Longer timeslices (tc=600k, tc=1200k) reliably hurt spawn_storm_busy (+11-15%) and catch_unwind_panics (+10-12%) because actors hold the scheduler mutex longer per timeslice, stalling the storm of joinable tasks. - Shorter timeslices (tc=150k) give a small improvement on many_timers (-3-4%) and a wash everywhere else. - yield_in_hot_loop and uncontended_channel are essentially flat across all knobs — both are scheduling-dominated and call yield_now explicitly, so the RDTSC-driven preemption path is irrelevant. - Conclusion: the knobs matter primarily under contention (multi-core). Re-run sweep on a multi-core machine before drawing tuning conclusions.	2026-05-25 13:04:58 +00:00
Bench	6d1c59fb99	benches: baseline results Two compile fixes: - tokio_favored.rs bench_mpsc_smarm: consumer spawn closure returned u64 via bare 'count' tail expression; smarm::Runtime::run() requires FnOnce()->(). Fixed to 'let _ = count;'. Same fix on the consumer.join() call site. - smarm_favored.rs bench_unc_smarm: same pattern, same fix. Baseline run: Intel Xeon @ 2.80GHz, 1 core, kernel 6.18.5, rustc 1.95.0, smarm 0.3.0, no RUSTFLAGS. Single-CPU sandbox — N-thread rows identical to 1-thread; scaling sweep limited to 1 thread. Notable findings: - deep_recursion: tokio wins (22 vs 62 us); mmap stack alloc cost dominates for single-use actors at depth 500. - yield_in_hot_loop: tokio wins (138 vs 182 ms); smarm mutex overhead on yield_now exceeds expected naked-switch advantage on 1 CPU. - mpsc_contention/uncontended_channel/catch_unwind_panics: smarm wins as predicted. - spawn_storm_busy: smarm 47x slower; global mutex saturated by bg yielders.	2026-05-25 13:04:54 +00:00

Author

SHA1

Message

Date

Bench

3da6ffaa77

benches: expose preemption knobs + sweep runner

Config API changes (src/preempt.rs, src/runtime.rs):
- preempt: promote ALLOC_INTERVAL and TIMESLICE_CYCLES from bare consts to
  DEFAULT_ALLOC_INTERVAL / DEFAULT_TIMESLICE_CYCLES; store active values in
  thread-locals set on each actor resume so multiple runtimes can use
  different settings concurrently.
- runtime: add alloc_interval / timeslice_cycles fields to Config; add
  Config::alloc_interval(n) and Config::timeslice_cycles(c) builder methods;
  thread the values through RuntimeInner to the reset_timeslice() call in
  schedule_loop.

Bench changes:
- Add bench_cfg(threads) helper to general/tokio_favored/smarm_favored that
  wraps Config::exact and reads SMARM_ALLOC_INTERVAL / SMARM_TIMESLICE_CYCLES
  env vars, so the sweep script can vary knobs without recompiling.

Sweep tooling (benches/sweep.py):
- 'run':     run the 3-file bench suite once; --save-baseline persists JSON
- 'regress': compare current run against baseline.json, exit 1 on any bench
             that regresses >10% vs stored medians
- 'sweep':   run the full SWEEP_GRID (10 points), print comparison table,
             optional --save-csv; binaries pre-built so no recompile per point

Sweep results (10-point grid, 1-CPU sandbox):
- The preemption knobs have very little effect on this single-CPU machine.
  Most benches move <5% across the entire grid.
- Longer timeslices (tc=600k, tc=1200k) reliably hurt spawn_storm_busy
  (+11-15%) and catch_unwind_panics (+10-12%) because actors hold the
  scheduler mutex longer per timeslice, stalling the storm of joinable tasks.
- Shorter timeslices (tc=150k) give a small improvement on many_timers
  (-3-4%) and a wash everywhere else.
- yield_in_hot_loop and uncontended_channel are essentially flat across all
  knobs — both are scheduling-dominated and call yield_now explicitly, so
  the RDTSC-driven preemption path is irrelevant.
- Conclusion: the knobs matter primarily under contention (multi-core).
  Re-run sweep on a multi-core machine before drawing tuning conclusions.

2026-05-25 13:04:58 +00:00

Bench

6d1c59fb99

benches: baseline results

Two compile fixes:
- tokio_favored.rs bench_mpsc_smarm: consumer spawn closure returned u64 via
  bare 'count' tail expression; smarm::Runtime::run() requires FnOnce()->().
  Fixed to 'let _ = count;'. Same fix on the consumer.join() call site.
- smarm_favored.rs bench_unc_smarm: same pattern, same fix.

Baseline run: Intel Xeon @ 2.80GHz, 1 core, kernel 6.18.5, rustc 1.95.0,
smarm 0.3.0, no RUSTFLAGS. Single-CPU sandbox — N-thread rows identical to
1-thread; scaling sweep limited to 1 thread.

Notable findings:
- deep_recursion: tokio wins (22 vs 62 us); mmap stack alloc cost dominates
  for single-use actors at depth 500.
- yield_in_hot_loop: tokio wins (138 vs 182 ms); smarm mutex overhead on
  yield_now exceeds expected naked-switch advantage on 1 CPU.
- mpsc_contention/uncontended_channel/catch_unwind_panics: smarm wins as
  predicted.
- spawn_storm_busy: smarm 47x slower; global mutex saturated by bg yielders.

2026-05-25 13:04:54 +00:00

2 Commits