Add `Config::alloc_interval()` and `Config::timeslice_cycles()` so
callers can tune preemption sensitivity at runtime. The values flow
through `RuntimeInner` and are written into per-scheduler-thread locals
via a new `configure_preempt()` call at thread startup, keeping the hot
path free of cross-thread coherency traffic.
Fix unused-variable warnings in channel.rs by inlining `current_pid()`
directly into `te!` macro arguments — since the no-op macro arm never
evaluates its argument, no binding is needed at the call site.
Clean up a handful of dead imports exposed by the refactor.
Config API changes (src/preempt.rs, src/runtime.rs):
- preempt: promote ALLOC_INTERVAL and TIMESLICE_CYCLES from bare consts to
DEFAULT_ALLOC_INTERVAL / DEFAULT_TIMESLICE_CYCLES; store active values in
thread-locals set on each actor resume so multiple runtimes can use
different settings concurrently.
- runtime: add alloc_interval / timeslice_cycles fields to Config; add
Config::alloc_interval(n) and Config::timeslice_cycles(c) builder methods;
thread the values through RuntimeInner to the reset_timeslice() call in
schedule_loop.
Bench changes:
- Add bench_cfg(threads) helper to general/tokio_favored/smarm_favored that
wraps Config::exact and reads SMARM_ALLOC_INTERVAL / SMARM_TIMESLICE_CYCLES
env vars, so the sweep script can vary knobs without recompiling.
Sweep tooling (benches/sweep.py):
- 'run': run the 3-file bench suite once; --save-baseline persists JSON
- 'regress': compare current run against baseline.json, exit 1 on any bench
that regresses >10% vs stored medians
- 'sweep': run the full SWEEP_GRID (10 points), print comparison table,
optional --save-csv; binaries pre-built so no recompile per point
Sweep results (10-point grid, 1-CPU sandbox):
- The preemption knobs have very little effect on this single-CPU machine.
Most benches move <5% across the entire grid.
- Longer timeslices (tc=600k, tc=1200k) reliably hurt spawn_storm_busy
(+11-15%) and catch_unwind_panics (+10-12%) because actors hold the
scheduler mutex longer per timeslice, stalling the storm of joinable tasks.
- Shorter timeslices (tc=150k) give a small improvement on many_timers
(-3-4%) and a wash everywhere else.
- yield_in_hot_loop and uncontended_channel are essentially flat across all
knobs — both are scheduling-dominated and call yield_now explicitly, so
the RDTSC-driven preemption path is irrelevant.
- Conclusion: the knobs matter primarily under contention (multi-core).
Re-run sweep on a multi-core machine before drawing tuning conclusions.
Stable Rust emits stack probes inline (subq/movq/jne loop) rather than
calling __rust_probestack, so there's no transparent hook for stack-
frame preemption. Override of __rust_probestack links cleanly but never
runs. Falling back to an explicit check!() that users drop into hot
compute loops.
check!() decrements the same ALLOC_COUNT counter as the heap path, so
both event sources fire timeslice checks at the same rate. Documents
the prep-to-park invariant on maybe_preempt — library code that
registers a wakeup and then parks must keep that window alloc-free and
check-free, or a preemption-driven yield in the middle would lose the
wakeup.
Adds a BinaryHeap of timer entries on SchedulerState. sleep() inserts
an entry and parks; schedule_loop pops due entries each iteration and
unparks them. When the run queue is empty but timers are pending, the
OS thread sleeps until the soonest deadline.
Single-threaded only; thread::sleep is fine because no other thread
can wake us. The IO thread coming next will need a Condvar or pipe
wakeup to break this OS-sleep early.
v0.2 will add stack-frame entries as a second preemption event source.
Both routes share ALLOC_COUNT so the timeslice check rate is the same
whether the actor is alloc-heavy, frame-heavy, or mixed.
Hand-rolled context switching on mmap'd stacks with guard pages,
allocator-driven RDTSC preemption, unbounded MPSC channels, supervision
via per-slot Signal mailboxes, root supervisor as sentinel PID.
Lib + tests + benches clean check/clippy. All 29 tests pass.
Bench: smarm 3.4% over serial baseline, within 160us of tokio
current-thread on prime-counting fan-out.