Files

T

History

smarm-agent 793941693f benches: salvage generic tooling from the excised spin work

Carries forward the non-spin parts of the dropped spin-tooling commit:
- .gitignore: __pycache__/ and *.pyc
- sweep.py: generic vs-tokio comparison summary (no spin_sweep coupling)
The spin_sweep README docs from that commit are intentionally left behind.

2026-06-15 11:21:43 +00:00

baseline-output

benches: expose preemption knobs + sweep runner

2026-05-25 13:04:58 +00:00

baseline.json

perf: check in cleaner baseline

2026-06-11 23:07:23 +02:00

bench_rq.sh

benches: add SMARM_BENCH_SETS (default 5) for stable medians

2026-06-11 20:50:29 +00:00

general.rs

benches: add SMARM_BENCH_SETS (default 5) for stable medians

2026-06-11 20:50:29 +00:00

multi_scheduler.rs

feat: full runtime redesign (v0.6)

2026-05-23 16:09:35 +00:00

primes.rs

feat: full runtime redesign (v0.6)

2026-05-23 16:09:35 +00:00

README.md

docs(bench,test): READMEs for benches/ and tests/

2026-06-09 21:35:08 +00:00

rq_micro.rs

feat(bench): phase 4 — run-queue bench harness + shootout driver

2026-06-09 20:44:10 +00:00

rq_runtime.rs

feat(scheduler): RFC 005 wake slot — per-scheduler capacity-one wake cache

2026-06-11 20:20:07 +02:00

smarm_favored.rs

benches: add SMARM_BENCH_SETS (default 5) for stable medians

2026-06-11 20:50:29 +00:00

sweep.py

benches: salvage generic tooling from the excised spin work

2026-06-15 11:21:43 +00:00

switch_cost.rs

bench: add switch_cost local-mode per-switch microbench (rdtsc+wall, RFC per-switch spike)

2026-06-15 11:19:05 +00:00

tokio_favored.rs

benches: add SMARM_BENCH_SETS (default 5) for stable medians

2026-06-11 20:50:29 +00:00

README.md

Benches

Two families live here: comparison benches (smarm vs tokio, predating v0.5) and the run-queue shootout (v0.5 phase 4). All are plain binaries (harness = false in Cargo.toml), so cargo bench just builds in release and runs main() — no criterion, no magic.

cargo bench --bench <name>           # one bench
cargo bench                          # all of them (slow; rarely what you want)

Catalog

file	what it measures
`primes.rs`	Compute fan-out/fan-in: counts primes across W workers. Pure compute throughput + spawn/join/channel cost.
`multi_scheduler.rs`	The original cross-runtime matrix: smarm (1 thread / N threads) vs tokio (current_thread / multi_thread) on compute, ping-pong, and spawn throughput.
`general.rs`	Workloads where neither runtime has a structural edge. Large gaps here mean real per-task/per-yield overhead differences — watch these for regressions.
`smarm_favored.rs`	Workloads the stackful green-thread model is built for. Single-thread numbers isolate per-switch cost from contention.
`tokio_favored.rs`	Workloads tokio's model is built for. Expect to lose; the value is knowing by how much and catching the gap widening.
`rq_micro.rs`	Run-queue structures in isolation (no runtime, no actors): push/pop throughput sweeping thread count × producer:consumer ratio. Covers all three queue types in one binary — the types compile in every build; only the runtime's alias is feature-selected.
`rq_runtime.rs`	The whole scheduler with the compile-time-selected queue: yield-storm (pure queue churn), ping-pong-pairs (park/unpark latency), spawn-storm (slab + free list + queue churn), sweeping scheduler count. Comparing variants requires rebuilding per `rq-*` feature.

The run-queue shootout

One command; it rebuilds rq_runtime once per queue variant, runs rq_micro once, and aggregates:

./scripts/bench_rq.sh
# on a big box:
SMARM_BENCH_THREADS="1 2 4 8 16 20" ./scripts/bench_rq.sh

Outputs land in bench_results/ (gitignored): one full log per run, plus summary.csv assembled from the machine-readable RQCSV,... lines every config prints alongside the human table.

Manual single-variant runs need the feature dance (features are additive, so the default rq-mutex must be switched off):

cargo bench --bench rq_runtime --no-default-features --features rq-striped

Knobs (env vars, all optional)

var	default	used by
`SMARM_BENCH_THREADS`	`"1 2 4"`	both — space-separated sweep
`SMARM_BENCH_RUNS`	`5`	both — repetitions; the median is reported
`SMARM_BENCH_ITEMS`	`200000`	`rq_micro` — items per measurement
`SMARM_BENCH_YIELD_ACTORS` / `_YIELDS`	`200` / `500`	`rq_runtime` yield-storm
`SMARM_BENCH_PAIRS` / `_ROUNDTRIPS`	`32` / `1000`	`rq_runtime` ping-pong
`SMARM_BENCH_SPAWNS`	`5000`	`rq_runtime` spawn-storm

Reading the numbers honestly

Core count is the experiment. On a 1-core machine (CI, sandboxes) the sweep only validates the harness and catches gross pathologies — oversubscribed schedulers measure context-switch noise, not contention. Variant decisions come from a many-core box.
The striped queue should lose at low thread counts (ticket overhead with no contention to amortize) — that's expected, not a bug.
Medians over SMARM_BENCH_RUNS absorb scheduling noise but not thermal / turbo drift; for publishable numbers, pin the CPU governor and run a warmup pass first.
spawn-storm batches joins (1024 at a time) to stay well under the slab cap; if you raise SMARM_BENCH_SPAWNS massively, that batching is why it still works.

Adding a bench

benches/<name>.rs with a plain main(); print the house table (see any existing bench) and, if it belongs to a sweep, a greppable CSV line with a distinctive prefix (RQCSV, for the shootout family).

[[bench]]
name = "<name>"
harness = false

Take parameters from SMARM_BENCH_* env vars with modest defaults — the defaults must finish in seconds on one core, the env scales them up on real hardware.
Report medians, and keep one measurement = one fresh runtime (init(Config::exact(t)) inside the measured closure constructor, the run() inside the timed region) so runs don't contaminate each other.

README.md Unescape Escape

Benches

Catalog

The run-queue shootout

Knobs (env vars, all optional)

Reading the numbers honestly

Adding a bench

README.md