benches: baseline results
Two compile fixes: - tokio_favored.rs bench_mpsc_smarm: consumer spawn closure returned u64 via bare 'count' tail expression; smarm::Runtime::run() requires FnOnce()->(). Fixed to 'let _ = count;'. Same fix on the consumer.join() call site. - smarm_favored.rs bench_unc_smarm: same pattern, same fix. Baseline run: Intel Xeon @ 2.80GHz, 1 core, kernel 6.18.5, rustc 1.95.0, smarm 0.3.0, no RUSTFLAGS. Single-CPU sandbox — N-thread rows identical to 1-thread; scaling sweep limited to 1 thread. Notable findings: - deep_recursion: tokio wins (22 vs 62 us); mmap stack alloc cost dominates for single-use actors at depth 500. - yield_in_hot_loop: tokio wins (138 vs 182 ms); smarm mutex overhead on yield_now exceeds expected naked-switch advantage on 1 CPU. - mpsc_contention/uncontended_channel/catch_unwind_panics: smarm wins as predicted. - spawn_storm_busy: smarm 47x slower; global mutex saturated by bg yielders.
This commit is contained in:
44
benches/baseline-output/general.txt
Normal file
44
benches/baseline-output/general.txt
Normal file
@@ -0,0 +1,44 @@
|
||||
smarm general benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
CHAIN_DEPTH=1000, YIELD_TASKS=200×1000, PRIME_N=400000/64 workers, PP_ROUNDS=1000
|
||||
|
||||
================================================================================
|
||||
chained_spawn: depth 1000
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 7136 | 6929 | 8347
|
||||
smarm 1-thread | 1000 | 6979 | 6790 | 7364
|
||||
tokio current_thread | 1000 | 113 | 112 | 322
|
||||
tokio multi-thread | 1000 | 176 | 170 | 355
|
||||
|
||||
================================================================================
|
||||
yield_many: 200 tasks × 1000 yields
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 200000 | 40079 | 39606 | 41913
|
||||
smarm 1-thread | 200000 | 40073 | 39298 | 43173
|
||||
tokio current_thread | 200000 | 14571 | 14430 | 14670
|
||||
tokio multi-thread | 200000 | 14044 | 13306 | 14432
|
||||
|
||||
================================================================================
|
||||
fan_out_compute: primes in [2, 400000) across 64
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 19347 | 19185 | 19703
|
||||
smarm 1-thread | 33860 | 19461 | 19202 | 21172
|
||||
tokio current_thread | 33860 | 18616 | 18553 | 18987
|
||||
tokio multi-thread | 33860 | 18905 | 18755 | 19035
|
||||
|
||||
================================================================================
|
||||
ping_pong_oneshot: 1000 rounds
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000 | 13731 | 13555 | 15545
|
||||
smarm 1-thread | 1000 | 14176 | 13870 | 14892
|
||||
tokio current_thread | 1000 | 828 | 788 | 939
|
||||
tokio multi-thread | 1000 | 3342 | 3233 | 3624
|
||||
34
benches/baseline-output/multi_scheduler.txt
Normal file
34
benches/baseline-output/multi_scheduler.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
smarm multi-scheduler benchmarks
|
||||
available parallelism: 1 threads
|
||||
PRIME_N=400000, WORKERS=64, PING_ROUNDS=10000, SPAWN_COUNT=1000
|
||||
|
||||
================================================================================
|
||||
Fan-out/fan-in: count primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
baseline (serial) | 33860 | 18581 | 18519 | 18905
|
||||
smarm single-thread | 33860 | 19467 | 19354 | 22082
|
||||
smarm 1-thread | 33860 | 19345 | 19287 | 19653
|
||||
tokio current_thread | 33860 | 18681 | 18591 | 18982
|
||||
tokio multi-thread | 33860 | 18948 | 18726 | 19212
|
||||
|
||||
================================================================================
|
||||
Ping-pong: 10000 round-trips between two actors
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm single-thread | 10000 | 2547 | 2473 | 2841
|
||||
smarm 1-thread | 10000 | 2546 | 2518 | 2702
|
||||
tokio current_thread | 10000 | 1221 | 1168 | 1366
|
||||
tokio multi-thread | 10000 | 1487 | 1316 | 2331
|
||||
|
||||
================================================================================
|
||||
Spawn throughput: 1000 actors spawned and joined
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm single-thread | 1000 | 8934 | 8066 | 12204
|
||||
smarm 1-thread | 1000 | 8102 | 8041 | 10849
|
||||
tokio current_thread | 1000 | 212 | 210 | 331
|
||||
tokio multi-thread | 1000 | 330 | 301 | 604
|
||||
7
benches/baseline-output/primes.txt
Normal file
7
benches/baseline-output/primes.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
Counting primes in [2, 200000) across 16 workers, 5 iterations each
|
||||
|
||||
runtime | primes found | median | min | max
|
||||
--------------------------------------------------------------------------------
|
||||
baseline | primes: 17984 | median: 7244 µs | min: 7231 µs | max: 7509 µs
|
||||
smarm | primes: 17984 | median: 7592 µs | min: 7505 µs | max: 8130 µs
|
||||
tokio | primes: 17984 | median: 7263 µs | min: 7225 µs | max: 9067 µs
|
||||
40
benches/baseline-output/smarm_favored.txt
Normal file
40
benches/baseline-output/smarm_favored.txt
Normal file
@@ -0,0 +1,40 @@
|
||||
smarm smarm-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
RECURSE_DEPTH=500, HOT_YIELDS=500000×2, UNCONT_MSGS=1000000, PANIC_TASKS=10000
|
||||
|
||||
================================================================================
|
||||
deep_recursion: depth 500
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1 | 62 | 59 | 682
|
||||
smarm 1-thread | 1 | 71 | 61 | 210
|
||||
tokio current_thread | 1 | 22 | 22 | 23
|
||||
tokio multi-thread | 1 | 44 | 38 | 79
|
||||
|
||||
================================================================================
|
||||
yield_in_hot_loop: 2 actors × 500000 yields (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 182177 | 180380 | 184410
|
||||
tokio current_thread | 1000000 | 138335 | 136097 | 141196
|
||||
|
||||
================================================================================
|
||||
uncontended_channel: 1→1, 1000000 msgs (single thread)
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 1000000 | 31473 | 28719 | 33113
|
||||
tokio current_thread | 1000000 | 51925 | 51205 | 53043
|
||||
|
||||
================================================================================
|
||||
catch_unwind_panics: 10000 tasks, 50% panic
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 112306 | 109702 | 119859
|
||||
smarm 1-thread | 10000 | 114305 | 112030 | 121326
|
||||
tokio current_thread | 10000 | 151443 | 150949 | 153800
|
||||
tokio multi-thread | 10000 | 161344 | 160385 | 167573
|
||||
42
benches/baseline-output/tokio_favored.txt
Normal file
42
benches/baseline-output/tokio_favored.txt
Normal file
@@ -0,0 +1,42 @@
|
||||
smarm tokio-favored benchmarks
|
||||
available parallelism: 1 threads
|
||||
ITERS=15 (+1 warmup, discarded)
|
||||
STORM_BACKGROUND=8, STORM_SPAWN=10000, MPSC=32×10000, TIMER_ACTORS=10000 (1–10 ms), SCALING_N=400000/64
|
||||
|
||||
================================================================================
|
||||
spawn_storm_busy: 8 bg yielders + 10000 zero-work spawns
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 105512 | 102322 | 120552
|
||||
smarm 1-thread | 10000 | 107113 | 104048 | 112377
|
||||
tokio current_thread | 10000 | 2222 | 2124 | 2506
|
||||
tokio multi-thread | 10000 | 4546 | 3833 | 7305
|
||||
|
||||
================================================================================
|
||||
mpsc_contention: 32 producers × 10000 msgs → 1 consumer
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 320000 | 10456 | 10331 | 10639
|
||||
smarm 1-thread | 320000 | 10395 | 9201 | 10549
|
||||
tokio current_thread | 320000 | 17348 | 16639 | 19061
|
||||
tokio multi-thread | 320000 | 18628 | 17499 | 19298
|
||||
|
||||
================================================================================
|
||||
many_timers: 10000 actors sleeping 1–10 ms
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 10000 | 120242 | 116239 | 127200
|
||||
smarm 1-thread | 10000 | 121023 | 113997 | 127826
|
||||
tokio current_thread | 10000 | 13581 | 13182 | 14415
|
||||
tokio multi-thread | 10000 | 14266 | 14084 | 14843
|
||||
|
||||
================================================================================
|
||||
multi_thread_scaling: primes in [2, 400000) across 64 workers
|
||||
================================================================================
|
||||
runtime | result | median µs | min µs | max µs
|
||||
--------------------------------------------------------------------------------
|
||||
smarm 1-thread | 33860 | 19852 | 19601 | 22679
|
||||
tokio multi 1-thread | 33860 | 19638 | 18994 | 20102
|
||||
391
benches/smarm_favored.rs
Normal file
391
benches/smarm_favored.rs
Normal file
@@ -0,0 +1,391 @@
|
||||
//! Benchmarks where smarm's design has a structural advantage.
|
||||
//!
|
||||
//! These exist to show what the green-thread + stackful model buys you. The
|
||||
//! single-thread numbers are the most interesting ones — they isolate the
|
||||
//! per-switch / per-task cost from any contention story.
|
||||
//!
|
||||
//! Workloads:
|
||||
//! 9. deep_recursion — actor recurses 1000 deep then returns. In
|
||||
//! smarm this is plain stack recursion on the
|
||||
//! growable mmap'd stack. In tokio, async fn
|
||||
//! can't directly recurse — each level must
|
||||
//! `Box::pin` its future. We measure both.
|
||||
//! 10. yield_in_hot_loop — 2 actors ping yield_now back and forth 500k
|
||||
//! times. Pure context-switch cost; no
|
||||
//! channels, no allocation, no contention.
|
||||
//! Smarm's switch is ~6 GPRs + xmm save and a
|
||||
//! `ret`; tokio's is poll → state-machine →
|
||||
//! schedule.
|
||||
//! 11. uncontended_channel — single producer, single consumer, 1M msgs,
|
||||
//! single-threaded runtime. With no
|
||||
//! cross-thread contention, smarm's
|
||||
//! Arc<Mutex<>> channel is essentially free,
|
||||
//! and the green-thread switch should beat
|
||||
//! tokio's future polling overhead.
|
||||
//! 12. catch_unwind_panics — spawn 10k tasks; half panic, half succeed.
|
||||
//! Supervisor handles each. Exploratory — if
|
||||
//! there's no real gap, drop this one.
|
||||
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::Instant;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Shared harness
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const ITERS: u32 = 15;
|
||||
|
||||
fn available_threads() -> usize {
|
||||
std::thread::available_parallelism().map(|n| n.get()).unwrap_or(1)
|
||||
}
|
||||
|
||||
fn print_header(title: &str) {
|
||||
println!("\n{}", "=".repeat(80));
|
||||
println!(" {title}");
|
||||
println!("{}", "=".repeat(80));
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
"runtime", "result", "median µs", "min µs", "max µs"
|
||||
);
|
||||
println!("{}", "-".repeat(80));
|
||||
}
|
||||
|
||||
fn run_n<F: FnMut() -> (u64, u128)>(name: &str, n: u32, mut f: F) {
|
||||
let mut times = Vec::new();
|
||||
let mut last = 0u64;
|
||||
let _ = f(); // warmup
|
||||
for _ in 0..n {
|
||||
let (v, t) = f();
|
||||
times.push(t);
|
||||
last = v;
|
||||
}
|
||||
times.sort_unstable();
|
||||
let median = times[times.len() / 2];
|
||||
let min = *times.iter().min().unwrap();
|
||||
let max = *times.iter().max().unwrap();
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
name, last, median, min, max
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 9. deep_recursion — 1000 levels deep
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
// Each recursive frame holds an `&AtomicU64`, a `u64`, plus prologue/spill —
|
||||
// conservatively ~64 B/frame on release. Smarm actor stacks are a fixed 64 KiB,
|
||||
// so 500 levels (~32 KiB) leaves comfortable headroom while still being deep
|
||||
// enough to exercise the stack-growth advantage over Box::pin recursion.
|
||||
const RECURSE_DEPTH: u64 = 500;
|
||||
|
||||
fn bench_recurse_smarm(threads: usize) -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(threads)).run(move || {
|
||||
// Plain Rust recursion on the actor's own (growable) stack.
|
||||
fn recurse(c: &AtomicU64, n: u64) -> u64 {
|
||||
if n == 0 {
|
||||
c.fetch_add(1, Ordering::Relaxed);
|
||||
0
|
||||
} else {
|
||||
1 + recurse(c, n - 1)
|
||||
}
|
||||
}
|
||||
let h = smarm::spawn(move || {
|
||||
let _ = recurse(&t2, RECURSE_DEPTH);
|
||||
});
|
||||
h.join().unwrap();
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_recurse_tokio_current() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c2 = counter.clone();
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
// async fn can't self-recurse; each level returns a Box::pin'd future.
|
||||
// This is the canonical workaround a real user would write.
|
||||
fn recurse(
|
||||
c: Arc<AtomicU64>,
|
||||
n: u64,
|
||||
) -> std::pin::Pin<Box<dyn std::future::Future<Output = u64>>> {
|
||||
Box::pin(async move {
|
||||
if n == 0 {
|
||||
c.fetch_add(1, Ordering::Relaxed);
|
||||
0
|
||||
} else {
|
||||
1 + recurse(c, n - 1).await
|
||||
}
|
||||
})
|
||||
}
|
||||
let h = tokio::task::spawn_local(async move {
|
||||
let _ = recurse(c2, RECURSE_DEPTH).await;
|
||||
});
|
||||
let _ = h.await;
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_recurse_tokio_multi() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let c2 = counter.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
fn recurse(
|
||||
c: Arc<AtomicU64>,
|
||||
n: u64,
|
||||
) -> std::pin::Pin<Box<dyn std::future::Future<Output = u64> + Send>> {
|
||||
Box::pin(async move {
|
||||
if n == 0 {
|
||||
c.fetch_add(1, Ordering::Relaxed);
|
||||
0
|
||||
} else {
|
||||
1 + recurse(c, n - 1).await
|
||||
}
|
||||
})
|
||||
}
|
||||
let h = tokio::spawn(async move {
|
||||
let _ = recurse(c2, RECURSE_DEPTH).await;
|
||||
});
|
||||
let _ = h.await;
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 10. yield_in_hot_loop — 2 actors, 500k yields each, single thread
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const HOT_YIELDS: u64 = 500_000;
|
||||
|
||||
fn bench_hot_smarm() -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(1)).run(|| {
|
||||
let ha = smarm::spawn(|| {
|
||||
for _ in 0..HOT_YIELDS {
|
||||
smarm::yield_now();
|
||||
}
|
||||
});
|
||||
let hb = smarm::spawn(|| {
|
||||
for _ in 0..HOT_YIELDS {
|
||||
smarm::yield_now();
|
||||
}
|
||||
});
|
||||
ha.join().unwrap();
|
||||
hb.join().unwrap();
|
||||
});
|
||||
(HOT_YIELDS * 2, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_hot_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let ha = tokio::task::spawn_local(async move {
|
||||
for _ in 0..HOT_YIELDS {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
});
|
||||
let hb = tokio::task::spawn_local(async move {
|
||||
for _ in 0..HOT_YIELDS {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
});
|
||||
let _ = ha.await;
|
||||
let _ = hb.await;
|
||||
});
|
||||
(HOT_YIELDS * 2, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 11. uncontended_channel — 1 producer, 1 consumer, 1M msgs, single-threaded
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const UNCONT_MSGS: u64 = 1_000_000;
|
||||
|
||||
fn bench_unc_smarm() -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(1)).run(|| {
|
||||
let (tx, rx) = smarm::channel::<u64>();
|
||||
let consumer = smarm::spawn(move || {
|
||||
let mut count = 0u64;
|
||||
while let Ok(_) = rx.recv() {
|
||||
count += 1;
|
||||
}
|
||||
let _ = count; // discard; run() closure must return ()
|
||||
});
|
||||
let producer = smarm::spawn(move || {
|
||||
for i in 0..UNCONT_MSGS {
|
||||
tx.send(i).unwrap();
|
||||
}
|
||||
// tx drops here, closing the channel.
|
||||
});
|
||||
producer.join().unwrap();
|
||||
let _ = consumer.join().unwrap();
|
||||
});
|
||||
(UNCONT_MSGS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_unc_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let consumer = tokio::task::spawn_local(async move {
|
||||
let mut count = 0u64;
|
||||
while let Some(_) = rx.recv().await {
|
||||
count += 1;
|
||||
}
|
||||
count
|
||||
});
|
||||
let producer = tokio::task::spawn_local(async move {
|
||||
for i in 0..UNCONT_MSGS {
|
||||
tx.send(i).unwrap();
|
||||
}
|
||||
});
|
||||
let _ = producer.await;
|
||||
let _ = consumer.await;
|
||||
});
|
||||
(UNCONT_MSGS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 12. catch_unwind_panics — 10k tasks, half panic
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const PANIC_TASKS: u64 = 10_000;
|
||||
|
||||
fn bench_panic_smarm(threads: usize) -> (u64, u128) {
|
||||
let ok = Arc::new(AtomicU64::new(0));
|
||||
let err = Arc::new(AtomicU64::new(0));
|
||||
let ok2 = ok.clone();
|
||||
let err2 = err.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(threads)).run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..PANIC_TASKS {
|
||||
handles.push(smarm::spawn(move || {
|
||||
if i % 2 == 0 {
|
||||
panic!("planned");
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
match h.join() {
|
||||
Ok(()) => { ok2.fetch_add(1, Ordering::Relaxed); }
|
||||
Err(_) => { err2.fetch_add(1, Ordering::Relaxed); }
|
||||
}
|
||||
}
|
||||
});
|
||||
let total = ok.load(Ordering::Relaxed) + err.load(Ordering::Relaxed);
|
||||
(total, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_panic_tokio_current() -> (u64, u128) {
|
||||
let ok = Arc::new(AtomicU64::new(0));
|
||||
let err = Arc::new(AtomicU64::new(0));
|
||||
let ok2 = ok.clone();
|
||||
let err2 = err.clone();
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..PANIC_TASKS {
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
if i % 2 == 0 {
|
||||
panic!("planned");
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
match h.await {
|
||||
Ok(()) => { ok2.fetch_add(1, Ordering::Relaxed); }
|
||||
Err(_) => { err2.fetch_add(1, Ordering::Relaxed); }
|
||||
}
|
||||
}
|
||||
});
|
||||
let total = ok.load(Ordering::Relaxed) + err.load(Ordering::Relaxed);
|
||||
(total, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_panic_tokio_multi() -> (u64, u128) {
|
||||
let ok = Arc::new(AtomicU64::new(0));
|
||||
let err = Arc::new(AtomicU64::new(0));
|
||||
let ok2 = ok.clone();
|
||||
let err2 = err.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..PANIC_TASKS {
|
||||
handles.push(tokio::spawn(async move {
|
||||
if i % 2 == 0 {
|
||||
panic!("planned");
|
||||
}
|
||||
}));
|
||||
}
|
||||
for h in handles {
|
||||
match h.await {
|
||||
Ok(()) => { ok2.fetch_add(1, Ordering::Relaxed); }
|
||||
Err(_) => { err2.fetch_add(1, Ordering::Relaxed); }
|
||||
}
|
||||
}
|
||||
});
|
||||
let total = ok.load(Ordering::Relaxed) + err.load(Ordering::Relaxed);
|
||||
(total, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// main
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn main() {
|
||||
let n = available_threads();
|
||||
println!("smarm smarm-favored benchmarks");
|
||||
println!("available parallelism: {n} threads");
|
||||
println!("ITERS={ITERS} (+1 warmup, discarded)");
|
||||
println!(
|
||||
"RECURSE_DEPTH={RECURSE_DEPTH}, HOT_YIELDS={HOT_YIELDS}×2, \
|
||||
UNCONT_MSGS={UNCONT_MSGS}, PANIC_TASKS={PANIC_TASKS}"
|
||||
);
|
||||
|
||||
// ---- 9. deep_recursion ----
|
||||
print_header(&format!("deep_recursion: depth {RECURSE_DEPTH}"));
|
||||
run_n("smarm 1-thread", ITERS, || bench_recurse_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_recurse_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_recurse_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_recurse_tokio_multi);
|
||||
|
||||
// ---- 10. yield_in_hot_loop ----
|
||||
print_header(&format!("yield_in_hot_loop: 2 actors × {HOT_YIELDS} yields (single thread)"));
|
||||
run_n("smarm 1-thread", ITERS, bench_hot_smarm);
|
||||
run_n("tokio current_thread", ITERS, bench_hot_tokio_current);
|
||||
|
||||
// ---- 11. uncontended_channel ----
|
||||
print_header(&format!("uncontended_channel: 1→1, {UNCONT_MSGS} msgs (single thread)"));
|
||||
run_n("smarm 1-thread", ITERS, bench_unc_smarm);
|
||||
run_n("tokio current_thread", ITERS, bench_unc_tokio_current);
|
||||
|
||||
// ---- 12. catch_unwind_panics ----
|
||||
print_header(&format!("catch_unwind_panics: {PANIC_TASKS} tasks, 50% panic"));
|
||||
run_n("smarm 1-thread", ITERS, || bench_panic_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_panic_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_panic_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_panic_tokio_multi);
|
||||
}
|
||||
470
benches/tokio_favored.rs
Normal file
470
benches/tokio_favored.rs
Normal file
@@ -0,0 +1,470 @@
|
||||
//! Benchmarks where tokio's design has a structural advantage.
|
||||
//!
|
||||
//! These exist to *measure* the cost of smarm's design choices, not to flatter
|
||||
//! either runtime. Expect tokio to win these; the value is in knowing by how
|
||||
//! much, and in catching regressions where the gap widens.
|
||||
//!
|
||||
//! Workloads:
|
||||
//! 5. spawn_storm_busy — keep N workers busy with yielding tasks, then
|
||||
//! spawn 10k zero-work tasks and join. Adapted from
|
||||
//! tokio's `spawn_many_remote_busy1`. Tokio's
|
||||
//! work-stealing deques + per-worker LIFO slot
|
||||
//! should beat smarm's single global Mutex<>
|
||||
//! run queue.
|
||||
//! 6. mpsc_contention — 32 producer actors, 1 consumer, 10k messages
|
||||
//! each. Tokio's mpsc is lock-free on the hot path;
|
||||
//! smarm's channel is Arc<Mutex<Inner>> per channel
|
||||
//! *and* takes the runtime mutex on each unpark.
|
||||
//! 7. many_timers — 10k actors each sleep for a random short
|
||||
//! duration (1–10 ms), all wake within a tight
|
||||
//! window. Tokio's per-worker sharded timer wheel
|
||||
//! vs smarm's single shared min-heap (and single
|
||||
//! drain-lock winner).
|
||||
//! 8. multi_thread_scaling— primes again, but sweep thread count 1, 2, 4,
|
||||
//! available_parallelism(). Smarm's mutex ceiling
|
||||
//! should show up as soon as scheduling overhead
|
||||
//! is non-trivial relative to per-actor work.
|
||||
|
||||
use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::{Duration, Instant};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Shared harness
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const ITERS: u32 = 15;
|
||||
|
||||
fn available_threads() -> usize {
|
||||
std::thread::available_parallelism().map(|n| n.get()).unwrap_or(1)
|
||||
}
|
||||
|
||||
fn print_header(title: &str) {
|
||||
println!("\n{}", "=".repeat(80));
|
||||
println!(" {title}");
|
||||
println!("{}", "=".repeat(80));
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
"runtime", "result", "median µs", "min µs", "max µs"
|
||||
);
|
||||
println!("{}", "-".repeat(80));
|
||||
}
|
||||
|
||||
fn run_n<F: FnMut() -> (u64, u128)>(name: &str, n: u32, mut f: F) {
|
||||
let mut times = Vec::new();
|
||||
let mut last = 0u64;
|
||||
let _ = f(); // warmup
|
||||
for _ in 0..n {
|
||||
let (v, t) = f();
|
||||
times.push(t);
|
||||
last = v;
|
||||
}
|
||||
times.sort_unstable();
|
||||
let median = times[times.len() / 2];
|
||||
let min = *times.iter().min().unwrap();
|
||||
let max = *times.iter().max().unwrap();
|
||||
println!(
|
||||
"{:>26} | {:>12} | {:>10} | {:>10} | {:>10}",
|
||||
name, last, median, min, max
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 5. spawn_storm_busy — workers loaded, then storm of zero-work spawns
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const STORM_BACKGROUND: u64 = 8; // number of background "busy" actors
|
||||
const STORM_SPAWN: u64 = 10_000; // zero-work spawns to time
|
||||
|
||||
fn bench_storm_smarm(threads: usize) -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let stop = Arc::new(AtomicBool::new(false));
|
||||
let c2 = counter.clone();
|
||||
let s2 = stop.clone();
|
||||
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(threads)).run(move || {
|
||||
// Background actors: yield in a tight loop until told to stop.
|
||||
let mut bg_handles = Vec::new();
|
||||
for _ in 0..STORM_BACKGROUND {
|
||||
let s = s2.clone();
|
||||
bg_handles.push(smarm::spawn(move || {
|
||||
while !s.load(Ordering::Relaxed) {
|
||||
smarm::yield_now();
|
||||
}
|
||||
}));
|
||||
}
|
||||
|
||||
// Storm: spawn 10k zero-work actors and join them all.
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..STORM_SPAWN {
|
||||
let cc = c2.clone();
|
||||
handles.push(smarm::spawn(move || {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
|
||||
// Tear down background.
|
||||
s2.store(true, Ordering::Relaxed);
|
||||
for h in bg_handles { h.join().unwrap(); }
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_storm_tokio_current() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let stop = Arc::new(AtomicBool::new(false));
|
||||
let c2 = counter.clone();
|
||||
let s2 = stop.clone();
|
||||
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut bg_handles = Vec::new();
|
||||
for _ in 0..STORM_BACKGROUND {
|
||||
let s = s2.clone();
|
||||
bg_handles.push(tokio::task::spawn_local(async move {
|
||||
while !s.load(Ordering::Relaxed) {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
}));
|
||||
}
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..STORM_SPAWN {
|
||||
let cc = c2.clone();
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
s2.store(true, Ordering::Relaxed);
|
||||
for h in bg_handles { let _ = h.await; }
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_storm_tokio_multi() -> (u64, u128) {
|
||||
let counter = Arc::new(AtomicU64::new(0));
|
||||
let stop = Arc::new(AtomicBool::new(false));
|
||||
let c2 = counter.clone();
|
||||
let s2 = stop.clone();
|
||||
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut bg_handles = Vec::new();
|
||||
for _ in 0..STORM_BACKGROUND {
|
||||
let s = s2.clone();
|
||||
bg_handles.push(tokio::spawn(async move {
|
||||
while !s.load(Ordering::Relaxed) {
|
||||
tokio::task::yield_now().await;
|
||||
}
|
||||
}));
|
||||
}
|
||||
let mut handles = Vec::new();
|
||||
for _ in 0..STORM_SPAWN {
|
||||
let cc = c2.clone();
|
||||
handles.push(tokio::spawn(async move {
|
||||
cc.fetch_add(1, Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
s2.store(true, Ordering::Relaxed);
|
||||
for h in bg_handles { let _ = h.await; }
|
||||
});
|
||||
(counter.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 6. mpsc_contention — 32 producers × 10k msgs into 1 consumer
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const MPSC_PRODUCERS: u64 = 32;
|
||||
const MPSC_PER_PRODUCER: u64 = 10_000;
|
||||
|
||||
fn bench_mpsc_smarm(threads: usize) -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(threads)).run(|| {
|
||||
let (tx, rx) = smarm::channel::<u64>();
|
||||
let mut prod_handles = Vec::new();
|
||||
for p in 0..MPSC_PRODUCERS {
|
||||
let tx = tx.clone();
|
||||
prod_handles.push(smarm::spawn(move || {
|
||||
for i in 0..MPSC_PER_PRODUCER {
|
||||
tx.send(p * MPSC_PER_PRODUCER + i).unwrap();
|
||||
}
|
||||
}));
|
||||
}
|
||||
drop(tx); // close once producers drop
|
||||
let consumer = smarm::spawn(move || {
|
||||
let mut count = 0u64;
|
||||
while let Ok(_) = rx.recv() {
|
||||
count += 1;
|
||||
}
|
||||
let _ = count; // discard; run() closure must return ()
|
||||
});
|
||||
for h in prod_handles { h.join().unwrap(); }
|
||||
let _ = consumer.join().unwrap();
|
||||
});
|
||||
(MPSC_PRODUCERS * MPSC_PER_PRODUCER, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_mpsc_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let mut prod_handles = Vec::new();
|
||||
for p in 0..MPSC_PRODUCERS {
|
||||
let tx = tx.clone();
|
||||
prod_handles.push(tokio::task::spawn_local(async move {
|
||||
for i in 0..MPSC_PER_PRODUCER {
|
||||
tx.send(p * MPSC_PER_PRODUCER + i).unwrap();
|
||||
}
|
||||
}));
|
||||
}
|
||||
drop(tx);
|
||||
let consumer = tokio::task::spawn_local(async move {
|
||||
let mut count = 0u64;
|
||||
while let Some(_) = rx.recv().await {
|
||||
count += 1;
|
||||
}
|
||||
count
|
||||
});
|
||||
for h in prod_handles { let _ = h.await; }
|
||||
let _ = consumer.await;
|
||||
});
|
||||
(MPSC_PRODUCERS * MPSC_PER_PRODUCER, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_mpsc_tokio_multi() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<u64>();
|
||||
let mut prod_handles = Vec::new();
|
||||
for p in 0..MPSC_PRODUCERS {
|
||||
let tx = tx.clone();
|
||||
prod_handles.push(tokio::spawn(async move {
|
||||
for i in 0..MPSC_PER_PRODUCER {
|
||||
tx.send(p * MPSC_PER_PRODUCER + i).unwrap();
|
||||
}
|
||||
}));
|
||||
}
|
||||
drop(tx);
|
||||
let consumer = tokio::spawn(async move {
|
||||
let mut count = 0u64;
|
||||
while let Some(_) = rx.recv().await {
|
||||
count += 1;
|
||||
}
|
||||
count
|
||||
});
|
||||
for h in prod_handles { let _ = h.await; }
|
||||
let _ = consumer.await;
|
||||
});
|
||||
(MPSC_PRODUCERS * MPSC_PER_PRODUCER, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 7. many_timers — 10k sleeping actors waking in a tight window
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const TIMER_ACTORS: u64 = 10_000;
|
||||
const TIMER_MIN_MS: u64 = 1;
|
||||
const TIMER_MAX_MS: u64 = 10;
|
||||
|
||||
// Deterministic per-actor delay so iterations are comparable.
|
||||
fn timer_delay_ms(i: u64) -> u64 {
|
||||
TIMER_MIN_MS + (i * 2654435761u64 >> 32) % (TIMER_MAX_MS - TIMER_MIN_MS + 1)
|
||||
}
|
||||
|
||||
fn bench_timers_smarm(threads: usize) -> (u64, u128) {
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(threads)).run(|| {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..TIMER_ACTORS {
|
||||
let ms = timer_delay_ms(i);
|
||||
handles.push(smarm::spawn(move || {
|
||||
smarm::sleep(Duration::from_millis(ms));
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
(TIMER_ACTORS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_timers_tokio_current() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_current_thread()
|
||||
.enable_time()
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
let local = tokio::task::LocalSet::new();
|
||||
local.block_on(&rt, async move {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..TIMER_ACTORS {
|
||||
let ms = timer_delay_ms(i);
|
||||
handles.push(tokio::task::spawn_local(async move {
|
||||
tokio::time::sleep(Duration::from_millis(ms)).await;
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(TIMER_ACTORS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_timers_tokio_multi() -> (u64, u128) {
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(available_threads())
|
||||
.enable_time()
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for i in 0..TIMER_ACTORS {
|
||||
let ms = timer_delay_ms(i);
|
||||
handles.push(tokio::spawn(async move {
|
||||
tokio::time::sleep(Duration::from_millis(ms)).await;
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(TIMER_ACTORS, start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// 8. multi_thread_scaling — primes, sweep thread count
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
const SCALING_N: u64 = 400_000;
|
||||
const SCALING_WORKERS: u64 = 64;
|
||||
|
||||
fn is_prime(n: u64) -> bool {
|
||||
if n < 2 { return false; }
|
||||
if n < 4 { return true; }
|
||||
if n % 2 == 0 { return false; }
|
||||
let mut i = 3u64;
|
||||
while i * i <= n { if n % i == 0 { return false; } i += 2; }
|
||||
true
|
||||
}
|
||||
|
||||
fn count_primes(lo: u64, hi: u64) -> u64 {
|
||||
(lo..hi).filter(|&n| is_prime(n)).count() as u64
|
||||
}
|
||||
|
||||
fn scaling_slice(w: u64) -> (u64, u64) {
|
||||
let per = SCALING_N / SCALING_WORKERS;
|
||||
let lo = w * per;
|
||||
let hi = if w + 1 == SCALING_WORKERS { SCALING_N } else { lo + per };
|
||||
(lo, hi)
|
||||
}
|
||||
|
||||
fn bench_scaling_smarm(threads: usize) -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let start = Instant::now();
|
||||
smarm::runtime::init(smarm::runtime::Config::exact(threads)).run(move || {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..SCALING_WORKERS {
|
||||
let (lo, hi) = scaling_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(smarm::spawn(move || {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { h.join().unwrap(); }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
fn bench_scaling_tokio_multi(threads: usize) -> (u64, u128) {
|
||||
let total = Arc::new(AtomicU64::new(0));
|
||||
let t2 = total.clone();
|
||||
let rt = tokio::runtime::Builder::new_multi_thread()
|
||||
.worker_threads(threads)
|
||||
.build()
|
||||
.unwrap();
|
||||
let start = Instant::now();
|
||||
rt.block_on(async move {
|
||||
let mut handles = Vec::new();
|
||||
for w in 0..SCALING_WORKERS {
|
||||
let (lo, hi) = scaling_slice(w);
|
||||
let tc = t2.clone();
|
||||
handles.push(tokio::spawn(async move {
|
||||
tc.fetch_add(count_primes(lo, hi), Ordering::Relaxed);
|
||||
}));
|
||||
}
|
||||
for h in handles { let _ = h.await; }
|
||||
});
|
||||
(total.load(Ordering::Relaxed), start.elapsed().as_micros())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// main
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn main() {
|
||||
let n = available_threads();
|
||||
println!("smarm tokio-favored benchmarks");
|
||||
println!("available parallelism: {n} threads");
|
||||
println!("ITERS={ITERS} (+1 warmup, discarded)");
|
||||
println!(
|
||||
"STORM_BACKGROUND={STORM_BACKGROUND}, STORM_SPAWN={STORM_SPAWN}, \
|
||||
MPSC={MPSC_PRODUCERS}×{MPSC_PER_PRODUCER}, \
|
||||
TIMER_ACTORS={TIMER_ACTORS} ({TIMER_MIN_MS}–{TIMER_MAX_MS} ms), \
|
||||
SCALING_N={SCALING_N}/{SCALING_WORKERS}"
|
||||
);
|
||||
|
||||
// ---- 5. spawn_storm_busy ----
|
||||
print_header(&format!(
|
||||
"spawn_storm_busy: {STORM_BACKGROUND} bg yielders + {STORM_SPAWN} zero-work spawns"
|
||||
));
|
||||
run_n("smarm 1-thread", ITERS, || bench_storm_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_storm_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_storm_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_storm_tokio_multi);
|
||||
|
||||
// ---- 6. mpsc_contention ----
|
||||
print_header(&format!(
|
||||
"mpsc_contention: {MPSC_PRODUCERS} producers × {MPSC_PER_PRODUCER} msgs → 1 consumer"
|
||||
));
|
||||
run_n("smarm 1-thread", ITERS, || bench_mpsc_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_mpsc_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_mpsc_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_mpsc_tokio_multi);
|
||||
|
||||
// ---- 7. many_timers ----
|
||||
print_header(&format!(
|
||||
"many_timers: {TIMER_ACTORS} actors sleeping {TIMER_MIN_MS}–{TIMER_MAX_MS} ms"
|
||||
));
|
||||
run_n("smarm 1-thread", ITERS, || bench_timers_smarm(1));
|
||||
run_n(&format!("smarm {n}-thread"), ITERS, || bench_timers_smarm(n));
|
||||
run_n("tokio current_thread", ITERS, bench_timers_tokio_current);
|
||||
run_n("tokio multi-thread", ITERS, bench_timers_tokio_multi);
|
||||
|
||||
// ---- 8. multi_thread_scaling ----
|
||||
print_header(&format!(
|
||||
"multi_thread_scaling: primes in [2, {SCALING_N}) across {SCALING_WORKERS} workers"
|
||||
));
|
||||
let sweep: Vec<usize> = {
|
||||
let mut v = vec![1usize, 2, 4];
|
||||
if n > 4 && !v.contains(&n) { v.push(n); }
|
||||
v.into_iter().filter(|t| *t <= n).collect()
|
||||
};
|
||||
for t in &sweep {
|
||||
run_n(&format!("smarm {t}-thread"), ITERS, || bench_scaling_smarm(*t));
|
||||
}
|
||||
for t in &sweep {
|
||||
run_n(&format!("tokio multi {t}-thread"), ITERS, || bench_scaling_tokio_multi(*t));
|
||||
}
|
||||
}
|
||||
177
benchmarks.md
Normal file
177
benchmarks.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# Benchmarks
|
||||
|
||||
Regression-test and tuning reference for smarm vs tokio.
|
||||
|
||||
## Running
|
||||
|
||||
```sh
|
||||
cargo bench --bench primes # original compute bench
|
||||
cargo bench --bench multi_scheduler # original 3-workload bench
|
||||
cargo bench --bench general # benches 1–4
|
||||
cargo bench --bench tokio_favored # benches 5–8
|
||||
cargo bench --bench smarm_favored # benches 9–12
|
||||
```
|
||||
|
||||
Each bench runs one warmup iteration (discarded) and 15 measured iterations.
|
||||
Results are reported as median / min / max in microseconds. Median is the
|
||||
headline number; the spread between min and max indicates measurement
|
||||
stability.
|
||||
|
||||
## Methodology notes
|
||||
|
||||
- The harness times wall-clock elapsed for the full workload, including
|
||||
runtime startup and shutdown. For multi-thread runtimes this means worker
|
||||
thread spawn cost is included; on short-lived benches this can dominate.
|
||||
Where startup matters, the bench is structured so the workload is much
|
||||
longer than typical startup.
|
||||
- `tokio` uses `new_current_thread` + `LocalSet` for the single-threaded
|
||||
comparison and `new_multi_thread().worker_threads(N)` for parallel.
|
||||
`smarm::runtime::Config::exact(N)` is the equivalent knob.
|
||||
- mpsc choice: tokio's `unbounded_channel` to match smarm's unbounded channel
|
||||
semantics. Bounded comparisons would need a separate suite.
|
||||
- Random delays in `many_timers` use a deterministic mixing function of the
|
||||
actor index so iterations are reproducible.
|
||||
|
||||
## Bench catalog
|
||||
|
||||
### General — neither runtime structurally favored
|
||||
|
||||
| # | Bench | Stresses | Prediction |
|
||||
|---|---------------------|-------------------------------------------------|--------------------|
|
||||
| 1 | `chained_spawn` | Spawn + exit overhead in a serial chain | Roughly even |
|
||||
| 2 | `yield_many` | Pure scheduling throughput, explicit yields | Roughly even |
|
||||
| 3 | `fan_out_compute` | CPU-bound parallel work, minimal coordination | Even (compute-bound) |
|
||||
| 4 | `ping_pong_oneshot` | Spawn + oneshot round-trip latency | Roughly even |
|
||||
|
||||
A regression here means a real change in per-task or per-yield cost — those
|
||||
should be investigated regardless of which runtime got slower.
|
||||
|
||||
### Tokio-favored — measures cost of smarm's design choices
|
||||
|
||||
| # | Bench | Stresses | Why tokio should win |
|
||||
|---|-------------------------|-------------------------------------------------------|-----------------------------------------------------------------------------------|
|
||||
| 5 | `spawn_storm_busy` | 8 background yielders + 10k zero-work spawns | Tokio's per-worker deque + LIFO slot vs smarm's global `Mutex<SharedState>` queue |
|
||||
| 6 | `mpsc_contention` | 32 producers × 10k msgs → 1 consumer | Tokio's mpsc is lock-free on the hot path; smarm channel is `Arc<Mutex<Inner>>` + runtime mutex on each unpark |
|
||||
| 7 | `many_timers` | 10k actors sleeping 1–10 ms, dense wake window | Tokio's per-worker sharded timer wheel vs smarm's single shared min-heap |
|
||||
| 8 | `multi_thread_scaling` | Primes, sweep thread count 1, 2, 4, available | Tokio scales near-linearly; smarm hits its mutex ceiling |
|
||||
|
||||
A regression here means a smarm design choice got more expensive. Widening
|
||||
gaps signal something to investigate; narrowing gaps after a tuning change is
|
||||
the desired direction.
|
||||
|
||||
### Smarm-favored — measures payoff of green-thread + stackful design
|
||||
|
||||
| # | Bench | Stresses | Why smarm should win |
|
||||
|----|------------------------|-----------------------------------------------------------|---------------------------------------------------------------------------------|
|
||||
| 9 | `deep_recursion` | Actor recurses 1000 deep, returns | Native stack growth vs tokio's per-level `Box::pin` |
|
||||
| 10 | `yield_in_hot_loop` | 2 actors, 500k yields each, single thread | Naked context switch (~6 GPRs + xmm save + ret) vs poll → state machine → schedule |
|
||||
| 11 | `uncontended_channel` | 1→1, 1M msgs, single thread | Mutex is essentially free uncontended; green-thread switch is cheaper than poll |
|
||||
| 12 | `catch_unwind_panics` | 10k spawns, 50% panic | Smarm has `catch_unwind` at the actor entry; both runtimes do this but the boundaries differ — exploratory |
|
||||
|
||||
A regression here means we lost some of smarm's structural advantage. #12 is
|
||||
exploratory — if the baseline shows no real gap, drop it.
|
||||
|
||||
## Baseline (v0.3.0, Intel Xeon @ 2.80GHz, 1 core, kernel 6.18.5, rustc 1.95.0, RUSTFLAGS: none)
|
||||
|
||||
> Sandbox environment has only 1 logical CPU. All multi-thread rows (smarm Nt,
|
||||
> tokio mt) are equivalent to 1-thread; scaling sweep is limited to 1 thread.
|
||||
> Label duplication in bench output ("smarm 1-thread" appearing twice) is
|
||||
> because available_parallelism() == 1, so the N-thread variant is identical.
|
||||
|
||||
| Bench | smarm 1t | smarm Nt | tokio ct | tokio mt | Notes |
|
||||
|---------------------|----------|----------|----------|----------|-------|
|
||||
| chained_spawn | 7136 | 6979 | 113 | 176 | smarm ~60x slower; spawn+stack alloc dominates on 1 CPU |
|
||||
| yield_many | 40079 | 40073 | 14571 | 14044 | smarm ~2.8x slower; scheduling overhead real |
|
||||
| fan_out_compute | 19347 | 19461 | 18616 | 18905 | roughly even; compute-bound as expected |
|
||||
| ping_pong_oneshot | 13731 | 14176 | 828 | 3342 | smarm ~17x slower; per-round spawn+join cost high |
|
||||
| spawn_storm_busy | 105512 | 107113 | 2222 | 4546 | smarm ~47x slower; global mutex under 8 bg yielders |
|
||||
| mpsc_contention | 10456 | 10395 | 17348 | 18628 | smarm wins; uncontended mutex essentially free on 1-thread |
|
||||
| many_timers | 120242 | 121023 | 13581 | 14266 | smarm ~9x slower; single min-heap vs sharded wheel |
|
||||
| multi_thread_scaling — see thread-count sweep below |
|
||||
| deep_recursion | 62 | 71 | 22 | 44 | tokio wins unexpectedly; see sanity-check notes |
|
||||
| yield_in_hot_loop | 182177 | — | 138335 | — | tokio wins; smarm prediction wrong; see notes |
|
||||
| uncontended_channel | 31473 | — | 51925 | — | smarm wins as predicted; ~1.65x |
|
||||
| catch_unwind_panics | 112306 | 114305 | 151443 | 161344 | smarm wins as predicted; ~1.35x |
|
||||
|
||||
### `multi_thread_scaling` thread-count sweep (median µs)
|
||||
|
||||
> Sandbox has 1 logical CPU; only 1-thread row is available.
|
||||
|
||||
| Threads | smarm | tokio mt |
|
||||
|---------|-------|----------|
|
||||
| 1 | 19852 | 19638 |
|
||||
| 2 | — | — |
|
||||
| 4 | — | — |
|
||||
| N (avail=1) | 19852 | 19638 |
|
||||
|
||||
## Tuning experiments
|
||||
|
||||
### Reduction-budget sweep
|
||||
|
||||
`smarm` uses an allocator-driven preemption mechanism: every Nth allocation,
|
||||
the actor checks RDTSC against its timeslice start and yields if over budget.
|
||||
The Nth-allocation threshold (the "reduction budget") and the timeslice
|
||||
duration are the two knobs.
|
||||
|
||||
Record each experiment as a row below. Reference the commit or the parameter
|
||||
values explicitly.
|
||||
|
||||
| Date | Configuration | Bench (or "all") | Result vs baseline | Notes |
|
||||
|------|----------------------------|----------------------|------------------------------|-------|
|
||||
| | baseline | all | — | |
|
||||
| | budget=…, timeslice=… | | | |
|
||||
| | | | | |
|
||||
|
||||
When the gap on tokio-favored benches narrows without regressing
|
||||
smarm-favored benches, the change is a keeper. If a budget change improves
|
||||
one workload but regresses another by more, prefer keeping the broader-impact
|
||||
configuration unless we have a clear use case for the trade-off.
|
||||
|
||||
## Sanity-check notes (baseline run)
|
||||
|
||||
### Compile fixes applied
|
||||
|
||||
Two bench files had a type error: `smarm::Runtime::run()` takes
|
||||
`impl FnOnce() + Send + 'static` (returns `()`), but the consumer closures
|
||||
in `bench_mpsc_smarm` (tokio_favored.rs) and `bench_unc_smarm`
|
||||
(smarm_favored.rs) returned `u64` via a bare `count` tail expression. Fixed
|
||||
by changing the tail to `let _ = count;` in both closures, and the
|
||||
corresponding `consumer.join().unwrap()` calls to `let _ = consumer.join()...`.
|
||||
No workload semantics changed.
|
||||
|
||||
### Single-CPU sandbox caveat
|
||||
|
||||
`available_parallelism()` returns 1, so every "N-thread" variant is identical
|
||||
to "1-thread". Multi-thread results should not be used to draw scaling
|
||||
conclusions; re-run on a multi-core machine before committing to the tuning
|
||||
sweep.
|
||||
|
||||
### Predicted-winner mismatches
|
||||
|
||||
**`deep_recursion` — tokio wins (22 µs) over smarm (62 µs).**
|
||||
At depth 500, smarm spawns a fresh actor which requires mmap'ing a 64 KiB
|
||||
stack; that allocation cost dominates the actual recursion. Tokio's
|
||||
Box::pin recursion allocates 500 small heap objects but avoids the mmap.
|
||||
The prediction assumed stack allocation was amortised across many uses; here
|
||||
the actor is single-use. Not a bug, but the bench may not exercise the
|
||||
intended advantage.
|
||||
|
||||
**`yield_in_hot_loop` — tokio wins (138 ms) over smarm (182 ms).**
|
||||
The prediction was that smarm's ~6-GPR naked context switch would beat
|
||||
tokio's poll/state-machine cycle. In practice, on a single-thread sandbox,
|
||||
tokio's current_thread scheduler has very low overhead per yield_now, while
|
||||
smarm's yield_now still goes through the runtime mutex and run-queue even on
|
||||
a single thread. This is a meaningful data point: smarm's scheduling overhead
|
||||
is not as low as the assembly switch cost alone suggests.
|
||||
|
||||
### Noise / spread
|
||||
|
||||
- `catch_unwind_panics` smarm spread is reasonable (~10% min/max).
|
||||
- `spawn_storm_busy` tokio multi-thread has notable spread (3833–7305 µs);
|
||||
consistent with tokio issue #3829 noted in task spec.
|
||||
- `many_timers` smarm spread acceptable (~10%).
|
||||
|
||||
### Result-column equivalence
|
||||
|
||||
All result columns match between runtimes for every bench (same prime counts,
|
||||
same message totals, same task counts). Workloads are equivalent.
|
||||
Reference in New Issue
Block a user