Update the documentation

Make preemption knobs configurable; fix unused-variable warnings
Add `Config::alloc_interval()` and `Config::timeslice_cycles()` so callers can tune preemption sensitivity at runtime. The values flow through `RuntimeInner` and are written into per-scheduler-thread locals via a new `configure_preempt()` call at thread startup, keeping the hot path free of cross-thread coherency traffic. Fix unused-variable warnings in channel.rs by inlining `current_pid()` directly into `te!` macro arguments — since the no-op macro arm never evaluates its argument, no binding is needed at the call site. Clean up a handful of dead imports exposed by the refactor.
2026-05-25 22:14:07 +02:00 · 2026-05-25 21:52:16 +02:00
12 changed files with 1423 additions and 63 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,3 @@
 target
 Cargo.lock
 smarm_trace.json
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -12,7 +12,7 @@ libc = "0.2"
 [dev-dependencies]
 libc = "0.2"
-tokio = { version = "1", features = ["rt", "rt-multi-thread", "macros", "sync"] }
+tokio = { version = "1", features = ["rt", "rt-multi-thread", "macros", "sync", "time"] }
 [profile.dev]
 panic = "unwind"
--- a/README.md
+++ b/README.md
@@ -1,8 +1,8 @@
 # smarm
-> Silly Marks Abstract Rust Machine. A prototype green-thread actor runtime for Rust.
+> SMARM — Smarm, Marks Actor Runtime Machinery. A proof-of-concept green-thread actor runtime for Rust.
-Implements the core ideas in [`LOOM.md`](./LOOM.md): green-thread actors on a
+Implements the core ideas in [`Achitecture.md`](.docs/Architecture.md): green-thread actors on a
 shared heap, scheduled cooperatively, communicating only by `Send` messages.
 Erlang's isolation model without Erlang's copying GC, Rust's zero-copy
 ownership transfers without async's function colouring.
@@ -58,7 +58,6 @@ tests/
  per-module integration tests
 benches/
  primes.rs    fan-out/fan-in compute, vs tokio current_thread
 LOOM.md        design intent
 ```
 ## Building and running
@@ -76,7 +75,26 @@ cargo bench               # primes benchmark vs tokio
 ## What's not here
-See the **Defer** section of `LOOM.md`. Notable absences: supervisor
+See the **Defer** section of `Architecture.md`. 
 restart-intensity caps, `join!` for handle groups, stack growth via remap,
 hierarchical timer wheel, fd-wait timeouts, `Signal::Timeout`. Each is
 mechanism we know how to add; none belongs in this iteration.
 ## Docs
 | Document | What it covers |
 |---|---|
 | [`Architecture.md`](./docs/Architecture.md) | Design intent, runtime model, and deferred work |
 | [`smarm - Deep Dive.html`](./docs/smarm%20-%20Deep%20Dive.html) | Generated walkthrough of the system; good starting point |
 | [`BENCHMARKS_AND_TUNING.md`](./docs/BENCHMARKS_AND_TUNING.md) | Where smarm wins and loses vs tokio, preemption knob recommendations |
 | [`benchmarks.md`](./docs/benchmarks.md) | Raw benchmark results, methodology, and tuning experiment log |
 ## Contributing
 This is a personal proof-of-concept. There's no PR workflow — if you fork it
 and do something interesting, just send me an email. I'd genuinely like to
 hear about it.
 ---
 <sub>The name is a recursive acronym. The M is for Marks, as in the BEAM — Bogdan/Björn's Erlang Abstract Machine, the virtual machine that runs Erlang and Elixir. smarm is not the BEAM. It just admires it from a safe distance.</sub>
--- a/docs/Architecture.md
+++ b/docs/Architecture.md
@@ -1,4 +1,4 @@
-# Loom
+# SMARM Architecture
 > Erlang-style actor concurrency for Rust, without the copies, the colors, or the GC pauses.
@@ -11,7 +11,7 @@ draws the boundary, the borrow checker already enforces it. What it lacks is an
 async/await is IO-centric, colors your functions, and trades stack simplicity for state-machine complexity;
 OS threads are too heavy to spawn per actor.
-Loom adds a third option: **green-thread actors on a shared heap**, scheduled cooperatively, with
+SMARM adds a third option: **green-thread actors on a shared heap**, scheduled cooperatively, with
 message-passing as the only cross-actor communication primitive. You get Erlang's isolation model without
 Erlang's copying GC, and you get Rust's zero-copy ownership transfers without async's cognitive overhead.
 No function coloring. No `Box<dyn Future>`. Just actors, messages, and the borrow checker doing what it
@@ -24,14 +24,14 @@ already does.
 ### Actors and scheduling
 Each actor is a lightweight green thread with its own heap-allocated, growable stack. Stacks are
-allocated via `mmap` with a guard page below the region; overflow is detected by the OS without Loom
+allocated via `mmap` with a guard page below the region; overflow is detected by the OS without SMARM
 polling for it. Initial stacks are small and grow by remapping on demand.
 The scheduler runs one OS thread per CPU. Each scheduler thread loops against a single global
 `Mutex<HashMap>` queue shared across all schedulers. If queue contention becomes a measured bottleneck
 this can be revisited; the interface will not change.
-Loom requires `panic = unwind`. Users who set `panic = abort` accept that supervision and actor
+SMARM requires `panic = unwind`. Users who set `panic = abort` accept that supervision and actor
 isolation are silently degraded to process death.
 ### Process descriptor
@@ -84,11 +84,11 @@ threshold is exceeded the actor yields. The workloads that starve a scheduler
 data transformation — are precisely the ones doing frequent allocations, so this approximation is
 correct by construction.
-`RDTSC` is not monotonic across core migration; a slightly wrong timeslice is acceptable. Loom is
+`RDTSC` is not monotonic across core migration; a slightly wrong timeslice is acceptable. SMARM is
 not a real-time scheduler.
 Known failure mode: tight no-alloc loops are invisible to this mechanism. Actors doing sustained
-allocation-free compute must call `loom::yield_now()` explicitly, or offload to a thread pool
+allocation-free compute must call `smarm::yield_now()` explicitly, or offload to a thread pool
 outside the actor scheduler (e.g. rayon). This is documented and acceptable — such loops are rare
 in message-passing workloads.
@@ -99,12 +99,12 @@ An actor yields at:
 - **Channel send/recv** — the primary communication primitive
 - **Mutex contention** — attempting to lock a held `Arc<Mutex<>>` parks the actor
 - **IO** — blocking on a socket or file descriptor parks the actor until the IO thread signals readiness
- **`loom::sleep(duration)`** — parks the actor; the timer wheel re-queues it on expiry
+- **`smarm::sleep(duration)`** — parks the actor; the timer wheel re-queues it on expiry
- **`loom::yield_now()`** — explicit cooperative yield
+- **`smarm::yield_now()`** — explicit cooperative yield
 - **Allocator preemption** — as above
 - **Spawn** — does not yield by default; the new actor is queued and the spawner continues
-`std::thread::sleep` inside an actor blocks the entire OS thread and should never be used. Loom
+`std::thread::sleep` inside an actor blocks the entire OS thread and should never be used. SMARM
 may emit a warning if it can detect this.
 ### IO thread
@@ -112,7 +112,7 @@ may emit a warning if it can detect this.
 A single dedicated IO thread runs an `epoll`/`kqueue` loop. Actors blocking on IO register their
 file descriptor and PID; the IO thread moves them back into the global queue when the fd is ready.
 A `HashMap<RawFd, Pid>` maps fds to parked actors. Cancellation (actor dies while waiting on IO)
-deregisters the fd. This is intentionally simple and not pluggable; Loom is not a general async
+deregisters the fd. This is intentionally simple and not pluggable; SMARM is not a general async
 executor.
 ### Communication
@@ -155,7 +155,7 @@ sensible global default.
 ### Mutex timeout
-Every `loom::mutex` lock attempt is mediated by the scheduler. If the lock is not acquired within
+Every `smarm::mutex` lock attempt is mediated by the scheduler. If the lock is not acquired within
 a configurable timeout, the actor receives a `LockTimeout` error rather than parking forever. This
 is a hard runtime guarantee, not a convention. Default timeout is global and configurable;
 individual locks and individual call sites can override it.
@@ -165,9 +165,9 @@ individual locks and individual call sites can override it.
 Actors can spawn children and wait on a group of handles:
 ```rust
-let h1 = loom::spawn(|| compute_a());
+let h1 = smarm::spawn(|| compute_a());
-let h2 = loom::spawn(|| compute_b());
+let h2 = smarm::spawn(|| compute_b());
-let (a, b) = loom::join!(h1, h2);
+let (a, b) = smarm::join!(h1, h2);
 ```
 `join!` parks the calling actor until all handles complete. The last child to finish re-queues the
@@ -176,7 +176,7 @@ parent. This is a countdown in the parent's descriptor; no polling, no waker reg
 ### Timer wheel
-`loom::sleep` and supervision timeouts are driven by a timer wheel in the scheduler. Sleeping
+`smarm::sleep` and supervision timeouts are driven by a timer wheel in the scheduler. Sleeping
 actors are parked and re-queued by the timer thread on expiry. The timer wheel is internal
 infrastructure; its design is an implementation detail.
@@ -189,22 +189,29 @@ infrastructure; its design is an implementation detail.
 - **Queue contention** — if `Mutex<HashMap>` proves to be a bottleneck under profiling, evaluate
  `DashMap` or a lock-free work-stealing deque (e.g. `crossbeam-deque`). Not before.
 - **AVX-512 context save** — extend `ContextSaveArea` when there is a concrete use case.
- **`loom::sleep` vs raw sleep semantics** — further control knobs deferred until the basic sleep
+- **`smarm::sleep` vs raw sleep semantics** — further control knobs deferred until the basic sleep
  is working and real use cases are understood.
 - **Supervision tree API** — the contract is defined; the recursive hierarchy, restart strategies,
  and introspection API are implementation work.
 - **no_std support** — the assembly shim is no_std friendly but the IO thread and allocator require
  OS primitives. Target is no_std + `alloc` on hosted platforms; bare metal is out of scope.
- **Distribution** — Loom is a single-process runtime. No distribution protocol, no BEAM-style
+- **Distribution** — SMARM is a single-process runtime. No distribution protocol, no BEAM-style
  clustering.
 ---
-## What Loom is Not
+## What SMARM is Not
- Not a drop-in replacement for Tokio. Loom does not implement `Future` or the async executor interface.
+- Not a drop-in replacement for Tokio. SMARM does not implement `Future` or the async executor interface.
- Not a general allocator. Loom manages actor stacks; heap allocation for actor data goes through
+- Not a general allocator. SMARM manages actor stacks; heap allocation for actor data goes through
  the system allocator.
- Not Erlang. No hot code reloading, no distribution protocol, no BEAM bytecode. Loom is a
+- Not Erlang. No hot code reloading, no distribution protocol, no BEAM bytecode. SMARM is a
  concurrency runtime, not a platform.
 - Not a real-time scheduler. Timeslice accuracy is best-effort.
 ---
 ## On names
 <sub>The name is a recursive acronym. The M is for Marks, as in the BEAM — Bogdan/Björn's Erlang Abstract Machine, the virtual machine that runs Erlang and Elixir. smarm is not the BEAM. It just admires it from a safe distance.</sub>
--- a/docs/BENCHMARKS_AND_TUNING.md
+++ b/docs/BENCHMARKS_AND_TUNING.md
--- a/docs/benchmarks.md
+++ b/docs/benchmarks.md
--- a/docs/smarm
+++ b/docs/smarm
--- a/src/channel.rs
+++ b/src/channel.rs
@@ -98,12 +98,10 @@ impl<T> Sender<T> {
            g.parked_receiver.take()
        };
        if let Some(pid) = unpark {
-            let me = crate::actor::current_pid();
+            crate::te!(crate::trace::Event::Send { sender: crate::actor::current_pid().unwrap_or(crate::pid::Pid::new(u32::MAX, u32::MAX)), receiver: Some(pid) });
            crate::te!(crate::trace::Event::Send { sender: me.unwrap_or(crate::pid::Pid::new(u32::MAX, u32::MAX)), receiver: Some(pid) });
            crate::scheduler::unpark(pid);
        } else {
-            let me = crate::actor::current_pid();
+            crate::te!(crate::trace::Event::Send { sender: crate::actor::current_pid().unwrap_or(crate::pid::Pid::new(u32::MAX, u32::MAX)), receiver: None });
            crate::te!(crate::trace::Event::Send { sender: me.unwrap_or(crate::pid::Pid::new(u32::MAX, u32::MAX)), receiver: None });
        }
        Ok(())
    }
@@ -132,9 +130,7 @@ impl<T> Receiver<T> {
            // Release the lock before parking — the unparker will need it.
            crate::scheduler::park_current();
            // Woken up — record it before looping to check the queue.
-            if let Some(me) = crate::actor::current_pid() {
+            crate::te!(crate::trace::Event::RecvWake(crate::actor::current_pid().unwrap()));
                crate::te!(crate::trace::Event::RecvWake(me));
            }
        }
    }
--- a/src/preempt.rs
+++ b/src/preempt.rs
@@ -28,23 +28,42 @@
 use std::alloc::{GlobalAlloc, Layout, System};
 use std::cell::Cell;
-const ALLOC_INTERVAL: u32 = 128;
+pub const DEFAULT_ALLOC_INTERVAL: u32 = 128;
-const TIMESLICE_CYCLES: u64 = 300_000; // ≈ 100µs on a 3 GHz CPU
+pub const DEFAULT_TIMESLICE_CYCLES: u64 = 300_000; // ≈ 100µs on a 3 GHz CPU
 thread_local! {
    /// While `false`, the allocator hook is a no-op.
    pub static PREEMPTION_ENABLED: Cell<bool> = const { Cell::new(false) };
    /// Countdown to next RDTSC check. Reset to `ALLOC_INTERVAL` on resume.
-    static ALLOC_COUNT: Cell<u32> = const { Cell::new(ALLOC_INTERVAL) };
+    static ALLOC_COUNT: Cell<u32> = const { Cell::new(DEFAULT_ALLOC_INTERVAL) };
    /// RDTSC value written by the scheduler on every actor resume.
    static TIMESLICE_START: Cell<u64> = const { Cell::new(0) };
    /// Per-thread copy of the configured alloc interval, written once at
    /// scheduler-thread startup. Kept in a thread-local so the hot path
    /// (`maybe_preempt`) pays only a TLS load, with no cache-coherency traffic.
    static CONFIGURED_ALLOC_INTERVAL: Cell<u32> = const { Cell::new(DEFAULT_ALLOC_INTERVAL) };
    /// Per-thread copy of the configured timeslice, written once at
    /// scheduler-thread startup.
    static CONFIGURED_TIMESLICE_CYCLES: Cell<u64> = const { Cell::new(DEFAULT_TIMESLICE_CYCLES) };
 }
 /// Called once per scheduler thread at startup (before any actor runs).
 /// Writes the runtime-configured preemption knobs into thread-locals so the
 /// hot path reads them without any cross-thread coherency cost.
 pub fn configure_preempt(alloc_interval: u32, timeslice_cycles: u64) {
    CONFIGURED_ALLOC_INTERVAL.with(|c| c.set(alloc_interval));
    CONFIGURED_TIMESLICE_CYCLES.with(|c| c.set(timeslice_cycles));
    // Also prime the countdown so the first resume uses the right interval.
    ALLOC_COUNT.with(|c| c.set(alloc_interval));
 }
 /// Arm the timeslice. Called by the scheduler on every resume.
 pub fn reset_timeslice() {
-    ALLOC_COUNT.with(|c| c.set(ALLOC_INTERVAL));
+    ALLOC_COUNT.with(|c| c.set(CONFIGURED_ALLOC_INTERVAL.with(|i| i.get())));
    TIMESLICE_START.with(|c| c.set(rdtsc()));
 }
@@ -102,10 +121,10 @@ pub fn maybe_preempt() {
    ALLOC_COUNT.with(|c| {
        let n = c.get();
        if n == 0 {
-            c.set(ALLOC_INTERVAL);
+            c.set(CONFIGURED_ALLOC_INTERVAL.with(|i| i.get()));
            if PREEMPTION_ENABLED.with(|e| e.get()) {
                let start = TIMESLICE_START.with(|s| s.get());
-                if rdtsc().saturating_sub(start) > TIMESLICE_CYCLES {
+                if rdtsc().saturating_sub(start) > CONFIGURED_TIMESLICE_CYCLES.with(|t| t.get()) {
                    // SAFETY: reachable only inside an actor (the scheduler
                    // sets PREEMPTION_ENABLED on resume and clears it on
                    // return). The scheduler stack is therefore valid.
--- a/src/runtime.rs
+++ b/src/runtime.rs
@@ -31,8 +31,8 @@
 //! becomes a measured bottleneck.
 use crate::actor::{
-    clear_current_pid, current_pid, is_actor_done, reset_actor_done,
+    clear_current_pid, is_actor_done, reset_actor_done, set_current_actor_box, 
-    set_current_actor_box, set_current_pid, take_last_outcome, Actor, Outcome,
+    set_current_pid, take_last_outcome, Actor, Outcome,
 };
 use crate::channel::Sender;
 use crate::context::{get_actor_sp, set_actor_sp, switch_to_actor};
@@ -70,13 +70,19 @@ pub struct Config {
    min: usize,
    max: usize,
    exact: Option<usize>,
    alloc_interval: u32,
    timeslice_cycles: u64,
 }
 impl Config {
    /// Exact thread count; takes precedence over min/max.
    pub fn exact(n: usize) -> Self {
        assert!(n >= 1, "scheduler thread count must be ≥ 1");
-        Self { min: n, max: n, exact: Some(n) }
+        Self {
            min: n, max: n, exact: Some(n),
            alloc_interval: crate::preempt::DEFAULT_ALLOC_INTERVAL,
            timeslice_cycles: crate::preempt::DEFAULT_TIMESLICE_CYCLES,
        }
    }
    /// Bounded range. Thread count = clamp(available_parallelism, min, max).
@@ -86,7 +92,28 @@ impl Config {
        if let Some(e) = exact {
            assert!(e >= 1, "exact must be ≥ 1");
        }
-        Self { min, max, exact }
+        Self {
            min, max, exact,
            alloc_interval: crate::preempt::DEFAULT_ALLOC_INTERVAL,
            timeslice_cycles: crate::preempt::DEFAULT_TIMESLICE_CYCLES,
        }
    }
    /// How many allocations (or `smarm::check!()` calls) between RDTSC checks.
    /// Lower = more responsive preemption, higher = less overhead.
    /// Default: 128.
    pub fn alloc_interval(mut self, n: u32) -> Self {
        assert!(n >= 1, "alloc_interval must be ≥ 1");
        self.alloc_interval = n;
        self
    }
    /// How many TSC cycles constitute one timeslice.
    /// Default: 300_000 (≈ 100µs on a 3 GHz CPU).
    pub fn timeslice_cycles(mut self, n: u64) -> Self {
        assert!(n >= 1, "timeslice_cycles must be ≥ 1");
        self.timeslice_cycles = n;
        self
    }
    /// The number of scheduler threads this config resolves to.
@@ -106,7 +133,11 @@ impl Default for Config {
        let avail = thread::available_parallelism()
            .map(|n| n.get())
            .unwrap_or(1);
-        Self { min: 1, max: avail, exact: None }
+        Self {
            min: 1, max: avail, exact: None,
            alloc_interval: crate::preempt::DEFAULT_ALLOC_INTERVAL,
            timeslice_cycles: crate::preempt::DEFAULT_TIMESLICE_CYCLES,
        }
    }
 }
@@ -270,10 +301,13 @@ pub(crate) struct RuntimeInner {
    /// Global counters for RFC 000 primitives.
    pub(crate) io_parked: AtomicU32,
    pub(crate) sleeping: AtomicU32,
    /// Preemption knobs, written into each scheduler thread's locals on startup.
    pub(crate) alloc_interval: u32,
    pub(crate) timeslice_cycles: u64,
 }
 impl RuntimeInner {
-    fn new(thread_count: usize) -> Arc<Self> {
+    fn new(thread_count: usize, alloc_interval: u32, timeslice_cycles: u64) -> Arc<Self> {
        let stats = (0..thread_count).map(|_| SchedulerStats::new()).collect();
        Arc::new(Self {
            shared: Mutex::new(SharedState::new()),
@@ -281,6 +315,8 @@ impl RuntimeInner {
            stats,
            io_parked: AtomicU32::new(0),
            sleeping: AtomicU32::new(0),
            alloc_interval,
            timeslice_cycles,
        })
    }
@@ -295,18 +331,6 @@ impl RuntimeInner {
        crate::preempt::PREEMPTION_ENABLED.with(|c| c.set(prev));
        result
    }
    /// Returns `None` when the mutex is poisoned.
    /// Used in `unpark` / channel Drop which can fire after teardown.
    pub(crate) fn try_with_shared<R>(&self, f: impl FnOnce(&mut SharedState) -> R) -> Option<R> {
        let prev = crate::preempt::PREEMPTION_ENABLED.with(|c| c.replace(false));
        let result = match self.shared.lock() {
            Ok(mut g) => Some(f(&mut g)),
            Err(p) => Some(f(&mut p.into_inner())),
        };
        crate::preempt::PREEMPTION_ENABLED.with(|c| c.set(prev));
        result
    }
 }
 // ---------------------------------------------------------------------------
@@ -322,7 +346,7 @@ pub struct Runtime {
 pub fn init(config: Config) -> Runtime {
    let n = config.resolved_thread_count();
    Runtime {
-        inner: RuntimeInner::new(n),
+        inner: RuntimeInner::new(n, config.alloc_interval, config.timeslice_cycles),
        thread_count: n,
    }
 }
@@ -526,6 +550,7 @@ fn finalize_actor(inner: &Arc<RuntimeInner>, pid: Pid, outcome: Outcome) {
 // ---------------------------------------------------------------------------
 fn schedule_loop(inner: &Arc<RuntimeInner>, slot: usize) {
    crate::preempt::configure_preempt(inner.alloc_interval, inner.timeslice_cycles);
    let stats = &inner.stats[slot];
    loop {
--- a/src/scheduler.rs
+++ b/src/scheduler.rs
@@ -11,7 +11,7 @@ use crate::actor::current_pid;
 use crate::channel::Sender;
 use crate::pid::Pid;
 use crate::runtime::{
-    self, RuntimeInner, YieldIntent, ROOT_PID, RUNTIME,
+    self, RuntimeInner, YieldIntent, RUNTIME,
 };
 use crate::supervisor::Signal;
 use std::sync::Arc;
--- a/tests/runtime.rs
+++ b/tests/runtime.rs
@@ -15,10 +15,7 @@
 //!   - Panic on one scheduler thread doesn't kill others
 use smarm::{channel, runtime::{Config, Runtime}, spawn, yield_now, JoinHandle};
-use std::sync::{
+use std::sync::{atomic::{AtomicBool, AtomicU64, Ordering}, Arc};
    atomic::{AtomicBool, AtomicU64, AtomicUsize, Ordering},
    Arc, Barrier,
 };
 use std::time::Duration;
 use std::collections::HashSet;