From d432349f99ef69644fe7a85a707af975493f441c Mon Sep 17 00:00:00 2001 From: smarm Date: Mon, 25 May 2026 22:14:07 +0200 Subject: [PATCH] Update the documentation --- .gitignore | 1 + README.md | 26 +- LOOM.md => docs/Architecture.md | 49 +- .../BENCHMARKS_AND_TUNING.md | 0 benchmarks.md => docs/benchmarks.md | 0 docs/smarm - Deep Dive.html | 1297 +++++++++++++++++ 6 files changed, 1348 insertions(+), 25 deletions(-) rename LOOM.md => docs/Architecture.md (84%) rename BENCHMARKS_AND_TUNING.md => docs/BENCHMARKS_AND_TUNING.md (100%) rename benchmarks.md => docs/benchmarks.md (100%) create mode 100644 docs/smarm - Deep Dive.html diff --git a/.gitignore b/.gitignore index a9d37c5..01eb4b6 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ target Cargo.lock +smarm_trace.json diff --git a/README.md b/README.md index 1b4a5c7..8d7dd45 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ # smarm -> Silly Marks Abstract Rust Machine. A prototype green-thread actor runtime for Rust. +> SMARM — Smarm, Marks Actor Runtime Machinery. A proof-of-concept green-thread actor runtime for Rust. -Implements the core ideas in [`LOOM.md`](./LOOM.md): green-thread actors on a +Implements the core ideas in [`Achitecture.md`](.docs/Architecture.md): green-thread actors on a shared heap, scheduled cooperatively, communicating only by `Send` messages. Erlang's isolation model without Erlang's copying GC, Rust's zero-copy ownership transfers without async's function colouring. @@ -58,7 +58,6 @@ tests/ per-module integration tests benches/ primes.rs fan-out/fan-in compute, vs tokio current_thread -LOOM.md design intent ``` ## Building and running @@ -76,7 +75,26 @@ cargo bench # primes benchmark vs tokio ## What's not here -See the **Defer** section of `LOOM.md`. Notable absences: supervisor +See the **Defer** section of `Architecture.md`. restart-intensity caps, `join!` for handle groups, stack growth via remap, hierarchical timer wheel, fd-wait timeouts, `Signal::Timeout`. Each is mechanism we know how to add; none belongs in this iteration. + +## Docs + +| Document | What it covers | +|---|---| +| [`Architecture.md`](./docs/Architecture.md) | Design intent, runtime model, and deferred work | +| [`smarm - Deep Dive.html`](./docs/smarm%20-%20Deep%20Dive.html) | Generated walkthrough of the system; good starting point | +| [`BENCHMARKS_AND_TUNING.md`](./docs/BENCHMARKS_AND_TUNING.md) | Where smarm wins and loses vs tokio, preemption knob recommendations | +| [`benchmarks.md`](./docs/benchmarks.md) | Raw benchmark results, methodology, and tuning experiment log | + +## Contributing + +This is a personal proof-of-concept. There's no PR workflow — if you fork it +and do something interesting, just send me an email. I'd genuinely like to +hear about it. + +--- + +The name is a recursive acronym. The M is for Marks, as in the BEAM — Bogdan/Björn's Erlang Abstract Machine, the virtual machine that runs Erlang and Elixir. smarm is not the BEAM. It just admires it from a safe distance. diff --git a/LOOM.md b/docs/Architecture.md similarity index 84% rename from LOOM.md rename to docs/Architecture.md index 179143c..19f4663 100644 --- a/LOOM.md +++ b/docs/Architecture.md @@ -1,4 +1,4 @@ -# Loom +# SMARM Architecture > Erlang-style actor concurrency for Rust, without the copies, the colors, or the GC pauses. @@ -11,7 +11,7 @@ draws the boundary, the borrow checker already enforces it. What it lacks is an async/await is IO-centric, colors your functions, and trades stack simplicity for state-machine complexity; OS threads are too heavy to spawn per actor. -Loom adds a third option: **green-thread actors on a shared heap**, scheduled cooperatively, with +SMARM adds a third option: **green-thread actors on a shared heap**, scheduled cooperatively, with message-passing as the only cross-actor communication primitive. You get Erlang's isolation model without Erlang's copying GC, and you get Rust's zero-copy ownership transfers without async's cognitive overhead. No function coloring. No `Box`. Just actors, messages, and the borrow checker doing what it @@ -24,14 +24,14 @@ already does. ### Actors and scheduling Each actor is a lightweight green thread with its own heap-allocated, growable stack. Stacks are -allocated via `mmap` with a guard page below the region; overflow is detected by the OS without Loom +allocated via `mmap` with a guard page below the region; overflow is detected by the OS without SMARM polling for it. Initial stacks are small and grow by remapping on demand. The scheduler runs one OS thread per CPU. Each scheduler thread loops against a single global `Mutex` queue shared across all schedulers. If queue contention becomes a measured bottleneck this can be revisited; the interface will not change. -Loom requires `panic = unwind`. Users who set `panic = abort` accept that supervision and actor +SMARM requires `panic = unwind`. Users who set `panic = abort` accept that supervision and actor isolation are silently degraded to process death. ### Process descriptor @@ -84,11 +84,11 @@ threshold is exceeded the actor yields. The workloads that starve a scheduler data transformation — are precisely the ones doing frequent allocations, so this approximation is correct by construction. -`RDTSC` is not monotonic across core migration; a slightly wrong timeslice is acceptable. Loom is +`RDTSC` is not monotonic across core migration; a slightly wrong timeslice is acceptable. SMARM is not a real-time scheduler. Known failure mode: tight no-alloc loops are invisible to this mechanism. Actors doing sustained -allocation-free compute must call `loom::yield_now()` explicitly, or offload to a thread pool +allocation-free compute must call `smarm::yield_now()` explicitly, or offload to a thread pool outside the actor scheduler (e.g. rayon). This is documented and acceptable — such loops are rare in message-passing workloads. @@ -99,12 +99,12 @@ An actor yields at: - **Channel send/recv** — the primary communication primitive - **Mutex contention** — attempting to lock a held `Arc>` parks the actor - **IO** — blocking on a socket or file descriptor parks the actor until the IO thread signals readiness -- **`loom::sleep(duration)`** — parks the actor; the timer wheel re-queues it on expiry -- **`loom::yield_now()`** — explicit cooperative yield +- **`smarm::sleep(duration)`** — parks the actor; the timer wheel re-queues it on expiry +- **`smarm::yield_now()`** — explicit cooperative yield - **Allocator preemption** — as above - **Spawn** — does not yield by default; the new actor is queued and the spawner continues -`std::thread::sleep` inside an actor blocks the entire OS thread and should never be used. Loom +`std::thread::sleep` inside an actor blocks the entire OS thread and should never be used. SMARM may emit a warning if it can detect this. ### IO thread @@ -112,7 +112,7 @@ may emit a warning if it can detect this. A single dedicated IO thread runs an `epoll`/`kqueue` loop. Actors blocking on IO register their file descriptor and PID; the IO thread moves them back into the global queue when the fd is ready. A `HashMap` maps fds to parked actors. Cancellation (actor dies while waiting on IO) -deregisters the fd. This is intentionally simple and not pluggable; Loom is not a general async +deregisters the fd. This is intentionally simple and not pluggable; SMARM is not a general async executor. ### Communication @@ -155,7 +155,7 @@ sensible global default. ### Mutex timeout -Every `loom::mutex` lock attempt is mediated by the scheduler. If the lock is not acquired within +Every `smarm::mutex` lock attempt is mediated by the scheduler. If the lock is not acquired within a configurable timeout, the actor receives a `LockTimeout` error rather than parking forever. This is a hard runtime guarantee, not a convention. Default timeout is global and configurable; individual locks and individual call sites can override it. @@ -165,9 +165,9 @@ individual locks and individual call sites can override it. Actors can spawn children and wait on a group of handles: ```rust -let h1 = loom::spawn(|| compute_a()); -let h2 = loom::spawn(|| compute_b()); -let (a, b) = loom::join!(h1, h2); +let h1 = smarm::spawn(|| compute_a()); +let h2 = smarm::spawn(|| compute_b()); +let (a, b) = smarm::join!(h1, h2); ``` `join!` parks the calling actor until all handles complete. The last child to finish re-queues the @@ -176,7 +176,7 @@ parent. This is a countdown in the parent's descriptor; no polling, no waker reg ### Timer wheel -`loom::sleep` and supervision timeouts are driven by a timer wheel in the scheduler. Sleeping +`smarm::sleep` and supervision timeouts are driven by a timer wheel in the scheduler. Sleeping actors are parked and re-queued by the timer thread on expiry. The timer wheel is internal infrastructure; its design is an implementation detail. @@ -189,22 +189,29 @@ infrastructure; its design is an implementation detail. - **Queue contention** — if `Mutex` proves to be a bottleneck under profiling, evaluate `DashMap` or a lock-free work-stealing deque (e.g. `crossbeam-deque`). Not before. - **AVX-512 context save** — extend `ContextSaveArea` when there is a concrete use case. -- **`loom::sleep` vs raw sleep semantics** — further control knobs deferred until the basic sleep +- **`smarm::sleep` vs raw sleep semantics** — further control knobs deferred until the basic sleep is working and real use cases are understood. - **Supervision tree API** — the contract is defined; the recursive hierarchy, restart strategies, and introspection API are implementation work. - **no_std support** — the assembly shim is no_std friendly but the IO thread and allocator require OS primitives. Target is no_std + `alloc` on hosted platforms; bare metal is out of scope. -- **Distribution** — Loom is a single-process runtime. No distribution protocol, no BEAM-style +- **Distribution** — SMARM is a single-process runtime. No distribution protocol, no BEAM-style clustering. --- -## What Loom is Not +## What SMARM is Not -- Not a drop-in replacement for Tokio. Loom does not implement `Future` or the async executor interface. -- Not a general allocator. Loom manages actor stacks; heap allocation for actor data goes through +- Not a drop-in replacement for Tokio. SMARM does not implement `Future` or the async executor interface. +- Not a general allocator. SMARM manages actor stacks; heap allocation for actor data goes through the system allocator. -- Not Erlang. No hot code reloading, no distribution protocol, no BEAM bytecode. Loom is a +- Not Erlang. No hot code reloading, no distribution protocol, no BEAM bytecode. SMARM is a concurrency runtime, not a platform. - Not a real-time scheduler. Timeslice accuracy is best-effort. + + +--- + +## On names + +The name is a recursive acronym. The M is for Marks, as in the BEAM — Bogdan/Björn's Erlang Abstract Machine, the virtual machine that runs Erlang and Elixir. smarm is not the BEAM. It just admires it from a safe distance. diff --git a/BENCHMARKS_AND_TUNING.md b/docs/BENCHMARKS_AND_TUNING.md similarity index 100% rename from BENCHMARKS_AND_TUNING.md rename to docs/BENCHMARKS_AND_TUNING.md diff --git a/benchmarks.md b/docs/benchmarks.md similarity index 100% rename from benchmarks.md rename to docs/benchmarks.md diff --git a/docs/smarm - Deep Dive.html b/docs/smarm - Deep Dive.html new file mode 100644 index 0000000..4f2b25e --- /dev/null +++ b/docs/smarm - Deep Dive.html @@ -0,0 +1,1297 @@ + + + + + +smarm — Deep Dive + + + + + + + + +
+ + +
+ +

Green-Thread Actor Runtime

+

Erlang's isolation model. Rust's zero-copy ownership. No function colouring.

+

+ smarm is a prototype concurrent runtime for Rust. Each actor is a green thread with its own + mmap'd stack. N OS threads share a single global run queue. Actors communicate + exclusively via message passing (owned values over channels); no shared mutable state + without an explicit Arc<Mutex<T>>. +

+

+ Preemption is allocator-driven: every Nth heap allocation, smarm reads RDTSC and yields + the actor if its timeslice has expired. No OS signals, no separate timer thread for scheduling. +

+ +
+
+
vs async/await
+

No function colouring. No Box<dyn Future>. No poll state machines. Just plain Rust functions that block.

+
+
+
vs OS threads
+

64 KB stacks instead of 8 MB. Context switch in ~10–20 ns (6 GPR saves + ret) instead of kernel mode.

+
+
+
vs Erlang BEAM
+

Zero-copy ownership via Rust's type system. No GC pause. No copying GC. Message passing is a move, not a clone.

+
+
+
+ +
+ + +
+ +

Module Map

+

13 source modules, three rough layers. The bottom layer has zero +smarm dependencies; middle layer builds the runtime machinery; top layer + is public API.

+ +
+ + + + + + + + + + + + LAYER 0 — PRIMITIVES + LAYER 1 — RUNTIME MACHINERY + LAYER 2 — PUBLIC API / FACADE + + + + + stack + mmap + guard + + + + context + naked asm CSW + + + + preempt + alloc hook + RDTSC + + + + pid + (index, gen) pair + + + + timer + min-heap + + + + supervisor + Signal enum only + + + + trace + Chrome JSON opt + + + + + actor + trampoline + TLs + + + + io + epoll + pool thread + + + + channel + MPSC, park/unpark + + + + mutex + timeout + FIFO + + + + runtime + SharedState + loop + + + + + scheduler + public API facade + + + + lib.rs + re-exports + GlobalAlloc + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + uses directly + + uses via type (Pid etc) + + public API edge + +
+ +
+
+
stack
+
Layer 0 · primitive
+

Calls mmap for a contiguous region, then mprotect's the bottom page to PROT_NONE. Stack grows downward; overflow hits the guard page → SIGSEGV. Implements Drop via munmap. Zero smarm dependencies.

+
+
+
context
+
Layer 0 · primitive
+

Two #[naked] assembly functions (switch_to_actor, switch_to_scheduler). Save 6 callee-saved GPRs, swap rsp, restore, ret. + Thread-locals hold each side's saved stack pointer. XMM registers not +saved here — compiler guarantees spill at Rust call sites.

+
+
+
preempt
+
Layer 0 · primitive
+

Implements GlobalAlloc — wraps System allocator. On every Nth alloc, reads RDTSC. If elapsed > timeslice_cycles and preemption is enabled, calls switch_to_scheduler(). Thread-locals hold the countdown, start timestamp, and an enabled flag (scheduler disables it to prevent self-preemption).

+
+
+
pid
+
Layer 0 · primitive
+

struct Pid(u32 index, u32 generation). Index = slot in the actor table. Generation increments on actor death. Stale handles are detectable: a Pid with wrong generation fails slot lookup rather than silently addressing a new actor. Solves ABA without exhausting PID space.

+
+
+
actor
+
Layer 1 · machinery
+

Owns the Stack. Defines the trampoline: every actor's first ret lands here. Trampoline reads the closure from a thread-local, calls it inside catch_unwind, writes the Outcome + to another thread-local, then yields back to the scheduler. +Thread-locals: current PID, pending closure, last outcome, done flag.

+
+
+
runtime
+
Layer 1 · core
+

The heaviest module. Contains SharedState (slot table, run queue, timers, IO), RuntimeInner (shared state behind a mutex, per-thread stats, drain lock), and schedule_loop + — the main scheduler loop that drains timers, drains IO completions, +pops actors, resumes them, and handles the post-yield intent (re-queue +vs park vs finalize).

+
+
+
channel
+
Layer 1 · primitive
+

Unbounded MPSC. Inner state is Arc<Mutex<Inner<T>>> — senders are clonable, last drop closes channel. recv(): checks queue; if empty, registers self as parked_receiver, releases the lock, calls park_current(). send(): pushes, takes the parked PID, calls unpark(pid).

+
+
+
mutex
+
Layer 1 · primitive
+

Actor-aware mutex with mandatory timeout (default 30s). Fast +path: no holder → grant immediately. Slow path: join FIFO waiter queue, +insert a WaitTimeout timer, park. On timer expiry: if actor is still in waiters, unpark it with LockTimeout. On guard drop: pop next waiter, grant, unpark.

+
+
+
io
+
Layer 1 · machinery
+

Two background OS threads: an epoll thread (waits on fds with EPOLLONESHOT; on ready, pushes FdReady completion) and a pool thread (runs blocking closures inside catch_unwind; pushes Blocking completion). Both write a wake pipe byte to stir the scheduler. Completions are drained inside schedule_loop.

+
+
+
timer
+
Layer 0 · primitive
+

BinaryHeap<Reverse<Entry>> = min-heap by deadline. Two Reason variants: Sleep (unpark unconditionally) and WaitTimeout (call target.on_timeout()). No cancellation — stale entries are no-ops on pop. Entries inserted by sleep() and mutex::lock_timeout().

+
+
+
scheduler
+
Layer 2 · public facade
+

Thin facade. Exposes spawn, yield_now, park_current, unpark, sleep, block_on_io, wait_readable, wait_writable, run. All delegate to runtime. Also owns JoinHandle and the NoPreempt RAII guard.

+
+
+
supervisor
+
Layer 0 · primitive
+

Just the Signal enum: Exit(Pid) or Panic(Pid, Box<dyn Any+Send>). No restart logic — that's user-space policy. Signals are delivered via the supervisor actor's own channel (Sender<Signal> stored in the child's slot).

+
+
+
+ +
+ + +
+ +

Who Imports What

+

The critical insight: runtime.rs is the hub. Every substantive module either feeds into it or is orchestrated by it. scheduler.rs is purely a facade — it imports runtime and re-exports it through the public API.

+ +
+ + + + + + + + + + runtime.rs + SharedState · schedule_loop + + + + stack + Stack::new() + + + + + context + switch fns + + + + + preempt + RDTSC + hook + + + + + actor + trampoline + + + + + timer + min-heap + + + + + io + epoll + pool + + + + + supervisor + Signal enum + + + + + channel + calls unpark() + + + + + mutex + calls unpark() + + + + + + scheduler.rs / lib.rs + public API re-exports · GlobalAlloc + + + + + runtime calls unpark() via scheduler + channel/mutex call unpark() directly + +
+ +
+ +

Circular dependency: channel and mutex call scheduler::unpark(), which calls into runtime. And runtime's schedule_loop resumes actors that run channel/mutex code. This is intentional — it's the cooperative unpark mechanism. It works because unpark() never blocks and preemption is disabled while holding any smarm internal lock.

+
+
+ +
+ + +
+ +

What Happens When You Call run(f)

+

Starting from user code calling smarm::run(|| { ... }). The single-threaded run() is a wrapper around runtime::init(Config::exact(1)).run(f).

+ +
+
+
1
+
+

Install panic hook (once)

+

A OnceLock guard installs a custom panic hook +that suppresses output inside actor context. Without this, concurrent +actor panics can deadlock Rust's default backtrace printer +(non-reentrant internal lock). The previous hook is chained for panics +outside actors.

+
+
+
+
2
+
+

Start IoThread io.rs

+

Creates a wake pipe (non-blocking O_NONBLOCK). Creates an epollfd. Creates a shutdown pipe and registers it in the epollfd. Spawns the epoll thread (epoll_wait loop) and the pool thread (blocking-work mpsc receiver). Both share a completion VecDeque behind a mutex.

+
+
+
+
3
+
+

Install RUNTIME thread-local runtime.rs

+

Arc<RuntimeInner> is cloned into the calling thread's RUNTIME thread-local. This makes with_runtime() work on the calling thread immediately — needed for the next step.

+
+
+
+
4
+
+

Spawn initial actor scheduler.rs

+

Calls scheduler::spawn(f). This locks SharedState, allocates a slot, creates a Stack via mmap, calls init_actor_stack() to write the initial register frame (trampoline address + 6 zero GPR slots), stores the closure in pending_closures, pushes the PID to the run queue, returns a JoinHandle.

+
+
+
+
5
+
+

Spawn N-1 OS scheduler threads

+

For each extra thread: clone Arc<RuntimeInner>, spawn OS thread, set RUNTIME and SCHED_SLOT thread-locals, enter schedule_loop. Thread 0 is the calling thread.

+
+
+
+
6
+
+

Enter schedule_loop on thread 0 runtime.rs

+

This is a loop { drain → pop → resume → handle-intent }. + Thread 0 blocks here until the run queue is empty and no timers or IO +are pending. All actors run inside this loop. This call does not return +until the program is done.

+
+
+
+
7
+
+

Shutdown sequence

+

All scheduler threads return from schedule_loop. OS threads are joined. IoThread::drop() is called: writes shutdown pipe → epoll thread exits; drops the mpsc sender → pool thread exits; closes all fds. SharedState is cleared for potential next run() call.

+
+
+
+
+ +
+ + +
+ +

The Yield → Schedule → Resume Cycle

+

This is the heartbeat of the entire runtime. Every context switch +follows exactly this path, whether triggered by a cooperative yield, +preemption, channel recv, mutex contention, or IO wait.

+ +
+ + + + + + + + + + + + ACTOR STACK + + + + SCHEDULER OS THREAD + + + + SHARED STATE + + + + + actor code running + PREEMPTION_ENABLED = true + + + + yield triggered + set YieldIntent, call switch_to_sched() + + + + + + x86-64 naked asm + push rbx,rbp,r12-r15 + save actor rsp → ACTOR_SP TL + + + + + + rsp swap + + + + scheduler resumes + pop rbx,rbp,r12-r15 + ret → back in schedule_loop() + + + + post-yield handling + PREEMPTION_ENABLED = false + check is_actor_done() + read YieldIntent + + + + + + lock shared + + + save actor.sp + if Yield: push run_queue + if Park: state=Parked + if Done: finalize_actor + + + + pop next actor + drain timers+IO first + run_queue.pop_front() + + + + + + resume actor + set TLs → switch_to_actor() + + + + + + rsp swap + + actor resumes + exactly where it yielded + +
+ +

The 6 Yield Sources

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SourceIntent setWho re-queuesNotes
yield_now()YieldScheduler immediatelyActor stays Runnable; pushed back to queue tail
Allocator preemptionYieldScheduler immediatelyRDTSC check in maybe_preempt() triggers switch_to_scheduler()
channel::recv() (empty)Parkchannel::send()unpark()Receiver PID stored in channel's parked_receiver
mutex::lock() (contended)ParkMutexGuard::drop() or timer timeoutFIFO waiter queue; timeout via WaitTimeout timer entry
sleep(d)ParkTimer heap → schedule_loop drainInserts Reason::Sleep entry; scheduler unparks on pop
wait_readable/writable(fd)Parkepoll thread → completion queue → schedulerEPOLLONESHOT; one ADD → one wakeup → one DEL per call
+
+ +
+ + +
+ +

New Actor From First Resume

+

Spawning is the trickiest part of the runtime. An actor's first +resume is fundamentally different from subsequent ones because we can't +"call" into a new stack — we have to ret into it.

+ +
+
+
1
+
+

scheduler::spawn(f) called

+

Allocates a slot from free list or grows the slots vec. Assigns Pid(index, generation). Creates a Stack (64 KB mmap + guard page).

+
+
+
+
2
+
+

Initial stack frame written context::init_actor_stack()

+

Starting from top & ~15 - 8 (aligned), pushes downward: the trampoline function pointer as the ret address, then 6 zero words for the callee-saved registers. The resulting rsp is stored as actor.sp. No actual function call has happened yet.

+
high addr ← top
+  top-8:  &trampoline   ← will be popped by 'ret'
+  top-16: 0             ← rbx
+  top-24: 0             ← rbp
+  top-32: 0             ← r12
+  top-40: 0             ← r13
+  top-48: 0             ← r14
+  top-56: 0             ← r15  ← initial rsp stored here
+
+
+
+
3
+
+

Closure stored separately

+

The closure Box<dyn FnOnce() + Send> goes into SharedState::pending_closures keyed by PID — not + on the actor's stack. This is because we can't pass it via a register +during first resume. The PID is pushed to the run queue; slot state is Runnable.

+
+
+
+
4
+
+

Scheduler picks up the PID, prepares first resume

+

Before calling switch_to_actor(), the scheduler pops the closure from pending_closures and writes it to the CURRENT_ACTOR_BOX thread-local. Then sets ACTOR_SP, sets CURRENT_PID, arms the timeslice, enables preemption.

+
+
+
+
5
+
+

First context switch lands in trampoline()

+

switch_to_actor() saves the scheduler's GPRs, loads actor.sp as the new rsp, pops the 6 zero words (restoring the "saved" registers to zero), then rets — which pops the trampoline address from the stack and jumps to it. We're now executing on the actor's stack.

+
+
+
+
6
+
+

trampoline() reads the closure and runs it

+

Takes the closure from CURRENT_ACTOR_BOX thread-local (consuming it — subsequent resumes skip this). Calls it inside panic::catch_unwind(AssertUnwindSafe(f)). The actor's code runs normally from here. Any yield (channel, mutex, preemption) calls switch_to_scheduler(); the scheduler saves actor state, processes intent, loops.

+
+
+
+
7
+
+

Actor returns → trampoline handles completion

+

If catch_unwind returns Ok(()), outcome is Exit. If it returns Err(payload), outcome is Panic(payload). Either way, outcome is written to LAST_OUTCOME thread-local, ACTOR_DONE is set to true, then switch_to_scheduler() is called for the last time. Scheduler sees is_actor_done() == true, calls finalize_actor(): delivers Signal to supervisor, unparks joiners, reclaims slot.

+
+
+
+
+ +
+ + +
+ +

Allocator-Driven Timeslicing

+ +
+
+

How it works

+

The PreemptingAllocator is installed as the process's #[global_allocator]. Its alloc(), alloc_zeroed(), and realloc() all call maybe_preempt() before delegating to the system allocator.

+

maybe_preempt() decrements a thread-local counter. Every 128 allocations (default), it reads RDTSC. If rdtsc() - timeslice_start > 300_000 cycles (~100µs at 3 GHz) and PREEMPTION_ENABLED == true, it calls switch_to_scheduler().

+

The check!() macro calls the same maybe_preempt() function — for tight loops that make no allocations.

+
+
+

Invariant: preemption must be off when holding smarm locks

+

If preemption fired while the scheduler held SharedState, the context switch would try to re-acquire the same mutex → deadlock. smarm prevents this with:

+
    +
  • PREEMPTION_ENABLED = false in the scheduler loop before/after switch_to_actor()
  • +
  • with_shared() saves and disables preemption while the mutex is held
  • +
  • NoPreempt RAII guard used in channel/mutex slow paths
  • +
  • trace::record() also disables preemption (it can allocate)
  • +
+
+ +

Known gap: tight no-alloc loops are invisible without explicit check!() calls. This is documented and by design — such loops are uncommon in message-passing workloads.

+
+
+
+ +
// preempt.rs — simplified
+pub fn maybe_preempt() {
+    ALLOC_COUNT.with(|c| {
+        let n = c.get();
+        if n == 0 {
+            c.set(ACTIVE_ALLOC_INTERVAL.with(|i| i.get()));  // reset counter
+            if PREEMPTION_ENABLED.with(|e| e.get()) {
+                let elapsed = rdtsc() - TIMESLICE_START.with(|s| s.get());
+                if elapsed > ACTIVE_TIMESLICE_CYCLES.with(|i| i.get()) {
+                    unsafe { switch_to_scheduler() };  // YieldIntent::Yield
+                }
+            }
+        } else {
+            c.set(n - 1);
+        }
+    });
+}
+
+ +
+ + +
+ +

Two Background Threads, One Wake Pipe

+ +
+ + + + + + + + + + + + + Actor + calls wait_readable(fd) + or block_on_io(f) + → park_current() + → state = Parked + + + + epoll thread + epoll_wait(-1) loop + EPOLLONESHOT per fd + on ready: push FdReady + write wake_pipe + on shutdown pipe: exit + + + + pool thread + mpsc::recv() loop + catch_unwind(closure) + push Blocking result + write wake_pipe + tx drop → exit + + + + completions + Arc<Mutex<VecDeque>> + FdReady { fd, events } + Blocking { pid, result } + drained by schedule_loop + + + + scheduler + poll(wake_fd) + drain completions + FdReady → + lookup waiters[fd] + unpark(pid) + Blocking → + store in slot, unpark + + + + + epoll_ctl ADD + + + + submit(closure) + + + + + + + + + drain + + + + + wake pipe write + +
+ +
+ 📎 +

epoll_ctl ADD/DEL is called by the scheduler thread directly on the epollfd — this is legal per the epoll_ctl(2) man page even while the epoll thread is inside epoll_wait. Avoids needing a second command channel.

+
+
+ +
+ + +
+ +

Things That Would Bite You

+ +
+
+
Lost-wakeup window
+

Between registering as a channel's parked_receiver and calling park_current(), a sender could call unpark(). At that moment the actor is still Runnable, so unpark() sets pending_unpark = true instead of re-queuing. The scheduler checks this flag after the Park yield and re-queues immediately rather than parking. This flag also protects epoll and mutex paths.

+
+
+
std::thread::sleep inside actor
+

Blocks the entire OS scheduler thread, starving every actor assigned to that thread. There's no detection. Use smarm::sleep(d) instead.

+
+
+
Allocations while holding SharedState
+

The with_shared() helper disables preemption while the mutex is held. But any code path that allocates inside with_shared and then tries to acquire SharedState again will deadlock. All internal smarm code is carefully structured to avoid this.

+
+
+
Global run queue mutex
+

All N scheduler threads contend on a single Mutex<SharedState>. + This is the primary scalability ceiling — visible in the benchmark +suite as "tokio-favored" scenarios. Identified, documented, deferred. +The fix would be per-thread deques with work stealing.

+
+
+
No timer cancellation
+

When a mutex lock is granted before its timeout, the timer +entry stays in the heap. It fires eventually, the callback sees "actor +is no longer waiting" and no-ops. Cost is ~32 bytes and a few cycles per + stale entry. Bounded by one entry per parked actor.

+
+
+
fd leak on actor death during IO wait
+

If an actor dies while waiting on an fd, the epoll registration + is leaked. EPOLLONESHOT bounds damage to one stale wakeup, which the +scheduler drops when it can't find the PID in waiters. Noted in io.rs as a known gap for a future pass.

+
+
+
XMM registers not saved in context switch
+

This is intentional and correct. XMM0–15 are +caller-saved in SysV AMD64 ABI. Every yield passes through a Rust call +site, so the compiler has already spilled live XMM values to the actor's + stack before we get to the naked asm. They're restored when the actor +resumes because they're on its own stack.

+
+
+
panic = unwind is required
+

The trampoline uses catch_unwind to intercept actor panics before they reach the naked assembly shim. If a user sets panic = abort, + panics kill the process instead of being caught — the supervision tree +collapses to process death. This is documented and the profile is set in + Cargo.toml.

+
+
+
+ +
+ + + \ No newline at end of file