diff --git a/.gitignore b/.gitignore
index a9d37c5..01eb4b6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,3 @@
 target
 Cargo.lock
+smarm_trace.json
diff --git a/README.md b/README.md
index 1b4a5c7..8d7dd45 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,8 @@
 # smarm
 
-> Silly Marks Abstract Rust Machine. A prototype green-thread actor runtime for Rust.
+> SMARM — Smarm, Marks Actor Runtime Machinery. A proof-of-concept green-thread actor runtime for Rust.
 
-Implements the core ideas in [`LOOM.md`](./LOOM.md): green-thread actors on a
+Implements the core ideas in [`Achitecture.md`](.docs/Architecture.md): green-thread actors on a
 shared heap, scheduled cooperatively, communicating only by `Send` messages.
 Erlang's isolation model without Erlang's copying GC, Rust's zero-copy
 ownership transfers without async's function colouring.
@@ -58,7 +58,6 @@ tests/
   per-module integration tests
 benches/
   primes.rs    fan-out/fan-in compute, vs tokio current_thread
-LOOM.md        design intent
 ```
 
 ## Building and running
@@ -76,7 +75,26 @@ cargo bench               # primes benchmark vs tokio
 
 ## What's not here
 
-See the **Defer** section of `LOOM.md`. Notable absences: supervisor
+See the **Defer** section of `Architecture.md`. 
 restart-intensity caps, `join!` for handle groups, stack growth via remap,
 hierarchical timer wheel, fd-wait timeouts, `Signal::Timeout`. Each is
 mechanism we know how to add; none belongs in this iteration.
+
+## Docs
+
+| Document | What it covers |
+|---|---|
+| [`Architecture.md`](./docs/Architecture.md) | Design intent, runtime model, and deferred work |
+| [`smarm - Deep Dive.html`](./docs/smarm%20-%20Deep%20Dive.html) | Generated walkthrough of the system; good starting point |
+| [`BENCHMARKS_AND_TUNING.md`](./docs/BENCHMARKS_AND_TUNING.md) | Where smarm wins and loses vs tokio, preemption knob recommendations |
+| [`benchmarks.md`](./docs/benchmarks.md) | Raw benchmark results, methodology, and tuning experiment log |
+
+## Contributing
+
+This is a personal proof-of-concept. There's no PR workflow — if you fork it
+and do something interesting, just send me an email. I'd genuinely like to
+hear about it.
+
+---
+
+<sub>The name is a recursive acronym. The M is for Marks, as in the BEAM — Bogdan/Björn's Erlang Abstract Machine, the virtual machine that runs Erlang and Elixir. smarm is not the BEAM. It just admires it from a safe distance.</sub>
diff --git a/LOOM.md b/docs/Architecture.md
similarity index 84%
rename from LOOM.md
rename to docs/Architecture.md
index 179143c..19f4663 100644
--- a/LOOM.md
+++ b/docs/Architecture.md
@@ -1,4 +1,4 @@
-# Loom
+# SMARM Architecture
 
 > Erlang-style actor concurrency for Rust, without the copies, the colors, or the GC pauses.
 
@@ -11,7 +11,7 @@ draws the boundary, the borrow checker already enforces it. What it lacks is an
 async/await is IO-centric, colors your functions, and trades stack simplicity for state-machine complexity;
 OS threads are too heavy to spawn per actor.
 
-Loom adds a third option: **green-thread actors on a shared heap**, scheduled cooperatively, with
+SMARM adds a third option: **green-thread actors on a shared heap**, scheduled cooperatively, with
 message-passing as the only cross-actor communication primitive. You get Erlang's isolation model without
 Erlang's copying GC, and you get Rust's zero-copy ownership transfers without async's cognitive overhead.
 No function coloring. No `Box<dyn Future>`. Just actors, messages, and the borrow checker doing what it
@@ -24,14 +24,14 @@ already does.
 ### Actors and scheduling
 
 Each actor is a lightweight green thread with its own heap-allocated, growable stack. Stacks are
-allocated via `mmap` with a guard page below the region; overflow is detected by the OS without Loom
+allocated via `mmap` with a guard page below the region; overflow is detected by the OS without SMARM
 polling for it. Initial stacks are small and grow by remapping on demand.
 
 The scheduler runs one OS thread per CPU. Each scheduler thread loops against a single global
 `Mutex<HashMap>` queue shared across all schedulers. If queue contention becomes a measured bottleneck
 this can be revisited; the interface will not change.
 
-Loom requires `panic = unwind`. Users who set `panic = abort` accept that supervision and actor
+SMARM requires `panic = unwind`. Users who set `panic = abort` accept that supervision and actor
 isolation are silently degraded to process death.
 
 ### Process descriptor
@@ -84,11 +84,11 @@ threshold is exceeded the actor yields. The workloads that starve a scheduler 
 data transformation — are precisely the ones doing frequent allocations, so this approximation is
 correct by construction.
 
-`RDTSC` is not monotonic across core migration; a slightly wrong timeslice is acceptable. Loom is
+`RDTSC` is not monotonic across core migration; a slightly wrong timeslice is acceptable. SMARM is
 not a real-time scheduler.
 
 Known failure mode: tight no-alloc loops are invisible to this mechanism. Actors doing sustained
-allocation-free compute must call `loom::yield_now()` explicitly, or offload to a thread pool
+allocation-free compute must call `smarm::yield_now()` explicitly, or offload to a thread pool
 outside the actor scheduler (e.g. rayon). This is documented and acceptable — such loops are rare
 in message-passing workloads.
 
@@ -99,12 +99,12 @@ An actor yields at:
 - **Channel send/recv** — the primary communication primitive
 - **Mutex contention** — attempting to lock a held `Arc<Mutex<>>` parks the actor
 - **IO** — blocking on a socket or file descriptor parks the actor until the IO thread signals readiness
-- **`loom::sleep(duration)`** — parks the actor; the timer wheel re-queues it on expiry
-- **`loom::yield_now()`** — explicit cooperative yield
+- **`smarm::sleep(duration)`** — parks the actor; the timer wheel re-queues it on expiry
+- **`smarm::yield_now()`** — explicit cooperative yield
 - **Allocator preemption** — as above
 - **Spawn** — does not yield by default; the new actor is queued and the spawner continues
 
-`std::thread::sleep` inside an actor blocks the entire OS thread and should never be used. Loom
+`std::thread::sleep` inside an actor blocks the entire OS thread and should never be used. SMARM
 may emit a warning if it can detect this.
 
 ### IO thread
@@ -112,7 +112,7 @@ may emit a warning if it can detect this.
 A single dedicated IO thread runs an `epoll`/`kqueue` loop. Actors blocking on IO register their
 file descriptor and PID; the IO thread moves them back into the global queue when the fd is ready.
 A `HashMap<RawFd, Pid>` maps fds to parked actors. Cancellation (actor dies while waiting on IO)
-deregisters the fd. This is intentionally simple and not pluggable; Loom is not a general async
+deregisters the fd. This is intentionally simple and not pluggable; SMARM is not a general async
 executor.
 
 ### Communication
@@ -155,7 +155,7 @@ sensible global default.
 
 ### Mutex timeout
 
-Every `loom::mutex` lock attempt is mediated by the scheduler. If the lock is not acquired within
+Every `smarm::mutex` lock attempt is mediated by the scheduler. If the lock is not acquired within
 a configurable timeout, the actor receives a `LockTimeout` error rather than parking forever. This
 is a hard runtime guarantee, not a convention. Default timeout is global and configurable;
 individual locks and individual call sites can override it.
@@ -165,9 +165,9 @@ individual locks and individual call sites can override it.
 Actors can spawn children and wait on a group of handles:
 
 ```rust
-let h1 = loom::spawn(|| compute_a());
-let h2 = loom::spawn(|| compute_b());
-let (a, b) = loom::join!(h1, h2);
+let h1 = smarm::spawn(|| compute_a());
+let h2 = smarm::spawn(|| compute_b());
+let (a, b) = smarm::join!(h1, h2);
 ```
 
 `join!` parks the calling actor until all handles complete. The last child to finish re-queues the
@@ -176,7 +176,7 @@ parent. This is a countdown in the parent's descriptor; no polling, no waker reg
 
 ### Timer wheel
 
-`loom::sleep` and supervision timeouts are driven by a timer wheel in the scheduler. Sleeping
+`smarm::sleep` and supervision timeouts are driven by a timer wheel in the scheduler. Sleeping
 actors are parked and re-queued by the timer thread on expiry. The timer wheel is internal
 infrastructure; its design is an implementation detail.
 
@@ -189,22 +189,29 @@ infrastructure; its design is an implementation detail.
 - **Queue contention** — if `Mutex<HashMap>` proves to be a bottleneck under profiling, evaluate
   `DashMap` or a lock-free work-stealing deque (e.g. `crossbeam-deque`). Not before.
 - **AVX-512 context save** — extend `ContextSaveArea` when there is a concrete use case.
-- **`loom::sleep` vs raw sleep semantics** — further control knobs deferred until the basic sleep
+- **`smarm::sleep` vs raw sleep semantics** — further control knobs deferred until the basic sleep
   is working and real use cases are understood.
 - **Supervision tree API** — the contract is defined; the recursive hierarchy, restart strategies,
   and introspection API are implementation work.
 - **no_std support** — the assembly shim is no_std friendly but the IO thread and allocator require
   OS primitives. Target is no_std + `alloc` on hosted platforms; bare metal is out of scope.
-- **Distribution** — Loom is a single-process runtime. No distribution protocol, no BEAM-style
+- **Distribution** — SMARM is a single-process runtime. No distribution protocol, no BEAM-style
   clustering.
 
 ---
 
-## What Loom is Not
+## What SMARM is Not
 
-- Not a drop-in replacement for Tokio. Loom does not implement `Future` or the async executor interface.
-- Not a general allocator. Loom manages actor stacks; heap allocation for actor data goes through
+- Not a drop-in replacement for Tokio. SMARM does not implement `Future` or the async executor interface.
+- Not a general allocator. SMARM manages actor stacks; heap allocation for actor data goes through
   the system allocator.
-- Not Erlang. No hot code reloading, no distribution protocol, no BEAM bytecode. Loom is a
+- Not Erlang. No hot code reloading, no distribution protocol, no BEAM bytecode. SMARM is a
   concurrency runtime, not a platform.
 - Not a real-time scheduler. Timeslice accuracy is best-effort.
+
+
+---
+
+## On names
+
+<sub>The name is a recursive acronym. The M is for Marks, as in the BEAM — Bogdan/Björn's Erlang Abstract Machine, the virtual machine that runs Erlang and Elixir. smarm is not the BEAM. It just admires it from a safe distance.</sub>
diff --git a/BENCHMARKS_AND_TUNING.md b/docs/BENCHMARKS_AND_TUNING.md
similarity index 100%
rename from BENCHMARKS_AND_TUNING.md
rename to docs/BENCHMARKS_AND_TUNING.md
diff --git a/benchmarks.md b/docs/benchmarks.md
similarity index 100%
rename from benchmarks.md
rename to docs/benchmarks.md
diff --git a/docs/smarm - Deep Dive.html b/docs/smarm - Deep Dive.html
new file mode 100644
index 0000000..4f2b25e
--- /dev/null
+++ b/docs/smarm - Deep Dive.html	
@@ -0,0 +1,1297 @@
+<!DOCTYPE html>
+<html lang="en"><head>
+<meta http-equiv="content-type" content="text/html; charset=UTF-8">
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>smarm — Deep Dive</title>
+<link rel="preconnect" href="https://fonts.googleapis.com/">
+<link href="smarm%20-%20Deep%20Dive_files/css2.css" rel="stylesheet">
+<style>
+  :root {
+    --bg: #0d0f14;
+    --surface: #13161e;
+    --surface2: #1a1e2a;
+    --border: #252a38;
+    --accent: #5b8af5;
+    --accent2: #f5a623;
+    --accent3: #4ecdc4;
+    --accent4: #ff6b6b;
+    --accent5: #a8e6cf;
+    --text: #c8d0e0;
+    --text-dim: #606880;
+    --text-bright: #e8eaf6;
+    --code-bg: #0a0c12;
+    --green: #56d364;
+    --yellow: #f0b429;
+    --red: #f85149;
+    --purple: #bc8cff;
+  }
+
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+
+  html { scroll-behavior: smooth; }
+
+  body {
+    background: var(--bg);
+    color: var(--text);
+    font-family: 'DM Sans', sans-serif;
+    font-size: 15px;
+    line-height: 1.7;
+  }
+
+  /* NAV */
+  nav {
+    position: fixed;
+    top: 0; left: 0; right: 0;
+    z-index: 100;
+    background: rgba(13,15,20,0.92);
+    backdrop-filter: blur(12px);
+    border-bottom: 1px solid var(--border);
+    padding: 0 2rem;
+    display: flex;
+    align-items: center;
+    gap: 2rem;
+    height: 52px;
+  }
+
+  .nav-brand {
+    font-family: 'JetBrains Mono', monospace;
+    font-weight: 700;
+    font-size: 1rem;
+    color: var(--accent);
+    letter-spacing: -0.02em;
+  }
+
+  nav a {
+    text-decoration: none;
+    color: var(--text-dim);
+    font-size: 0.8rem;
+    font-weight: 500;
+    letter-spacing: 0.04em;
+    text-transform: uppercase;
+    transition: color 0.2s;
+  }
+
+  nav a:hover { color: var(--text-bright); }
+
+  /* LAYOUT */
+  main {
+    max-width: 1100px;
+    margin: 0 auto;
+    padding: 80px 2rem 6rem;
+  }
+
+  section {
+    margin-bottom: 5rem;
+  }
+
+  /* TYPOGRAPHY */
+  .section-label {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.65rem;
+    letter-spacing: 0.2em;
+    text-transform: uppercase;
+    color: var(--accent);
+    margin-bottom: 0.5rem;
+  }
+
+  h1 {
+    font-family: 'DM Serif Display', serif;
+    font-size: clamp(2.5rem, 5vw, 4rem);
+    color: var(--text-bright);
+    line-height: 1.1;
+    margin-bottom: 1rem;
+  }
+
+  h2 {
+    font-family: 'DM Serif Display', serif;
+    font-size: 2rem;
+    color: var(--text-bright);
+    line-height: 1.2;
+    margin-bottom: 0.4rem;
+  }
+
+  h3 {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.9rem;
+    font-weight: 500;
+    color: var(--accent2);
+    margin-bottom: 0.8rem;
+    letter-spacing: 0.02em;
+  }
+
+  p {
+    color: var(--text);
+    margin-bottom: 1rem;
+    max-width: 72ch;
+  }
+
+  p strong { color: var(--text-bright); font-weight: 500; }
+
+  code {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.82em;
+    background: var(--code-bg);
+    color: var(--accent3);
+    padding: 0.1em 0.35em;
+    border-radius: 3px;
+    border: 1px solid var(--border);
+  }
+
+  pre {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.8rem;
+    background: var(--code-bg);
+    border: 1px solid var(--border);
+    border-radius: 8px;
+    padding: 1.2rem 1.4rem;
+    overflow-x: auto;
+    line-height: 1.6;
+    color: var(--text);
+    margin-bottom: 1.5rem;
+  }
+
+  pre .kw  { color: var(--purple); }
+  pre .fn  { color: var(--accent); }
+  pre .ty  { color: var(--accent3); }
+  pre .st  { color: var(--accent5); }
+  pre .cm  { color: var(--text-dim); font-style: italic; }
+  pre .nu  { color: var(--accent2); }
+  pre .mc  { color: var(--accent4); }
+
+  /* HERO */
+  .hero {
+    padding: 4rem 0 2rem;
+  }
+
+  .hero-tagline {
+    font-family: 'DM Serif Display', serif;
+    font-style: italic;
+    font-size: 1.2rem;
+    color: var(--text-dim);
+    margin-bottom: 2rem;
+  }
+
+  .pitch-row {
+    display: flex;
+    gap: 1.5rem;
+    margin-top: 2rem;
+    flex-wrap: wrap;
+  }
+
+  .pitch-card {
+    flex: 1;
+    min-width: 200px;
+    background: var(--surface);
+    border: 1px solid var(--border);
+    border-radius: 10px;
+    padding: 1.2rem 1.4rem;
+  }
+
+  .pitch-card .label {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.65rem;
+    letter-spacing: 0.15em;
+    text-transform: uppercase;
+    color: var(--text-dim);
+    margin-bottom: 0.4rem;
+  }
+
+  .pitch-card p {
+    font-size: 0.9rem;
+    color: var(--text);
+    margin: 0;
+  }
+
+  /* SVG DIAGRAMS */
+  .diagram-wrap {
+    background: var(--surface);
+    border: 1px solid var(--border);
+    border-radius: 12px;
+    padding: 2rem;
+    overflow-x: auto;
+    margin-bottom: 1.5rem;
+  }
+
+  .diagram-wrap svg {
+    display: block;
+    margin: 0 auto;
+  }
+
+  /* SEQUENCE / FLOW */
+  .flow-diagram {
+    display: flex;
+    flex-direction: column;
+    gap: 0;
+    max-width: 800px;
+  }
+
+  .flow-step {
+    display: flex;
+    gap: 1rem;
+    position: relative;
+  }
+
+  .flow-step::before {
+    content: '';
+    position: absolute;
+    left: 19px;
+    top: 38px;
+    bottom: -2px;
+    width: 2px;
+    background: var(--border);
+  }
+
+  .flow-step:last-child::before { display: none; }
+
+  .flow-num {
+    width: 40px;
+    height: 40px;
+    flex-shrink: 0;
+    background: var(--surface2);
+    border: 1.5px solid var(--accent);
+    border-radius: 50%;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.75rem;
+    font-weight: 700;
+    color: var(--accent);
+    margin-top: 0.5rem;
+    position: relative;
+    z-index: 1;
+  }
+
+  .flow-body {
+    padding: 0.5rem 0 1.6rem;
+    flex: 1;
+  }
+
+  .flow-body h4 {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.82rem;
+    font-weight: 500;
+    color: var(--text-bright);
+    margin-bottom: 0.25rem;
+  }
+
+  .flow-body p {
+    font-size: 0.88rem;
+    color: var(--text-dim);
+    margin: 0;
+  }
+
+  .flow-body .tag {
+    display: inline-block;
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.65rem;
+    background: var(--code-bg);
+    border: 1px solid var(--border);
+    border-radius: 4px;
+    padding: 0.1em 0.4em;
+    color: var(--accent3);
+    margin-right: 0.3rem;
+    vertical-align: middle;
+  }
+
+  /* MODULE TABLE */
+  .module-grid {
+    display: grid;
+    grid-template-columns: 1fr 1fr;
+    gap: 1rem;
+    margin-bottom: 1.5rem;
+  }
+
+  @media (max-width: 700px) {
+    .module-grid { grid-template-columns: 1fr; }
+  }
+
+  .module-card {
+    background: var(--surface);
+    border: 1px solid var(--border);
+    border-radius: 10px;
+    padding: 1rem 1.2rem;
+    transition: border-color 0.2s;
+  }
+
+  .module-card:hover {
+    border-color: var(--accent);
+  }
+
+  .module-name {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.85rem;
+    font-weight: 700;
+    color: var(--accent);
+    margin-bottom: 0.2rem;
+  }
+
+  .module-layer {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.6rem;
+    text-transform: uppercase;
+    letter-spacing: 0.1em;
+    color: var(--text-dim);
+    margin-bottom: 0.5rem;
+  }
+
+  .module-card p {
+    font-size: 0.82rem;
+    color: var(--text-dim);
+    margin: 0;
+  }
+
+  /* DIVIDER */
+  .divider {
+    height: 1px;
+    background: linear-gradient(to right, transparent, var(--border), transparent);
+    margin: 4rem 0;
+  }
+
+  /* THREAD STATE TABLE */
+  .state-table {
+    width: 100%;
+    border-collapse: collapse;
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 0.78rem;
+    margin-bottom: 1.5rem;
+  }
+
+  .state-table th {
+    background: var(--surface2);
+    color: var(--text-dim);
+    padding: 0.6rem 1rem;
+    text-align: left;
+    letter-spacing: 0.06em;
+    text-transform: uppercase;
+    font-size: 0.65rem;
+    border-bottom: 1px solid var(--border);
+  }
+
+  .state-table td {
+    padding: 0.6rem 1rem;
+    border-bottom: 1px solid var(--border);
+    color: var(--text);
+  }
+
+  .state-table tr:last-child td { border-bottom: none; }
+  .state-table tr:hover td { background: var(--surface2); }
+
+  .pill {
+    display: inline-block;
+    padding: 0.15em 0.5em;
+    border-radius: 20px;
+    font-size: 0.7em;
+    font-weight: 700;
+    letter-spacing: 0.04em;
+  }
+
+  .pill-green { background: rgba(86,211,100,0.12); color: var(--green); }
+  .pill-yellow { background: rgba(240,180,41,0.12); color: var(--yellow); }
+  .pill-red { background: rgba(248,81,73,0.12); color: var(--red); }
+  .pill-blue { background: rgba(91,138,245,0.12); color: var(--accent); }
+
+  /* CALLOUT */
+  .callout {
+    display: flex;
+    gap: 1rem;
+    background: var(--surface);
+    border: 1px solid var(--border);
+    border-left: 3px solid var(--accent2);
+    border-radius: 0 8px 8px 0;
+    padding: 1rem 1.2rem;
+    margin-bottom: 1.5rem;
+    max-width: 72ch;
+  }
+
+  .callout-icon {
+    font-size: 1.1rem;
+    flex-shrink: 0;
+    margin-top: 0.1rem;
+  }
+
+  .callout p {
+    font-size: 0.88rem;
+    margin: 0;
+    color: var(--text);
+  }
+
+  /* COLUMNS */
+  .two-col {
+    display: grid;
+    grid-template-columns: 1fr 1fr;
+    gap: 2rem;
+  }
+
+  @media (max-width: 700px) {
+    .two-col { grid-template-columns: 1fr; }
+  }
+
+  /* TICK */
+  .warn { color: var(--yellow); }
+  .good { color: var(--green); }
+
+</style>
+</head>
+<body>
+
+<nav>
+  <span class="nav-brand">smarm v0.3</span>
+  <a href="#overview">Overview</a>
+  <a href="#modules">Modules</a>
+  <a href="#deps">Dep Graph</a>
+  <a href="#init">Init</a>
+  <a href="#yield-cycle">Yield Cycle</a>
+  <a href="#spawn">Spawn</a>
+  <a href="#preempt">Preemption</a>
+  <a href="#io">IO</a>
+  <a href="#gotchas">Gotchas</a>
+</nav>
+
+<main>
+
+<!-- HERO -->
+<section class="hero" id="overview">
+  <div class="section-label">smarm — Silly Marks Abstract Rust Machine</div>
+  <h1>Green-Thread Actor Runtime</h1>
+  <p class="hero-tagline">Erlang's isolation model. Rust's zero-copy ownership. No function colouring.</p>
+  <p>
+    smarm is a prototype concurrent runtime for Rust. Each <strong>actor</strong> is a green thread with its own
+    <code>mmap</code>'d stack. N OS threads share a single global run queue. Actors communicate
+    exclusively via <strong>message passing</strong> (owned values over channels); no shared mutable state
+    without an explicit <code>Arc&lt;Mutex&lt;T&gt;&gt;</code>.
+  </p>
+  <p>
+    Preemption is <strong>allocator-driven</strong>: every Nth heap allocation, smarm reads RDTSC and yields
+    the actor if its timeslice has expired. No OS signals, no separate timer thread for scheduling.
+  </p>
+
+  <div class="pitch-row">
+    <div class="pitch-card">
+      <div class="label">vs async/await</div>
+      <p>No function colouring. No <code>Box&lt;dyn Future&gt;</code>. No poll state machines. Just plain Rust functions that block.</p>
+    </div>
+    <div class="pitch-card">
+      <div class="label">vs OS threads</div>
+      <p>64 KB stacks instead of 8 MB. Context switch in ~10–20 ns (6 GPR saves + ret) instead of kernel mode.</p>
+    </div>
+    <div class="pitch-card">
+      <div class="label">vs Erlang BEAM</div>
+      <p>Zero-copy ownership via Rust's type system. No GC pause. No copying GC. Message passing is a <code>move</code>, not a clone.</p>
+    </div>
+  </div>
+</section>
+
+<div class="divider"></div>
+
+<!-- MODULE MAP -->
+<section id="modules">
+  <div class="section-label">Architecture</div>
+  <h2>Module Map</h2>
+  <p>13 source modules, three rough layers. The bottom layer has zero 
+smarm dependencies; middle layer builds the runtime machinery; top layer
+ is public API.</p>
+
+  <div class="diagram-wrap">
+    <svg width="900" height="420" viewBox="0 0 900 420" xmlns="http://www.w3.org/2000/svg" style="max-width:100%;font-family:'JetBrains Mono',monospace">
+      <defs>
+        <marker id="arr" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
+          <path d="M0,0 L0,6 L8,3 z" fill="#252a38"></path>
+        </marker>
+        <marker id="arr-acc" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
+          <path d="M0,0 L0,6 L8,3 z" fill="#5b8af5"></path>
+        </marker>
+      </defs>
+
+      <!-- Layer labels -->
+      <text x="18" y="62" fill="#606880" font-size="9" letter-spacing="2" text-transform="uppercase">LAYER 0 — PRIMITIVES</text>
+      <text x="18" y="192" fill="#606880" font-size="9" letter-spacing="2">LAYER 1 — RUNTIME MACHINERY</text>
+      <text x="18" y="322" fill="#606880" font-size="9" letter-spacing="2">LAYER 2 — PUBLIC API / FACADE</text>
+
+      <!-- Layer 0 boxes -->
+      <!-- stack -->
+      <rect x="40" y="70" width="100" height="48" rx="6" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="90" y="89" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">stack</text>
+      <text x="90" y="105" fill="#606880" font-size="9" text-anchor="middle">mmap + guard</text>
+
+      <!-- context -->
+      <rect x="160" y="70" width="100" height="48" rx="6" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="210" y="89" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">context</text>
+      <text x="210" y="105" fill="#606880" font-size="9" text-anchor="middle">naked asm CSW</text>
+
+      <!-- preempt -->
+      <rect x="280" y="70" width="100" height="48" rx="6" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="330" y="89" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">preempt</text>
+      <text x="330" y="105" fill="#606880" font-size="9" text-anchor="middle">alloc hook + RDTSC</text>
+
+      <!-- pid -->
+      <rect x="400" y="70" width="100" height="48" rx="6" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="450" y="89" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">pid</text>
+      <text x="450" y="105" fill="#606880" font-size="9" text-anchor="middle">(index, gen) pair</text>
+
+      <!-- timer -->
+      <rect x="520" y="70" width="100" height="48" rx="6" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="570" y="89" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">timer</text>
+      <text x="570" y="105" fill="#606880" font-size="9" text-anchor="middle">min-heap</text>
+
+      <!-- supervisor -->
+      <rect x="640" y="70" width="110" height="48" rx="6" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="695" y="89" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">supervisor</text>
+      <text x="695" y="105" fill="#606880" font-size="9" text-anchor="middle">Signal enum only</text>
+
+      <!-- trace -->
+      <rect x="770" y="70" width="100" height="48" rx="6" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="820" y="89" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">trace</text>
+      <text x="820" y="105" fill="#606880" font-size="9" text-anchor="middle">Chrome JSON opt</text>
+
+      <!-- Layer 1 boxes -->
+      <!-- actor -->
+      <rect x="100" y="200" width="110" height="48" rx="6" fill="#1a1e2a" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="155" y="219" fill="#4ecdc4" font-size="11" font-weight="700" text-anchor="middle">actor</text>
+      <text x="155" y="235" fill="#606880" font-size="9" text-anchor="middle">trampoline + TLs</text>
+
+      <!-- io -->
+      <rect x="230" y="200" width="110" height="48" rx="6" fill="#1a1e2a" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="285" y="219" fill="#4ecdc4" font-size="11" font-weight="700" text-anchor="middle">io</text>
+      <text x="285" y="235" fill="#606880" font-size="9" text-anchor="middle">epoll + pool thread</text>
+
+      <!-- channel -->
+      <rect x="360" y="200" width="110" height="48" rx="6" fill="#1a1e2a" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="415" y="219" fill="#4ecdc4" font-size="11" font-weight="700" text-anchor="middle">channel</text>
+      <text x="415" y="235" fill="#606880" font-size="9" text-anchor="middle">MPSC, park/unpark</text>
+
+      <!-- mutex -->
+      <rect x="490" y="200" width="110" height="48" rx="6" fill="#1a1e2a" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="545" y="219" fill="#4ecdc4" font-size="11" font-weight="700" text-anchor="middle">mutex</text>
+      <text x="545" y="235" fill="#606880" font-size="9" text-anchor="middle">timeout + FIFO</text>
+
+      <!-- runtime -->
+      <rect x="620" y="200" width="110" height="48" rx="6" fill="#1a1e2a" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="675" y="219" fill="#4ecdc4" font-size="11" font-weight="700" text-anchor="middle">runtime</text>
+      <text x="675" y="235" fill="#606880" font-size="9" text-anchor="middle">SharedState + loop</text>
+
+      <!-- Layer 2 -->
+      <!-- scheduler -->
+      <rect x="270" y="330" width="130" height="48" rx="6" fill="#0a0c12" stroke="#5b8af5" stroke-width="1.5"></rect>
+      <text x="335" y="349" fill="#f5a623" font-size="11" font-weight="700" text-anchor="middle">scheduler</text>
+      <text x="335" y="365" fill="#606880" font-size="9" text-anchor="middle">public API facade</text>
+
+      <!-- lib -->
+      <rect x="430" y="330" width="130" height="48" rx="6" fill="#0a0c12" stroke="#5b8af5" stroke-width="1.5"></rect>
+      <text x="495" y="349" fill="#f5a623" font-size="11" font-weight="700" text-anchor="middle">lib.rs</text>
+      <text x="495" y="365" fill="#606880" font-size="9" text-anchor="middle">re-exports + GlobalAlloc</text>
+
+      <!-- ARROWS: L0 → L1 (faint) -->
+      <!-- stack → actor -->
+      <line x1="115" y1="118" x2="130" y2="198" stroke="#252a38" stroke-width="1" marker-end="url(#arr)"></line>
+      <!-- context → actor -->
+      <line x1="210" y1="118" x2="170" y2="198" stroke="#252a38" stroke-width="1" marker-end="url(#arr)"></line>
+      <!-- context → runtime -->
+      <line x1="240" y1="118" x2="640" y2="198" stroke="#252a38" stroke-width="1" stroke-dasharray="3,3"></line>
+      <!-- preempt → runtime -->
+      <line x1="350" y1="118" x2="650" y2="198" stroke="#252a38" stroke-width="1" marker-end="url(#arr)"></line>
+      <!-- pid → actor -->
+      <line x1="430" y1="118" x2="200" y2="198" stroke="#252a38" stroke-width="1" stroke-dasharray="3,3"></line>
+      <!-- pid → runtime -->
+      <line x1="470" y1="118" x2="655" y2="198" stroke="#252a38" stroke-width="1" stroke-dasharray="3,3"></line>
+      <!-- timer → runtime -->
+      <line x1="570" y1="118" x2="685" y2="198" stroke="#252a38" stroke-width="1" marker-end="url(#arr)"></line>
+      <!-- timer → mutex -->
+      <line x1="545" y1="118" x2="530" y2="198" stroke="#252a38" stroke-width="1" marker-end="url(#arr)"></line>
+      <!-- supervisor → runtime (via channel) -->
+      <line x1="695" y1="118" x2="720" y2="198" stroke="#252a38" stroke-width="1" stroke-dasharray="3,3"></line>
+
+      <!-- ARROWS: L1 → L2 -->
+      <!-- runtime → scheduler -->
+      <line x1="645" y1="248" x2="390" y2="328" stroke="#5b8af5" stroke-width="1.5" marker-end="url(#arr-acc)" stroke-dasharray="4,2"></line>
+      <!-- actor → scheduler -->
+      <line x1="185" y1="248" x2="295" y2="328" stroke="#5b8af5" stroke-width="1.5" marker-end="url(#arr-acc)"></line>
+      <!-- channel → scheduler -->
+      <line x1="415" y1="248" x2="380" y2="328" stroke="#5b8af5" stroke-width="1.5" marker-end="url(#arr-acc)"></line>
+      <!-- mutex → scheduler -->
+      <line x1="525" y1="248" x2="420" y2="328" stroke="#5b8af5" stroke-width="1.5" marker-end="url(#arr-acc)" stroke-dasharray="4,2"></line>
+      <!-- runtime → lib -->
+      <line x1="700" y1="248" x2="540" y2="328" stroke="#5b8af5" stroke-width="1.5" marker-end="url(#arr-acc)" stroke-dasharray="4,2"></line>
+
+      <!-- Legend -->
+      <line x1="40" y1="400" x2="70" y2="400" stroke="#252a38" stroke-width="1.5"></line>
+      <text x="76" y="404" fill="#606880" font-size="9">uses directly</text>
+      <line x1="160" y1="400" x2="190" y2="400" stroke="#252a38" stroke-width="1.5" stroke-dasharray="3,3"></line>
+      <text x="196" y="404" fill="#606880" font-size="9">uses via type (Pid etc)</text>
+      <line x1="340" y1="400" x2="370" y2="400" stroke="#5b8af5" stroke-width="1.5"></line>
+      <text x="376" y="404" fill="#606880" font-size="9">public API edge</text>
+    </svg>
+  </div>
+
+  <div class="module-grid">
+    <div class="module-card">
+      <div class="module-name">stack</div>
+      <div class="module-layer">Layer 0 · primitive</div>
+      <p>Calls <code>mmap</code> for a contiguous region, then <code>mprotect</code>'s the bottom page to <code>PROT_NONE</code>. Stack grows downward; overflow hits the guard page → SIGSEGV. Implements <code>Drop</code> via <code>munmap</code>. Zero smarm dependencies.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">context</div>
+      <div class="module-layer">Layer 0 · primitive</div>
+      <p>Two <code>#[naked]</code> assembly functions (<code>switch_to_actor</code>, <code>switch_to_scheduler</code>). Save 6 callee-saved GPRs, swap <code>rsp</code>, restore, <code>ret</code>.
+ Thread-locals hold each side's saved stack pointer. XMM registers not 
+saved here — compiler guarantees spill at Rust call sites.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">preempt</div>
+      <div class="module-layer">Layer 0 · primitive</div>
+      <p>Implements <code>GlobalAlloc</code> — wraps <code>System</code> allocator. On every Nth alloc, reads RDTSC. If elapsed &gt; <code>timeslice_cycles</code> and preemption is enabled, calls <code>switch_to_scheduler()</code>. Thread-locals hold the countdown, start timestamp, and an enabled flag (scheduler disables it to prevent self-preemption).</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">pid</div>
+      <div class="module-layer">Layer 0 · primitive</div>
+      <p><code>struct Pid(u32 index, u32 generation)</code>. Index = slot in the actor table. Generation increments on actor death. Stale handles are detectable: a <code>Pid</code> with wrong generation fails slot lookup rather than silently addressing a new actor. Solves ABA without exhausting PID space.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">actor</div>
+      <div class="module-layer">Layer 1 · machinery</div>
+      <p>Owns the <code>Stack</code>. Defines the <code>trampoline</code>: every actor's first <code>ret</code> lands here. Trampoline reads the closure from a thread-local, calls it inside <code>catch_unwind</code>, writes the <code>Outcome</code>
+ to another thread-local, then yields back to the scheduler. 
+Thread-locals: current PID, pending closure, last outcome, done flag.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">runtime</div>
+      <div class="module-layer">Layer 1 · core</div>
+      <p>The heaviest module. Contains <code>SharedState</code> (slot table, run queue, timers, IO), <code>RuntimeInner</code> (shared state behind a mutex, per-thread stats, drain lock), and <code>schedule_loop</code>
+ — the main scheduler loop that drains timers, drains IO completions, 
+pops actors, resumes them, and handles the post-yield intent (re-queue 
+vs park vs finalize).</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">channel</div>
+      <div class="module-layer">Layer 1 · primitive</div>
+      <p>Unbounded MPSC. Inner state is <code>Arc&lt;Mutex&lt;Inner&lt;T&gt;&gt;&gt;</code> — senders are clonable, last drop closes channel. <code>recv()</code>: checks queue; if empty, registers self as <code>parked_receiver</code>, releases the lock, calls <code>park_current()</code>. <code>send()</code>: pushes, takes the parked PID, calls <code>unpark(pid)</code>.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">mutex</div>
+      <div class="module-layer">Layer 1 · primitive</div>
+      <p>Actor-aware mutex with mandatory timeout (default 30s). Fast 
+path: no holder → grant immediately. Slow path: join FIFO waiter queue, 
+insert a <code>WaitTimeout</code> timer, park. On timer expiry: if actor is still in waiters, unpark it with <code>LockTimeout</code>. On guard drop: pop next waiter, grant, unpark.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">io</div>
+      <div class="module-layer">Layer 1 · machinery</div>
+      <p>Two background OS threads: an <strong>epoll thread</strong> (waits on fds with EPOLLONESHOT; on ready, pushes <code>FdReady</code> completion) and a <strong>pool thread</strong> (runs blocking closures inside <code>catch_unwind</code>; pushes <code>Blocking</code> completion). Both write a wake pipe byte to stir the scheduler. Completions are drained inside <code>schedule_loop</code>.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">timer</div>
+      <div class="module-layer">Layer 0 · primitive</div>
+      <p><code>BinaryHeap&lt;Reverse&lt;Entry&gt;&gt;</code> = min-heap by deadline. Two <code>Reason</code> variants: <code>Sleep</code> (unpark unconditionally) and <code>WaitTimeout</code> (call <code>target.on_timeout()</code>). No cancellation — stale entries are no-ops on pop. Entries inserted by <code>sleep()</code> and <code>mutex::lock_timeout()</code>.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">scheduler</div>
+      <div class="module-layer">Layer 2 · public facade</div>
+      <p>Thin facade. Exposes <code>spawn</code>, <code>yield_now</code>, <code>park_current</code>, <code>unpark</code>, <code>sleep</code>, <code>block_on_io</code>, <code>wait_readable</code>, <code>wait_writable</code>, <code>run</code>. All delegate to <code>runtime</code>. Also owns <code>JoinHandle</code> and the <code>NoPreempt</code> RAII guard.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name">supervisor</div>
+      <div class="module-layer">Layer 0 · primitive</div>
+      <p>Just the <code>Signal</code> enum: <code>Exit(Pid)</code> or <code>Panic(Pid, Box&lt;dyn Any+Send&gt;)</code>. No restart logic — that's user-space policy. Signals are delivered via the supervisor actor's own channel (<code>Sender&lt;Signal&gt;</code> stored in the child's slot).</p>
+    </div>
+  </div>
+</section>
+
+<div class="divider"></div>
+
+<!-- DEPENDENCY GRAPH -->
+<section id="deps">
+  <div class="section-label">Dependency Graph</div>
+  <h2>Who Imports What</h2>
+  <p>The critical insight: <code>runtime.rs</code> is the hub. Every substantive module either feeds into it or is orchestrated by it. <code>scheduler.rs</code> is purely a facade — it imports <code>runtime</code> and re-exports it through the public API.</p>
+
+  <div class="diagram-wrap">
+    <svg width="860" height="320" viewBox="0 0 860 320" xmlns="http://www.w3.org/2000/svg" style="max-width:100%;font-family:'JetBrains Mono',monospace">
+      <defs>
+        <marker id="a2" markerWidth="7" markerHeight="7" refX="5" refY="3" orient="auto">
+          <path d="M0,0 L0,6 L7,3 z" fill="#5b8af5"></path>
+        </marker>
+      </defs>
+
+      <!-- runtime center -->
+      <rect x="340" y="120" width="140" height="52" rx="8" fill="#0d1220" stroke="#5b8af5" stroke-width="2"></rect>
+      <text x="410" y="141" fill="#5b8af5" font-size="12" font-weight="700" text-anchor="middle">runtime.rs</text>
+      <text x="410" y="157" fill="#606880" font-size="9" text-anchor="middle">SharedState · schedule_loop</text>
+
+      <!-- Feeding into runtime: stack -->
+      <rect x="30" y="20" width="90" height="38" rx="5" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="75" y="38" fill="#4ecdc4" font-size="10" font-weight="700" text-anchor="middle">stack</text>
+      <text x="75" y="51" fill="#606880" font-size="8" text-anchor="middle">Stack::new()</text>
+      <line x1="120" y1="39" x2="340" y2="135" stroke="#5b8af5" stroke-width="1" marker-end="url(#a2)"></line>
+
+      <!-- context -->
+      <rect x="140" y="20" width="90" height="38" rx="5" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="185" y="38" fill="#4ecdc4" font-size="10" font-weight="700" text-anchor="middle">context</text>
+      <text x="185" y="51" fill="#606880" font-size="8" text-anchor="middle">switch fns</text>
+      <line x1="215" y1="39" x2="355" y2="120" stroke="#5b8af5" stroke-width="1" marker-end="url(#a2)"></line>
+
+      <!-- preempt -->
+      <rect x="250" y="20" width="90" height="38" rx="5" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="295" y="38" fill="#4ecdc4" font-size="10" font-weight="700" text-anchor="middle">preempt</text>
+      <text x="295" y="51" fill="#606880" font-size="8" text-anchor="middle">RDTSC + hook</text>
+      <line x1="310" y1="58" x2="376" y2="120" stroke="#5b8af5" stroke-width="1" marker-end="url(#a2)"></line>
+
+      <!-- actor -->
+      <rect x="360" y="20" width="90" height="38" rx="5" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="405" y="38" fill="#4ecdc4" font-size="10" font-weight="700" text-anchor="middle">actor</text>
+      <text x="405" y="51" fill="#606880" font-size="8" text-anchor="middle">trampoline</text>
+      <line x1="405" y1="58" x2="405" y2="120" stroke="#5b8af5" stroke-width="1" marker-end="url(#a2)"></line>
+
+      <!-- timer -->
+      <rect x="470" y="20" width="90" height="38" rx="5" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="515" y="38" fill="#4ecdc4" font-size="10" font-weight="700" text-anchor="middle">timer</text>
+      <text x="515" y="51" fill="#606880" font-size="8" text-anchor="middle">min-heap</text>
+      <line x1="510" y1="58" x2="450" y2="120" stroke="#5b8af5" stroke-width="1" marker-end="url(#a2)"></line>
+
+      <!-- io -->
+      <rect x="575" y="20" width="90" height="38" rx="5" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="620" y="38" fill="#4ecdc4" font-size="10" font-weight="700" text-anchor="middle">io</text>
+      <text x="620" y="51" fill="#606880" font-size="8" text-anchor="middle">epoll + pool</text>
+      <line x1="600" y1="58" x2="480" y2="120" stroke="#5b8af5" stroke-width="1" marker-end="url(#a2)"></line>
+
+      <!-- supervisor -->
+      <rect x="680" y="20" width="100" height="38" rx="5" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="730" y="38" fill="#4ecdc4" font-size="10" font-weight="700" text-anchor="middle">supervisor</text>
+      <text x="730" y="51" fill="#606880" font-size="8" text-anchor="middle">Signal enum</text>
+      <line x1="710" y1="58" x2="480" y2="135" stroke="#5b8af5" stroke-width="1" marker-end="url(#a2)"></line>
+
+      <!-- channel + mutex → runtime (through scheduler) -->
+      <rect x="140" y="220" width="100" height="38" rx="5" fill="#1a1e2a" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="190" y="238" fill="#4ecdc4" font-size="10" font-weight="700" text-anchor="middle">channel</text>
+      <text x="190" y="251" fill="#606880" font-size="8" text-anchor="middle">calls unpark()</text>
+      <!-- channel → runtime -->
+      <line x1="240" y1="240" x2="340" y2="172" stroke="#5b8af5" stroke-width="1" marker-end="url(#a2)" stroke-dasharray="4,2"></line>
+
+      <rect x="560" y="220" width="100" height="38" rx="5" fill="#1a1e2a" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="610" y="238" fill="#4ecdc4" font-size="10" font-weight="700" text-anchor="middle">mutex</text>
+      <text x="610" y="251" fill="#606880" font-size="8" text-anchor="middle">calls unpark()</text>
+      <!-- mutex → runtime -->
+      <line x1="565" y1="240" x2="480" y2="172" stroke="#5b8af5" stroke-width="1" marker-end="url(#a2)" stroke-dasharray="4,2"></line>
+
+      <!-- scheduler facade at bottom -->
+      <rect x="300" y="260" width="260" height="48" rx="8" fill="#0a0c12" stroke="#f5a623" stroke-width="2"></rect>
+      <text x="430" y="279" fill="#f5a623" font-size="12" font-weight="700" text-anchor="middle">scheduler.rs  /  lib.rs</text>
+      <text x="430" y="295" fill="#606880" font-size="9" text-anchor="middle">public API re-exports · GlobalAlloc</text>
+      <!-- scheduler ← runtime -->
+      <line x1="410" y1="172" x2="390" y2="260" stroke="#f5a623" stroke-width="1.5" marker-end="url(#a2)" stroke-dasharray="5,2"></line>
+
+      <!-- Mutual: runtime calls channel/mutex unpark via scheduler -->
+      <text x="430" y="210" fill="#606880" font-size="8" text-anchor="middle">runtime calls unpark() via scheduler</text>
+      <text x="430" y="220" fill="#606880" font-size="8" text-anchor="middle">channel/mutex call unpark() directly</text>
+    </svg>
+  </div>
+
+  <div class="callout">
+    <span class="callout-icon">⚠</span>
+    <p><strong>Circular dependency:</strong> <code>channel</code> and <code>mutex</code> call <code>scheduler::unpark()</code>, which calls into <code>runtime</code>. And <code>runtime</code>'s <code>schedule_loop</code> resumes actors that run channel/mutex code. This is intentional — it's the cooperative unpark mechanism. It works because <code>unpark()</code> never blocks and preemption is disabled while holding any smarm internal lock.</p>
+  </div>
+</section>
+
+<div class="divider"></div>
+
+<!-- INIT SEQUENCE -->
+<section id="init">
+  <div class="section-label">Initialisation</div>
+  <h2>What Happens When You Call <code>run(f)</code></h2>
+  <p>Starting from user code calling <code>smarm::run(|| { ... })</code>. The single-threaded <code>run()</code> is a wrapper around <code>runtime::init(Config::exact(1)).run(f)</code>.</p>
+
+  <div class="flow-diagram">
+    <div class="flow-step">
+      <div class="flow-num">1</div>
+      <div class="flow-body">
+        <h4>Install panic hook (once)</h4>
+        <p>A <code>OnceLock</code> guard installs a custom panic hook 
+that suppresses output inside actor context. Without this, concurrent 
+actor panics can deadlock Rust's default backtrace printer 
+(non-reentrant internal lock). The previous hook is chained for panics 
+outside actors.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">2</div>
+      <div class="flow-body">
+        <h4>Start <code>IoThread</code> <span class="tag">io.rs</span></h4>
+        <p>Creates a wake pipe (non-blocking <code>O_NONBLOCK</code>). Creates an <code>epollfd</code>. Creates a shutdown pipe and registers it in the epollfd. Spawns the <strong>epoll thread</strong> (<code>epoll_wait</code> loop) and the <strong>pool thread</strong> (blocking-work mpsc receiver). Both share a completion <code>VecDeque</code> behind a mutex.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">3</div>
+      <div class="flow-body">
+        <h4>Install <code>RUNTIME</code> thread-local <span class="tag">runtime.rs</span></h4>
+        <p><code>Arc&lt;RuntimeInner&gt;</code> is cloned into the calling thread's <code>RUNTIME</code> thread-local. This makes <code>with_runtime()</code> work on the calling thread immediately — needed for the next step.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">4</div>
+      <div class="flow-body">
+        <h4>Spawn initial actor <span class="tag">scheduler.rs</span></h4>
+        <p>Calls <code>scheduler::spawn(f)</code>. This locks <code>SharedState</code>, allocates a slot, creates a <code>Stack</code> via <code>mmap</code>, calls <code>init_actor_stack()</code> to write the initial register frame (trampoline address + 6 zero GPR slots), stores the closure in <code>pending_closures</code>, pushes the PID to the run queue, returns a <code>JoinHandle</code>.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">5</div>
+      <div class="flow-body">
+        <h4>Spawn N-1 OS scheduler threads</h4>
+        <p>For each extra thread: clone <code>Arc&lt;RuntimeInner&gt;</code>, spawn OS thread, set <code>RUNTIME</code> and <code>SCHED_SLOT</code> thread-locals, enter <code>schedule_loop</code>. Thread 0 is the calling thread.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">6</div>
+      <div class="flow-body">
+        <h4>Enter <code>schedule_loop</code> on thread 0 <span class="tag">runtime.rs</span></h4>
+        <p>This is a <code>loop { drain → pop → resume → handle-intent }</code>.
+ Thread 0 blocks here until the run queue is empty and no timers or IO 
+are pending. All actors run inside this loop. This call does not return 
+until the program is done.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">7</div>
+      <div class="flow-body">
+        <h4>Shutdown sequence</h4>
+        <p>All scheduler threads return from <code>schedule_loop</code>. OS threads are joined. <code>IoThread::drop()</code> is called: writes shutdown pipe → epoll thread exits; drops the mpsc sender → pool thread exits; closes all fds. <code>SharedState</code> is cleared for potential next <code>run()</code> call.</p>
+      </div>
+    </div>
+  </div>
+</section>
+
+<div class="divider"></div>
+
+<!-- YIELD CYCLE -->
+<section id="yield-cycle">
+  <div class="section-label">Core Mechanism</div>
+  <h2>The Yield → Schedule → Resume Cycle</h2>
+  <p>This is the heartbeat of the entire runtime. Every context switch 
+follows exactly this path, whether triggered by a cooperative yield, 
+preemption, channel recv, mutex contention, or IO wait.</p>
+
+  <div class="diagram-wrap">
+    <svg width="820" height="480" viewBox="0 0 820 480" xmlns="http://www.w3.org/2000/svg" style="max-width:100%;font-family:'JetBrains Mono',monospace">
+      <defs>
+        <marker id="ab" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
+          <path d="M0,0 L0,6 L8,3 z" fill="#f5a623"></path>
+        </marker>
+        <marker id="ab2" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
+          <path d="M0,0 L0,6 L8,3 z" fill="#4ecdc4"></path>
+        </marker>
+      </defs>
+
+      <!-- ACTOR column -->
+      <text x="130" y="28" fill="#606880" font-size="9" text-anchor="middle" letter-spacing="2">ACTOR STACK</text>
+      <line x1="130" y1="35" x2="130" y2="450" stroke="#252a38" stroke-width="1" stroke-dasharray="4,3"></line>
+
+      <!-- SCHEDULER column -->
+      <text x="430" y="28" fill="#606880" font-size="9" text-anchor="middle" letter-spacing="2">SCHEDULER OS THREAD</text>
+      <line x1="430" y1="35" x2="430" y2="450" stroke="#252a38" stroke-width="1" stroke-dasharray="4,3"></line>
+
+      <!-- RUNTIME column -->
+      <text x="700" y="28" fill="#606880" font-size="9" text-anchor="middle" letter-spacing="2">SHARED STATE</text>
+      <line x1="700" y1="35" x2="700" y2="450" stroke="#252a38" stroke-width="1" stroke-dasharray="4,3"></line>
+
+      <!-- Step A: Actor running -->
+      <rect x="50" y="45" width="160" height="38" rx="5" fill="#13161e" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="130" y="62" fill="#a8e6cf" font-size="10" font-weight="500" text-anchor="middle">actor code running</text>
+      <text x="130" y="75" fill="#606880" font-size="8" text-anchor="middle">PREEMPTION_ENABLED = true</text>
+
+      <!-- Step B: yield triggered -->
+      <rect x="50" y="110" width="160" height="38" rx="5" fill="#1a1e2a" stroke="#f5a623" stroke-width="1.5"></rect>
+      <text x="130" y="127" fill="#f5a623" font-size="10" font-weight="500" text-anchor="middle">yield triggered</text>
+      <text x="130" y="140" fill="#606880" font-size="8" text-anchor="middle">set YieldIntent, call switch_to_sched()</text>
+
+      <line x1="130" y1="83" x2="130" y2="110" stroke="#f5a623" stroke-width="1" marker-end="url(#ab)"></line>
+
+      <!-- Step C: assembly runs -->
+      <rect x="50" y="175" width="160" height="50" rx="5" fill="#0a0c12" stroke="#252a38" stroke-width="1"></rect>
+      <text x="130" y="192" fill="#bc8cff" font-size="9" font-weight="700" text-anchor="middle">x86-64 naked asm</text>
+      <text x="130" y="205" fill="#606880" font-size="8" text-anchor="middle">push rbx,rbp,r12-r15</text>
+      <text x="130" y="216" fill="#606880" font-size="8" text-anchor="middle">save actor rsp → ACTOR_SP TL</text>
+
+      <line x1="130" y1="148" x2="130" y2="175" stroke="#606880" stroke-width="1" marker-end="url(#ab)"></line>
+
+      <!-- Arrow: actor rsp saved, load sched rsp -->
+      <line x1="210" y1="200" x2="350" y2="200" stroke="#f5a623" stroke-width="1.5" marker-end="url(#ab)"></line>
+      <text x="280" y="194" fill="#606880" font-size="8" text-anchor="middle">rsp swap</text>
+
+      <!-- Scheduler resumes from switch_to_actor() call -->
+      <rect x="350" y="175" width="160" height="50" rx="5" fill="#0a0c12" stroke="#252a38" stroke-width="1"></rect>
+      <text x="430" y="192" fill="#bc8cff" font-size="9" font-weight="700" text-anchor="middle">scheduler resumes</text>
+      <text x="430" y="205" fill="#606880" font-size="8" text-anchor="middle">pop rbx,rbp,r12-r15</text>
+      <text x="430" y="216" fill="#606880" font-size="8" text-anchor="middle">ret → back in schedule_loop()</text>
+
+      <!-- Post-yield handling -->
+      <rect x="350" y="250" width="160" height="60" rx="5" fill="#13161e" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="430" y="267" fill="#4ecdc4" font-size="10" font-weight="500" text-anchor="middle">post-yield handling</text>
+      <text x="430" y="280" fill="#606880" font-size="8" text-anchor="middle">PREEMPTION_ENABLED = false</text>
+      <text x="430" y="292" fill="#606880" font-size="8" text-anchor="middle">check is_actor_done()</text>
+      <text x="430" y="303" fill="#606880" font-size="8" text-anchor="middle">read YieldIntent</text>
+
+      <line x1="430" y1="225" x2="430" y2="250" stroke="#4ecdc4" stroke-width="1" marker-end="url(#ab2)"></line>
+
+      <!-- update SharedState -->
+      <line x1="510" y1="280" x2="640" y2="280" stroke="#4ecdc4" stroke-width="1" marker-end="url(#ab2)"></line>
+      <text x="575" y="274" fill="#606880" font-size="8" text-anchor="middle">lock shared</text>
+
+      <rect x="640" y="255" width="130" height="60" rx="5" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="705" y="272" fill="#a8e6cf" font-size="9" text-anchor="middle">save actor.sp</text>
+      <text x="705" y="285" fill="#a8e6cf" font-size="9" text-anchor="middle">if Yield: push run_queue</text>
+      <text x="705" y="298" fill="#a8e6cf" font-size="9" text-anchor="middle">if Park: state=Parked</text>
+      <text x="705" y="311" fill="#a8e6cf" font-size="9" text-anchor="middle">if Done: finalize_actor</text>
+
+      <!-- Next loop iteration: pop actor -->
+      <rect x="350" y="360" width="160" height="48" rx="5" fill="#13161e" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="430" y="377" fill="#4ecdc4" font-size="10" font-weight="500" text-anchor="middle">pop next actor</text>
+      <text x="430" y="390" fill="#606880" font-size="8" text-anchor="middle">drain timers+IO first</text>
+      <text x="430" y="400" fill="#606880" font-size="8" text-anchor="middle">run_queue.pop_front()</text>
+
+      <line x1="430" y1="315" x2="430" y2="360" stroke="#4ecdc4" stroke-width="1" marker-end="url(#ab2)"></line>
+
+      <!-- Resume: set TLs, call switch_to_actor -->
+      <rect x="350" y="430" width="160" height="38" rx="5" fill="#1a1e2a" stroke="#56d364" stroke-width="1.5"></rect>
+      <text x="430" y="447" fill="#56d364" font-size="10" font-weight="500" text-anchor="middle">resume actor</text>
+      <text x="430" y="460" fill="#606880" font-size="8" text-anchor="middle">set TLs → switch_to_actor()</text>
+
+      <line x1="430" y1="408" x2="430" y2="430" stroke="#56d364" stroke-width="1" marker-end="url(#ab2)"></line>
+
+      <!-- actor back -->
+      <line x1="350" y1="449" x2="210" y2="449" stroke="#56d364" stroke-width="1.5" marker-end="url(#ab2)"></line>
+      <text x="280" y="443" fill="#606880" font-size="8" text-anchor="middle">rsp swap</text>
+      <rect x="50" y="430" width="160" height="38" rx="5" fill="#13161e" stroke="#56d364" stroke-width="1.5"></rect>
+      <text x="130" y="447" fill="#56d364" font-size="10" font-weight="500" text-anchor="middle">actor resumes</text>
+      <text x="130" y="460" fill="#606880" font-size="8" text-anchor="middle">exactly where it yielded</text>
+    </svg>
+  </div>
+
+  <h3>The 6 Yield Sources</h3>
+  <table class="state-table">
+    <thead>
+      <tr>
+        <th>Source</th>
+        <th>Intent set</th>
+        <th>Who re-queues</th>
+        <th>Notes</th>
+      </tr>
+    </thead>
+    <tbody>
+      <tr>
+        <td><code>yield_now()</code></td>
+        <td><span class="pill pill-green">Yield</span></td>
+        <td>Scheduler immediately</td>
+        <td>Actor stays Runnable; pushed back to queue tail</td>
+      </tr>
+      <tr>
+        <td>Allocator preemption</td>
+        <td><span class="pill pill-green">Yield</span></td>
+        <td>Scheduler immediately</td>
+        <td>RDTSC check in <code>maybe_preempt()</code> triggers <code>switch_to_scheduler()</code></td>
+      </tr>
+      <tr>
+        <td><code>channel::recv()</code> (empty)</td>
+        <td><span class="pill pill-yellow">Park</span></td>
+        <td><code>channel::send()</code> → <code>unpark()</code></td>
+        <td>Receiver PID stored in channel's <code>parked_receiver</code></td>
+      </tr>
+      <tr>
+        <td><code>mutex::lock()</code> (contended)</td>
+        <td><span class="pill pill-yellow">Park</span></td>
+        <td><code>MutexGuard::drop()</code> or timer timeout</td>
+        <td>FIFO waiter queue; timeout via <code>WaitTimeout</code> timer entry</td>
+      </tr>
+      <tr>
+        <td><code>sleep(d)</code></td>
+        <td><span class="pill pill-yellow">Park</span></td>
+        <td>Timer heap → <code>schedule_loop</code> drain</td>
+        <td>Inserts <code>Reason::Sleep</code> entry; scheduler unparks on pop</td>
+      </tr>
+      <tr>
+        <td><code>wait_readable/writable(fd)</code></td>
+        <td><span class="pill pill-yellow">Park</span></td>
+        <td>epoll thread → completion queue → scheduler</td>
+        <td>EPOLLONESHOT; one ADD → one wakeup → one DEL per call</td>
+      </tr>
+    </tbody>
+  </table>
+</section>
+
+<div class="divider"></div>
+
+<!-- SPAWN WALKTHROUGH -->
+<section id="spawn">
+  <div class="section-label">Spawn Mechanics</div>
+  <h2>New Actor From First Resume</h2>
+  <p>Spawning is the trickiest part of the runtime. An actor's first 
+resume is fundamentally different from subsequent ones because we can't 
+"call" into a new stack — we have to <code>ret</code> into it.</p>
+
+  <div class="flow-diagram">
+    <div class="flow-step">
+      <div class="flow-num">1</div>
+      <div class="flow-body">
+        <h4><code>scheduler::spawn(f)</code> called</h4>
+        <p>Allocates a slot from free list or grows the slots vec. Assigns <code>Pid(index, generation)</code>. Creates a <code>Stack</code> (64 KB <code>mmap</code> + guard page).</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">2</div>
+      <div class="flow-body">
+        <h4>Initial stack frame written <span class="tag">context::init_actor_stack()</span></h4>
+        <p>Starting from <code>top &amp; ~15 - 8</code> (aligned), pushes downward: the <code>trampoline</code> function pointer as the <code>ret</code> address, then 6 zero words for the callee-saved registers. The resulting <code>rsp</code> is stored as <code>actor.sp</code>. No actual function call has happened yet.</p>
+        <pre><code>high addr ← top
+  top-8:  &amp;trampoline   ← will be popped by 'ret'
+  top-16: 0             ← rbx
+  top-24: 0             ← rbp
+  top-32: 0             ← r12
+  top-40: 0             ← r13
+  top-48: 0             ← r14
+  top-56: 0             ← r15  ← initial rsp stored here</code></pre>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">3</div>
+      <div class="flow-body">
+        <h4>Closure stored separately</h4>
+        <p>The closure <code>Box&lt;dyn FnOnce() + Send&gt;</code> goes into <code>SharedState::pending_closures</code> keyed by PID — <em>not</em>
+ on the actor's stack. This is because we can't pass it via a register 
+during first resume. The PID is pushed to the run queue; slot state is <code>Runnable</code>.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">4</div>
+      <div class="flow-body">
+        <h4>Scheduler picks up the PID, prepares first resume</h4>
+        <p>Before calling <code>switch_to_actor()</code>, the scheduler pops the closure from <code>pending_closures</code> and writes it to the <code>CURRENT_ACTOR_BOX</code> thread-local. Then sets <code>ACTOR_SP</code>, sets <code>CURRENT_PID</code>, arms the timeslice, enables preemption.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">5</div>
+      <div class="flow-body">
+        <h4>First context switch lands in <code>trampoline()</code></h4>
+        <p><code>switch_to_actor()</code> saves the scheduler's GPRs, loads <code>actor.sp</code> as the new <code>rsp</code>, pops the 6 zero words (restoring the "saved" registers to zero), then <code>ret</code>s — which pops the trampoline address from the stack and jumps to it. We're now executing on the actor's stack.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">6</div>
+      <div class="flow-body">
+        <h4><code>trampoline()</code> reads the closure and runs it</h4>
+        <p>Takes the closure from <code>CURRENT_ACTOR_BOX</code> thread-local (consuming it — subsequent resumes skip this). Calls it inside <code>panic::catch_unwind(AssertUnwindSafe(f))</code>. The actor's code runs normally from here. Any yield (channel, mutex, preemption) calls <code>switch_to_scheduler()</code>; the scheduler saves actor state, processes intent, loops.</p>
+      </div>
+    </div>
+    <div class="flow-step">
+      <div class="flow-num">7</div>
+      <div class="flow-body">
+        <h4>Actor returns → trampoline handles completion</h4>
+        <p>If <code>catch_unwind</code> returns <code>Ok(())</code>, outcome is <code>Exit</code>. If it returns <code>Err(payload)</code>, outcome is <code>Panic(payload)</code>. Either way, outcome is written to <code>LAST_OUTCOME</code> thread-local, <code>ACTOR_DONE</code> is set to true, then <code>switch_to_scheduler()</code> is called for the last time. Scheduler sees <code>is_actor_done() == true</code>, calls <code>finalize_actor()</code>: delivers <code>Signal</code> to supervisor, unparks joiners, reclaims slot.</p>
+      </div>
+    </div>
+  </div>
+</section>
+
+<div class="divider"></div>
+
+<!-- PREEMPTION -->
+<section id="preempt">
+  <div class="section-label">Preemption</div>
+  <h2>Allocator-Driven Timeslicing</h2>
+
+  <div class="two-col">
+    <div>
+      <h3>How it works</h3>
+      <p>The <code>PreemptingAllocator</code> is installed as the process's <code>#[global_allocator]</code>. Its <code>alloc()</code>, <code>alloc_zeroed()</code>, and <code>realloc()</code> all call <code>maybe_preempt()</code> before delegating to the system allocator.</p>
+      <p><code>maybe_preempt()</code> decrements a thread-local counter. Every <strong>128 allocations</strong> (default), it reads RDTSC. If <code>rdtsc() - timeslice_start &gt; 300_000 cycles</code> (~100µs at 3 GHz) and <code>PREEMPTION_ENABLED == true</code>, it calls <code>switch_to_scheduler()</code>.</p>
+      <p>The <code>check!()</code> macro calls the same <code>maybe_preempt()</code> function — for tight loops that make no allocations.</p>
+    </div>
+    <div>
+      <h3>Invariant: preemption must be off when holding smarm locks</h3>
+      <p>If preemption fired while the scheduler held <code>SharedState</code>, the context switch would try to re-acquire the same mutex → deadlock. smarm prevents this with:</p>
+      <ul style="color:var(--text);font-size:0.88rem;padding-left:1.2rem;margin-bottom:1rem;">
+        <li style="margin-bottom:0.4rem;"><code>PREEMPTION_ENABLED = false</code> in the scheduler loop before/after <code>switch_to_actor()</code></li>
+        <li style="margin-bottom:0.4rem;"><code>with_shared()</code> saves and disables preemption while the mutex is held</li>
+        <li style="margin-bottom:0.4rem;"><code>NoPreempt</code> RAII guard used in channel/mutex slow paths</li>
+        <li><code>trace::record()</code> also disables preemption (it can allocate)</li>
+      </ul>
+      <div class="callout" style="margin-top:0">
+        <span class="callout-icon">⚠</span>
+        <p class="warn">Known gap: tight no-alloc loops are invisible without explicit <code>check!()</code> calls. This is documented and by design — such loops are uncommon in message-passing workloads.</p>
+      </div>
+    </div>
+  </div>
+
+  <pre><code><span class="cm">// preempt.rs — simplified</span>
+<span class="kw">pub</span> <span class="kw">fn</span> <span class="fn">maybe_preempt</span>() {
+    ALLOC_COUNT.<span class="fn">with</span>(|c| {
+        <span class="kw">let</span> n = c.<span class="fn">get</span>();
+        <span class="kw">if</span> n == <span class="nu">0</span> {
+            c.<span class="fn">set</span>(ACTIVE_ALLOC_INTERVAL.<span class="fn">with</span>(|i| i.<span class="fn">get</span>()));  <span class="cm">// reset counter</span>
+            <span class="kw">if</span> PREEMPTION_ENABLED.<span class="fn">with</span>(|e| e.<span class="fn">get</span>()) {
+                <span class="kw">let</span> elapsed = <span class="fn">rdtsc</span>() - TIMESLICE_START.<span class="fn">with</span>(|s| s.<span class="fn">get</span>());
+                <span class="kw">if</span> elapsed &gt; ACTIVE_TIMESLICE_CYCLES.<span class="fn">with</span>(|i| i.<span class="fn">get</span>()) {
+                    <span class="kw">unsafe</span> { <span class="fn">switch_to_scheduler</span>() };  <span class="cm">// YieldIntent::Yield</span>
+                }
+            }
+        } <span class="kw">else</span> {
+            c.<span class="fn">set</span>(n - <span class="nu">1</span>);
+        }
+    });
+}</code></pre>
+</section>
+
+<div class="divider"></div>
+
+<!-- IO -->
+<section id="io">
+  <div class="section-label">IO Architecture</div>
+  <h2>Two Background Threads, One Wake Pipe</h2>
+
+  <div class="diagram-wrap">
+    <svg width="820" height="300" viewBox="0 0 820 300" xmlns="http://www.w3.org/2000/svg" style="max-width:100%;font-family:'JetBrains Mono',monospace">
+      <defs>
+        <marker id="ai" markerWidth="7" markerHeight="7" refX="5" refY="3" orient="auto">
+          <path d="M0,0 L0,6 L7,3 z" fill="#4ecdc4"></path>
+        </marker>
+        <marker id="ai2" markerWidth="7" markerHeight="7" refX="5" refY="3" orient="auto">
+          <path d="M0,0 L0,6 L7,3 z" fill="#f5a623"></path>
+        </marker>
+      </defs>
+
+      <!-- Actor -->
+      <rect x="20" y="100" width="120" height="110" rx="8" fill="#13161e" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="80" y="122" fill="#a8e6cf" font-size="11" font-weight="700" text-anchor="middle">Actor</text>
+      <text x="80" y="140" fill="#606880" font-size="8" text-anchor="middle">calls wait_readable(fd)</text>
+      <text x="80" y="155" fill="#606880" font-size="8" text-anchor="middle">or block_on_io(f)</text>
+      <text x="80" y="175" fill="#ff6b6b" font-size="8" text-anchor="middle">→ park_current()</text>
+      <text x="80" y="190" fill="#ff6b6b" font-size="8" text-anchor="middle">→ state = Parked</text>
+
+      <!-- Epoll thread -->
+      <rect x="220" y="20" width="160" height="120" rx="8" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="300" y="42" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">epoll thread</text>
+      <text x="300" y="60" fill="#606880" font-size="8" text-anchor="middle">epoll_wait(-1) loop</text>
+      <text x="300" y="75" fill="#606880" font-size="8" text-anchor="middle">EPOLLONESHOT per fd</text>
+      <text x="300" y="90" fill="#606880" font-size="8" text-anchor="middle">on ready: push FdReady</text>
+      <text x="300" y="105" fill="#606880" font-size="8" text-anchor="middle">write wake_pipe</text>
+      <text x="300" y="120" fill="#606880" font-size="8" text-anchor="middle">on shutdown pipe: exit</text>
+
+      <!-- Pool thread -->
+      <rect x="220" y="170" width="160" height="110" rx="8" fill="#13161e" stroke="#252a38" stroke-width="1.5"></rect>
+      <text x="300" y="192" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">pool thread</text>
+      <text x="300" y="210" fill="#606880" font-size="8" text-anchor="middle">mpsc::recv() loop</text>
+      <text x="300" y="225" fill="#606880" font-size="8" text-anchor="middle">catch_unwind(closure)</text>
+      <text x="300" y="240" fill="#606880" font-size="8" text-anchor="middle">push Blocking result</text>
+      <text x="300" y="255" fill="#606880" font-size="8" text-anchor="middle">write wake_pipe</text>
+      <text x="300" y="268" fill="#606880" font-size="8" text-anchor="middle">tx drop → exit</text>
+
+      <!-- Completions queue -->
+      <rect x="460" y="95" width="150" height="110" rx="8" fill="#1a1e2a" stroke="#3a4060" stroke-width="1.5"></rect>
+      <text x="535" y="117" fill="#4ecdc4" font-size="11" font-weight="700" text-anchor="middle">completions</text>
+      <text x="535" y="135" fill="#606880" font-size="8" text-anchor="middle">Arc&lt;Mutex&lt;VecDeque&gt;&gt;</text>
+      <text x="535" y="153" fill="#606880" font-size="8" text-anchor="middle">FdReady { fd, events }</text>
+      <text x="535" y="168" fill="#606880" font-size="8" text-anchor="middle">Blocking { pid, result }</text>
+      <text x="535" y="185" fill="#a8e6cf" font-size="8" text-anchor="middle">drained by schedule_loop</text>
+
+      <!-- Scheduler -->
+      <rect x="680" y="80" width="120" height="140" rx="8" fill="#0d1220" stroke="#5b8af5" stroke-width="1.5"></rect>
+      <text x="740" y="102" fill="#5b8af5" font-size="11" font-weight="700" text-anchor="middle">scheduler</text>
+      <text x="740" y="120" fill="#606880" font-size="8" text-anchor="middle">poll(wake_fd)</text>
+      <text x="740" y="135" fill="#606880" font-size="8" text-anchor="middle">drain completions</text>
+      <text x="740" y="150" fill="#606880" font-size="8" text-anchor="middle">FdReady →</text>
+      <text x="740" y="162" fill="#606880" font-size="8" text-anchor="middle">lookup waiters[fd]</text>
+      <text x="740" y="177" fill="#606880" font-size="8" text-anchor="middle">unpark(pid)</text>
+      <text x="740" y="192" fill="#606880" font-size="8" text-anchor="middle">Blocking →</text>
+      <text x="740" y="204" fill="#606880" font-size="8" text-anchor="middle">store in slot, unpark</text>
+
+      <!-- Arrows -->
+      <!-- actor → epoll (epoll_ctl via scheduler) -->
+      <line x1="140" y1="120" x2="220" y2="75" stroke="#4ecdc4" stroke-width="1" marker-end="url(#ai)"></line>
+      <text x="185" y="90" fill="#606880" font-size="7" text-anchor="middle">epoll_ctl ADD</text>
+
+      <!-- actor → pool (via submit) -->
+      <line x1="140" y1="190" x2="220" y2="220" stroke="#4ecdc4" stroke-width="1" marker-end="url(#ai)"></line>
+      <text x="182" y="215" fill="#606880" font-size="7" text-anchor="middle">submit(closure)</text>
+
+      <!-- epoll → completions -->
+      <line x1="380" y1="80" x2="460" y2="130" stroke="#f5a623" stroke-width="1" marker-end="url(#ai2)"></line>
+      <!-- pool → completions -->
+      <line x1="380" y1="220" x2="460" y2="170" stroke="#f5a623" stroke-width="1" marker-end="url(#ai2)"></line>
+
+      <!-- completions → scheduler -->
+      <line x1="610" y1="150" x2="680" y2="150" stroke="#4ecdc4" stroke-width="1.5" marker-end="url(#ai)"></line>
+      <text x="645" y="144" fill="#606880" font-size="7" text-anchor="middle">drain</text>
+
+      <!-- wake pipe (both threads → scheduler) -->
+      <line x1="380" y1="100" x2="680" y2="120" stroke="#f5a623" stroke-width="1" stroke-dasharray="3,2" marker-end="url(#ai2)"></line>
+      <line x1="380" y1="240" x2="680" y2="175" stroke="#f5a623" stroke-width="1" stroke-dasharray="3,2" marker-end="url(#ai2)"></line>
+      <text x="560" y="105" fill="#606880" font-size="7" text-anchor="middle">wake pipe write</text>
+    </svg>
+  </div>
+
+  <div class="callout">
+    <span class="callout-icon">📎</span>
+    <p>epoll_ctl ADD/DEL is called by the <strong>scheduler thread</strong> directly on the epollfd — this is legal per the <code>epoll_ctl(2)</code> man page even while the epoll thread is inside <code>epoll_wait</code>. Avoids needing a second command channel.</p>
+  </div>
+</section>
+
+<div class="divider"></div>
+
+<!-- GOTCHAS -->
+<section id="gotchas">
+  <div class="section-label">Key Gotchas</div>
+  <h2>Things That Would Bite You</h2>
+
+  <div class="module-grid">
+    <div class="module-card">
+      <div class="module-name" style="color:var(--red)">Lost-wakeup window</div>
+      <p>Between registering as a channel's <code>parked_receiver</code> and calling <code>park_current()</code>, a sender could call <code>unpark()</code>. At that moment the actor is still <code>Runnable</code>, so <code>unpark()</code> sets <code>pending_unpark = true</code> instead of re-queuing. The scheduler checks this flag after the <code>Park</code> yield and re-queues immediately rather than parking. This flag also protects epoll and mutex paths.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name" style="color:var(--red)"><code>std::thread::sleep</code> inside actor</div>
+      <p>Blocks the entire OS scheduler thread, starving every actor assigned to that thread. There's no detection. Use <code>smarm::sleep(d)</code> instead.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name" style="color:var(--red)">Allocations while holding <code>SharedState</code></div>
+      <p>The <code>with_shared()</code> helper disables preemption while the mutex is held. But any code path that allocates inside <code>with_shared</code> <em>and</em> then tries to acquire <code>SharedState</code> again will deadlock. All internal smarm code is carefully structured to avoid this.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name" style="color:var(--yellow)">Global run queue mutex</div>
+      <p>All N scheduler threads contend on a single <code>Mutex&lt;SharedState&gt;</code>.
+ This is the primary scalability ceiling — visible in the benchmark 
+suite as "tokio-favored" scenarios. Identified, documented, deferred. 
+The fix would be per-thread deques with work stealing.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name" style="color:var(--yellow)">No timer cancellation</div>
+      <p>When a mutex lock is granted before its timeout, the timer 
+entry stays in the heap. It fires eventually, the callback sees "actor 
+is no longer waiting" and no-ops. Cost is ~32 bytes and a few cycles per
+ stale entry. Bounded by one entry per parked actor.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name" style="color:var(--yellow)">fd leak on actor death during IO wait</div>
+      <p>If an actor dies while waiting on an fd, the epoll registration
+ is leaked. EPOLLONESHOT bounds damage to one stale wakeup, which the 
+scheduler drops when it can't find the PID in <code>waiters</code>. Noted in <code>io.rs</code> as a known gap for a future pass.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name" style="color:var(--green)">XMM registers not saved in context switch</div>
+      <p class="good">This is intentional and correct. XMM0–15 are 
+caller-saved in SysV AMD64 ABI. Every yield passes through a Rust call 
+site, so the compiler has already spilled live XMM values to the actor's
+ stack before we get to the naked asm. They're restored when the actor 
+resumes because they're on its own stack.</p>
+    </div>
+    <div class="module-card">
+      <div class="module-name" style="color:var(--green)"><code>panic = unwind</code> is required</div>
+      <p class="good">The trampoline uses <code>catch_unwind</code> to intercept actor panics before they reach the naked assembly shim. If a user sets <code>panic = abort</code>,
+ panics kill the process instead of being caught — the supervision tree 
+collapses to process death. This is documented and the profile is set in
+ <code>Cargo.toml</code>.</p>
+    </div>
+  </div>
+</section>
+
+</main>
+
+
+</body></html>
\ No newline at end of file

Source	Intent set	Who re-queues	Notes
`yield_now()`	Yield	Scheduler immediately	Actor stays Runnable; pushed back to queue tail
Allocator preemption	Yield	Scheduler immediately	RDTSC check in `maybe_preempt()` triggers `switch_to_scheduler()`
`channel::recv()` (empty)	Park	`channel::send()` → `unpark()`	Receiver PID stored in channel's `parked_receiver`
`mutex::lock()` (contended)	Park	`MutexGuard::drop()` or timer timeout	FIFO waiter queue; timeout via `WaitTimeout` timer entry
`sleep(d)`	Park	Timer heap → `schedule_loop` drain	Inserts `Reason::Sleep` entry; scheduler unparks on pop
`wait_readable/writable(fd)`	Park	epoll thread → completion queue → scheduler	EPOLLONESHOT; one ADD → one wakeup → one DEL per call