Files
smarm/LOOM.md
Claude 8cbef1dfc1 feat: I/O and mutex support (v0.3)
Add epoll-based non-blocking I/O and kernel-like mutexes:
- src/io.rs: Complete epoll backend with timeout & error handling
- src/mutex.rs: Fair mutex with waiter queues & parking integration
- Enhanced scheduler to support synchronous I/O blocking
- Comprehensive test suites for I/O (epoll) and mutex behavior
- Documentation: LOOM.md concurrency model & README
2026-05-23 16:09:29 +00:00

211 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Loom
> Erlang-style actor concurrency for Rust, without the copies, the colors, or the GC pauses.
---
## Vision
Rust gives you the right ownership discipline for safe actor concurrency almost for free — `Send` already
draws the boundary, the borrow checker already enforces it. What it lacks is an execution model to match:
async/await is IO-centric, colors your functions, and trades stack simplicity for state-machine complexity;
OS threads are too heavy to spawn per actor.
Loom adds a third option: **green-thread actors on a shared heap**, scheduled cooperatively, with
message-passing as the only cross-actor communication primitive. You get Erlang's isolation model without
Erlang's copying GC, and you get Rust's zero-copy ownership transfers without async's cognitive overhead.
No function coloring. No `Box<dyn Future>`. Just actors, messages, and the borrow checker doing what it
already does.
---
## Do: Core Runtime
### Actors and scheduling
Each actor is a lightweight green thread with its own heap-allocated, growable stack. Stacks are
allocated via `mmap` with a guard page below the region; overflow is detected by the OS without Loom
polling for it. Initial stacks are small and grow by remapping on demand.
The scheduler runs one OS thread per CPU. Each scheduler thread loops against a single global
`Mutex<HashMap>` queue shared across all schedulers. If queue contention becomes a measured bottleneck
this can be revisited; the interface will not change.
Loom requires `panic = unwind`. Users who set `panic = abort` accept that supervision and actor
isolation are silently degraded to process death.
### Process descriptor
Each actor has a descriptor that is hot while the actor runs and will typically live in L1 cache.
It holds:
- `stack_base: *mut u8` — bottom of the allocated stack region
- `stack_cap: usize` — total allocated size
- `stack_ptr: *mut u8` — current stack pointer (`rsp`), saved on yield
- `pid: (u32, u32)` — index and generation counter (see PIDs below)
- `alloc_count: u32` — countdown for preemption sampling
- `timeslice_start: u64``RDTSC` value written on every resume
- `resize_count: u16` — diagnostic counter for stack growth events
- `context: *mut ContextSaveArea` — pointer to the register save area (cold, touched only on switch)
### Context switching
Context switching is implemented in a `#[naked]` assembly shim, one per supported architecture.
The compiler cannot be asked to switch stacks.
**Suspend** (yield, preemption, or blocking):
1. Save callee-saved integer registers and SIMD registers into `ContextSaveArea`.
2. Save `rsp`/`sp` into the process descriptor.
3. Load the scheduler's stack pointer from a thread-local and jump back into the scheduler loop.
**Resume**:
1. Load `rsp`/`sp` from the process descriptor.
2. Restore registers from `ContextSaveArea`.
3. `ret` — the return address is already on the restored stack, execution resumes exactly where the
actor yielded.
**x86-64**: saves `rbx`, `rbp`, `r12``r15` (6 × 8 = 48 bytes) and `xmm0``xmm15` (16 × 16 = 256
bytes) = 304 bytes total. Full SSE baseline is required; the compiler may autovectorise freely.
AVX-512 is deferred.
**ARM64**: saves `x19``x30` (12 × 8 = 96 bytes, including the link register `x30` which must be
saved explicitly — it holds the return address, unlike x86 where `call` pushes it to the stack) and
`d8``d15` (8 × 8 = 64 bytes) = 160 bytes total.
`ContextSaveArea` is a `Box<ContextSaveArea>` per actor. Lifetime equals the actor's lifetime;
no churn, no bulk deallocation, `Box` is correct.
Initial platform target is x86-64 Linux. ARM64 and macOS are natural follow-ons.
### Allocator-driven preemption
Every Nth allocation, the allocator reads `RDTSC` and compares it against `timeslice_start`. If the
threshold is exceeded the actor yields. The workloads that starve a scheduler — sustained compute,
data transformation — are precisely the ones doing frequent allocations, so this approximation is
correct by construction.
`RDTSC` is not monotonic across core migration; a slightly wrong timeslice is acceptable. Loom is
not a real-time scheduler.
Known failure mode: tight no-alloc loops are invisible to this mechanism. Actors doing sustained
allocation-free compute must call `loom::yield_now()` explicitly, or offload to a thread pool
outside the actor scheduler (e.g. rayon). This is documented and acceptable — such loops are rare
in message-passing workloads.
### Yield points
An actor yields at:
- **Channel send/recv** — the primary communication primitive
- **Mutex contention** — attempting to lock a held `Arc<Mutex<>>` parks the actor
- **IO** — blocking on a socket or file descriptor parks the actor until the IO thread signals readiness
- **`loom::sleep(duration)`** — parks the actor; the timer wheel re-queues it on expiry
- **`loom::yield_now()`** — explicit cooperative yield
- **Allocator preemption** — as above
- **Spawn** — does not yield by default; the new actor is queued and the spawner continues
`std::thread::sleep` inside an actor blocks the entire OS thread and should never be used. Loom
may emit a warning if it can detect this.
### IO thread
A single dedicated IO thread runs an `epoll`/`kqueue` loop. Actors blocking on IO register their
file descriptor and PID; the IO thread moves them back into the global queue when the fd is ready.
A `HashMap<RawFd, Pid>` maps fds to parked actors. Cancellation (actor dies while waiting on IO)
deregisters the fd. This is intentionally simple and not pluggable; Loom is not a general async
executor.
### Communication
Messages must be `Send` or `Copy`. Non-`Send` types cannot cross an actor boundary; this is
enforced by the type system with no runtime overhead.
Two primitives only:
- **Move** — transfer owned data across a channel. Zero copy. The sender relinquishes ownership
at the type level. This is the default.
- **`Arc<Mutex<T>>`** — for genuinely shared long-lived state. Explicit and visible.
Cross-actor `Rc` or bare pointers are banned. There is no cycle detector. Cross-actor cycles are
banned by construction: either transfer ownership or use `Arc`.
### PIDs
A PID is a `(index, generation)` pair. The index may be reused after an actor dies; the generation
counter increments on every death. A stale handle holding the wrong generation is a detectable
error, not a silent misdirection. This avoids the ABA problem without reserving PID space forever.
### Supervision
Every actor has a supervisor, assigned at spawn. This is not optional. The root supervisor is
provided by the runtime; its death is a process exit.
A supervisor receives one of three signals when a child actor terminates:
- `Signal::Exit(pid)` — normal completion
- `Signal::Panic(pid, payload)` — caught via `catch_unwind` at the actor entry point boundary,
before unwinding can reach the assembly shim
- `Signal::Timeout(pid)` — actor exceeded a budget (see below)
The supervisor decides: restart the actor, escalate to its own supervisor, or ignore. Restart
intensity is capped: if an actor panics more than N times within a time window, the supervisor
stops restarting and escalates. This prevents a bad prelude or corrupted input from spinning the
supervisor in a restart loop indefinitely. N and the window are configurable per supervisor with a
sensible global default.
### Mutex timeout
Every `loom::mutex` lock attempt is mediated by the scheduler. If the lock is not acquired within
a configurable timeout, the actor receives a `LockTimeout` error rather than parking forever. This
is a hard runtime guarantee, not a convention. Default timeout is global and configurable;
individual locks and individual call sites can override it.
### Task joining
Actors can spawn children and wait on a group of handles:
```rust
let h1 = loom::spawn(|| compute_a());
let h2 = loom::spawn(|| compute_b());
let (a, b) = loom::join!(h1, h2);
```
`join!` parks the calling actor until all handles complete. The last child to finish re-queues the
parent. This is a countdown in the parent's descriptor; no polling, no waker registration. A
`join_timeout!` variant is a natural extension.
### Timer wheel
`loom::sleep` and supervision timeouts are driven by a timer wheel in the scheduler. Sleeping
actors are parked and re-queued by the timer thread on expiry. The timer wheel is internal
infrastructure; its design is an implementation detail.
---
## Defer: Later Work
- **Stack sizing policy** — initial size, growth factor, and whether stacks ever shrink are
implementation decisions to be made with profiling data, not up front.
- **Queue contention** — if `Mutex<HashMap>` proves to be a bottleneck under profiling, evaluate
`DashMap` or a lock-free work-stealing deque (e.g. `crossbeam-deque`). Not before.
- **AVX-512 context save** — extend `ContextSaveArea` when there is a concrete use case.
- **`loom::sleep` vs raw sleep semantics** — further control knobs deferred until the basic sleep
is working and real use cases are understood.
- **Supervision tree API** — the contract is defined; the recursive hierarchy, restart strategies,
and introspection API are implementation work.
- **no_std support** — the assembly shim is no_std friendly but the IO thread and allocator require
OS primitives. Target is no_std + `alloc` on hosted platforms; bare metal is out of scope.
- **Distribution** — Loom is a single-process runtime. No distribution protocol, no BEAM-style
clustering.
---
## What Loom is Not
- Not a drop-in replacement for Tokio. Loom does not implement `Future` or the async executor interface.
- Not a general allocator. Loom manages actor stacks; heap allocation for actor data goes through
the system allocator.
- Not Erlang. No hot code reloading, no distribution protocol, no BEAM bytecode. Loom is a
concurrency runtime, not a platform.
- Not a real-time scheduler. Timeslice accuracy is best-effort.