Wine-NSPA – Architecture Overview

This page is the system map for Wine-NSPA. Use it to understand the major layers, where each bypass sits, and which responsibilities still remain in wineserver.

What Wine-NSPA is
Layered architecture
Subsystem map
- 3.1 Synchronization and priority inheritance
- 3.2 Wineserver IPC
- 3.3 NT-local stubs
- 3.4 Client scheduler
- 3.5 Message dispatch
- 3.6 Shared-state query bypass
- 3.7 Hook chains
- 3.8 File I/O
- 3.9 Memory and shared-memory backing
- 3.10 Audio
Bypass topology
Wineserver residual design
RT priority mapping
Subsystem summary
Document index

1. What Wine-NSPA is

Wine-NSPA is a PREEMPT_RT-tuned fork of Wine 11.8. It targets PREEMPT_RT_FULL Linux kernels, grafts kernel-level priority inheritance onto every Win32 sync primitive, replaces Wine’s single-threaded wineserver event loop in the hot path with kernel-mediated channels and bounded shmem rings, and ships a custom ntsync kernel module that gives /dev/ntsync Windows-faithful priority semantics with PI boost.

Scope covers latency-sensitive and correctness-sensitive Wine surfaces on PREEMPT_RT: synchronization, wineserver IPC, UI dispatch, startup and steady-state file I/O, hook dispatch, timer delivery, and audio callback paths. Audio workloads are part of the validation matrix, but the architecture is not audio-specific.

NSPA is not an acronym. The current 11.x line is a reimplementation of earlier Wine 8.x and 10.x RT branches, updated to use NTSync (introduced upstream in Wine 9.x and Linux 6.10) instead of the older shmem-dispatcher-based design.

The whole project is a small Linux kernel module (~3 kLOC of ntsync.{c,h} deltas on top of upstream) plus a Wine fork that increasingly bypasses wineserver through bounded shmem rings, client-local tables, and scheduler hosts. NSPA_RT_PRIO is the master RT gate: when unset, the PI and RT-owned paths stand down and Wine behaves byte-identically to upstream. A few non-RT follow-ons keep their own narrower A/B toggles, but there is no zero-config tax for users who do not opt in.

2. Layered architecture

The architecture has three layers: a kernel layer (NTSync, io_uring, librtpi-style PI futexes, RT scheduler), a wineserver layer (gamma channel dispatcher + main loop + handler tables), and a client layer (ntdll, win32u, NT-local stubs, the spawn-main-derived scheduler hosts, audio drivers, application/PE code). Most NSPA bypasses route around the wineserver layer entirely on the common case, falling back only when the bypass envelope is exceeded.

The diagram defines the routing boundaries. Vanilla Wine routes these surfaces through wineserver, which serializes most handlers under global_lock. NSPA adds kernel-mediated and client-local bypasses that remove wineserver from the common path while preserving wineserver ownership of cross-process naming, lifecycle, and residual server-managed state.

3. Subsystem map

Each subsection here is a one-paragraph (sometimes two) sketch of the subsystem; the deep design is in a dedicated page linked at the end of each section.

3.1 Synchronization and priority inheritance

NSPA implements priority inheritance along four independent paths so that no Win32 sync surface is left as a priority-inversion source. CS-PI repurposes RTL_CRITICAL_SECTION::LockSemaphore as a FUTEX_LOCK_PI futex word, giving every critical section the kernel’s transitive PI chain semantics. NTSync direct routes NtWaitForSingleObject and friends through /dev/ntsync ioctls, where the kernel overlay implements priority-ordered waiter queues and per-task PI boost across mutex chains. Vendored librtpi provides pi_cond_wait for unix-side condition variables built on FUTEX_WAIT_REQUEUE_PI. Win32 condvar PI extends that up into the Win32 surface so SleepConditionVariableCS is also PI-clean.

The kernel side is where the heavy lifting happens. The ntsync.ko module sits at /dev/ntsync and implements NT sync object semantics natively in the kernel, with PI-aware mutexes, priority-ordered waiter queues, a channel transport that serves the gamma dispatcher, a thread-token return path, an aggregate-wait primitive for heterogeneous waits, and TRY_RECV2 for post-dispatch burst drain. The userspace half also includes the client-created anonymous sync path for mutexes, semaphores, and events, so the design is a kernel overlay plus a Wine-side in-process sync layer rather than “just a driver.”

All four paths are gated on NSPA_RT_PRIO. When unset, every PI code path short-circuits and Wine behaves byte-for-byte like upstream. Detail: see NTSync PI Kernel, NTSync Userspace Sync, cs-pi, condvar-pi-requeue.

3.2 Wineserver IPC

The classical Wine IPC architecture has every client thread read()/write() over a unix socket pair to the wineserver process, which dispatches under global_lock. The earlier NSPA work (Torge Matthies’s 2022 patch, forward-ported as the v1.5 line) replaced the socket round-trip with a per-thread shmem region and a futex signal, served by a pool of pthread dispatchers inside the server. That worked but had its own pile of correctness rough edges; it has been superseded by the gamma channel dispatcher.

The gamma dispatcher uses the ntsync channel object to deliver a per-process kernel-mediated request/reply queue. The client thread issues NTSYNC_IOC_CHANNEL_SEND_PI; the dispatcher receives via CHANNEL_RECV2 and replies via CHANNEL_REPLY. On current kernels it blocks in NTSYNC_IOC_AGGREGATE_WAIT over the channel plus its per-process uring eventfd and shutdown eventfd, then follows each reply with non-blocking TRY_RECV2 burst drain until the queue is empty. The dispatcher path also carries a small hot-path tuning pack: inline request / queue helpers, lighter fences, and no production allocator poison overhead. The channel object also returns a thread-token so the kernel knows which client thread sent the request, letting the server-side request path resolve the sender without a second userspace lookup. The legacy shmem-IPC path (shmem-ipc.gen.html) is historical and superseded and retained as reference material only.

Detail: see gamma-channel-dispatcher. Historical: shmem-ipc (superseded).

3.3 NT-local stubs

The NT-local stubs pattern moves NT-API state from wineserver-resident storage into client-resident storage. The pattern: the client maintains its own state for a class of NT objects (file handles, local sections, anonymous events, timers, WM_TIMER tuples), processes operations locally, and lazily promotes the state back to a server-visible handle only when an API genuinely needs server-side handle semantics (DuplicateHandle, CreateProcess inheritance, cross-process visibility). The stub answers the common case locally; the server stays the authority for the long tail.

Active NT-local surfaces include nspa_local_file, local sections, anonymous local events, nspa_local_timer, and nspa_local_wm_timer. The timer work is also extended so anonymous timers piggyback on the local-event base and the timer dispatchers can run on the shared RT scheduler host instead of dedicated helper threads. The same general client-side move is also what made the async local-file close queue worth centralizing on the scheduler host instead of minting yet another long-lived helper thread.

Detail: see nt-local-stubs, nspa-local-file-architecture, local-section-architecture.

3.4 Client scheduler

The client-side scheduler is its own architectural layer inside the process. Upstream spawn-main split the Unix bootstrap thread from the Win32 app main thread; Wine-NSPA uses that split to host ntdll_sched on a per-process default-class thread (wine-sched) plus a lazy RT-class thread (wine-sched-rt). One current consumer is the async local-file close queue, and the RT-class consumers are the migrated local_timer and local_wm_timer dispatchers.

This is not a replacement for gamma or wineserver dispatch. It is the client helper-thread consolidation layer: a place to host small loops, close queues, observability sampling, and RT timer work without a fresh per-subsystem dedicated thread.

Detail: see client-scheduler-architecture.

3.5 Message dispatch

The Win32 message pump is the second-hottest source of wineserver round-trips (after NtCreateFile, which nspa_local_file already drains). A typical Win32 application calls GetMessage / PeekMessage on every UI tick; cross-thread PostMessage and SendMessage go through the server even when sender and receiver are in the same process; RedrawWindow and InvalidateRect push paint flags into the server.

NSPA’s msg-ring v1 ships a per-thread bounded MPMC shmem ring for cross-thread same-process PostMessage / SendMessage / reply, signalled via NTSync events. The same substrate also carries the redraw_window push ring, the paint-cache fast path, and a get_message empty-poll cache. That cache keeps a per-thread snapshot of the last empty filter tuple plus queue_shm->nspa_change_seq; if the same filter comes back before the sequence changes, Wine returns STATUS_PENDING locally instead of paying another wineserver round-trip. The message path also reads its hot per-thread caches through TEB->Win32ClientInfo instead of repeated pthread_getspecific() calls. On the current layout, the forward msg ring and the co-located timer/redraw rings also keep hot producer and consumer indices on separate cachelines so the writer’s head updates and the reader’s tail advances do not false-share the same line.

Three pre-existing wine-userspace bugs in dlls/win32u/nspa/msg_ring.c were found and fixed in the 2026-04-27 audit (MR1 reply-slot ABA, MR2 FUTEX_PRIVATE on MAP_SHARED memfd, MR4 POST wake-loss on dual-signal-fail rollback), all of the silent-contract-violation class.

Detail: see msg-ring-architecture.

3.6 Shared-state query bypass

Wine-NSPA publishes read-mostly thread and process snapshots into shared objects so a set of NtQueryInformationThread() and NtQueryInformationProcess() classes can answer locally. The current coverage is seven thread classes, six process classes, the zero-time WaitForSingleObject(process, 0) liveness poll, and the zero-time WaitForSingleObject(thread, 0) liveness poll. The client path uses a seqlock read discipline over server-published snapshots; if the snapshot is missing or the information class still needs server-side transformation, the original RPC path remains in place.

This is intentionally not a general “all query classes are local” claim. ThreadBasicInformation still stays on the server path because the existing reply transform does more than dump raw kernel state. The current path is the read-mostly slice that was safe to publish and validate independently.

Detail: see thread-and-process-shared-state.

3.7 Hook chains

Win32 hook chains (WH_KEYBOARD, WH_MOUSE, WH_GETMESSAGE, WH_CALLWNDPROC, etc.) get queried on every dispatch in vanilla Wine, even when the chain is empty. On a busy UI tick that is one server round-trip per hook type per GetMessage iteration, which adds up: an Ableton 165s capture showed 26,700 hook lookups across the run.

NSPA caches the hook chain in a per-process shmem region (Tier 1 = “is there any hook of type X at all?”; Tier 2 = “here’s the full list”). The wineserver publishes invalidations on chain mutation. Hot-path lookups become an O(1) shmem read. Steady-state validation: 26,700 / 26,700 cache hits, server-side get_hook_info dropped to 0.

Detail: see hook-cache.

3.8 File I/O

File I/O is two related stories. The open path is owned by nspa_local_file (see §3.3): a bounded set of regular-file and explicit-directory NtCreateFile calls are serviced client-side via stat() / lstat() + open(), with sharing arbitration preserved through a server-published (dev, inode) -> sharing-state shmem region. Only API surfaces that genuinely need a server-visible file handle trigger lazy promotion, and eligible unnamed file-backed sections can stay local too.

The data-plane path is owned by io_uring: regular-file reads and writes submit directly to a per-thread ring, async CreateFile routes through the per-process dispatcher-owned ring, and the deferred async socket path uses true RECVMSG / SENDMSG SQEs. io_uring composes with nspa_local_file because the unix fd held by the local-file table is the same fd the ring path operates on, and local sections reduce the matching mapping-side RPC churn that used to sit adjacent to those opens.

What remains outside io_uring is the genuinely server-managed surface: named pipes, named events, cross-process section / handle boundaries, and the parts of the async model that still depend on server-owned pseudo-fds or object naming.

Detail: see io_uring-architecture, nspa-local-file-architecture, local-section-architecture.

3.9 Memory and shared-memory backing

Wine-NSPA’s memory surface is broader than “large pages exist.” The current tree has four memory stories that matter architecturally: client-side local sections, RT-keyed page locking and automatic hugetlb promotion, current-process QueryWorkingSetEx() reporting plus working-set quota bookkeeping, and the selective use of dedicated memfd backends for bypass state such as msg-ring, shared-state snapshots, and local-file inode arbitration.

Those pieces are related because they all change how Wine exposes or backs memory, but they are not the same mechanism. Local sections are about keeping common file-backed views client-side. RT-keyed mlockall(), automatic hugetlb promotion, and heap-arena hugetlb backing are about page locking and page size on the hot RT path. Working-set support is about what the Win32 memory surface reports and stores. memfd is about where bypass-owned shared state lives. Keeping those roles separate makes the design easier to reason about and avoids the common mistake of treating every shared region as “just more session shmem.”

Detail: see memory-and-large-pages, local-section-architecture, msg-ring-architecture, nspa-local-file-architecture.

3.10 Audio

Audio is delivered through winejack.drv, which routes both JACK-backed MIDI and WASAPI audio. nspaASIO layers ASIO on top of the same transport and provides the low-latency path: zero-latency bufferSwitch dispatch inside the JACK RT callback so the ASIO host and JACK callback execute on the same period boundary.

The audio thread typically runs at NT band 31 / TIME_CRITICAL, which under NSPA_RT_PRIO=80 maps to SCHED_FIFO 80. JACK’s own callback runs at FIFO 88-89 (above NSPA’s ceiling but below the 99 (reserved) kernel-thread band). Wineserver runs at FIFO 64 (auto-derived NSPA_RT_PRIO - 16) – below the entire RT band, so dispatcher contention can never preempt the audio path.

Detail: see audio-stack.

4. Bypass topology

Each bypass moves a specific class of NT-API state or I/O work out of wineserver and into client-local state, bounded shared memory, or kernel-mediated primitives. Every path is independently bounded, validated, and revertible.

The current topology covers the active bypass surfaces plus the residual wineserver floor (process/thread lifecycle, cross-process naming, path resolution, handle inheritance). As of 2026-05-30, the default-on set includes sync primitives, hook caching, thread/process shared-state readers, zero-time process and thread wait polling, NT-local file and section handling, local events, sched-hosted timer dispatch, the client scheduler substrate, io_uring regular-file I/O, gamma’s aggregate-wait + TRY_RECV2 dispatcher path, async CreateFile, socket RECVMSG / SENDMSG, msg-ring v1 same-process send/reply, the redraw_window push ring, the paint cache fast path, the get_message empty-poll cache, and the RT-keyed memory follow-ons (mlockall(), automatic hugetlb promotion, heap-arena hugetlb backing), plus the X11 and Wayland host/plugin embed protocols for winelib hosts. The main remaining server-managed surfaces are the harder cross-process, named-object, device-IRP, and message classes that do not fit those local envelopes.

This staging keeps regression scope local and rebase cost bounded. The active paths already remove measurable server traffic while still falling back cleanly whenever a call leaves the local envelope or an explicit A/B toggle is used.

5. Wineserver residual design

The decomposition plan for wineserver is mostly about the residual server-owned timer, fd-poll, routing, and lock-partitioning work that remains after the bypass set above. Timer splitting becomes tractable after nspa_local_timer reduces server ownership. fd-poll splitting becomes tractable after io_uring absorbs the regular-file and socket surfaces. Lock partitioning becomes tractable only after the residual lock holders are a small, auditable set. The shared-state readers, zero-time waits, and message-pump empty-poll cache reduced that residual further by draining read-mostly query traffic and repeated empty queue polls before they ever became wineserver work; the later hot-path carries then lowered the cost of those already-local paths further through TEB-relative state, cacheline layout work, helper inlining, and x86_64 AVX2 string / Unicode fast windows.

The target is not elimination of wineserver. Process and thread lifecycle, cross-process named-object registration, NT path resolution (\??\, NT object directory, 8.3 names, case-insensitive behavior on case-sensitive filesystems), and handle-inheritance coordination at CreateProcess time remain centralized. The objective is to reduce wineserver to the authoritative metadata and lifecycle surfaces that cannot be safely decentralized.

Detail: see wineserver-decomposition.

6. RT priority mapping

Win32 thread priorities map to Linux SCHED_FIFO priorities through a single formula:

fifo_prio = nspa_rt_prio_base - (31 - nt_band)
clamped to [1..98]

NSPA_RT_PRIO (default 80) is the ceiling, not a midpoint. NT band 31 (TIME_CRITICAL in the REALTIME priority class) maps to exactly the ceiling; lower NT bands scale linearly below it. The wineserver main_loop auto-derives its own priority at NSPA_RT_PRIO - 16 (=64 by default), placing it below the entire RT band so dispatcher contention can never preempt an RT audio thread.

Win32 label	Win32 value	NT band	FIFO priority (with `NSPA_RT_PRIO=80`)
TIME_CRITICAL (REALTIME class)	+15	31	80 (ceiling)
(band 30)	+6	30	79
HIGHEST	+2	26	75
ABOVE_NORMAL	+1	25	74
NORMAL (REALTIME class)	0	24	73
BELOW_NORMAL	-1	23	72
LOWEST	-2	22	71
IDLE (REALTIME class)	-15	16	65
—	—	—	—
`wineserver` main loop	–	–	64 (`NSPA_RT_PRIO - 16`)
(unused)	–	–	1..63

The 99 (reserved) band is kernel-thread only and intentionally unreachable from userspace. JACK / PipeWire RT callbacks typically run at FIFO 88-89, above the NSPA Win32 ceiling – this is correct, because the audio backend has tighter latency requirements than any single client’s audio thread.

The mapping is governed by these env vars:

Var	Default	Effect
`NSPA_RT_PRIO`	unset (RT dormant)	Master gate. Sets the FIFO ceiling and activates all four PI paths. When unset, NSPA is byte-identical to upstream Wine.
`NSPA_RT_POLICY`	`FF`	`SCHED_FIFO` vs `RR` for NT bands [16..30]. Same-prio RR quantum-slices the audio thread; FIFO eliminates. TIME_CRITICAL (NT 31) is always FIFO.
`NSPA_SRV_RT_PRIO`	`NSPA_RT_PRIO - 16`	Override wineserver’s FIFO priority. Auto-derive is correct – do not set manually.
`NSPA_SRV_RT_POLICY`	`FF`	Wineserver scheduler policy.

THREAD_PRIORITY_TIME_CRITICAL is special-cased at the client side: even if a process didn’t first call SetPriorityClass(REALTIME), a SetThreadPriority(thread, TIME_CRITICAL) call is treated as a ceiling promotion. This covers the common audio pattern where apps set TIME_CRITICAL without first lifting the process class – a Win32-API quirk that NSPA accommodates leniently for the ceiling band only.

App-facing RT promotions intentionally do not set Linux’s sticky SCHED_RESET_ON_FORK flag. Wine threads are created with pthread_create / clone3(CLONE_THREAD), not fork(2), so the flag does not protect the threading model NSPA actually uses. Omitting it keeps later application-side demotion calls such as SetThreadPriority(NORMAL) symmetric instead of tripping the kernel’s -EPERM rule for clearing the sticky flag.

Detail: see current-state for the live mapping; per-path PI mechanism in cs-pi, NTSync PI Kernel, and NTSync Userspace Sync.

7. Status reference

The canonical status board lives at current-state. It tracks active surfaces, current defaults, validation totals, and the current dispatcher and memory tuning notes.

This document describes system structure. current-state.md records current validation and default polarity.

8. Document index

Master overview (this doc) plus dedicated subsystem pages.

Doc	Subject
`architecture.gen.html` (this doc)	Master overview – shape and structure
`current-state.gen.html`	Live state board – active surfaces, remaining work, validation totals, recent arc
`aggregate-wait-and-async-completion.gen.html`	Aggregate-wait plus same-thread async completion architecture
`client-scheduler-architecture.gen.html`	spawn-main + `ntdll_sched`, default-class and RT-class sched hosts, and their consumers
`wineserver-decomposition.gen.html`	Long-horizon wineserver decomposition plan
`gamma-channel-dispatcher.gen.html`	Per-process kernel-mediated wineserver IPC (gamma dispatcher)
`nt-local-stubs.gen.html`	NT-local stubs architectural pattern
`local-section-architecture.gen.html`	client-side unnamed file-backed sections on top of local-file handles
`nspa-local-file-architecture.gen.html`	NT-local file path for bounded regular-file and explicit-directory opens
`msg-ring-architecture.gen.html`	Same-process message rings, redraw push ring, paint cache, and the `get_message` empty-poll cache
`memory-and-large-pages.gen.html`	large pages, working-set reporting, automatic hugetlb promotion, quota bookkeeping, and shared-memory backing choices
`hot-path-optimizations.gen.html`	cross-cutting optimization choices: published-state caching, TEB-relative hot state, cache/slab layout, helper inlining, SIMD string/Unicode loops, and GUI flush trims
`thread-and-process-shared-state.gen.html`	server-published thread/process snapshots, query bypass coverage, and zero-time process/thread waits
`hook-cache.gen.html`	Tier 1+2 Win32 hook-chain cache
`ntsync-pi-driver.gen.html`	NTSync PI kernel overlay: PI baseline, channel transport, aggregate-wait, and later kernel hardening
`ntsync-userspace.gen.html`	Wine in-process sync path: handle-to-fd cache, client-created sync objects, direct wait/signal helpers, and dispatcher-facing wrappers
`cs-pi.gen.html`	Critical Section Priority Inheritance (Path A; v2.3)
`condvar-pi-requeue.gen.html`	`RtlSleepConditionVariableCS` with `FUTEX_WAIT_REQUEUE_PI` (Path D)
`io_uring-architecture.gen.html`	`io_uring` file I/O, async `CreateFile`, and socket SQEs
`audio-stack.gen.html`	winejack.drv + nspaASIO low-latency audio path
`nspa-x11-embed-protocol.gen.html`	Wine-NSPA atomic X11 embed contract for winelib hosts
`nspa-wayland-embed-protocol.gen.html`	Wine-NSPA native Wayland embed contract for winelib hosts
`juce-nspa.gen.html`	JUCE-NSPA framework substrate for Linux winelib Windows-plugin hosts
`element-plugin-host.gen.html`	Lulada DAW with recording, sequencing, JACK routing, and X11/Wayland plugin embedding
`yabridge-nspa.gen.html`	Yabridge-NSPA bridge alignment for native Linux DAWs hosting Windows plugins
`nspa-rt-test.gen.html`	nspa_rt_test PE harness reference
`decoration-loop-investigation.gen.html`	X11 decoration-loop bug 57955 case study
`sync-primitives-research.gen.html`	Background research on sync-primitive selection
`shmem-ipc.gen.html`	Historical – legacy shmem dispatcher (superseded by gamma)

The live defaults, current overlay, and exact validation totals are maintained on current-state.gen.html rather than repeated here.

Master overview updated 2026-05-30. Per-subsystem detail is in the dedicated pages linked above; live state is in current-state.gen.html.