Wine 11.6 + NSPA RT patchset | Kernel 6.19.x-rt with NTSync PI | 2026-04-30 Author: Jordan Johnston Status: architectural pattern reference for shipped client-local NT bypasses and their fallback discipline.
This page explains the NT-local stub pattern itself: what gets moved into the client process, how cross-process arbitration is still preserved, and how the shipped stubs fit into the broader decomposition story.
Wineserver is the historical bottleneck of Wine on PREEMPT_RT. Every NT-API
call that touches kernel-mediated state – a file handle, a synchronization
object, a timer, a message queue, a window – traditionally crosses a
request-shmem RPC into wineserver, where one global pi_mutex_t global_lock
is held while a handler runs the request. Throughput is fine on idle systems;
latency is not. Under contention the global lock serialises every handler,
priority-inverts low-priority handlers against the RT audio thread, and
turns every NT-API call into a queue-depth-bounded wait. This is the lock
that perf 2026-04-26 keeps showing in every wineserver capture: channel_dispatcher
6-11%, get_ptid_entry 1-10%, main_loop_epoll 2-7%, all under one lock.
NSPA’s response is not “rewrite wineserver from scratch”. It is an architectural pattern we call NT-local stubs: client-process-resident handlers that satisfy a class of NT-API calls without crossing into wineserver, and fall back to the server only when an honest cross-process arbitration is required. Each stub picks an NT surface, owns its own data structures (a private handle range, a per-process table, a shmem region, a dispatcher thread), and short-circuits the server when it can.
The pattern is already shipping. As of 2026-04-27 there are three live NT-local stubs in tree:
| Stub | NT surface | Lives in |
|---|---|---|
nspa_local_file |
NtCreateFile (read-only regulars) + downstream NtReadFile, NtClose, NtQueryInformationFile, NtCreateSection, NtDuplicateObject routing |
dlls/ntdll/unix/nspa/local_file.c |
nspa_local_timer |
NtCreateTimer / NtSetTimer / NtCancelTimer / NtQueryTimer (anonymous) |
dlls/ntdll/unix/nspa/local_timer.c |
nspa_local_wm_timer |
NtUserSetTimer / NtUserSetSystemTimer / NtUserKillTimer / WM_TIMER posting |
dlls/win32u/nspa/local_wm_timer.c |
Each stub is independent – they do not share state, do not coordinate,
and can be enabled or disabled independently via env vars. Together they
form a strategy: shrink wineserver request-by-request, until the
handlers that remain are honest and small. The end state – which is
the long arc of wine/nspa/docs/wineserver-decomposition-plan.md – is
a metadata service for cross-process arbitration only, not an application
server.
Every stub follows the same skeleton. Strip away the API-specific details and the structure is:
NTSTATUS NtSomething( ..., IO_STATUS_BLOCK *io )
{
if (NSPA_LOCAL_X_active())
{
NTSTATUS bypass = nspa_local_X_try_bypass( ... );
if (bypass == STATUS_SUCCESS) return STATUS_SUCCESS;
if (bypass == STATUS_<real_error>) return bypass;
/* otherwise STATUS_NOT_SUPPORTED -- fall through */
}
/* original server path -- unmodified */
...
}
The five invariants every stub honours:
STATUS_NOT_SUPPORTED /
STATUS_NOT_IMPLEMENTED. Anything outside the stub’s correctness
envelope falls back to the unmodified server path. No silent
correctness drift – the stub either handles the call exactly as
the server would, or refuses it.pi_mutex_t
(PI-priority-boosting under PREEMPT_RT), mutates its in-memory
tables, and releases. No blocking syscall, no RPC, no inter-stub
call is made under the stub’s lock. (See §6.)NSPA_DISABLE_LOCAL_TIMERS, NSPA_DISABLE_LOCAL_WM_TIMERS,
NSPA_LOCAL_FILES, etc.). When the gate is off the stub returns
STATUS_NOT_SUPPORTED from its first entry point and the original
server path is used unchanged. This is the bisection mechanism
when something regresses.STATUS_* returns,
or IO_STATUS_BLOCK content.The shape is mechanical enough that a new stub for a new NT surface –
say NtQueryDirectoryFile – would be drop-in: define a private handle
range or reuse one, build a per-process table, decide an eligibility
predicate, write the bypass entry point, gate it behind an env var.
The pattern itself is the reusable element; the per-stub specifics are
surface-dependent.
A reasonable counter-proposal to “ship many NT-local stubs” is “rewrite
wineserver as a multi-threaded handler with per-subsystem locks”. This
is the §3.4 of wineserver-decomposition-plan.md. It is a valid
long-horizon target but the wrong starting move, for three reasons:
Audit surface. Wineserver assumes “nothing else changes during my handler” pervasively. Every handler reads + writes shared state, often mutually entangled (a file open might create a kernel object that is named in the object directory that is held in the handle table that is referenced from the parent process’s handle table). Partitioning the lock means proving every cross-subsystem invariant explicitly. That’s months of audit work for one architectural change, and the change does not deliver any user-visible improvement until the whole partitioning is done.
Incremental migration vs full rewrite. Each NT-local stub
intercepts one specific NT surface, services it from outside the
server, and routes anything it cannot model back to the server. The
interception point is the stub’s eligibility check; the routing path is
the STATUS_NOT_SUPPORTED fallback. Each stub independently delivers a
measurable win on its NT surface and is independently revertible. After
enough surfaces move out, what remains in wineserver is the small set
of handlers that genuinely need cross-process arbitration – the list
in §4 of the decomposition plan: cross-process object naming, process /
thread lifecycle, handle inheritance, and NT-specific path resolution.
Practical wins happen before the lock dissolves. Consider Ableton’s
startup profile: ~28,500 file opens in the first session boot, of which
the vast majority are read-only regular-file opens of DLL manifests,
.pyc files, and theme resources. Every one of those round-trips into
the server today, holds the global lock during sharing arbitration, and
returns. The local-file bypass eliminates the round-trip for the
eligible subset; lock contention drops because the server is not
running the handler at all. Lock partitioning would help the handlers
that are still running. Eliminating the handler entirely is a
strictly bigger win.
The progressive-stubs strategy lets us sequence the work:
perf captures.This is the inverted ordering relative to “monolithic rewrite” and it
matches the de-risking discipline the rest of the project uses
(feedback_validate_before_default_on.md, feedback_ship_default_off.md).
Some NT semantics genuinely cross processes. A FILE_SHARE_NONE open
in process A must reject a subsequent open in process B. A named
synchronization object created in A must be openable by name from B.
A handle inherited from a parent at process create must be reachable
from the child. These cannot be serviced from within one client’s
address space alone – somebody has to arbitrate.
Stubs handle cross-process state in one of two ways:
(a) Process-shared shmem with a PSHARED PI mutex. When the
arbitration data is small, well-bounded, and update-rate-limited, the
stub publishes it into a shared memory region. Wineserver creates the
region (memfd-backed) at first request from a client, and clients
mmap it into their own address space. A pi_mutex_t in the shmem,
initialised with RTPI_MUTEX_PSHARED, serialises writers across
processes; readers use a seqlock and never block. The local-file
bypass uses this exclusively for its inode-aggregation table:
/* server/nspa/local_file.c:113 */
for (b = 0; b < NSPA_INODE_BUCKETS; b++)
pi_mutex_init( nspa_lock_of( (nspa_inode_bucket_t *)&t->buckets[b] ),
RTPI_MUTEX_PSHARED );
Writers (server + clients) take the per-bucket PSHARED PI mutex, bump the bucket’s seq odd, mutate the slot, bump the seq even. Readers take no lock – they ACQUIRE-load the seq, copy the slot, ACQUIRE-load the seq again, and retry on mismatch. Bounded retry (8 attempts in the local-file table) followed by silent fall-back to “treat as not found, ask the server” keeps reads RT-safe. The lock holders are the processes that own opens; they boost each other under PI when the server thread holding the same bucket is at low priority.
(b) Bypass disabled for the cross-process case. When the
arbitration is too entangled to map onto shmem-with-seqlock, the stub
simply refuses the cross-process path. Anonymous timers are eligible
for the local-timer bypass; named timers (OBJECT_ATTRIBUTES->ObjectName
non-empty) fall through to the server, because their cross-process
visibility lives in the NT object directory which is only accessible
from the server. The local-WM_TIMER stub is even stricter: hwnds owned
by other processes are server-only:
/* dlls/win32u/nspa/local_wm_timer.c:474 */
if (owner_pid != GetCurrentProcessId()) return STATUS_NOT_IMPLEMENTED;
The cost of refusal is the lost optimisation on that one call. The benefit is that the stub never has to model cross-process semantics that are honestly server-side. Cross-process correctness stays where it belongs.
Rule of thumb: if the arbitration data is <= 256 KB, idempotent
under retry, and read-mostly, push it through shmem (option a). If
it’s bigger, mutable, or tied to NT object naming, refuse the bypass
(option b). The local-file bypass is currently the only stub using
option (a); the timer stubs both use (b). Future stubs (named pipes,
section objects) will likely require (a) for their refcount tables.
Stubs intercept the creation call. They do not (necessarily) intercept
every downstream API on the resulting handle. NT has dozens of
NtSomethingFile syscalls; instead of stubbing all of them, NSPA’s
stubs cover the high-volume hot path (open + read + close) and lazily
mint a server-recognised handle on demand for the rare downstream
APIs that need it.
Mechanism, using the local-file bypass as the canonical example:
NtCreateFile returns a local-range handle (0x7FFFCxxx –
0x7FFFFFFE). The handle is recognised as local by a constant-time
range check nspa_local_file_is_local_handle(h).NtReadFile, NtClose, and server_get_unix_fd all check the
range and route to the local table – zero RPC.Some less-common API (NtCreateSection, NtDuplicateObject,
NtQueryInformationFile for an unsupported info class) genuinely
needs a server-side handle. Those entry points call:
/* dlls/ntdll/unix/nspa/local_file.c:1414 */
HANDLE nspa_local_file_get_or_promote_server_handle( HANDLE local_handle );
On first call, it issues nspa_create_file_from_unix_fd – an
NSPA-specific RPC that hands the server the unix fd, a path string,
and access bits, and gets back a fresh server handle. The server
handle is cached in the per-process opens table; subsequent calls
on the same local handle return the cached promoted handle without
another RPC.
The principle: stubs accelerate the dominant API set; promotion preserves correctness for the long tail. The long-tail APIs cost exactly one extra RPC (the promotion), once per lifetime of the handle. Compare to the no-bypass case where the entire lifetime is RPCs.
The same lazy-mint pattern shows up elsewhere. NTSync direct-sync
(client-side sync object creation, feedback_test_with_pe_binaries.md)
mints client-range NTSync handles that bypass wineserver, then
promotes to a server handle on first cross-process duplication.
Section objects that back local-file handles get promoted via
nspa_create_mapping_from_unix_fd (the same shape as the file
promotion). Each promotion path is a small RPC that wineserver still
serves – but only at the moment of promotion, not on every API
afterwards.
References:
| Stub | Promotion entry point | RPC |
|---|---|---|
nspa_local_file |
nspa_local_file_get_or_promote_server_handle (dlls/ntdll/unix/nspa/local_file.c:1414) |
nspa_create_file_from_unix_fd |
nspa_local_file (sections) |
NtCreateSection intercept |
nspa_create_mapping_from_unix_fd |
| NTSync direct-sync | client-range handle promote on dup | existing wineserver dup_handle |
nspa_local_timer |
(none – backing event is server-allocated up front) | NtCreateEvent |
nspa_local_wm_timer |
(none – pure shmem dispatch) | n/a |
Notice that the nspa_local_timer stub takes a different design:
rather than returning a private handle and lazily promoting, it
returns a server-allocated NtCreateEvent handle from the start.
The bypass is in the expiry path (zero-RTT NtSetEvent instead of
a server timer-fire), not in the creation path. Same goal – avoid
RPCs on the hot path – different mechanics. Per-stub design choice.
Every NT-local stub takes a lock during table mutation. None of them hold that lock across an RPC, a blocking syscall, or a callback into unrelated code. This is not optional – it is the property that lets the stubs be safe to call from RT-priority client threads and from audio-callback contexts.
The discipline, as written in feedback_never_fifo_busyloops.md and
the wine-nspa-lockup-audit-20260427.md audit:
pi_mutex_t, mutate in-memory
tables, release. Worst-case hold time is dozens of nanoseconds.open(), stat(), read(),
mmap() all happen outside the per-stub lock. The stub
sequences them: stat first (no lock), check + publish under
lock, open second (no lock), insert under lock.wine_server_call would defeat the
entire point of the stub – it’s the call we’re trying to
eliminate. RPCs happen before the lock is taken (e.g. table
mmap-fetch via nspa_get_inode_table at first-use) or after
it’s released (e.g. lazy server-handle promotion).NtSetEvent and NtQueueApcThread – both of which can wake
threads that try to re-enter the stub. The fire path drops the
lock first, fires, re-acquires.The local-timer dispatcher’s fire loop is a clean illustration:
/* dlls/ntdll/unix/nspa/local_timer.c:381-432 */
if (!list_empty( &fire_batch ))
{
pi_mutex_unlock( &timer_lock );
LIST_FOR_EACH_ENTRY_SAFE( t, next, &fire_batch, ... )
{
list_remove( &t->queue_entry );
if (!t->cancelled) fire_timer( t ); /* NtSetEvent + APC */
pi_mutex_lock( &timer_lock );
/* re-arm under lock, then drop */
...
pi_mutex_unlock( &timer_lock );
}
pi_mutex_lock( &timer_lock );
continue;
}
The lock is dropped before any NtSetEvent – which can wake a higher-
priority thread that immediately calls NtSetTimer or NtCancelTimer,
both of which take timer_lock. If the dispatcher held timer_lock
through the fire, the woken thread would block on the dispatcher’s
release – a textbook unbounded priority inversion mediated by PI on
the lock, which is survivable but expensive. Dropping the lock
makes the inversion impossible at the cost of a refcount on the
in-flight entry (t->refcount++ before drop, --t->refcount after
re-acquire) so a concurrent close can’t free the entry mid-fire.
This pattern – “drop lock, do the dangerous thing, re-acquire” –
shows up in every stub. It is the same discipline feedback_no_blocking_under_lock.md
calls out for the wineserver itself; the stubs apply it locally.
The 2026-04-27 audit (see wine-nspa-lockup-audit-20260427.md)
explicitly verified this property for every NT-local stub:
Out of scope: ntsync.c (validated), client-side bucket_lock discipline (other-process behaviour), Ableton-internal bugs, performance. […] F1-F12: every NT-local stub holds its lock for in-memory mutation only. No syscalls, no RPCs, no callbacks under any stub’s lock. Refcount patterns ensure entries survive lock drops in fire / promote paths.
The audit was a precondition for re-arming the lockup repro. With the discipline confirmed, the path forward is more stubs – not different locking.
nspa_local_file – NtCreateFile bypassSurface: NtCreateFile for read-only regular-file opens, plus
downstream NtReadFile / NtClose / NtQueryInformationFile /
NtCreateSection / NtDuplicateObject routing through promotion.
Files:
| Path | Role |
|---|---|
dlls/ntdll/unix/nspa/local_file.c (1630 lines) |
client-side stub: handle range, table, lookup, bypass entry, promotion |
server/nspa/local_file.c (300 lines) |
server-side cross-process inode-aggregation shmem |
include/wine/server_protocol.h |
nspa_get_inode_table, nspa_create_file_from_unix_fd requests |
Entry point: nspa_local_file_try_bypass at local_file.c:1238.
Called from dlls/ntdll/unix/file.c:4717 inside NtCreateFile,
right after path resolution.
Eligibility predicate: FILE_OPEN or FILE_OPEN_IF-on-existing,
access is read-only after GENERIC_* expansion, options does
not include FILE_DIRECTORY_FILE / FILE_DELETE_ON_CLOSE / open-by-id,
no attr->RootDirectory, no security descriptor, target is S_ISREG.
Encoded in the categoriser at local_file.c:281-345. Anything else
returns STATUS_NOT_SUPPORTED and falls through.
Private handle range: [0x7FFFC000, 0x80000000) (16 KiB window,
4096 handles), excluding 0x7FFFFFFF for CURRENT_PROCESS. Range
check via nspa_local_file_is_local_handle(h) is constant-time.
Per-process table: linked list of struct nspa_local_open under
a single pi_mutex_t nspa_lf_opens_mutex. Each entry caches:
- the local handle returned to the app
- the unix fd
- (device, inode) for shmem cross-process arbitration
- access / sharing bits
- the original NT path string for GetFinalPathNameByHandle
- the lazy-promoted server handle (0 until first promote)
Cross-process arbitration: server-allocated memfd-backed shmem
region of nspa_inode_bucket_t buckets. Each bucket has a PSHARED
PI mutex, a seqlock, and N subentries (one per process holding an
open of any inode that hashes to this bucket). Server publishes its
own subentry under index 0; clients publish into 1..N-1. Hash:
device * 0x9E3779B97F4A7C15 + inode, mixed twice – byte-identical
between client and server (local_file.c:180 mirrors
server/nspa/local_file.c:135).
Lock discipline: nspa_lf_opens_mutex covers in-memory list
operations only; stat, open, mmap happen outside. The PSHARED
bucket mutex is taken cross-process for the seqlock-write, dropped
before any further work. Reads are seqlock-bounded with 8-retry cap.
Lazy promotion: nspa_local_file_get_or_promote_server_handle
mints a server handle via nspa_create_file_from_unix_fd RPC and
caches it. Long tail of NT-API surface (info classes, section creation,
duplication) routes through the promoted handle.
Feature gate: NSPA_LOCAL_FILES=1 (default-off as of 2026-04-27,
pending production validation; the table machinery is always built).
Dedicated reference: nspa-local-file-architecture.gen.html –
the per-stub deep dive. Read it for the full design rationale,
sharing-arbitration semantics, eligibility matrix, and Phase history.
nspa_local_timer – NtSetTimer fast pathSurface: NtCreateTimer (anonymous only), NtSetTimer,
NtCancelTimer, NtQueryTimer.
Files:
| Path | Role |
|---|---|
dlls/ntdll/unix/nspa/local_timer.c (713 lines) |
the entire stub: dispatcher, table, fire path |
Entry points (all in dlls/ntdll/unix/sync.c, all check
nspa_local_timer_* first and fall through on STATUS_NOT_IMPLEMENTED):
| Sync.c line | Function | Stub call |
|---|---|---|
| 2655 | NtCreateTimer |
nspa_promote_if_local (no-op for timers) |
| 2709 | NtCreateTimer |
nspa_local_timer_create |
| 2768 | NtSetTimer |
nspa_local_timer_set |
| 2803 | NtCancelTimer |
nspa_local_timer_cancel |
| 2837 | NtQueryTimer |
nspa_local_timer_query |
Design twist: unlike the local-file bypass, the timer stub does
not return a private handle. The handle returned to the app is a
server-allocated NtCreateEvent handle (local_timer.c:485-489):
event_type = (timer_type == NotificationTimer) ? NotificationEvent : SynchronizationEvent;
if ((ret = NtCreateEvent( &event, access, NULL,
event_type, FALSE )))
return ret;
The bypass is in the firing path, not the creation path. A
dispatcher pthread (SCHED_FIFO at NSPA_RT_PRIO - 1) sleeps on a
pi_cond_t with CLOCK_MONOTONIC absolute deadlines. On expiry it
issues NtSetEvent against the backing event handle. Because the
backing handle is an NTSync event with direct-sync enabled (the
client-range fast-path), NtSetEvent is a futex-only operation –
no wineserver RPC. Net result: timer expiry costs a pi_cond_timedwait
wake-up + an NtSetEvent futex op + optional NtQueueApcThread RPC.
Server’s old timer_callback path is bypassed entirely.
Eligibility: anonymous only (!attr->ObjectName). Named timers
fall through to the server because their cross-process visibility
lives in the NT object directory.
Clock semantics: internal deadlines are CLOCK_MONOTONIC
absolute nanoseconds. NT relative when (negative LARGE_INTEGER)
maps cleanly – elapsed time is what monotonic measures. NT absolute
when (positive FILETIME) is converted at insert time via the
current CLOCK_REALTIME / CLOCK_MONOTONIC offset; an NTP step
between insert and fire is absorbed at the cost of the step size.
For audio/RT workloads using only relative timers, this is the
correct trade. (See local_timer.c:34-48 for the design comment.)
Lock discipline: pi_mutex_t timer_lock covers the table and
the deadline queue. fire_timer (which calls NtSetEvent and
NtQueueApcThread) runs outside the lock with a refcount on the
entry. The dispatcher loop at local_timer.c:346-437 is the
canonical drop-fire-reacquire pattern.
Feature gate: NSPA_DISABLE_LOCAL_TIMERS (default-on; gate
disables the bypass).
nspa_local_wm_timer – WM_TIMER dispatcherSurface: NtUserSetTimer, NtUserSetSystemTimer, NtUserKillTimer,
NtUserKillSystemTimer, plus WM_TIMER / WM_SYSTIMER posting into
the message queue.
Files:
| Path | Role |
|---|---|
dlls/win32u/nspa/local_wm_timer.c (638 lines) |
the entire stub: dispatcher, table, ring publish |
dlls/win32u/message.c:4694, 4736, 4770, 4791 |
call-sites in NtUserSetTimer etc. |
Entry points (all check nspa_local_wm_timer_* first and fall
through on STATUS_NOT_IMPLEMENTED):
| message.c line | Function | Stub call |
|---|---|---|
| 4694 | NtUserSetTimer |
nspa_local_wm_timer_set(WM_TIMER) |
| 4736 | NtUserSetSystemTimer |
nspa_local_wm_timer_set(WM_SYSTIMER) |
| 4770 | NtUserKillTimer |
nspa_local_wm_timer_kill(WM_TIMER) |
| 4791 | NtUserKillSystemTimer |
nspa_local_wm_timer_kill(WM_SYSTIMER) |
Design twist: WM_TIMERs are posted into the owner thread’s
message queue, not the calling thread’s. The stub resolves
hwnd -> owner_tid at SetTimer time (which is when the caller has
a wineserver session and can do the resolution), caches a pointer
into the peer’s nspa_queue_bypass_shm_t, and the dispatcher
thread later writes WM_TIMER ring slots into that peer shmem
directly:
/* dlls/win32u/nspa/local_wm_timer.c:478-484 */
if (owner_tid == GetCurrentThreadId())
peer_shm = nspa_get_own_bypass_shm_public();
else
peer_shm = nspa_get_peer_bypass_shm_public( owner_tid );
if (!peer_shm) return STATUS_NOT_IMPLEMENTED;
The dispatcher pthread is a pure shmem-writer – it never enters
wineserver. peek_message on the consumer side drains the ring
client-side via nspa_try_pop_own_timer_ring. End-to-end, a
WM_TIMER expiry is a pi_cond_timedwait wake + a ring slot store +
a consumer ring-pop. Server’s old pending_timers / expired_timers
dispatch is bypassed entirely.
Cross-process refusal: hwnds owned by other processes are
explicitly rejected (local_wm_timer.c:474). Cross-process WM_TIMERs
go through the server, where the server can do the cross-process
post correctly. The bypass declines cases it can’t handle as
correctly as the server – option (b) of §4.
Coalescing semantics: NT’s WM_TIMER has implicit coalescing –
if the message pump stalls across N periods, the app sees one
WM_TIMER. The stub replicates this by tagging entries with in_ring
on publish; the dispatcher only re-arms once the consumer drains
the slot (local_wm_timer.c:281-309). Server’s restart_timer
discipline mapped to a slot-state check.
Lock discipline: pi_mutex_t wm_timer_lock covers the table
and wheel. Ring publishes use atomic state machine on the slot
without taking the lock – the slot’s state byte is the
synchronisation point with the consumer. The dispatcher does not
issue any wineserver RPC.
Feature gate: NSPA_DISABLE_LOCAL_WM_TIMERS (default-on; gate
disables the bypass).
2026-04-30 follow-up: 78947c1
tightened the eligibility predicate so TIMERPROC and cross-thread
SetTimer cases now refuse the stub and defer to the server. That is
the correct NT-local-stub shape: keep the cheap owner-thread path local,
and route anything with ambiguous ownership or callback semantics back
to the authoritative server path.
The pattern is generalisable. A handful of high-volume NT surfaces remain in the server today; each is a candidate for a future stub.
NtQueryDirectoryFile. Directory enumerations dominate certain
workloads (file dialogs, plugin scanners, installers). A stub backed
by a per-process directory-handle table and getdents64 syscalls,
returning a private handle range, is a natural extension of the
local-file bypass. Eligibility predicate: read-only directory open,
no notify, single-process consumer. Cross-process visibility is not
a concern – directory contents are filesystem state, the kernel is
the source of truth, no wineserver mediation is needed once the
handle is in our table.
NtFsControlFile for named-pipe and registry ops. Many FSCTLs
(query mountpoint, query partition, registry queries) are read-only
and don’t need server arbitration. Per-FSCTL stubbing is feasible.
Section objects. Already partially refcounted (see
project_kernel_refobj_future_uses.md). A future stub could absorb
read-only section creation backed by a local-file unix fd, in the
same shape as nspa_local_file_get_or_promote_server_handle does
today for sections opened from a local file. The mapping table
would live alongside the file table or in its own region.
More message-queue carve-outs. The nspa_msg_ring v2 work
(see MEMORY.md indices) is a stub-shaped bypass for message
posting and peeking. Phase C “get_message bypass” continues this
direction: the bucketing diag shipped (project_msg_ring_v2_phase_c_stage1_validated)
identified that server-generated WM_PAINT / hardware / winevent
messages dominate the Ableton workload’s get_message traffic. A
stub for those classes would push the consumer side fully off the
server’s find_msg_to_get path.
Timer further work. The NtSetTimerEx variant with completion
ports is currently always server-routed. A completion-port-aware
local-timer stub would extend nspa_local_timer to handle that
case, at the cost of building a per-process completion-port-mux.
The roadmap section of wineserver-decomposition-plan.md has the
broader timeline; this list is the NT-local-stub-shaped subset.
Each NT-local stub shrinks the wineserver footprint along one NT surface. As the surfaces add up, the decomposition plan changes:
Today (2026-04-27): wineserver runs channel_dispatcher 6-11%
of CPU under load, get_ptid_entry 1-10%, main_loop_epoll 2-7%.
The hot path is the channel-RECV / handler / channel-REPLY loop
running under global_lock. Stubs are absorbing the
highest-frequency NT calls (file opens, anonymous timers, owner-
process WM_TIMERs); the channel traffic is shrinking in those
classes.
Mid-term: as more stubs ship – directory enumeration, named-pipe FSCTLs, section objects – the channel traffic that remains in the server is exactly the cross-process arbitration set of §4 of the decomposition plan. At that point the §3.2 “router/handler split” of the decomposition plan becomes natural: the router’s “fast-path” is just “pass to a stub if a stub claims the call”. Most of the codebase is already structured this way – the stub call is the first thing the NT entry point does, so the “router” is the current control flow. Decomposition becomes folding existing handlers into nt-local stubs, not rewriting wineserver.
End-state: wineserver is a metadata service. Cross-process object naming, process / thread lifecycle, handle inheritance, NT-specific path resolution – the §4 list. At that point the §3.4 “lock partitioning” question is not “should we” but “is it worth the audit”. If the metadata service is doing < 0.5% CPU end-to-end under audio + UI load, the answer is probably no – and the lock stays a single global pi_mutex_t because there’s nothing left to contend on.
The migration is complete when wineserver is small enough that it is no longer a meaningful hot-path component.
| Reference | Purpose |
|---|---|
nspa-local-file-architecture.gen.html |
Per-stub deep dive on the local-file bypass |
dlls/ntdll/unix/nspa/local_file.c:1238 |
nspa_local_file_try_bypass entry |
dlls/ntdll/unix/nspa/local_file.c:1414 |
nspa_local_file_get_or_promote_server_handle (lazy promotion) |
dlls/ntdll/unix/nspa/local_file.c:1207 |
nspa_local_file_is_local_handle (range check) |
server/nspa/local_file.c:113 |
PSHARED PI mutex init for inode-table buckets |
server/nspa/local_file.c:181 |
nspa_inode_publish_slot seqlock-write protocol |
dlls/ntdll/unix/nspa/local_timer.c:346 |
dispatcher_main (drop-fire-reacquire) |
dlls/ntdll/unix/nspa/local_timer.c:465 |
nspa_local_timer_create (anonymous-only check) |
dlls/ntdll/unix/sync.c:2709 |
NtCreateTimer call-site for local-timer stub |
dlls/win32u/nspa/local_wm_timer.c:227 |
publish_timer_slot (peer-shmem ring write) |
dlls/win32u/nspa/local_wm_timer.c:457 |
nspa_local_wm_timer_set (cross-process refusal) |
dlls/win32u/message.c:4694 |
NtUserSetTimer call-site for WM_TIMER stub |
wine/nspa/docs/wineserver-decomposition-plan.md |
Long-arc decomposition plan (NT-local stubs are §3.x) |
wine/nspa/docs/wine-nspa-lockup-audit-20260427.md |
Lock-discipline audit confirming every stub holds locks briefly |
wine/nspa/docs/local-file-bypass-design.md |
Original design doc for the local-file bypass |
MEMORY.md: project_msg_ring_v2_phase_c_stage1_validated |
Phase C msg-ring bucketing diag (basis for future message-queue stubs) |