Wine 11.6 + NSPA RT patchset | Kernel 6.19.x-rt with NTSync PI | 2026-04-23 Author: jordan Johnston
Wine-NSPA’s local-file bypass (NSPA_LOCAL_FILES=1) services read-only regular-file NtCreateFile calls entirely within the client process. Every eligible open would otherwise cost a full wineserver round-trip: the client builds a create_file request, the server allocates a struct file + inode tracking + handle entry, returns a server-visible handle, then every subsequent NtReadFile / NtQueryInformationFile / etc fires another round-trip. For an app like Ableton Live 12 Lite that does roughly 28,500 file opens in a single startup session – DLL manifests, .pyc files, theme resources, Live Library indexes – those round-trips dominate startup profile and show up as real latency on the main thread.
The bypass routes eligible opens to a client-private handle range, maintains a per-process table that owns the unix fd, and exposes the unix fd to every Wine I/O path via a thin fast-path check inside server_get_unix_fd. When an API needs server-side state (section mapping, query-by-handle, inheritance), the bypass lazily promotes the local handle to a server-recognised handle on demand.
The feature is invisible to Win32 applications: same CreateFile semantics, same sharing arbitration, same io->Information = FILE_OPENED return value, same behaviour on every downstream API. Apps see identical functional behaviour whether the bypass is enabled or not – the difference is measurable only in profiler output and perceived startup latency.
Ableton’s startup profile exposed a large population of short-lived file opens:
| Pattern | Example |
|---|---|
| DLL manifest lookups | C:\windows\winsxs\manifests\amd64_microsoft.windows.common-controls_*.manifest |
| Python bytecode loads | .../Resources/Python/abl.live/**/*.pyc |
| Theme resources | C:\windows\resources\themes\aero\aero.msstyles |
| Clock source probes | /sys/bus/clocksource/devices/clocksource0/current_clocksource |
| Ableton library indexes | C:\users\ninez\AppData\Local\Ableton\Live Database\Live-files-*.db |
| Live Packs | C:\ProgramData\Ableton\Live 12 Lite\Resources\Graphics.alp |
Each open is cheap on its own (a few µs) but the aggregate is hundreds of millisecond-scale server traffic during startup – and the startup is happening on the main thread, which is where paint and UI dispatch live. Eliminating the server round-trip on these opens directly reduces time-to-first-paint and reduces steady-state priority-inversion risk on the RT audio path (server’s single-threaded main loop services all requests).
Other candidate workloads with similar profiles: plugin scanners (hundreds of VST probe opens), .NET apps (thousands of assembly-manifest reads at JIT time), installers (cache-file probes), and any Windows application using Python or Lua as an embedded runtime.
[0x7FFFC000, 0x80000000) that is disjoint from the server’s normal handle allocation (low-to-mid) and from the NTSync client-handle range. Any caller that does nspa_local_file_is_local_handle(h) can cheaply tell whether a handle is ours.FILE_SHARE_NONE we must honour that. A server-published shmem region carries (dev, inode) -> (aggregate-access, aggregate-sharing, refcount) so client-side arbitration matches what server-side check_sharing would enforce.NtQueryInformationFile, NtDuplicateObject, NtCreateSection, …), the bypass does a single nspa_create_file_from_unix_fd RPC that hands the unix fd to the server and gets back a real server handle. Subsequent calls on the same local handle reuse the cached promoted handle.stat() + linked-list-walk-under-lock + open() + list insert – no syscall other than the two that are inherent to the work. No lazy-init on the hot path (see Phase 1A.9 init-fix).STATUS_NOT_SUPPORTED and the caller falls through to the normal server_create_file path. Anything the bypass doesn’t handle is handled by vanilla Wine unchanged.FILE_OPEN / FILE_OPEN_IF-on-existing-file dispositions, only read-only access masks, only regular files, only synchronous (FILE_SYNCHRONOUS_IO_*) opens. Anything outside the envelope goes to the server.Local handles are allocated from the fixed range [NSPA_LF_HANDLE_BASE, 0x80000000) where NSPA_LF_HANDLE_BASE = 0x80000000 - NSPA_LF_HANDLE_CAP*4 with NSPA_LF_HANDLE_CAP = 4096. That gives an exact 16 KiB handle window disjoint from:
0x4, grows up)~0..~5)INPROC_SYNC_CACHE_TOTAL)nspa_local_file_is_local_handle(h) is a constant-time range check: base <= h < 0x80000000 && h != 0x7FFFFFFF (the last exclusion is for the CURRENT_PROCESS pseudo-handle which would otherwise land inside the range). The check is called from every NT-API intercept site to decide whether to take the bypass path or fall through.
c
struct nspa_local_open {
struct list entry;
HANDLE handle; /* local-range handle returned to app */
HANDLE server_handle; /* lazy-promoted; 0 until first promote */
int unix_fd;
unsigned long long device;
unsigned long long inode;
unsigned int access;
unsigned int sharing;
unsigned int options; /* FILE_OPEN options: SYNC_IO_NONALERT, etc */
unsigned int attributes; /* OBJ_INHERIT forwarded on promote */
WCHAR *nt_name; /* original NT path for GetFinalPathNameByHandle */
USHORT nt_name_len;
};
Protected by a single process-wide PI mutex (nspa_lf_opens_mutex). Linear list – walk is O(N) per lookup. For Ableton’s typical workload the list reaches a few hundred entries at peak; the walk is in the noise next to a server RTT it avoids.
Table add (on mint) and remove (on close) are the only writers. Every other operation (lookup, promote lookup) is a read under the same lock. The lock is a PI mutex because RT-priority threads occasionally open files at init and we cannot have a low-priority thread holding the lock against the audio callback.
Windows file sharing (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE) has cross-process semantics: if Process A opens foo with sharing=0, Process B’s open of foo must fail with STATUS_SHARING_VIOLATION. Any pure-local bypass has to see what other processes have done on the same (device, inode).
The wineserver publishes a NSPA_INODE_BUCKETS = 1024 bucket hash table as a memfd-backed shmem region. Each bucket has 4 slots of (dev, inode, agg_access, agg_sharing, refcount) + a per-bucket PI mutex. Clients map the region read-only for arbitration lookups, read-write on the mutex word for publishing their own opens.
┌───────────────────────────────────────────────┐
│ nspa_inode_table_shm_t (~160 KB) │
├───────────────────────────────────────────────┤
│ buckets[0..1023] │
│ ├─ lock_storage (pi_mutex_t, 64 B) │
│ ├─ slot[0] (dev, ino, access, share, ref) │
│ ├─ slot[1] │
│ ├─ slot[2] │
│ └─ slot[3] │
└───────────────────────────────────────────────┘
Bucket index = hash(dev, inode) mod 1024. Slot selection is linear within the bucket (first free or matching). If all 4 slots are full and none match, the bypass returns STATUS_NOT_SUPPORTED and the open falls back to the server – this is an overflow-safety valve, not a correctness path.
nspa_local_file_check_and_publish_open atomically checks the existing aggregate against the new open’s access/sharing mask, returning STATUS_SHARING_VIOLATION if they conflict. Matching the server’s algorithm exactly:
access to agg_access and intersect sharing with agg_sharing.(agg_access & ~my_sharing) == 0 AND (my_access & ~agg_sharing) == 0.This lives in nspa_local_file_check_sharing_algorithm(). The server-side publish hooks (nspa_inode_publish_slot) mirror the same rule from the server side whenever a non-bypass open creates or clears an inode entry. Arbitration therefore sees the union of bypass and non-bypass opens.
The LF table returns a local-range handle to the application. Most Nt-API intercepts can service the call from the local unix fd directly (NtReadFile, NtWriteFile, NtQueryInformationFile for FileBasicInformation / FilePositionInformation / etc). But some APIs require a server-visible handle:
NtCreateSection – section object lives on the serverNtDuplicateObject – dup goes through SERVER_START_REQ(dup_handle)NtQueryInformationFile for classes the server handles (e.g. FileNameInformation)NtQueryObject – ObjectName / ObjectBasic / ObjectType all server-sideNtQuerySecurityObject, NtSetSecurityObjectNtMakePermanentObject, NtMakeTemporaryObjectNtCompareObjectsFor these, the bypass lazily promotes the local handle: on first call needing server state, it issues a single nspa_create_file_from_unix_fd RPC:
wine_server_send_fd(unix_fd) – SCM_RIGHTS transfers a dup of the fd to the serverstruct fd + struct file_obj + stores the NT pathalloc_handle and returns a normal server-range handleserver_handle fieldSubsequent calls on the same local handle reuse the cached server handle – no second RPC. nspa_promote_if_local(h) is the one-line helper that every intercept site calls:
```c HANDLE nspa_promote_if_local( HANDLE h );
// Returns h unchanged if not local-range.
// Returns the promoted server handle (cached if already promoted) if local-range.
// Returns h unchanged if promotion failed (caller falls back to server path).
```
This is Phase 1A.4.a lazy-promotion. The alternative – eagerly promoting at mint time – was rejected because most file opens in Ableton’s workload never touch a server-requiring API; they read, maybe query a position, and close. Eager promotion would cost an RPC per open; lazy promotion costs an RPC per distinct file that escapes the read-only happy path.
attributes plumbingThe promote RPC forwards ObjectAttributes->Attributes (typically OBJ_CASE_INSENSITIVE, plus OBJ_INHERIT when bInheritHandles=TRUE is set on CreateProcess). The server’s alloc_handle_entry translates OBJ_INHERIT to RESERVED_INHERIT on the handle’s access mask, which is how Wine tracks inheritable handles for copy_handle_table during CreateProcess. Without the forwarding, inheritable local-range handles would be silently dropped by the inheritance walk.
The bypass accepts only a tightly-scoped subset. The eligibility gate in file.c’s NtCreateFile:
c
if (!loader_open &&
!attr->RootDirectory && !attr->SecurityDescriptor &&
(disposition == FILE_OPEN || disposition == FILE_OPEN_IF) &&
!(options & (FILE_OPEN_BY_FILE_ID | FILE_DIRECTORY_FILE | FILE_DELETE_ON_CLOSE)) &&
(options & (FILE_SYNCHRONOUS_IO_ALERT | FILE_SYNCHRONOUS_IO_NONALERT)) &&
!(access & ~(FILE_READ_DATA | FILE_READ_ATTRIBUTES | FILE_READ_EA |
READ_CONTROL | SYNCHRONIZE | GENERIC_READ)))
{
NTSTATUS bypass = nspa_local_file_try_bypass( ... );
if (bypass == STATUS_SUCCESS) return STATUS_SUCCESS;
if (bypass == STATUS_SHARING_VIOLATION) { status = bypass; goto done; }
/* STATUS_NOT_SUPPORTED -> fall through */
}
Disqualifiers and their reasons:
| Condition | Why rejected |
|---|---|
loader_open (.dll / .drv / .sys / .exe) |
Wine’s loader owns its own open path for these; we don’t want to race with it. |
attr->RootDirectory != 0 |
Relative opens would need openat() against a server-handle root – not worth the complexity. |
attr->SecurityDescriptor != 0 |
Custom SD means the caller wants server-enforced access control. |
disposition != FILE_OPEN && != FILE_OPEN_IF |
Create / overwrite / supersede need server-side atomicity on existence checks. |
options & FILE_OPEN_BY_FILE_ID |
Open-by-ID walks the server’s inode -> name mapping. |
options & FILE_DIRECTORY_FILE |
Directories use NtQueryDirectoryFile streaming – different bypass target, not in scope. |
options & FILE_DELETE_ON_CLOSE |
Atomic-delete semantics need server ordering. |
options lacks any FILE_SYNCHRONOUS_IO_* flag |
OVERLAPPED opens route through register_async_file_read which takes the handle to the server – local handle would fail STATUS_INVALID_HANDLE. |
| Any access bit outside the read-only mask | Write access has sharing-arbitration corners we don’t cover in MVP. |
FILE_OPEN_FOR_BACKUP_INTENT, FILE_NO_INTERMEDIATE_BUFFERING, FILE_WRITE_THROUGH, FILE_OPEN_REPARSE_POINT, FILE_RANDOM_ACCESS, FILE_SEQUENTIAL_ONLY are all accepted – they either have no semantic we need to enforce client-side or map cleanly to open() flags.
Every handle-consuming NT API in ntdll/unix and server/ either:
| NT API | Strategy | File / Line |
|---|---|---|
NtCreateFile |
bypass dispatch | dlls/ntdll/unix/file.c |
NtReadFile, NtWriteFile |
fast path via server_get_unix_fd |
dlls/ntdll/unix/file.c |
NtQueryInformationFile |
intercept + promote | dlls/ntdll/unix/file.c |
NtSetInformationFile |
intercept + promote | dlls/ntdll/unix/file.c |
NtFsControlFile |
intercept + promote | dlls/ntdll/unix/file.c |
NtDeviceIoControlFile |
intercept + promote | dlls/ntdll/unix/file.c |
NtFlushBuffersFileEx |
intercept + promote | dlls/ntdll/unix/file.c |
NtCancelIoFile, NtCancelSynchronousIoFile |
intercept + promote | dlls/ntdll/unix/file.c |
NtLockFile |
intercept + promote | dlls/ntdll/unix/file.c |
NtQueryVolumeInformationFile |
intercept + promote | dlls/ntdll/unix/file.c |
NtQueryObject |
intercept + traced promote | dlls/ntdll/unix/file.c |
NtSetInformationObject |
intercept + promote | dlls/ntdll/unix/file.c |
NtCreateSection |
dedicated nspa_create_mapping_from_unix_fd RPC |
dlls/ntdll/unix/sync.c |
NtDuplicateObject (same-process) |
intercept + promote + DUPLICATE_CLOSE_SOURCE LF cleanup | dlls/ntdll/unix/server.c |
NtCompareObjects |
intercept + promote (both args) | dlls/ntdll/unix/server.c |
NtQuerySecurityObject |
intercept + promote | dlls/ntdll/unix/security.c |
NtSetSecurityObject |
intercept + promote | dlls/ntdll/unix/security.c |
NtMakePermanentObject |
intercept + promote | dlls/ntdll/unix/sync.c |
NtMakeTemporaryObject |
intercept + promote | dlls/ntdll/unix/sync.c |
NtClose |
LF close path (close fd + remove entry + server-close promoted) | dlls/ntdll/unix/server.c |
CreateProcess inheritance (legacy bInheritHandles=TRUE) |
nspa_local_file_promote_inheritable before new_process RPC |
dlls/ntdll/unix/process.c |
CreateProcess inheritance (STARTUPINFOEX PS_ATTRIBUTE_HANDLE_LIST) |
deferred – synchronous promote-per-handle introduced a one-frame menu-paint delay; proper fix is batched promote RPC | dlls/ntdll/unix/process.c |
All NSPA-specific source lives under a nspa/ subdirectory in each module. Upstream Wine files carry only single-line intercept hook calls, keeping rebase-against-upstream conflicts minimal.
``` dlls/ntdll/unix/nspa/ ├── local_file.c – LF table, bypass dispatch, promote helpers ├── local_timer.c – NT timer local dispatcher └── debug.h – NSPA_TRACE macro, compile + runtime gated
dlls/win32u/nspa/ ├── msg_ring.c – Message bypass (POST/SEND rings) └── local_wm_timer.c – WM_TIMER local dispatcher
server/nspa/ ├── local_file.c – inode-aggregation shmem + promote handler ├── local_file.h – server-side declarations ├── profile.c – wineserver per-request-type profiler └── debug.h – server-side NSPA_TRACE macro ```
Upstream diffs against vanilla Wine are narrow:
dlls/ntdll/unix/file.c: eligibility gate (8 lines) + 10 one-line intercept callsdlls/ntdll/unix/server.c: one call to nspa_local_file_try_get_unix_fd(), one promote in NtDuplicateObject, one LF-close entry in NtClosedlls/ntdll/unix/sync.c: LF-handle branch in NtCreateSection, two promote lines in NtMake{Permanent,Temporary}Objectdlls/ntdll/unix/security.c: two promote linesdlls/ntdll/unix/process.c: alloc_handle_list was extended (prong A, currently deferred – see §14) + nspa_local_file_promote_inheritable() call before new_process RPCserver/file.c: one call to nspa_lf_trace_promote() inside the existing nspa_create_file_from_unix_fd handlerTrace emission is both compile-time gated (NSPA_DEBUG, default on; pass -DNSPA_DEBUG=0 for a release build) and runtime gated via cached env checks.
```c
static inline int nspa_trace_##name##_enabled(void) { \
static int cache = -1; \
int v = __atomic_load_n( &cache, __ATOMIC_RELAXED ); \
if (v < 0) { \
v = getenv( "NSPA_" #name ) ? 1 : 0; \
__atomic_store_n( &cache, v, __ATOMIC_RELAXED ); \
} \
return v; \
}
NSPA_TRACE_ENABLED_FN(LF_TRACE) NSPA_TRACE_ENABLED_FN(LF_TRACE_SRV) / … /
do { if (nspa_trace_##name##_enabled()) fprintf( stderr, __VA_ARGS__ ); } while (0)
```
getenv(); subsequent calls are a relaxed atomic load + not-taken branch when the env is unset (production default).nspa/*.c – upstream Wine files have zero NSPA_TRACE calls. Trace-worthy hooks in upstream code (e.g. the LF fast path in server_get_unix_fd) have been extracted into helpers (nspa_local_file_try_get_unix_fd, nspa_promote_if_local_traced) that the upstream file calls, and all trace logic lives inside those helpers.Ableton Live 12 Lite, 95-second playback window, NSPA_PROFILE=1 with all prod gates. Baseline is the pre-LF fullprod run (2026-04-21); “post-LF” is the 2026-04-23 run after the complete stack landed.
| Request | Pre-LF (baseline) | Post-LF | Delta |
|---|---|---|---|
send_message |
32,342 | 325 | -99% |
get_message_reply |
7,557 | 0 | -100% |
send_hardware_message |
1,249 | 0 | -100% |
accept_hardware_message |
1,205 | 0 | -100% |
set_cursor |
1,766 | 0 | -100% |
get_key_state |
1,166 | 0 | -100% |
get_window_children_from_point |
1,705 | 0 | -100% |
create_file |
60 | 0 | -100% |
close_handle |
62 | 0 | -100% |
| AudioCalc thread server requests | 27 mentions | 0 | complete audio-path offload |
| Server handler total CPU | 686.8 ms | 571.6 ms | -16.8% |
The 99% drop on send_message is msg-ring (documented separately) rather than LF – they compose, and the full NSPA bypass stack is what produces the aggregate numbers. LF’s direct contribution shows as the zero rows on create_file / close_handle / get_handle_fd: those are steady-state during playback, but during startup the LF bypass eats roughly 28,500 file opens that would otherwise each cost a server RTT plus a get_handle_fd return-trip.
The bottom-line metric is server handler CPU: 16.8% less server work across the board despite a 10x higher raw request count. The replacement traffic (ring wakeups, hook chain) is ~0.05 µs per request where the replaced traffic was 8+ µs per request.
DuplicateHandle of a local-range sourceThe same-process path is covered. Cross-process dup where the source lives in another Wine-NSPA process’s local-range is not – the server has no access to the remote’s LF table. Fix would require a cross-process LF promotion RPC. Rare in DAW workloads; parked.
STARTUPINFOEX PROC_THREAD_ATTRIBUTE_HANDLE_LIST local-range inheritancePhase 1A.9 prong A (synchronous get_or_promote per handle in the explicit inheritance list) was deferred because the per-handle promote RPC on the CreateProcess-calling thread surfaced as a visible menu-content-paint delay (“black menu flash”). Legacy bInheritHandles=TRUE via prong B (nspa_local_file_promote_inheritable) is unaffected and covers the common case.
Proper fix options (ranked):
OBJ_INHERIT, fire the promote RPC off the critical path so the server handle is already cached when alloc_handle_list runs. Lower CreateProcess latency but higher complexity.The current envelope captures the hot path. Anything outside it falls back cleanly. Worth widening only when a real workload demands:
FILE_OVERWRITE_IF / FILE_SUPERSEDE (cache writes, plugin DB updates)FILE_DIRECTORY_FILE (directory handles – different primitive, NtQueryDirectoryFile streaming)FILE_DELETE_ON_CLOSE (temp-file semantics)None of these has a profile-visible cost today; keep them on the server path.
| Phase | Commit | Scope |
|---|---|---|
| 1A.0 | bbea50591a4 |
Diagnostic scaffolding |
| 1A.1.a-c | 5fe0bff087c .. fc79ed3 |
Shared inode-table shmem + publish hooks + client reader |
| 1A.2.a-e | 8c43fcbfb1f .. 99254f1 |
Per-bucket PI lock + slot subentries + client publish API + NtCreateFile bypass dispatch + read/write routing |
| 1A.3 | 836cfa2 .. c71e8fc |
Section-handle promotion infrastructure + audit conclusions |
| 1A.4.a | 35f8897 |
Lazy server-handle promotion + PI mutex on table |
| 1A.4 partial | eb9c6d8454d |
Nt*File hooks (b-e) |
| 1A.5 | 43f68f1 |
Final ship-stable + audit findings |
| 1A.5+ | 7a03f51 |
Wider Nt*File coverage (audit-driven) |
| 1A.6 | 73426aa72c4 |
Promoted-fd correctness (nt_name plumb, GENERIC_* access map) |
| 1A.6 follow-up | 69bde5a825e |
NtQueryObject + NtSetInformationObject promote |
| 1A.7 | 2b193aa0590 |
NtDuplicateObject same-process promote (fixes Ableton .als load) |
| 1A.8 | 86e17b75986 |
Object-generic API audit sweep (NtCompareObjects, security, permanence) |
| 1A.9 | 18c209da804 |
OVERLAPPED reject + FILE_OPEN_IF widen + CreateProcess inheritance (prong B) + attributes plumbing |
| Menu-flash fix | 641dd63a313 + 72c59b04337 + 6edea95126f |
Init nspa_lf_handle_base at declaration + defer prong A + gate QS_TIMER synth on caller’s filter |
| Reorg A-D | e81f4a3817f .. cc491efe052 |
File moves into nspa/ subdirs + intercept-site collapse + debug gating |