Wine-NSPA – Local-File Bypass Architecture

Wine 11.6 + NSPA RT patchset | Kernel 6.19.x-rt with NTSync PI | 2026-04-23 Author: jordan Johnston

Table of Contents

  1. Overview
  2. Motivation
  3. Design Principles
  4. Vanilla Wine vs Wine-NSPA File Open
  5. Handle Range & Per-Process Table
  6. Shared Inode Table & Sharing Arbitration
  7. Lazy Server-Handle Promotion
  8. Dispatch Flow
  9. Eligibility Criteria
  10. NT API Coverage Matrix
  11. File Manifest (post-reorg)
  12. Debug Gating
  13. Results & Profiler Numbers
  14. Known Gaps & Roadmap
  15. Phase History

1. Overview

Wine-NSPA’s local-file bypass (NSPA_LOCAL_FILES=1) services read-only regular-file NtCreateFile calls entirely within the client process. Every eligible open would otherwise cost a full wineserver round-trip: the client builds a create_file request, the server allocates a struct file + inode tracking + handle entry, returns a server-visible handle, then every subsequent NtReadFile / NtQueryInformationFile / etc fires another round-trip. For an app like Ableton Live 12 Lite that does roughly 28,500 file opens in a single startup session – DLL manifests, .pyc files, theme resources, Live Library indexes – those round-trips dominate startup profile and show up as real latency on the main thread.

The bypass routes eligible opens to a client-private handle range, maintains a per-process table that owns the unix fd, and exposes the unix fd to every Wine I/O path via a thin fast-path check inside server_get_unix_fd. When an API needs server-side state (section mapping, query-by-handle, inheritance), the bypass lazily promotes the local handle to a server-recognised handle on demand.

The feature is invisible to Win32 applications: same CreateFile semantics, same sharing arbitration, same io->Information = FILE_OPENED return value, same behaviour on every downstream API. Apps see identical functional behaviour whether the bypass is enabled or not – the difference is measurable only in profiler output and perceived startup latency.


2. Motivation

Ableton’s startup profile exposed a large population of short-lived file opens:

Pattern Example
DLL manifest lookups C:\windows\winsxs\manifests\amd64_microsoft.windows.common-controls_*.manifest
Python bytecode loads .../Resources/Python/abl.live/**/*.pyc
Theme resources C:\windows\resources\themes\aero\aero.msstyles
Clock source probes /sys/bus/clocksource/devices/clocksource0/current_clocksource
Ableton library indexes C:\users\ninez\AppData\Local\Ableton\Live Database\Live-files-*.db
Live Packs C:\ProgramData\Ableton\Live 12 Lite\Resources\Graphics.alp

Each open is cheap on its own (a few µs) but the aggregate is hundreds of millisecond-scale server traffic during startup – and the startup is happening on the main thread, which is where paint and UI dispatch live. Eliminating the server round-trip on these opens directly reduces time-to-first-paint and reduces steady-state priority-inversion risk on the RT audio path (server’s single-threaded main loop services all requests).

Other candidate workloads with similar profiles: plugin scanners (hundreds of VST probe opens), .NET apps (thousands of assembly-manifest reads at JIT time), installers (cache-file probes), and any Windows application using Python or Lua as an embedded runtime.


3. Design Principles


4. Vanilla Wine vs Wine-NSPA File Open

Vanilla Wine: every open = server RTT Wine-NSPA: local bypass for eligible opens NtCreateFile (ntdll unix) SERVER: create_file request open(), stat(), check_sharing alloc struct fd + struct file global_lock held during sharing arbitration alloc_handle (server range) reply: server handle 0x14 NtReadFile: another server RTT get_handle_fd -> SCM_RIGHTS client mmaps + pread; close on needs_close Cost per open-read-close: 3+ server RTTs ~5-10µs each, ~15-30µs wall on an otherwise idle server under RT contention: unbounded NtCreateFile (ntdll unix) nspa_local_file_try_bypass stat() -> (dev, inode) check_and_publish via shmem table open() O_RDONLY per-bucket PI mutex, no server call alloc local handle (0x7FFF xxxx) return local handle NtReadFile(local_handle) server_get_unix_fd fast path table lookup -> pread(fd) Cost per open-read-close: 0 server RTTs stat + open + pread; everything local promotion happens only on API that needs server state

5. Handle Range & Per-Process Table

5.1 Handle range

Local handles are allocated from the fixed range [NSPA_LF_HANDLE_BASE, 0x80000000) where NSPA_LF_HANDLE_BASE = 0x80000000 - NSPA_LF_HANDLE_CAP*4 with NSPA_LF_HANDLE_CAP = 4096. That gives an exact 16 KiB handle window disjoint from:

nspa_local_file_is_local_handle(h) is a constant-time range check: base <= h < 0x80000000 && h != 0x7FFFFFFF (the last exclusion is for the CURRENT_PROCESS pseudo-handle which would otherwise land inside the range). The check is called from every NT-API intercept site to decide whether to take the bypass path or fall through.

5.2 Per-process table

c struct nspa_local_open { struct list entry; HANDLE handle; /* local-range handle returned to app */ HANDLE server_handle; /* lazy-promoted; 0 until first promote */ int unix_fd; unsigned long long device; unsigned long long inode; unsigned int access; unsigned int sharing; unsigned int options; /* FILE_OPEN options: SYNC_IO_NONALERT, etc */ unsigned int attributes; /* OBJ_INHERIT forwarded on promote */ WCHAR *nt_name; /* original NT path for GetFinalPathNameByHandle */ USHORT nt_name_len; };

Protected by a single process-wide PI mutex (nspa_lf_opens_mutex). Linear list – walk is O(N) per lookup. For Ableton’s typical workload the list reaches a few hundred entries at peak; the walk is in the noise next to a server RTT it avoids.

Table add (on mint) and remove (on close) are the only writers. Every other operation (lookup, promote lookup) is a read under the same lock. The lock is a PI mutex because RT-priority threads occasionally open files at init and we cannot have a low-priority thread holding the lock against the audio callback.


6. Shared Inode Table & Sharing Arbitration

Windows file sharing (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE) has cross-process semantics: if Process A opens foo with sharing=0, Process B’s open of foo must fail with STATUS_SHARING_VIOLATION. Any pure-local bypass has to see what other processes have done on the same (device, inode).

6.1 Shmem layout

The wineserver publishes a NSPA_INODE_BUCKETS = 1024 bucket hash table as a memfd-backed shmem region. Each bucket has 4 slots of (dev, inode, agg_access, agg_sharing, refcount) + a per-bucket PI mutex. Clients map the region read-only for arbitration lookups, read-write on the mutex word for publishing their own opens.

┌───────────────────────────────────────────────┐ │ nspa_inode_table_shm_t (~160 KB) │ ├───────────────────────────────────────────────┤ │ buckets[0..1023] │ │ ├─ lock_storage (pi_mutex_t, 64 B) │ │ ├─ slot[0] (dev, ino, access, share, ref) │ │ ├─ slot[1] │ │ ├─ slot[2] │ │ └─ slot[3] │ └───────────────────────────────────────────────┘

Bucket index = hash(dev, inode) mod 1024. Slot selection is linear within the bucket (first free or matching). If all 4 slots are full and none match, the bypass returns STATUS_NOT_SUPPORTED and the open falls back to the server – this is an overflow-safety valve, not a correctness path.

6.2 Arbitration logic

nspa_local_file_check_and_publish_open atomically checks the existing aggregate against the new open’s access/sharing mask, returning STATUS_SHARING_VIOLATION if they conflict. Matching the server’s algorithm exactly:

This lives in nspa_local_file_check_sharing_algorithm(). The server-side publish hooks (nspa_inode_publish_slot) mirror the same rule from the server side whenever a non-bypass open creates or clears an inode entry. Arbitration therefore sees the union of bypass and non-bypass opens.


7. Lazy Server-Handle Promotion

The LF table returns a local-range handle to the application. Most Nt-API intercepts can service the call from the local unix fd directly (NtReadFile, NtWriteFile, NtQueryInformationFile for FileBasicInformation / FilePositionInformation / etc). But some APIs require a server-visible handle:

For these, the bypass lazily promotes the local handle: on first call needing server state, it issues a single nspa_create_file_from_unix_fd RPC:

  1. wine_server_send_fd(unix_fd) – SCM_RIGHTS transfers a dup of the fd to the server
  2. Server’s handler wraps the fd in a struct fd + struct file_obj + stores the NT path
  3. Server calls alloc_handle and returns a normal server-range handle
  4. Client stores the server handle in the LF entry’s server_handle field

Subsequent calls on the same local handle reuse the cached server handle – no second RPC. nspa_promote_if_local(h) is the one-line helper that every intercept site calls:

```c HANDLE nspa_promote_if_local( HANDLE h );

// Returns h unchanged if not local-range. // Returns the promoted server handle (cached if already promoted) if local-range. // Returns h unchanged if promotion failed (caller falls back to server path). ```

This is Phase 1A.4.a lazy-promotion. The alternative – eagerly promoting at mint time – was rejected because most file opens in Ableton’s workload never touch a server-requiring API; they read, maybe query a position, and close. Eager promotion would cost an RPC per open; lazy promotion costs an RPC per distinct file that escapes the read-only happy path.

7.1 attributes plumbing

The promote RPC forwards ObjectAttributes->Attributes (typically OBJ_CASE_INSENSITIVE, plus OBJ_INHERIT when bInheritHandles=TRUE is set on CreateProcess). The server’s alloc_handle_entry translates OBJ_INHERIT to RESERVED_INHERIT on the handle’s access mask, which is how Wine tracks inheritable handles for copy_handle_table during CreateProcess. Without the forwarding, inheritable local-range handles would be silently dropped by the inheritance walk.


8. Dispatch Flow

NtCreateFile bypass dispatch + downstream intercepts app: CreateFileA(...) eligibility gate (file.c:4706) disposition FILE_OPEN|FILE_OPEN_IF, sync, read-only fail gate server create_file RTT nspa_local_file_try_bypass stat() + S_ISREG check check_and_publish via inode shmem open() + table_add SHARING_VIOLATION or NOT_SUPPORTED local handle 0x7FFFC4xx app uses the handle: NtReadFile / NtQuery* / NtSet* / NtFsCtl / NtDeviceIoCtl / ... every NT-API entry point checks nspa_local_file_is_local_handle NtReadFile / NtWriteFile server_get_unix_fd fast path -> pread(fd) NtQuery*InformationFile etc nspa_promote_if_local -> server RPC nspa_create_file_from_unix_fd one-time per local handle; cached NtClose (local path) close(fd) + remove entry + server close if promoted

9. Eligibility Criteria

The bypass accepts only a tightly-scoped subset. The eligibility gate in file.c’s NtCreateFile:

c if (!loader_open && !attr->RootDirectory && !attr->SecurityDescriptor && (disposition == FILE_OPEN || disposition == FILE_OPEN_IF) && !(options & (FILE_OPEN_BY_FILE_ID | FILE_DIRECTORY_FILE | FILE_DELETE_ON_CLOSE)) && (options & (FILE_SYNCHRONOUS_IO_ALERT | FILE_SYNCHRONOUS_IO_NONALERT)) && !(access & ~(FILE_READ_DATA | FILE_READ_ATTRIBUTES | FILE_READ_EA | READ_CONTROL | SYNCHRONIZE | GENERIC_READ))) { NTSTATUS bypass = nspa_local_file_try_bypass( ... ); if (bypass == STATUS_SUCCESS) return STATUS_SUCCESS; if (bypass == STATUS_SHARING_VIOLATION) { status = bypass; goto done; } /* STATUS_NOT_SUPPORTED -> fall through */ }

Disqualifiers and their reasons:

Condition Why rejected
loader_open (.dll / .drv / .sys / .exe) Wine’s loader owns its own open path for these; we don’t want to race with it.
attr->RootDirectory != 0 Relative opens would need openat() against a server-handle root – not worth the complexity.
attr->SecurityDescriptor != 0 Custom SD means the caller wants server-enforced access control.
disposition != FILE_OPEN && != FILE_OPEN_IF Create / overwrite / supersede need server-side atomicity on existence checks.
options & FILE_OPEN_BY_FILE_ID Open-by-ID walks the server’s inode -> name mapping.
options & FILE_DIRECTORY_FILE Directories use NtQueryDirectoryFile streaming – different bypass target, not in scope.
options & FILE_DELETE_ON_CLOSE Atomic-delete semantics need server ordering.
options lacks any FILE_SYNCHRONOUS_IO_* flag OVERLAPPED opens route through register_async_file_read which takes the handle to the server – local handle would fail STATUS_INVALID_HANDLE.
Any access bit outside the read-only mask Write access has sharing-arbitration corners we don’t cover in MVP.

FILE_OPEN_FOR_BACKUP_INTENT, FILE_NO_INTERMEDIATE_BUFFERING, FILE_WRITE_THROUGH, FILE_OPEN_REPARSE_POINT, FILE_RANDOM_ACCESS, FILE_SEQUENTIAL_ONLY are all accepted – they either have no semantic we need to enforce client-side or map cleanly to open() flags.


10. NT API Coverage Matrix

Every handle-consuming NT API in ntdll/unix and server/ either:

NT API Strategy File / Line
NtCreateFile bypass dispatch dlls/ntdll/unix/file.c
NtReadFile, NtWriteFile fast path via server_get_unix_fd dlls/ntdll/unix/file.c
NtQueryInformationFile intercept + promote dlls/ntdll/unix/file.c
NtSetInformationFile intercept + promote dlls/ntdll/unix/file.c
NtFsControlFile intercept + promote dlls/ntdll/unix/file.c
NtDeviceIoControlFile intercept + promote dlls/ntdll/unix/file.c
NtFlushBuffersFileEx intercept + promote dlls/ntdll/unix/file.c
NtCancelIoFile, NtCancelSynchronousIoFile intercept + promote dlls/ntdll/unix/file.c
NtLockFile intercept + promote dlls/ntdll/unix/file.c
NtQueryVolumeInformationFile intercept + promote dlls/ntdll/unix/file.c
NtQueryObject intercept + traced promote dlls/ntdll/unix/file.c
NtSetInformationObject intercept + promote dlls/ntdll/unix/file.c
NtCreateSection dedicated nspa_create_mapping_from_unix_fd RPC dlls/ntdll/unix/sync.c
NtDuplicateObject (same-process) intercept + promote + DUPLICATE_CLOSE_SOURCE LF cleanup dlls/ntdll/unix/server.c
NtCompareObjects intercept + promote (both args) dlls/ntdll/unix/server.c
NtQuerySecurityObject intercept + promote dlls/ntdll/unix/security.c
NtSetSecurityObject intercept + promote dlls/ntdll/unix/security.c
NtMakePermanentObject intercept + promote dlls/ntdll/unix/sync.c
NtMakeTemporaryObject intercept + promote dlls/ntdll/unix/sync.c
NtClose LF close path (close fd + remove entry + server-close promoted) dlls/ntdll/unix/server.c
CreateProcess inheritance (legacy bInheritHandles=TRUE) nspa_local_file_promote_inheritable before new_process RPC dlls/ntdll/unix/process.c
CreateProcess inheritance (STARTUPINFOEX PS_ATTRIBUTE_HANDLE_LIST) deferred – synchronous promote-per-handle introduced a one-frame menu-paint delay; proper fix is batched promote RPC dlls/ntdll/unix/process.c

11. File Manifest (post-reorg)

All NSPA-specific source lives under a nspa/ subdirectory in each module. Upstream Wine files carry only single-line intercept hook calls, keeping rebase-against-upstream conflicts minimal.

``` dlls/ntdll/unix/nspa/ ├── local_file.c – LF table, bypass dispatch, promote helpers ├── local_timer.c – NT timer local dispatcher └── debug.h – NSPA_TRACE macro, compile + runtime gated

dlls/win32u/nspa/ ├── msg_ring.c – Message bypass (POST/SEND rings) └── local_wm_timer.c – WM_TIMER local dispatcher

server/nspa/ ├── local_file.c – inode-aggregation shmem + promote handler ├── local_file.h – server-side declarations ├── profile.c – wineserver per-request-type profiler └── debug.h – server-side NSPA_TRACE macro ```

Upstream diffs against vanilla Wine are narrow:


12. Debug Gating

Trace emission is both compile-time gated (NSPA_DEBUG, default on; pass -DNSPA_DEBUG=0 for a release build) and runtime gated via cached env checks.

```c

if NSPA_DEBUG

define NSPA_TRACE_ENABLED_FN(name) \

static inline int nspa_trace_##name##_enabled(void) { \
    static int cache = -1; \
    int v = __atomic_load_n( &cache, __ATOMIC_RELAXED ); \
    if (v < 0) { \
        v = getenv( "NSPA_" #name ) ? 1 : 0; \
        __atomic_store_n( &cache, v, __ATOMIC_RELAXED ); \
    } \
    return v; \
}

NSPA_TRACE_ENABLED_FN(LF_TRACE) NSPA_TRACE_ENABLED_FN(LF_TRACE_SRV) //

define NSPA_TRACE(name, …) \

do { if (nspa_trace_##name##_enabled()) fprintf( stderr, __VA_ARGS__ ); } while (0)

else

define NSPA_TRACE(name, …) ((void)0)

endif

```


13. Results & Profiler Numbers

Ableton Live 12 Lite, 95-second playback window, NSPA_PROFILE=1 with all prod gates. Baseline is the pre-LF fullprod run (2026-04-21); “post-LF” is the 2026-04-23 run after the complete stack landed.

Request Pre-LF (baseline) Post-LF Delta
send_message 32,342 325 -99%
get_message_reply 7,557 0 -100%
send_hardware_message 1,249 0 -100%
accept_hardware_message 1,205 0 -100%
set_cursor 1,766 0 -100%
get_key_state 1,166 0 -100%
get_window_children_from_point 1,705 0 -100%
create_file 60 0 -100%
close_handle 62 0 -100%
AudioCalc thread server requests 27 mentions 0 complete audio-path offload
Server handler total CPU 686.8 ms 571.6 ms -16.8%

The 99% drop on send_message is msg-ring (documented separately) rather than LF – they compose, and the full NSPA bypass stack is what produces the aggregate numbers. LF’s direct contribution shows as the zero rows on create_file / close_handle / get_handle_fd: those are steady-state during playback, but during startup the LF bypass eats roughly 28,500 file opens that would otherwise each cost a server RTT plus a get_handle_fd return-trip.

The bottom-line metric is server handler CPU: 16.8% less server work across the board despite a 10x higher raw request count. The replacement traffic (ring wakeups, hook chain) is ~0.05 µs per request where the replaced traffic was 8+ µs per request.


14. Known Gaps & Roadmap

14.1 Cross-process DuplicateHandle of a local-range source

The same-process path is covered. Cross-process dup where the source lives in another Wine-NSPA process’s local-range is not – the server has no access to the remote’s LF table. Fix would require a cross-process LF promotion RPC. Rare in DAW workloads; parked.

14.2 STARTUPINFOEX PROC_THREAD_ATTRIBUTE_HANDLE_LIST local-range inheritance

Phase 1A.9 prong A (synchronous get_or_promote per handle in the explicit inheritance list) was deferred because the per-handle promote RPC on the CreateProcess-calling thread surfaced as a visible menu-content-paint delay (“black menu flash”). Legacy bInheritHandles=TRUE via prong B (nspa_local_file_promote_inheritable) is unaffected and covers the common case.

Proper fix options (ranked):

  1. Batched promote RPC – single server round-trip that promotes an array of local handles. Caps the CreateProcess cost at one RTT regardless of list length.
  2. Async pre-promotion at mint time – if the open carried OBJ_INHERIT, fire the promote RPC off the critical path so the server handle is already cached when alloc_handle_list runs. Lower CreateProcess latency but higher complexity.

14.3 Eligibility widening

The current envelope captures the hot path. Anything outside it falls back cleanly. Worth widening only when a real workload demands:

None of these has a profile-visible cost today; keep them on the server path.


15. Phase History

Phase Commit Scope
1A.0 bbea50591a4 Diagnostic scaffolding
1A.1.a-c 5fe0bff087c .. fc79ed3 Shared inode-table shmem + publish hooks + client reader
1A.2.a-e 8c43fcbfb1f .. 99254f1 Per-bucket PI lock + slot subentries + client publish API + NtCreateFile bypass dispatch + read/write routing
1A.3 836cfa2 .. c71e8fc Section-handle promotion infrastructure + audit conclusions
1A.4.a 35f8897 Lazy server-handle promotion + PI mutex on table
1A.4 partial eb9c6d8454d Nt*File hooks (b-e)
1A.5 43f68f1 Final ship-stable + audit findings
1A.5+ 7a03f51 Wider Nt*File coverage (audit-driven)
1A.6 73426aa72c4 Promoted-fd correctness (nt_name plumb, GENERIC_* access map)
1A.6 follow-up 69bde5a825e NtQueryObject + NtSetInformationObject promote
1A.7 2b193aa0590 NtDuplicateObject same-process promote (fixes Ableton .als load)
1A.8 86e17b75986 Object-generic API audit sweep (NtCompareObjects, security, permanence)
1A.9 18c209da804 OVERLAPPED reject + FILE_OPEN_IF widen + CreateProcess inheritance (prong B) + attributes plumbing
Menu-flash fix 641dd63a313 + 72c59b04337 + 6edea95126f Init nspa_lf_handle_base at declaration + defer prong A + gate QS_TIMER synth on caller’s filter
Reorg A-D e81f4a3817f .. cc491efe052 File moves into nspa/ subdirs + intercept-site collapse + debug gating