Wine-NSPA RT – State of The Art

Kernel: 6.19.11-rt1-1-nspa (PREEMPT_RT) | CONFIG_NTSYNC=m (5 patches) | Wine-NSPA 11.6
NSPA_RT_PRIO=80 NSPA_RT_POLICY=FF | nspa_rt_test.exe v6 (11 tests, baseline + rt) | 2026-04-16

22/22 PASS — 4 PI coverage paths — io_uring Phase 1+2+3 COMPLETE — all RT subsystems healthy

PI Coverage Topology

How Wine-NSPA’s 4 PI paths map from Win32 API calls through to the kernel mechanisms that enforce priority inheritance.

Win32 API → Wine ntdll → Kernel PI Mechanism EnterCriticalSection RtlEnterCriticalSection TID CAS fast path (PE) → slow path (unix) FUTEX_LOCK_PI kernel rt_mutex, transitive PI Path A WaitForSingleObject NtWaitForSingleObject inproc_wait → ioctl(/dev/ntsync) /dev/ntsync PI (5 patches) prio-ordered queues, sched_setattr Path B pi_cond_wait() audio, gstreamer, winebus librtpi (unix-side, header-only) unlock mutex → sleep on condvar FUTEX_WAIT_REQUEUE_PI atomic requeue onto PI mutex Path C SleepConditionVariableCS Win32 app condvar + CS NtNspaCondWaitPI (3 syscalls) condvar→mutex mapping table FUTEX_WAIT_REQUEUE_PI atomic requeue onto CS PI mutex Path D SleepConditionVariableSRW SRW lock — no PI possible RtlWaitOnAddress (plain futex) no ownership → no PI target no PI Win32 API Wine ntdll layer Kernel mechanism Gating: all 4 paths activate only when NSPA_RT_PRIO is set When inactive, every code path is byte-identical to upstream Wine. Zero overhead. Linux 6.19.11-rt1 PREEMPT_RT — rt_mutex PI chains, NTSync driver (5 patches), futex requeue-PI

1. System Health Dashboard

At-a-glance status of every test in RT mode (NSPA_RT_PRIO=80). Each row shows the single most important metric.

Test Status Key Metric v3 Value v5 Value v6 Value Assessment
rapidmutex PASS RT throughput 288K ops/s 327K ops/s 290K ops/s normal variance, no regression
philosophers PASS RT phil max wait 1620 us 1301 us 646 us -50% v5→v6, CFS variance
fork-mutex PASS RT elapsed 1024 ms 948 ms 948 ms stable
cs-contention PASS avg wait (PI hold) 474 ms 349 ms 276 ms -21% v5→v6, improving
signal-recursion PASS elapsed (no sync) 61 ms 57 ms 57 ms no exception-path regression
large-pages PASS 2MB + 1GB alloc LargePage=1 LargePage=1 LargePage=1 PAGEMAP_SCAN path confirmed
ntsync-d4 8/8 PI contention RT avg 238 ms 270 ms 477 ms CFS load variance, PI correct
ntsync-d8 3/3 PI contention RT avg 479 ms 201 ms 311 ms CFS variance, PI chain correct
ntsync-d12 3/3 transitive chain wait 226 ms 96 ms 406 ms O(1) scaling, CFS load-dependent
socket-io A PASS immediate recv avg 95.8 us 95.8 us stable baseline
socket-io B PASS deferred recv avg 115.4 us 115.4 us overlapped stable
condvar-pi PASS RT max wait 152 us NEW — requeue-PI, -42% worst-case vs non-PI

2. NTSync Driver Status

2.1 Module and Handle Verification

Check Evidence Result
ntsync module loaded [ntsync] active (client-side handle 00000000001FFC00) confirmed
client-side handles Handle 0x1FFC00 in client range (520K down from max) confirmed – bypasses wineserver
kernel mutex objects 0x1FFBFC, 0x1FFBF0 – distinct NTSync handles confirmed

2.1b Client-Side NTSync Object Creation

Anonymous sync objects are created directly on /dev/ntsync client-side (f863fa180ca), bypassing wineserver for creation and wait. Events are excluded — client-created event handles destabilized Ableton Live, so anonymous events remain on wineserver (a7b00453978). Mutexes and semaphores are stable client-side.

Object Type Creation Path Wait Path PI Support
Mutex (anonymous) Client → ioctl(NTSYNC_IOC_CREATE_MUTEX) Client → ioctl(NTSYNC_IOC_WAIT_ANY) PI v2 (kernel driver)
Semaphore (anonymous) Client → ioctl(NTSYNC_IOC_CREATE_SEM) Client → ioctl(NTSYNC_IOC_WAIT_ANY) No PI (no owner concept)
Event (anonymous) Wineserver → ioctl(NTSYNC_IOC_CREATE_EVENT) Client → ioctl(NTSYNC_IOC_WAIT_ANY) N/A
Named objects Wineserver (all types) Client via fd received from server Same as above per type

2.2 Kernel Patch Validation Matrix

Kernel Patch Validated By Evidence Status
0001: raw_spinlock + rt_mutex hardening all 11 tests (no lockup under PREEMPT_RT) 22/22 PASS, no watchdog triggers, no soft-lockup warnings validated
0002: priority-ordered waiter queues ntsync-d4/d8/d12 sub-test 3 (priority wakeup) 5-7 waiters correct wake order at all 3 depths, both modes validated
0003: mutex owner PI boost v2 philosophers + ntsync-d4/d8 PI contention Phil RT max wait 1620→865us (-46.6%); d8 RT PI avg 479→239ms (-50.1%) validated — 3 bugs fixed
0004: uring_fd extension (io_uring CQE wakeup) socket-io test (sync + overlapped) Phase A: 95us avg, Phase B: 115us avg, 2000/2000 completions validated
0005: PI kmalloc pre-allocation fix ntsync-d8/d12 under RT load No PREEMPT_RT allocation-in-atomic warnings, PI contention stable validated

2.3 Priority-Ordered Wakeup (d4 run, representative)

Priority Win32 Value wake_us Delta from TC Order
TIME_CRITICAL 15 2542030506 +0 us 1st
HIGHEST 2 2542030528 +22 us 2nd
ABOVE_NORMAL 1 2542030556 +50 us 3rd
NORMAL 0 2542030567 +61 us 4th
BELOW_NORMAL -1 2542030575 +69 us 5th
LOWEST -2 2542030588 +82 us 6th
IDLE -15 2542030651 +145 us 7th

2.4 Transitive PI Chain Scaling

Chain Depth RT Wait (ms) Per-Hop Increment Tail Holder Elapsed
4 236 ~50 ms 437 ms
8 235 ~50 ms 639 ms
12 226 ~50 ms 1002 ms

Key finding: RT wait time does NOT increase with chain depth. The tail holder does the same CPU work (~220ms); ntsync_pi_recalc() propagates boost faster than the tail finishes. Confirmed O(1) for RT wait up to depth 12. No further depth testing needed.


3. RT Priority Mapping Status

3.1 Mapping Table (NSPA_RT_PRIO=80)

Formula: fifo_prio = nspa_rt_prio_base - (31 - nt_band), clamped to [1..98].

Win32 Label Win32 Value NT Band FIFO Priority Notes
IDLE (realtime class) -15 16 65
LOWEST (realtime class) -2 22 71
BELOW_NORMAL (realtime class) -1 23 72
NORMAL (realtime class) 0 24 73 standard RT band
ABOVE_NORMAL (realtime class) 1 25 74
HIGHEST (realtime class) 2 26 75
TIME_CRITICAL 15 31 80 client RT ceiling, always SCHED_FIFO
wineserver main 64 SCHED_FIFO, below all client RT
kernel threads 99 reserved, never used by Wine

Ceiling mapping: NSPA_RT_PRIO is the max client RT priority, not a midpoint. TIME_CRITICAL is special-cased to NT band 31 and maps exactly to that ceiling. Standard REALTIME-class priorities scale linearly below it (NT 24 -> 73, NT 16 -> 65 when NSPA_RT_PRIO=80). The v3 wake-order test now uses only the 7 standard Win32 priority values because non-standard values (3-14) bypass the TIME_CRITICAL special case and can otherwise outrank it.


4. Wineserver global_lock PI – SHIPPED

Converted server/fd.c:global_lock from pthread_mutex_t to pi_mutex_t (FUTEX_LOCK_PI). This lock serializes all wineserver dispatch between the main epoll loop and per-client shmem dispatcher pthreads (NSPA v1.5). With PI, when a high-priority dispatcher (boosted by v2.4 client PI) contends with a lower-priority holder, the holder is automatically boosted via the kernel’s rt_mutex PI chain.

Background

Upstream wineserver is single-threaded – no locks. NSPA v1.5 added per-client shmem dispatcher pthreads, requiring global_lock to serialize state access. The original librtpi sweep excluded server/ because the audit assumed “single-threaded event loop = no contention.” That assumption was wrong – the v1.5 dispatchers create real contention, and without PI the lock was a priority inversion hazard when v2.4 client-side boost raised a dispatcher’s priority above the lock holder’s.

Impact (v3 baseline vs PI, RT mode, NSPA_RT_PRIO=80)

Test Metric v3 baseline With PI Change
cs-contention holder work time 475 ms 216 ms 2.2x faster
philosophers RT max wait 1620 us 692 us 2.3x lower
rapidmutex throughput 301K ops/s 326K ops/s +8%
ntsync-d4 PI contention avg ~238 ms ~220 ms consistent
fork-mutex child total max no change no change no regression

18/18 PASS, zero regressions.

Files changed

RT/PI Audit Finding

During this investigation, we validated that the server’s add_queue() / wake_up() priority ordering (originally planned as a separate change) is already handled by ntsync kernel patch 1002. With ntsync loaded, all sync primitives (events, mutexes, semaphores – both named and anonymous) go through the ntsync kernel driver for waiting, which provides priority-ordered waiter queues. The server-side add_queue() is never exercised for these objects. Confirmed by the wake-order test subcommand: 50/50 PASS regardless of server-side queue order.


5. vDSO Preloader Port (0029) – SHIPPED

Commit 239ca470158 – full port of Jinoh Kang’s 13-patch vDSO preloader series (patches 01-07, 09, 11-13) to Wine 11.6. Replaces the previous minimal port that ignored vvar pages.

What it does

Bug found during porting: Patch 06 (EHDR unmapping) breaks child process creation on static-pie x86_64 preloader. Intentionally omitted.

Benchmark Results (native Linux, 2M iterations)

vDSO (preserved) syscall (deleted) penalty ratio
clock_gettime 26.3 ns 328.2 ns +301.9 ns 12.5x
gettimeofday 27.5 ns 319.1 ns +291.6 ns 11.6x

RT Audio Impact (when vDSO would be deleted without this patch)

Buffer size Budget 10 timing calls 100 timing calls
48kHz/1024 21.3 ms +3.0 us (0.014%) +30.2 us (0.14%)
48kHz/64 1.33 ms +3.0 us (0.23%) +30.2 us (2.26%)

Validation

Note: On the current kernel (6.19.11-rt1-1-nspa), vDSO is not in a reserved range, so the default on-conflict mode takes no action. The value is defensive – protects against kernels/configs where ASLR places vDSO in reserved ranges. The VMA-aware reservation is independently valuable as a correctness improvement.

6. Ableton Live 12 Lite Windowing Workaround – SHIPPED

Long-running Wine 11.6 regression in the normal desktop/X11 windowing path, separate from the RT/NTSync stack. The currently committed workaround is app-compat focused and was validated locally from wine/build on 2026-04-14.

Area Status Notes
Startup sequence usable splash -> activation -> main window reached again
Menubar growth loop suppressed old runaway content/menubar expansion no longer reproduces in the committed build
Main window content mapping much improved black/blank main window behavior substantially reduced
Resize behavior improved, not perfect some edge-triggered vertical correction / repaint artifacts still possible
Fix quality workaround committed because it materially improves behavior without a known broader regression so far

Files in the workaround stack

Behavior summary


7. Test Harness Status

7.1 v3 Bug Fixes (shipped)

# Bug Impact
1 now_us() QPC integer overflow (~44 min uptime) Negative timestamps –> false PASS on priority wakeup
2 Priority wakeup sentinel prev_time > 0 Never triggered with negative timestamps
3 Load threads were SCHED_FIFO under REALTIME class System lockup – FIFO busyloops pinned all cores
4 Sleep(0) = no-op for FIFO threads Load threads never yielded to desktop
5 CPU-bound work at FIFO 80+ monopolized cores Desktop froze for 1-2s per iteration
6 No ntsync module detection Tests ran against futex path, not ntsync
7 Non-standard Win32 priority values (7-14) Bypassed TC ceiling clamp

7.2 Remaining Improvements (not yet built)

Improvement Purpose Priority
Histogram mode Per-test latency distribution (P50/P95/P99/max) instead of just min/avg/max MEDIUM
Automated baseline-vs-RT diff Machine-readable JSON output + diff script MEDIUM
Per-test CPU pinning taskset to eliminate CFS load placement variance LOW
Longer soak mode Loop all tests for N minutes to catch rare races LOW

8. Open Investigation Targets

PI contention variability (d4 vs d8/d12)

Priority: MEDIUM — understanding, not a bug. PI boost is confirmed working at all depths.

NTSync PI contention times vary with CFS load placement. d4 (8 iterations) averages ~270ms with 67ms spread, while d8/d12 (3 iterations) show higher and tighter averages. The variable is CFS scheduling of the SCHED_OTHER holder under load thread competition, not a PI chain problem.

Next step: CPU pinning via taskset to isolate CFS variance from PI behavior.

Kernel mutex vs CS throughput gap

Priority: LOW — expected behavior, not a bug.

NTSync rapid mutex ~250K ops/s vs CS rapidmutex ~301K ops/s. The ~20% gap is expected: CS has a user-space CAS fast path that avoids syscalls when uncontended, while ntsync always takes an ioctl. Applications use CriticalSection for hot-path locking; kernel mutexes are for cross-process sync.

Wow64 clean rebuild

Priority: MEDIUM — required for 32-bit app compatibility.

32-bit (i386) DLLs may be stale. All current tests run 64-bit PE. Needed for 32-bit VST plugins and older games. Requires configure --enable-archs=i386,x86_64.

CS DYNAMIC_SPIN

Priority: LOW — planned, not yet implemented.

RTL_CRITICAL_SECTION_FLAG_DYNAMIC_SPIN is a FIXME stub. CRT heap uses this flag. Plan: substitute ~4000 spincount, gate behind !nspa_cs_pi_active().

Real-world application validation

Priority: MEDIUM — synthetic tests prove correctness, not real-world impact.

SRW spin, condvar PI, SIMD, CoWait rewrite all need validation with real DAW/VST workloads (Ableton, REAPER, Bitwig, various VST plugins). The test suite confirms no regressions but doesn’t exercise the full complexity of real audio applications.



Changelog


Raw logs: wine/nspa/docs/logs/v6/rt_*.log | Full comparison: nspa-test-comparison.gen.html
Generated: 2026-04-16 | Wine-NSPA RT v6