Kernel: 6.19.11-rt1-1-nspa (PREEMPT_RT) | CONFIG_NTSYNC=m (5 patches) | Wine-NSPA 11.6
NSPA_RT_PRIO=80 NSPA_RT_POLICY=FF | nspa_rt_test.exe v6 (11 tests, baseline + rt) | 2026-04-16
22/22 PASS — 4 PI coverage paths — io_uring Phase 1+2+3 COMPLETE — all RT subsystems healthy
How Wine-NSPA’s 4 PI paths map from Win32 API calls through to the kernel mechanisms that enforce priority inheritance.
At-a-glance status of every test in RT mode (NSPA_RT_PRIO=80). Each row shows the single most important metric.
| Test | Status | Key Metric | v3 Value | v5 Value | v6 Value | Assessment |
|---|---|---|---|---|---|---|
| rapidmutex | PASS | RT throughput | 288K ops/s | 327K ops/s | 290K ops/s | normal variance, no regression |
| philosophers | PASS | RT phil max wait | 1620 us | 1301 us | 646 us | -50% v5→v6, CFS variance |
| fork-mutex | PASS | RT elapsed | 1024 ms | 948 ms | 948 ms | stable |
| cs-contention | PASS | avg wait (PI hold) | 474 ms | 349 ms | 276 ms | -21% v5→v6, improving |
| signal-recursion | PASS | elapsed (no sync) | 61 ms | 57 ms | 57 ms | no exception-path regression |
| large-pages | PASS | 2MB + 1GB alloc | LargePage=1 | LargePage=1 | LargePage=1 | PAGEMAP_SCAN path confirmed |
| ntsync-d4 | 8/8 | PI contention RT avg | 238 ms | 270 ms | 477 ms | CFS load variance, PI correct |
| ntsync-d8 | 3/3 | PI contention RT avg | 479 ms | 201 ms | 311 ms | CFS variance, PI chain correct |
| ntsync-d12 | 3/3 | transitive chain wait | 226 ms | 96 ms | 406 ms | O(1) scaling, CFS load-dependent |
| socket-io A | PASS | immediate recv avg | – | 95.8 us | 95.8 us | stable baseline |
| socket-io B | PASS | deferred recv avg | – | 115.4 us | 115.4 us | overlapped stable |
| condvar-pi | PASS | RT max wait | – | – | 152 us | NEW — requeue-PI, -42% worst-case vs non-PI |
| Check | Evidence | Result |
|---|---|---|
| ntsync module loaded | [ntsync] active (client-side handle 00000000001FFC00) |
confirmed |
| client-side handles | Handle 0x1FFC00 in client range (520K down from max) |
confirmed – bypasses wineserver |
| kernel mutex objects | 0x1FFBFC, 0x1FFBF0 – distinct NTSync handles |
confirmed |
Anonymous sync objects are created directly on /dev/ntsync client-side (f863fa180ca), bypassing wineserver for creation and wait. Events are excluded — client-created event handles destabilized Ableton Live, so anonymous events remain on wineserver (a7b00453978). Mutexes and semaphores are stable client-side.
| Object Type | Creation Path | Wait Path | PI Support |
|---|---|---|---|
| Mutex (anonymous) | Client → ioctl(NTSYNC_IOC_CREATE_MUTEX) | Client → ioctl(NTSYNC_IOC_WAIT_ANY) | PI v2 (kernel driver) |
| Semaphore (anonymous) | Client → ioctl(NTSYNC_IOC_CREATE_SEM) | Client → ioctl(NTSYNC_IOC_WAIT_ANY) | No PI (no owner concept) |
| Event (anonymous) | Wineserver → ioctl(NTSYNC_IOC_CREATE_EVENT) | Client → ioctl(NTSYNC_IOC_WAIT_ANY) | N/A |
| Named objects | Wineserver (all types) | Client via fd received from server | Same as above per type |
| Kernel Patch | Validated By | Evidence | Status |
|---|---|---|---|
| 0001: raw_spinlock + rt_mutex hardening | all 11 tests (no lockup under PREEMPT_RT) | 22/22 PASS, no watchdog triggers, no soft-lockup warnings | validated |
| 0002: priority-ordered waiter queues | ntsync-d4/d8/d12 sub-test 3 (priority wakeup) | 5-7 waiters correct wake order at all 3 depths, both modes | validated |
| 0003: mutex owner PI boost v2 | philosophers + ntsync-d4/d8 PI contention | Phil RT max wait 1620→865us (-46.6%); d8 RT PI avg 479→239ms (-50.1%) | validated — 3 bugs fixed |
| 0004: uring_fd extension (io_uring CQE wakeup) | socket-io test (sync + overlapped) | Phase A: 95us avg, Phase B: 115us avg, 2000/2000 completions | validated |
| 0005: PI kmalloc pre-allocation fix | ntsync-d8/d12 under RT load | No PREEMPT_RT allocation-in-atomic warnings, PI contention stable | validated |
| Priority | Win32 Value | wake_us | Delta from TC | Order |
|---|---|---|---|---|
| TIME_CRITICAL | 15 | 2542030506 | +0 us | 1st |
| HIGHEST | 2 | 2542030528 | +22 us | 2nd |
| ABOVE_NORMAL | 1 | 2542030556 | +50 us | 3rd |
| NORMAL | 0 | 2542030567 | +61 us | 4th |
| BELOW_NORMAL | -1 | 2542030575 | +69 us | 5th |
| LOWEST | -2 | 2542030588 | +82 us | 6th |
| IDLE | -15 | 2542030651 | +145 us | 7th |
| Chain Depth | RT Wait (ms) | Per-Hop Increment | Tail Holder Elapsed |
|---|---|---|---|
| 4 | 236 | ~50 ms | 437 ms |
| 8 | 235 | ~50 ms | 639 ms |
| 12 | 226 | ~50 ms | 1002 ms |
Key finding: RT wait time does NOT increase with chain depth. The tail holder does the same CPU work (~220ms);
ntsync_pi_recalc()propagates boost faster than the tail finishes. Confirmed O(1) for RT wait up to depth 12. No further depth testing needed.
NSPA_RT_PRIO=80)Formula: fifo_prio = nspa_rt_prio_base - (31 - nt_band), clamped to [1..98].
| Win32 Label | Win32 Value | NT Band | FIFO Priority | Notes |
|---|---|---|---|---|
| IDLE (realtime class) | -15 | 16 | 65 | |
| LOWEST (realtime class) | -2 | 22 | 71 | |
| BELOW_NORMAL (realtime class) | -1 | 23 | 72 | |
| NORMAL (realtime class) | 0 | 24 | 73 | standard RT band |
| ABOVE_NORMAL (realtime class) | 1 | 25 | 74 | |
| HIGHEST (realtime class) | 2 | 26 | 75 | |
| TIME_CRITICAL | 15 | 31 | 80 | client RT ceiling, always SCHED_FIFO |
| wineserver main | – | – | 64 | SCHED_FIFO, below all client RT |
| kernel threads | – | – | 99 | reserved, never used by Wine |
Ceiling mapping:
NSPA_RT_PRIOis the max client RT priority, not a midpoint.TIME_CRITICALis special-cased to NT band 31 and maps exactly to that ceiling. Standard REALTIME-class priorities scale linearly below it (NT 24 -> 73,NT 16 -> 65whenNSPA_RT_PRIO=80). The v3 wake-order test now uses only the 7 standard Win32 priority values because non-standard values (3-14) bypass theTIME_CRITICALspecial case and can otherwise outrank it.
global_lock PI – SHIPPEDConverted server/fd.c:global_lock from pthread_mutex_t to pi_mutex_t (FUTEX_LOCK_PI). This lock serializes all wineserver dispatch between the main epoll loop and per-client shmem dispatcher pthreads (NSPA v1.5). With PI, when a high-priority dispatcher (boosted by v2.4 client PI) contends with a lower-priority holder, the holder is automatically boosted via the kernel’s rt_mutex PI chain.
Upstream wineserver is single-threaded – no locks. NSPA v1.5 added per-client shmem dispatcher pthreads, requiring global_lock to serialize state access. The original librtpi sweep excluded server/ because the audit assumed “single-threaded event loop = no contention.” That assumption was wrong – the v1.5 dispatchers create real contention, and without PI the lock was a priority inversion hazard when v2.4 client-side boost raised a dispatcher’s priority above the lock holder’s.
| Test | Metric | v3 baseline | With PI | Change |
|---|---|---|---|---|
| cs-contention | holder work time | 475 ms | 216 ms | 2.2x faster |
| philosophers | RT max wait | 1620 us | 692 us | 2.3x lower |
| rapidmutex | throughput | 301K ops/s | 326K ops/s | +8% |
| ntsync-d4 | PI contention avg | ~238 ms | ~220 ms | consistent |
| fork-mutex | child total max | no change | no change | no regression |
18/18 PASS, zero regressions.
server/fd.c – pthread_mutex_t global_lock -> pi_mutex_t global_lock = PI_MUTEX_INIT(0), all pthread_mutex_lock/unlock -> pi_mutex_lock/unlockserver/file.h – declaration updated, #include <rtpi.h> addedserver/thread.c – all pthread_mutex_lock/unlock(&global_lock) -> pi_mutex_lock/unlockDuring this investigation, we validated that the server’s add_queue() / wake_up() priority ordering (originally planned as a separate change) is already handled by ntsync kernel patch 1002. With ntsync loaded, all sync primitives (events, mutexes, semaphores – both named and anonymous) go through the ntsync kernel driver for waiting, which provides priority-ordered waiter queues. The server-side add_queue() is never exercised for these objects. Confirmed by the wake-order test subcommand: 50/50 PASS regardless of server-side queue order.
Commit 239ca470158 – full port of Jinoh Kang’s 13-patch vDSO preloader series (patches 01-07, 09, 11-13) to Wine 11.6. Replaces the previous minimal port that ignored vvar pages.
/proc/self/maps, reserves around existing mappings instead of MAP_FIXED clobberingmremap when they conflict with reserved rangesWINEPRELOADREMAPVDSO=force|skip|on-conflict (default: on-conflict)Bug found during porting: Patch 06 (EHDR unmapping) breaks child process creation on static-pie x86_64 preloader. Intentionally omitted.
| vDSO (preserved) | syscall (deleted) | penalty | ratio | |
|---|---|---|---|---|
clock_gettime |
26.3 ns | 328.2 ns | +301.9 ns | 12.5x |
gettimeofday |
27.5 ns | 319.1 ns | +291.6 ns | 11.6x |
| Buffer size | Budget | 10 timing calls | 100 timing calls |
|---|---|---|---|
| 48kHz/1024 | 21.3 ms | +3.0 us (0.014%) | +30.2 us (0.14%) |
| 48kHz/64 | 1.33 ms | +3.0 us (0.23%) | +30.2 us (2.26%) |
WINEPRELOADREMAPVDSO=force on both 64-bit and 32-bit ChromaphoneNote: On the current kernel (6.19.11-rt1-1-nspa), vDSO is not in a reserved range, so the default on-conflict mode takes no action. The value is defensive – protects against kernels/configs where ASLR places vDSO in reserved ranges. The VMA-aware reservation is independently valuable as a correctness improvement.
Long-running Wine 11.6 regression in the normal desktop/X11 windowing path, separate from the RT/NTSync stack. The currently committed workaround is app-compat focused and was validated locally from wine/build on 2026-04-14.
| Area | Status | Notes |
|---|---|---|
| Startup sequence | usable | splash -> activation -> main window reached again |
| Menubar growth loop | suppressed | old runaway content/menubar expansion no longer reproduces in the committed build |
| Main window content mapping | much improved | black/blank main window behavior substantially reduced |
| Resize behavior | improved, not perfect | some edge-triggered vertical correction / repaint artifacts still possible |
| Fix quality | workaround | committed because it materially improves behavior without a known broader regression so far |
dlls/win32u/window.cdlls/win32u/defwnd.cdlls/win32u/driver.cdlls/winex11.drv/window.cdlls/winex11.drv/event.cdlls/winex11.drv/init.cdlls/winex11.drv/x11drv.hdlls/winex11.drv/x11drv_main.cinclude/wine/gdi_driver.h_NET_FRAME_EXTENTS are now used in the visible/window mapping path.win32u preserves the default non-client layout when a decorated menu window briefly reports client == window.WM_WINDOWPOSCHANGED recursion is suppressed.| # | Bug | Impact |
|---|---|---|
| 1 | now_us() QPC integer overflow (~44 min uptime) |
Negative timestamps –> false PASS on priority wakeup |
| 2 | Priority wakeup sentinel prev_time > 0 |
Never triggered with negative timestamps |
| 3 | Load threads were SCHED_FIFO under REALTIME class | System lockup – FIFO busyloops pinned all cores |
| 4 | Sleep(0) = no-op for FIFO threads |
Load threads never yielded to desktop |
| 5 | CPU-bound work at FIFO 80+ monopolized cores | Desktop froze for 1-2s per iteration |
| 6 | No ntsync module detection | Tests ran against futex path, not ntsync |
| 7 | Non-standard Win32 priority values (7-14) | Bypassed TC ceiling clamp |
| Improvement | Purpose | Priority |
|---|---|---|
| Histogram mode | Per-test latency distribution (P50/P95/P99/max) instead of just min/avg/max | MEDIUM |
| Automated baseline-vs-RT diff | Machine-readable JSON output + diff script | MEDIUM |
| Per-test CPU pinning | taskset to eliminate CFS load placement variance |
LOW |
| Longer soak mode | Loop all tests for N minutes to catch rare races | LOW |
Priority: MEDIUM — understanding, not a bug. PI boost is confirmed working at all depths.
NTSync PI contention times vary with CFS load placement. d4 (8 iterations) averages ~270ms with 67ms spread, while d8/d12 (3 iterations) show higher and tighter averages. The variable is CFS scheduling of the SCHED_OTHER holder under load thread competition, not a PI chain problem.
Next step: CPU pinning via taskset to isolate CFS variance from PI behavior.
Priority: LOW — expected behavior, not a bug.
NTSync rapid mutex ~250K ops/s vs CS rapidmutex ~301K ops/s. The ~20% gap is expected: CS has a user-space CAS fast path that avoids syscalls when uncontended, while ntsync always takes an ioctl. Applications use CriticalSection for hot-path locking; kernel mutexes are for cross-process sync.
Priority: MEDIUM — required for 32-bit app compatibility.
32-bit (i386) DLLs may be stale. All current tests run 64-bit PE. Needed for 32-bit VST plugins and older games. Requires configure --enable-archs=i386,x86_64.
Priority: LOW — planned, not yet implemented.
RTL_CRITICAL_SECTION_FLAG_DYNAMIC_SPIN is a FIXME stub. CRT heap uses this flag. Plan: substitute ~4000 spincount, gate behind !nspa_cs_pi_active().
Priority: MEDIUM — synthetic tests prove correctness, not real-world impact.
SRW spin, condvar PI, SIMD, CoWait rewrite all need validation with real DAW/VST workloads (Ableton, REAPER, Bitwig, various VST plugins). The test suite confirms no regressions but doesn’t exercise the full complexity of real audio applications.
f795984ed19).global_lock PI.Raw logs: wine/nspa/docs/logs/v6/rt_*.log | Full comparison: nspa-test-comparison.gen.html
Generated: 2026-04-16 | Wine-NSPA RT v6