Wine 11.6 + NSPA RT patchset | Kernel 6.19.x-rt with NTSync PI | 2026-04-16 Author: jordan Johnston
Win32 condvar PI bridges RtlSleepConditionVariableCS to the Linux kernel’s requeue-PI mechanism. When a SCHED_FIFO (RT) thread waits on a condition variable protected by a PI-enabled critical section, the kernel atomically requeues the waiter from the condvar futex onto the CS’s PI mutex on signal. This eliminates the priority inversion window between condvar wake and CS reacquire that exists in the standard Win32 condvar path.
The implementation uses two Linux futex operations that form a matched pair:
FUTEX_WAIT_REQUEUE_PI – the waiter sleeps on the condvar futex but declares a PI mutex it expects to be requeued ontoFUTEX_CMP_REQUEUE_PI – the signaler atomically wakes/requeues waiters from the condvar futex onto that PI mutexThe entire condvar PI path is gated behind nspa_cs_pi_active() – when inactive (no NSPA_RT_PRIO set), the code is byte-identical to upstream Wine. The gate also requires RecursionCount == 1 (non-recursive lock hold) because the kernel’s PI mutex has no recursion concept.
The standard Win32 SleepConditionVariableCS implementation has a structural priority inversion gap. Between the moment a waiter is woken from the condvar and the moment it reacquires the critical section, there is no PI protection – a low-priority thread holding the CS will not be boosted, and the RT waiter can be preempted by medium-priority threads (classic priority inversion).
Key insight: The standard path has a window between wake and CS reacquire where no PI protection exists. The requeue-PI path eliminates this entirely – the kernel atomically moves the waiter from the condvar futex to the PI mutex chain, so the waiter either owns the CS immediately on wake or is on the PI chain (triggering priority boost) with zero gap.
The condvar PI implementation spans the PE–unix boundary. The PE side (dlls/ntdll/sync.c) manages the condvar-to-mutex mapping table and CS bookkeeping. The unix side (dlls/ntdll/unix/sync.c) issues the actual futex syscalls. Three new Nt-level syscalls bridge the two.
The Win32 WakeConditionVariable API only takes the condvar address – unlike POSIX pthread_cond_signal which has access to the mutex through the pthread_cond_wait call. But FUTEX_CMP_REQUEUE_PI requires both the condvar futex address and the PI mutex address. The signal side needs a way to find the PI mutex from only the condvar address.
A 64-entry open-addressed hash table with tombstone deletion and refcounting, shared by all threads in the process. The table maps condvar addresses to PI mutex addresses.
CONDVAR_PI_TOMBSTONE (preserves probe chains for other entries).Open-addressing with linear probing cannot simply clear a slot on deletion – it would break probe chains for entries that were inserted past the deleted slot. The standard solution is tombstone deletion: a deleted slot is marked with a sentinel value (CONDVAR_PI_TOMBSTONE) that lookup skips over but insertion can reuse.
struct condvar_pi_entry {
const volatile void *condvar_addr; /* key (or TOMBSTONE) */
LONG *pi_mutex_addr; /* value */
LONG refcount; /* waiters using this entry */
};
Three new Nt-level syscalls cross the PE–unix boundary. These are NSPA-specific extensions to the NT syscall table, numbered in the 0x00b1–0x00b3 range.
| Syscall | Number | Parameters | Description |
|---|---|---|---|
NtNspaCondWaitPI |
0x00b1 | condvar_futex, condvar_val, pi_mutex, timeout |
Wait on condvar with requeue-PI. Unlocks PI mutex, sleeps on condvar, gets requeued onto PI mutex on signal. |
NtNspaCondSignalPI |
0x00b2 | condvar_futex, pi_mutex |
Signal one waiter. Increments condvar, then FUTEX_CMP_REQUEUE_PI to wake 1, requeue 0. |
NtNspaCondBroadcastPI |
0x00b3 | condvar_futex, pi_mutex |
Broadcast to all waiters. Same as signal but wake 1, requeue INT_MAX. |
NtNspaCondWaitPI ALWAYS returns with the PI mutex owned by the caller, regardless of how it returns:
FUTEX_LOCK_PI to acquire the mutex explicitlyFUTEX_LOCK_PI to reacquire, then return STATUS_TIMEOUTThis “always own on return” contract matches what the PE side expects: it clears CS bookkeeping before the syscall and restores it after, so the unix side must guarantee the PI mutex is held on every return path.
If NtNspaCondWaitPI returns STATUS_NOT_SUPPORTED (kernel too old, or futex ops unavailable), the PE side falls through to the standard Win32 condvar path with normal RtlLeaveCriticalSection / RtlWaitOnAddress / RtlEnterCriticalSection. The CS-PI leave/enter still provides PI protection during those calls – the gap just isn’t eliminated.
EAGAIN from FUTEX_WAIT_REQUEUE_PI means the condvar value changed between our read and the futex call – a signal raced with us. This is treated as “we were signaled” rather than an error. The waiter falls through to FUTEX_LOCK_PI to acquire the PI mutex, then returns STATUS_SUCCESS. No wakeup is lost.
The signal path increments the condvar counter exactly once, then issues FUTEX_CMP_REQUEUE_PI with the post-increment value. On EAGAIN (another signal raced), it re-reads the current value and retries the CMP_REQUEUE_PI without incrementing again. This avoids the counter drifting upward and causing spurious wakeups.
The mapping table uses refcounting. Every condvar_pi_register increments the refcount, every condvar_pi_deregister decrements it. The entry is only cleared (tombstoned) when the refcount reaches zero. This ensures the signal path can always find the PI mutex address for active waiters, even if some waiters have already returned.
Open-addressing deletion uses CONDVAR_PI_TOMBSTONE sentinel values. Lookup probes skip tombstones (they are not the entry we want, but entries beyond them might be). Probe chains terminate only at a true NULL slot. Insertion can reuse tombstone slots, keeping table density manageable.
If the unix side detects that the kernel does not support FUTEX_WAIT_REQUEUE_PI (returns ENOSYS), it returns STATUS_NOT_SUPPORTED. The PE side catches this and falls through to the standard Win32 condvar path. This means Wine-NSPA can run on kernels without requeue-PI support – the RT guarantees just degrade gracefully to the CS-PI-only path (PI on enter/leave, but gap between wake and enter).
Win32 condvar PI is the fourth PI mechanism in Wine-NSPA. Together, these four paths provide priority inheritance coverage across the entire Wine synchronization surface:
| Path | Mechanism | Scope |
|---|---|---|
| CS-PI | FUTEX_LOCK_PI on LockSemaphore |
Win32 CriticalSection enter/leave |
| NTSync PI | Kernel ntsync driver with priority-ordered wakeup | Win32 Mutex / Semaphore / Event |
| pi_cond requeue-PI | FUTEX_WAIT_REQUEUE_PI in librtpi |
Unix-side condvars (audio, gstreamer) |
| Win32 condvar PI | FUTEX_WAIT_REQUEUE_PI for RtlSleepConditionVariableCS |
Win32 SleepConditionVariableCS |
SRW gap: SRW-backed condvars (
RtlSleepConditionVariableSRW) are not covered by this work. SRW locks have no PI mechanism – this is an unsolved problem even in the Linux kernel (reader-writer locks with priority inheritance require tracking all readers, which is prohibitively expensive). Applications usingSleepConditionVariableSRWin RT paths should switch toSleepConditionVariableCSfor PI coverage.
EnterCriticalSection / LeaveCriticalSection gets PI protection via FUTEX_LOCK_PI on the LockSemaphore field.LockSemaphore PI mutex as the requeue target. The CS-PI mutex IS the condvar-PI mutex.The condvar-pi test validates the requeue-PI path under contention: an RT waiter (THREAD_PRIORITY_TIME_CRITICAL, mapped to SCHED_FIFO) waits on a condvar while a normal-priority signaler sends signals and 4 CPU-bound load threads create scheduling pressure.
THREAD_PRIORITY_TIME_CRITICAL (SCHED_FIFO at NSPA_RT_PRIO)SCHED_OTHER)| Mode | avg wait | max wait | min wait |
|---|---|---|---|
With PI (NSPA_RT_PRIO=80) |
129 us | 152 us | 124 us |
| Without PI | 100 us | 263 us | 29 us |
Key finding: PI tightens the distribution – max wait drops from 263 to 152 us (42% lower worst-case). The average is slightly higher with PI due to requeue overhead (extra kernel work for the atomic requeue), but the tail latency is dramatically better. For RT audio, worst-case matters more than average: a 263 us spike at the wrong moment causes a buffer underrun, while a consistent 129 us does not.
22/22 PASS (11 tests x 2 modes: with and without PI), no regressions vs the v5 test suite baseline.
| File | Role |
|---|---|
dlls/ntdll/sync.c |
PE-side condvar PI implementation: mapping table (condvar_pi_register / condvar_pi_deregister / condvar_pi_lookup), modified RtlSleepConditionVariableCS, RtlWakeConditionVariable, RtlWakeAllConditionVariable |
dlls/ntdll/unix/sync.c |
Unix-side futex operations: NtNspaCondWaitPI (FUTEX_UNLOCK_PI + FUTEX_WAIT_REQUEUE_PI + EAGAIN fallback), NtNspaCondSignalPI (FUTEX_CMP_REQUEUE_PI), NtNspaCondBroadcastPI |
dlls/ntdll/ntsyscalls.h |
Syscall table entries: 0x00b1 (NtNspaCondWaitPI), 0x00b2 (NtNspaCondSignalPI), 0x00b3 (NtNspaCondBroadcastPI) for both i386 and x86_64 |
include/winternl.h |
Function declarations for the three new NtNspaCond*PI syscalls |
programs/nspa_rt_test/main.c |
Validation test: cmd_condvar_pi – RT waiter + normal signaler + 4 load threads, 500 iterations, latency measurement |
Wine-NSPA Win32 Condvar PI Reference | Generated 2026-04-16 | Wine 11.6 + NSPA RT patchset