Wine-NSPA – Win32 Condvar PI (Requeue-PI)

Wine 11.6 + NSPA RT patchset | Kernel 6.19.x-rt with NTSync PI | 2026-04-16 Author: jordan Johnston

Table of Contents

  1. Overview
  2. The Problem
  3. Architecture
  4. Condvar-to-Mutex Mapping Table
  5. Syscall Interface
  6. Correctness Properties
  7. Relationship to Existing PI Infrastructure
  8. Test Results
  9. Files Changed

1. Overview

Win32 condvar PI bridges RtlSleepConditionVariableCS to the Linux kernel’s requeue-PI mechanism. When a SCHED_FIFO (RT) thread waits on a condition variable protected by a PI-enabled critical section, the kernel atomically requeues the waiter from the condvar futex onto the CS’s PI mutex on signal. This eliminates the priority inversion window between condvar wake and CS reacquire that exists in the standard Win32 condvar path.

The implementation uses two Linux futex operations that form a matched pair:

The entire condvar PI path is gated behind nspa_cs_pi_active() – when inactive (no NSPA_RT_PRIO set), the code is byte-identical to upstream Wine. The gate also requires RecursionCount == 1 (non-recursive lock hold) because the kernel’s PI mutex has no recursion concept.


2. The Problem

The standard Win32 SleepConditionVariableCS implementation has a structural priority inversion gap. Between the moment a waiter is woken from the condvar and the moment it reacquires the critical section, there is no PI protection – a low-priority thread holding the CS will not be boosted, and the RT waiter can be preempted by medium-priority threads (classic priority inversion).

Priority Inversion Gap Diagram

Before: Standard Win32 Path After: Requeue-PI Path (NSPA) capture value val = *(LONG*)&condvar->Ptr RtlLeaveCriticalSection CS released (FUTEX_UNLOCK_PI) RtlWaitOnAddress (condvar futex) sleeping... no PI protection WAKE (signaler increments condvar) PRIORITY INVERSION GAP RT waiter is runnable but does NOT own CS Low-prio thread may hold CS with no PI boost Medium-prio threads can preempt RT waiter Unbounded delay possible RtlEnterCriticalSection FUTEX_LOCK_PI (may block again) finally own CS again time condvar_pi_register(condvar, pi_mutex) map condvar -> CS LockSemaphore clear CS bookkeeping RecursionCount=0, OwningThread=0 NtNspaCondWaitPI (unix side) FUTEX_UNLOCK_PI(pi_mutex) FUTEX_WAIT_REQUEUE_PI(condvar, val, pi_mutex) sleeping on condvar futex kernel knows requeue target = PI mutex KERNEL ATOMIC REQUEUE FUTEX_CMP_REQUEUE_PI moves waiter onto PI mutex Waiter owns PI mutex immediately on wake If contended: waiter is on PI chain (boosted) Zero gap restore CS bookkeeping RecursionCount=1, OwningThread=GetCurrentThreadId() condvar_pi_deregister, return SUCCESS standard Win32 NSPA PI path unix/kernel PI inversion gap atomic requeue (safe)

Key insight: The standard path has a window between wake and CS reacquire where no PI protection exists. The requeue-PI path eliminates this entirely – the kernel atomically moves the waiter from the condvar futex to the PI mutex chain, so the waiter either owns the CS immediately on wake or is on the PI chain (triggering priority boost) with zero gap.


3. Architecture

The condvar PI implementation spans the PE–unix boundary. The PE side (dlls/ntdll/sync.c) manages the condvar-to-mutex mapping table and CS bookkeeping. The unix side (dlls/ntdll/unix/sync.c) issues the actual futex syscalls. Three new Nt-level syscalls bridge the two.

Call Flow Diagram

Win32 Condvar PI: Wait / Signal / Broadcast Flow WAIT PATH PE ntdll (sync.c) nspa_cs_pi_active() && RecursionCount == 1 condvar_pi_register(condvar, &crit->LockSemaphore) insert into hash table (condvar addr -> pi_mutex addr) clear CS bookkeeping RecursionCount = 0, OwningThread = 0 --- PE / unix boundary (syscall 0x00b1) --- Unix ntdll (unix/sync.c) NtNspaCondWaitPI(condvar, val, pi_mutex, timeout) futex(FUTEX_UNLOCK_PI, pi_mutex) futex(FUTEX_WAIT_REQUEUE_PI, condvar, val, abstime, pi_mutex) ... sleeping (kernel holds requeue target) ... kernel requeues onto PI mutex -- own it on wake EAGAIN? (value changed = signal raced) FUTEX_LOCK_PI(pi_mutex) -- still own it --- return to PE --- restore CS bookkeeping RecursionCount = 1, OwningThread = tid condvar_pi_deregister(condvar) return STATUS_SUCCESS SIGNAL PATH PE ntdll (sync.c) condvar_pi_lookup(condvar) hash table lookup -> pi_mutex (or NULL = no PI waiters) --- PE / unix boundary (syscall 0x00b2) --- Unix ntdll (unix/sync.c) NtNspaCondSignalPI(condvar, pi_mutex) InterlockedIncrement(condvar) -- 1 per signal futex(FUTEX_CMP_REQUEUE_PI, condvar, wake=1, requeue=0, pi_mutex, val) EAGAIN? re-read val, retry CMP_REQUEUE_PI kernel wakes 1 waiter onto PI mutex BROADCAST PATH PE ntdll (sync.c) condvar_pi_lookup(condvar) -> pi_mutex Unix ntdll (unix/sync.c) FUTEX_CMP_REQUEUE_PI(condvar, 1, INT_MAX, pi_mutex) wake 1, requeue all remaining onto PI mutex PE ntdll unix ntdll kernel futex mapping table

4. Condvar-to-Mutex Mapping Table

The Win32 WakeConditionVariable API only takes the condvar address – unlike POSIX pthread_cond_signal which has access to the mutex through the pthread_cond_wait call. But FUTEX_CMP_REQUEUE_PI requires both the condvar futex address and the PI mutex address. The signal side needs a way to find the PI mutex from only the condvar address.

Solution: Open-Addressed Hash Table

A 64-entry open-addressed hash table with tombstone deletion and refcounting, shared by all threads in the process. The table maps condvar addresses to PI mutex addresses.

Operations

Why Tombstones

Open-addressing with linear probing cannot simply clear a slot on deletion – it would break probe chains for entries that were inserted past the deleted slot. The standard solution is tombstone deletion: a deleted slot is marked with a sentinel value (CONDVAR_PI_TOMBSTONE) that lookup skips over but insertion can reuse.

Design Choices

struct condvar_pi_entry {
    const volatile void *condvar_addr;   /* key (or TOMBSTONE) */
    LONG                *pi_mutex_addr;  /* value */
    LONG                 refcount;       /* waiters using this entry */
};

5. Syscall Interface

Three new Nt-level syscalls cross the PE–unix boundary. These are NSPA-specific extensions to the NT syscall table, numbered in the 0x00b1–0x00b3 range.

SyscallNumberParametersDescription
NtNspaCondWaitPI 0x00b1 condvar_futex, condvar_val, pi_mutex, timeout Wait on condvar with requeue-PI. Unlocks PI mutex, sleeps on condvar, gets requeued onto PI mutex on signal.
NtNspaCondSignalPI 0x00b2 condvar_futex, pi_mutex Signal one waiter. Increments condvar, then FUTEX_CMP_REQUEUE_PI to wake 1, requeue 0.
NtNspaCondBroadcastPI 0x00b3 condvar_futex, pi_mutex Broadcast to all waiters. Same as signal but wake 1, requeue INT_MAX.

Contract

NtNspaCondWaitPI ALWAYS returns with the PI mutex owned by the caller, regardless of how it returns:

This “always own on return” contract matches what the PE side expects: it clears CS bookkeeping before the syscall and restores it after, so the unix side must guarantee the PI mutex is held on every return path.

Fallback

If NtNspaCondWaitPI returns STATUS_NOT_SUPPORTED (kernel too old, or futex ops unavailable), the PE side falls through to the standard Win32 condvar path with normal RtlLeaveCriticalSection / RtlWaitOnAddress / RtlEnterCriticalSection. The CS-PI leave/enter still provides PI protection during those calls – the gap just isn’t eliminated.


6. Correctness Properties

No Lost Wakeups

EAGAIN from FUTEX_WAIT_REQUEUE_PI means the condvar value changed between our read and the futex call – a signal raced with us. This is treated as “we were signaled” rather than an error. The waiter falls through to FUTEX_LOCK_PI to acquire the PI mutex, then returns STATUS_SUCCESS. No wakeup is lost.

No Over-Increment

The signal path increments the condvar counter exactly once, then issues FUTEX_CMP_REQUEUE_PI with the post-increment value. On EAGAIN (another signal raced), it re-reads the current value and retries the CMP_REQUEUE_PI without incrementing again. This avoids the counter drifting upward and causing spurious wakeups.

No Orphaned Waiters

The mapping table uses refcounting. Every condvar_pi_register increments the refcount, every condvar_pi_deregister decrements it. The entry is only cleared (tombstoned) when the refcount reaches zero. This ensures the signal path can always find the PI mutex address for active waiters, even if some waiters have already returned.

Tombstone Probing

Open-addressing deletion uses CONDVAR_PI_TOMBSTONE sentinel values. Lookup probes skip tombstones (they are not the entry we want, but entries beyond them might be). Probe chains terminate only at a true NULL slot. Insertion can reuse tombstone slots, keeping table density manageable.

Graceful Fallback

If the unix side detects that the kernel does not support FUTEX_WAIT_REQUEUE_PI (returns ENOSYS), it returns STATUS_NOT_SUPPORTED. The PE side catches this and falls through to the standard Win32 condvar path. This means Wine-NSPA can run on kernels without requeue-PI support – the RT guarantees just degrade gracefully to the CS-PI-only path (PI on enter/leave, but gap between wake and enter).


7. Relationship to Existing PI Infrastructure

Win32 condvar PI is the fourth PI mechanism in Wine-NSPA. Together, these four paths provide priority inheritance coverage across the entire Wine synchronization surface:

PathMechanismScope
CS-PI FUTEX_LOCK_PI on LockSemaphore Win32 CriticalSection enter/leave
NTSync PI Kernel ntsync driver with priority-ordered wakeup Win32 Mutex / Semaphore / Event
pi_cond requeue-PI FUTEX_WAIT_REQUEUE_PI in librtpi Unix-side condvars (audio, gstreamer)
Win32 condvar PI FUTEX_WAIT_REQUEUE_PI for RtlSleepConditionVariableCS Win32 SleepConditionVariableCS

SRW gap: SRW-backed condvars (RtlSleepConditionVariableSRW) are not covered by this work. SRW locks have no PI mechanism – this is an unsolved problem even in the Linux kernel (reader-writer locks with priority inheritance require tracking all readers, which is prohibitively expensive). Applications using SleepConditionVariableSRW in RT paths should switch to SleepConditionVariableCS for PI coverage.

How the Paths Layer


8. Test Results

The condvar-pi test validates the requeue-PI path under contention: an RT waiter (THREAD_PRIORITY_TIME_CRITICAL, mapped to SCHED_FIFO) waits on a condvar while a normal-priority signaler sends signals and 4 CPU-bound load threads create scheduling pressure.

Test Configuration

Latency Results

Modeavg waitmax waitmin wait
With PI (NSPA_RT_PRIO=80) 129 us 152 us 124 us
Without PI 100 us 263 us 29 us

Key finding: PI tightens the distribution – max wait drops from 263 to 152 us (42% lower worst-case). The average is slightly higher with PI due to requeue overhead (extra kernel work for the atomic requeue), but the tail latency is dramatically better. For RT audio, worst-case matters more than average: a 263 us spike at the wrong moment causes a buffer underrun, while a consistent 129 us does not.

Distribution Characteristics

Full Suite Results

22/22 PASS (11 tests x 2 modes: with and without PI), no regressions vs the v5 test suite baseline.


9. Files Changed

FileRole
dlls/ntdll/sync.c PE-side condvar PI implementation: mapping table (condvar_pi_register / condvar_pi_deregister / condvar_pi_lookup), modified RtlSleepConditionVariableCS, RtlWakeConditionVariable, RtlWakeAllConditionVariable
dlls/ntdll/unix/sync.c Unix-side futex operations: NtNspaCondWaitPI (FUTEX_UNLOCK_PI + FUTEX_WAIT_REQUEUE_PI + EAGAIN fallback), NtNspaCondSignalPI (FUTEX_CMP_REQUEUE_PI), NtNspaCondBroadcastPI
dlls/ntdll/ntsyscalls.h Syscall table entries: 0x00b1 (NtNspaCondWaitPI), 0x00b2 (NtNspaCondSignalPI), 0x00b3 (NtNspaCondBroadcastPI) for both i386 and x86_64
include/winternl.h Function declarations for the three new NtNspaCond*PI syscalls
programs/nspa_rt_test/main.c Validation test: cmd_condvar_pi – RT waiter + normal signaler + 4 load threads, 500 iterations, latency measurement

Wine-NSPA Win32 Condvar PI Reference | Generated 2026-04-16 | Wine 11.6 + NSPA RT patchset