Wine 11.6 + NSPA RT patchset | Kernel 6.19.x-rt with NTSync PI | 2026-04-15 Author: jordan Johnston
Wine-NSPA is a real-time optimized fork of Wine designed to run on PREEMPT_RT Linux kernels. It bridges the gap between Wine’s SCHED_OTHER threading model and the deterministic scheduling guarantees that RT workloads require – priority inheritance, bounded lock hold times, priority-ordered wakeups, and correct NT-to-Linux priority mapping.
While professional audio (DAWs, VST plugins, ASIO applications) is the primary motivation, Wine-NSPA’s RT infrastructure benefits any latency-sensitive Win32 application: real-time simulation, industrial control, low-latency trading, or any scenario where a Windows application must coexist with Linux’s RT scheduler without priority inversion or unbounded blocking.
NSPA_RT_PRIO is unset, Wine behaves identically to upstream. Every RT code path is gated on environment variables.raw_spinlock_t, rt_mutex, and sched_setattr_nocheck() – primitives that only exist (or behave differently) under PREEMPT_RT.On a standard kernel, spinlock_t disables preemption and mutex is a sleeping lock. Under PREEMPT_RT, spinlock_t becomes a sleeping rt_mutex (fully preemptible), and only raw_spinlock_t disables preemption. This means:
raw_spinlock critical sections)rt_mutex PI chains| Component | Layer | Status |
|---|---|---|
| RT priority mapping (v1/v1.2) | Wine ntdll + wineserver | SHIPPED |
| Wineserver self-promotion (v1.1) | Wine wineserver | SHIPPED |
| Shmem wineserver IPC (v1.5) | Wine ntdll + wineserver | SHIPPED |
| Wineserver global_lock PI | Wine wineserver | SHIPPED |
| CS-PI / FUTEX_LOCK_PI (v2.3) | Wine ntdll (PE + Unix) | SHIPPED |
| librtpi vendoring (v2.0) | Wine libs/ | SHIPPED |
| NTSync PI kernel patches | Linux kernel driver | SHIPPED |
| Client-side NTSync creation | Wine ntdll Unix | SHIPPED |
| winejack.drv (MIDI + audio) | Wine driver | SHIPPED |
| nspaASIO bridge | Wine DLL | SHIPPED |
| io_uring I/O bypass (Phase 1-3) | Wine ntdll Unix + server | SHIPPED |
| ntsync uring_fd kernel extension | Linux kernel driver | SHIPPED |
| msvcrt SIMD (AVX/SSE2) | Wine msvcrt | SHIPPED |
| SRW lock spin phase | Wine ntdll | SHIPPED |
| pi_cond requeue-PI | Wine libs/librtpi | SHIPPED |
| Win32 condvar PI (requeue-PI) | Wine ntdll PE + Unix | SHIPPED |
| CoWaitForMultipleHandles rewrite | Wine combase | SHIPPED |
| QPC rdTSC bypass | Wine ntdll Unix | SHIPPED |
| Large/huge pages | Wine ntdll Unix | SHIPPED |
Side-by-side comparison of how vanilla Wine and Wine-NSPA handle the same architectural components. Every NSPA change is additive – the vanilla behavior is preserved when NSPA_RT_PRIO is unset.
Key differences: Vanilla Wine runs entirely under
SCHED_OTHER– wineserver, all threads, all sync primitives. There is no priority mapping, no PI, no RT scheduling. Wine-NSPA adds 6 layers of RT infrastructure: wineserver self-promotion, NT–>FIFO priority mapping, CS-PI viaFUTEX_LOCK_PI, client-side NTSync with PI kernel patches, rdTSC QPC bypass, and large page support. All layers are opt-in viaNSPA_RT_PRIO.
Wine implements the Windows NT process model on top of Linux. Understanding this architecture is essential for knowing where NSPA hooks in and why certain design choices were made.
A single long-lived process per Wine prefix that acts as the NT kernel analog. It owns the handle table, manages process/thread lifetime, and arbitrates named synchronization objects. All cross-process operations (handle inheritance, named mutexes, process creation) go through wineserver via Unix domain sockets.
Bottleneck: Wineserver is single-threaded and event-driven. An RT thread that makes a wineserver round-trip blocks behind the server’s serialization. v1.5’s shmem path and client-side NTSync creation reduce the number of operations that require this round-trip. The v1.5 shmem dispatchers serialize on a
global_lock(now PI-aware viapi_mutex_t/ FUTEX_LOCK_PI) so that high-priority dispatcher contention propagates priority through the kernel’s rt_mutex PI chain.
Every Wine process has two halves sharing the same address space. The PE side runs Win32 code (app binaries, PE DLLs like ntdll.dll, kernelbase.dll). The Unix side runs native Linux code (ntdll.so, driver Unixlibs). The PE side cannot make raw Linux syscalls – it must cross into the Unix side via the Unixlib dispatcher. This split is architecturally important for NSPA: CS-PI’s fast path (CAS on LockSemaphore) lives on the PE side, while the FUTEX_LOCK_PI slow path lives on the Unix side.
Wine 11.x’s primary synchronization backend. The /dev/ntsync kernel device (Elizabeth Figura, mainlined) implements Win32 semaphores, mutexes, and events with correct NT semantics (abandoned mutexes, WaitForMultipleObjects atomicity, etc.) directly in the kernel. NSPA adds five patches to this driver for PI support, io_uring integration, and PREEMPT_RT safety.
NSPA adds a client-side fast path for creating anonymous sync objects (unnamed mutexes and semaphores). Instead of a wineserver round-trip, the client calls ioctl(NTSYNC_IOC_CREATE_*) directly and allocates a handle from the client range (index ~524K downward). Named objects still go through wineserver for namespace resolution. This eliminates a round-trip per object creation – significant for apps that create thousands of mutexes at startup.
Events are excluded from client-side creation (a7b00453978). Client-created event handles destabilized Ableton Live, causing crashes in the handle lifecycle. Anonymous events remain on wineserver, where the server-side lifecycle management is proven stable. Mutexes and semaphores are stable client-side.
Forward-port of Torge Matthies’s 2022 shmem wineserver IPC patch. Replaces socket-based request/reply with per-thread shared memory + futex signaling for small requests. The server side spawns a pthread per client thread that sits in FUTEX_WAIT and dispatches via the existing req_handlers[] array, serialized by a PI-aware global_lock around the main poll loop. v2.4 client-side boost raises the dispatcher’s priority to match the client’s RT priority for the duration of each request; the PI lock ensures this boost propagates through contention with other dispatchers or the main thread.
Why shmem matters for RT: NTSync captures the sync-wait hot path, but non-sync wineserver traffic (handle operations, registry, file metadata, loader calls) still uses IPC. During VST plugin loading, the loader holds Wine’s loader lock while dispatching dozens of wineserver calls. Shorter round-trips = shorter lock-hold times = less contention bleeding into RT threads sharing those locks.
Limitations: The
global_lockserializes all shmem dispatchers, so this is latency improvement, not throughput. Dispatcher threads run at SCHED_OTHER by default (v1.1’s wineserver RT promotion handles the scheduling side independently).
v2.4’s manual PI boost called sched_getscheduler() + sched_getparam() + sched_setscheduler() on every request (6 syscalls per RT request including unboost). v2.5 caches the RT thread’s scheduling policy and priority in TLS (nspa_rt_cached_policy, nspa_rt_cached_prio), eliminating the read syscalls. Only sched_setscheduler() fires — 2 syscalls per request (boost + unboost). Committed as 17621ba494c.
NSPA maps Win32 thread priorities to Linux SCHED_FIFO priorities, giving audio threads deterministic scheduling. The mapping is controlled by two environment variables and implemented in a two-tier architecture.
When a thread calls SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL), ntdll’s Unix side detects this in NtSetInformationThread(ThreadBasePriority) and calls sched_setscheduler(0, SCHED_FIFO | SCHED_RESET_ON_FORK, ...) directly. No wineserver round-trip needed. The priority change still forwards to wineserver for bookkeeping so that GetThreadPriority from other processes returns the correct value.
Implementation: dlls/ntdll/unix/thread.c, function nspa_rt_apply_tid().
When a thread calls SetThreadPriority(hOtherThread, ...) targeting a thread in another process, the request goes through wineserver. The server’s apply_thread_priority() in server/thread.c calls sched_setscheduler(thread->unix_tid, ...) on the target thread. This covers bulk updates from SetPriorityClass and any cross-process priority manipulation.
Tier 1 extended with a HANDLE-to-unix_tid map (256-slot open-addressing hash table protected by a PI mutex). When a thread calls SetThreadPriority(hThread, ...) where hThread is another thread in the same process, the client looks up the target’s unix_tid in the map and applies the scheduling change locally, avoiding a wineserver round-trip. The map is populated at NtCreateThreadEx time.
| Variable | Values | Default | Effect |
|---|---|---|---|
NSPA_RT_PRIO |
Integer in [min..max-1] | unset = RT dormant | Master switch + ceiling FIFO priority. NT 31 maps here. |
NSPA_RT_POLICY |
FF, RR, TS | FF | Scheduler policy for NT [16..30]. TC (NT 31) always FIFO. |
NSPA_SRV_RT_PRIO |
Integer | NSPA_RT_PRIO - 16 | Override wineserver’s FIFO priority. |
NSPA_SRV_RT_POLICY |
FF, RR | FF | Wineserver scheduler policy. |
Lenient path (v1 feature): For client-side self/same-process promotion,
THREAD_PRIORITY_TIME_CRITICALis treated as a special-case ceiling promotion even when the process is not in REALTIME priority class. This covers the common audio pattern where apps callSetThreadPriority(..., 15)without first callingSetPriorityClass(REALTIME). The lower realtime band still follows the normal process-class rules; the lenient exception is specifically forTIME_CRITICAL.
Priority inversion is the primary failure mode for RT audio under Wine. An RT thread blocked on a lock held by a normal-priority thread can be delayed indefinitely while CFS time-slices the holder against dozens of other threads. NSPA addresses this with PI (priority inheritance) on two independent sync paths.
Win32 CRITICAL_SECTION is the most contended lock in typical Wine workloads – used by heap operations, loader, DllMain serialization, GDI, and most app/plugin code. NSPA’s CS-PI repurposes the LockSemaphore field of RTL_CRITICAL_SECTION as a FUTEX_LOCK_PI word.
InterlockedIncrement(&LockCount). If we won (-1 –> 0), CAS our Linux TID into LockSemaphore. Done – never leaves user space.NtNspaLockCriticalSectionPI(address), crossing to Unix side. Unix side calls futex(&LockSemaphore, FUTEX_LOCK_PI_PRIVATE, ...). The kernel sees the owner TID in the futex word, boosts the owner to the waiter’s scheduling priority, and blocks the waiter on an rt_mutex.CAS(LockSemaphore, my_tid, 0). Pure user-space.NtNspaUnlockCriticalSectionPI(address). Kernel transfers ownership to the highest-priority waiter and drops the boost.TID source: PE code cannot call
syscall(SYS_gettid)directly. NSPA addsNtNspaGetUnixTid()which returns the Linux kernel TID fromntdll_thread_data->nspa_unix_tid. PE caches this in a__threadvariable – the syscall fires at most once per thread.
Win32 CreateMutex + WaitForSingleObject goes through the /dev/ntsync kernel driver. NSPA’s five kernel patches add PI, uring_fd, and PREEMPT_RT fixes to this path (see Section 6). The flow:
NtWaitForSingleObject –> inproc_wait() resolves the mutex’s fd from the handle cache.linux_wait_objs() calls ioctl(device, NTSYNC_IOC_WAIT_ANY, &args).ntsync_insert_waiter() inserts the waiter into a priority-ordered queue (patch 2).ntsync_pi_recalc() scans both any_waiters and all_waiters for the highest-priority waiter. Compares against the owner’s saved original priority (not normal_prio, which changes after boost). If boost needed, looks up or creates a per-task ntsync_pi_owner tracking entry, increments boost_count, and boosts via sched_setattr_nocheck() (patch 3 v2).try_wake_any_mutex() wakes the highest-priority waiter. ntsync_pi_drop() decrements boost_count; only restores original scheduling when the last boost on that task is removed (multi-object safe).Both PI paths degrade gracefully. CS-PI falls back to the legacy keyed-event wait if FUTEX_LOCK_PI returns ENOSYS. NTSync PI requires the patched kernel driver – without it, waiters are FIFO-ordered (upstream default) with no owner boost.
Windows SRW locks (Slim Reader/Writer) spin briefly before entering the kernel wait path. NSPA implements this behavior:
FUTEX_WAIT / FUTEX_LOCK_PInspa_is_rt_thread() checks cached scheduling policy (set during RT promotion)This matches Windows NT behavior where SRWLock uses a brief spin phase before blocking. The 256-iteration count is empirically chosen to cover the common case of short critical sections (sub-microsecond) without burning excessive cycles on longer holds.
Impact: Reduces kernel transitions for uncontended or briefly-contended SRW locks. The v5 NTSync d4 rapid throughput improved from 232K to 259K ops/s (+11.6%), consistent with fewer futex syscalls in the lock path.
Implementation: dlls/ntdll/sync.c (SRW acquire path).
Wine-NSPA’s pi_cond_t (condition variable with PI support, from librtpi) uses FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI to close the PI gap in condition variable wakeup. The waiter transitions directly from “blocked on condvar” to “blocked on PI mutex with priority inheritance” – no gap.
Implementation: libs/librtpi/rtpi.h (header-only inline, pi_cond_wait, pi_cond_signal, pi_cond_broadcast).
The same requeue-PI mechanism applied to PE-side Win32 condvars. When CS-PI is active and the CS is held non-recursively, RtlSleepConditionVariableCS takes the PI path: capture value, register condvar→mutex mapping, NtNspaCondWaitPI (unix: FUTEX_UNLOCK_PI + FUTEX_WAIT_REQUEUE_PI). On signal, kernel atomically requeues waiter onto PI mutex – zero gap.
Scope: CS-backed condvars only. SRW-backed condvars remain without PI (unsolved problem even in the Linux kernel).
| PI Coverage Path | Mechanism | Scope |
|---|---|---|
| CS-PI | FUTEX_LOCK_PI on LockSemaphore |
Win32 CriticalSection enter/leave |
| NTSync PI | Kernel ntsync driver (priority-ordered wakeup) | Win32 Mutex/Semaphore/Event |
| pi_cond requeue-PI | FUTEX_WAIT_REQUEUE_PI in librtpi |
Unix-side condvars (audio, gstreamer) |
| Win32 condvar PI | FUTEX_WAIT_REQUEUE_PI for RtlSleepConditionVariableCS |
Win32 SleepConditionVariableCS |
Both pi_cond (unix-side) and Win32 condvar PI (PE-side) use FUTEX_WAIT_REQUEUE_PI / FUTEX_CMP_REQUEUE_PI to atomically move waiters from the condvar futex onto the PI mutex chain — eliminating the priority inversion gap between wake and mutex reacquire.
Full details: Win32 Condvar PI documentation (architecture, syscall interface, mapping table design, 2 SVG diagrams).
Five patches applied to drivers/misc/ntsync.c in the NSPA kernel tree. Patches 1-3 make the NTSync driver safe on PREEMPT_RT and add Windows-faithful priority semantics with PI boost. Patches 4-5 integrate ntsync with io_uring for CQE wakeup and fix a PREEMPT_RT allocation bug.
Problem: Upstream ntsync uses
spinlock_t, which on PREEMPT_RT becomes a sleepingrt_mutex. This is correct for general use but changes the timing characteristics and makes some code paths that assume non-preemptibility incorrect.Fix: Convert
obj->locktoraw_spinlock_twithraw_spin_lock()/raw_spin_unlock(). This preserves true spin semantics even on PREEMPT_RT kernels, matching the driver’s design assumption of short critical sections around object state updates.
Problem: Upstream ntsync uses
list_add_tail()for all waiters – FIFO order. Windows NT uses strict priority ordering: the highest-priority waiter is always woken first on object release.Fix: Replace
list_add_tail()withntsync_insert_waiter(), which walks the waiter list and inserts the new entry before the first entry with lower scheduling priority (task->prio). Same-priority waiters are FIFO within their level. This matches Windows NT scheduling semantics exactly.
static void ntsync_insert_waiter(struct ntsync_q_entry *new_entry,
struct list_head *head)
{
struct ntsync_q_entry *entry;
list_for_each_entry(entry, head, node) {
if (new_entry->q->task->prio < entry->q->task->prio) {
list_add_tail(&new_entry->node, &entry->node);
return;
}
}
list_add_tail(&new_entry->node, head);
}
Problem: When a SCHED_FIFO thread waits on a mutex held by a SCHED_OTHER thread, the holder may not get CPU time promptly (CFS time-sharing), causing unbounded priority inversion.
Fix (v2):
ntsync_pi_recalc()scans bothany_waitersandall_waitersfor the highest-priority waiter. Compares against the owner’s saved original priority (via per-devicentsync_pi_ownertracking), notnormal_prio(which changes after boost). Per-taskboost_countensures original scheduling is saved once and restored only when ALL mutexes stop boosting that task. Conservative over-boosting between first and last removal is safe (never under-boosts).v2 fixes three bugs from v1: (a) multi-object PI corruption when a task held multiple boosted mutexes, (b) zero PI for
WaitForMultipleObjects(bWaitAll=TRUE), © stalenormal_priocomparison causing boost/unboost thrashing. Test results: philosophers RT max wait 1620→865us (-46.6%), ntsync d8 PI contention 479→239ms (-50.1%).
/* Per-task tracking — saves orig_attr once, restores when boost_count reaches 0 */
po = find_pi_owner(dev, owner);
base_prio = po ? po->orig_normal_prio : owner->normal_prio;
if (highest_prio < base_prio && !was_boosted) {
if (!po) {
po = kzalloc(sizeof(*po), GFP_ATOMIC);
po->orig_attr = capture_sched(owner);
po->orig_normal_prio = owner->normal_prio;
list_add(&po->node, &dev->boosted_owners);
}
po->boost_count++;
obj->u.mutex.pi_boosted = true;
}
if (needs_boost && highest_prio < owner->prio)
sched_setattr_nocheck(owner, &boost);
Tested with transitive PI chains up to depth 12. RT wait time does not increase with chain depth beyond the tail holder’s work time (~235ms for a 100M-iter CPU loop). The per-hop increment (~50ms) is visible in individual holder elapsed times but does not accumulate in the RT thread’s total wait. ntsync_pi_recalc() scales to at least depth 12 without degradation.
Problem: When a thread is blocked in
NTSYNC_IOC_WAIT_ANY/ALLand an io_uring CQE arrives (e.g. socket data ready), there’s no mechanism to wake the thread — the CQE sits in the ring until the ntsync wait times out or another object is signaled.Fix: Repurpose the
padfield inntsync_wait_argsasuring_fd. When set to a valid eventfd (registered withIORING_REGISTER_EVENTFD), the ntsync wait ioctl monitors it viapoll_initwait/vfs_pollalongside the ntsync objects. When the eventfd fires (io_uring posted a CQE), the ioctl returnsNTSYNC_INDEX_URING_READY(0xFFFFFFFE). The client-side retry loop insync.cdrains CQEs and re-enters the wait.
Problem:
ntsync_pi_recalc()calledkzalloc(GFP_ATOMIC)while holding araw_spinlock. On PREEMPT_RT, the slab allocator’s internalrt_spin_lockcan sleep, triggering__schedule_bug(BUG: scheduling while atomic).Fix: Pre-allocate the
ntsync_pi_ownerstruct before acquiring the raw spinlock. Pass the pre-allocated pointer intontsync_pi_recalc()and consume it only if a new owner entry is needed; otherwise free it after releasing the lock.
Per-thread io_uring rings in ntdll’s Unix layer bypass the wineserver for file and socket I/O. See io_uring-architecture.html for the full design document with diagrams.
| Operation | Before (server-mediated) | After (io_uring) |
|---|---|---|
| Sync file poll | poll() syscall in NtReadFile |
io_uring_prep_poll_add → 1 kernel transition |
| Async file read/write | 2 server round-trips + epoll monitoring | io_uring_prep_read/write → 0 server involvement |
| Socket recv/send (EAGAIN) | Server epoll monitors fd, alerts client | io_uring POLL_ADD → CQE → try_recv/try_send |
IORING_SETUP_SINGLE_ISSUER | COOP_TASKRUN) — no cross-thread submission, RT priority preservedset_async_direct_result so the server async stays frozen (no epoll re-queue)sock_get_poll_events returns -1 → no epoll monitoringuring_async_op structs, O(1) freelist alloc/free, zero malloc in submit path| Phase | Test | Result |
|---|---|---|
| Phase 1 | Sync poll replacement | All tests PASS |
| Phase 2 | Async file I/O bypass | All tests PASS |
| Phase 3 | Socket I/O (sync + overlapped) | 4000/4000, avg 95-113us |
| File | Purpose |
|---|---|
dlls/ntdll/unix/io_uring.c (~760 LOC) |
Ring management, pool allocator, Phase 1-3 submit/complete |
dlls/ntdll/unix/socket.c |
ALERTED interception, CQE handler, bitmap set/clear |
dlls/ntdll/unix/sync.c |
ntsync uring_fd retry loop |
server/sock.c |
E2 bitmap check in sock_get_poll_events |
Wine’s Linux audio stack has two layers: Windows-facing APIs such as WASAPI / WinMM / ASIO on top, and a Unix-side driver backend underneath. In upstream Wine, normal application audio typically flows through mmdevapi into winealsa.drv, then out through ALSA or PipeWire. In Wine-NSPA, winejack.drv replaces that backend with a direct JACK path for standard shared/exclusive audio, while nspaASIO is a separate bridge for hosts that specifically need ASIO semantics.
A unified Wine audio/MIDI driver that connects directly to JACK, replacing winealsa.drv + winepulse.drv for systems using JACK or PipeWire.
Implementation: dlls/winejack.drv/jack.c (~2700 lines, Unix side) + dlls/winejack.drv/jackmidi.c (~700 lines, MIDI).
A Wine PE DLL implementing the ASIO COM interface. Windows ASIO apps see a standard ASIO driver named “nspaASIO.” When JACK is active and float32 format is available, nspaASIO registers directly with winejack – bypassing WASAPI entirely. The play_thread uses futex synchronization with the JACK RT callback for same-period output.
Falls back to WASAPI exclusive mode when direct registration is unavailable (non-float32 format, JACK not running).
| Feature | winealsa.drv (vanilla) | winejack.drv (NSPA) |
|---|---|---|
| Backend | ALSA PCM | JACK (via PipeWire or native jackd) |
| RT callback | NtDelayExecution timer loop (drift-prone) |
JACK process callback (hardware-synced) |
| Exclusive mode | Accepts flag, no real exclusive access | Dedicated JACK ports, proper period contract |
| ASIO support | None (needs separate wineasio) | Built-in via nspaASIO Phase F |
| ASIO latency | N/A (wineasio: 1 period, separate driver) | 1 period (same driver, same callback) |
| MIDI | ALSA sequencer | JACK MIDI (sub-period timestamps) |
| Format conversion | Scalar loops | SSE2 mono/stereo fast paths |
| Locking | pthread_mutex in timer loop | PI mutex (app side), lock-free (RT side) |
| Fast path | None | Per-channel double buffers (exclusive float32) |
| WASAPI+ASIO coexistence | Separate drivers, no coordination | Same JACK callback, same period |
Low-latency audio requires precise, low-overhead timing. NSPA makes several changes to Wine’s timing subsystem to reduce jitter and overhead.
Wine’s QPC calls monotonic_counter() which uses clock_gettime(CLOCK_BOOTTIME) (falling back to CLOCK_MONOTONIC). This is a vDSO-accelerated call on modern kernels (~26ns). Without vDSO it becomes a real syscall (~328ns, 12.5x slower). The preloader’s vDSO preservation port (Jinoh Kang series) ensures the vDSO is never deleted – if ASLR places it in a reserved address range, the preloader relocates it via mremap instead of removing it from the auxiliary vector. The TSC frequency is calibrated at init via /sys/devices/system/cpu/cpu*/cpufreq/base_frequency for accurate jiffies-to-TSC conversion.
static inline ULONGLONG monotonic_counter(void)
{
struct timespec ts;
if (!clock_gettime( CLOCK_BOOTTIME, &ts ))
return ts.tv_sec * (ULONGLONG)TICKSPERSEC + ts.tv_nsec / 100;
...
}
When an application calls NtSetTimerResolution (the backend for timeBeginPeriod), NSPA translates the requested resolution to a Linux timer slack value via prctl(PR_SET_TIMERSLACK, slack_ns). The default Linux timer slack is 50us; setting it to 1ns gives sub-millisecond timer precision for Sleep(), poll(), nanosleep(), and futex waits.
if (set)
{
unsigned long slack_ns = (unsigned long)res * 100; /* 100ns units -> ns */
if (slack_ns < 1) slack_ns = 1;
prctl( PR_SET_TIMERSLACK, slack_ns );
}
else
prctl( PR_SET_TIMERSLACK, 0 ); /* reset to default */
Why this matters: Many audio apps call
timeBeginPeriod(1)expecting 1ms timer precision. Without timer slack adjustment, Linux’s default 50us slack on top of CFS scheduling jitter can turn aSleep(1)into a 2-3ms delay. WithPR_SET_TIMERSLACKset to match the requested resolution, timer expirations fire at their requested time.
Wine’s Sleep() implementation uses clock_nanosleep(CLOCK_MONOTONIC, ...) for high-precision delays, which benefits from the PREEMPT_RT kernel’s deterministic scheduling and the per-thread timer slack set above.
The kernel-mode ExSetTimerResolution API is forwarded from ntoskrnl.exe to the same NtSetTimerResolution path, ensuring that drivers setting timer resolution also benefit from the timer slack adjustment.
NSPA implements Windows VirtualAlloc(MEM_LARGE_PAGES) using Linux hugetlbfs, reducing TLB misses for large allocations common in audio sample buffers and plugin memory pools.
| Windows API | Linux Implementation | Page Size |
|---|---|---|
VirtualAlloc(MEM_LARGE_PAGES) |
mmap(MAP_HUGETLB | MAP_LOCKED, ...) |
2 MB (default hugepage) |
VirtualAlloc(MEM_LARGE_PAGES) with 1GB hint |
mmap(MAP_HUGETLB | MAP_HUGE_1GB | MAP_LOCKED, ...) |
1 GB |
QueryWorkingSetEx |
PAGEMAP_SCAN ioctl / /proc/pid/pagemap |
Reports LargePage flag |
Audio benefit: A typical VST plugin loads 50-500 MB of sample data. With 4KB pages, this requires 12,800-128,000 TLB entries. With 2MB pages, only 25-250 entries are needed – an order-of-magnitude reduction in TLB pressure. For systems with 1GB hugepages pre-allocated, a single TLB entry covers the entire sample bank.
MAP_LOCKED into the mmap flags.Implementation: dlls/ntdll/unix/virtual.c, anon_mmap_alloc() / NtAllocateVirtualMemory().
Wine’s msvcrt provides the C runtime for Win32 applications, including memcpy, memmove, memchr, strlen, and memcmp. Upstream Wine uses scalar C implementations or minimal hand-written assembly. NSPA replaces these with SIMD-optimized implementations:
| Function | Implementation | Width | Fallback |
|---|---|---|---|
memcpy |
AVX _mm256_loadu_si256 / _mm256_storeu_si256 |
256-bit | SSE2 128-bit |
memmove |
AVX with overlap detection + reverse copy | 256-bit | SSE2 128-bit |
memchr |
SSE2 _mm_cmpeq_epi8 + _mm_movemask_epi8 |
128-bit | scalar |
strlen |
SSE2 _mm_cmpeq_epi8 null scan |
128-bit | scalar |
memcmp |
SSE2 _mm_cmpeq_epi8 + early exit |
128-bit | scalar |
Runtime CPU dispatch: At DLL init, memcpy/memmove check CPUID for AVX support and select the AVX path when available. SSE2 is the minimum baseline (guaranteed on x86_64). The dispatch cost is a single branch on a cached function pointer – zero overhead after init.
Why it matters for RT: These functions are called thousands of times per audio buffer cycle – buffer copies in WASAPI/ASIO, socket I/O data movement, string parsing in API calls. The v5 test results show measurable impact: +4.7% CS throughput, +27% baseline socket-io throughput, -7% process startup time.
Implementation: dlls/msvcrt/string.c (all five functions), dlls/msvcrt/math.c (AVX detection), dlls/msvcrt/msvcrt.h (avx_supported declaration).
| Version | Scope | Key Files |
|---|---|---|
| v1 | RT priority mapping: two-tier promotion with NSPA_RT_PRIO as the client RT ceiling and linear scaling below it |
dlls/ntdll/unix/thread.c, server/thread.c |
| v1.1 | Wineserver self-promotion to SCHED_FIFO at NSPA_RT_PRIO-16 (below entire RT band) | server/thread.c |
| v1.2 | Cross-thread promotion via HANDLE-to-unix_tid map; NSPA_RT_POLICY=TS conservative mode | dlls/ntdll/unix/thread.c |
| v1.5 | Shmem wineserver IPC (Torge Matthies forward-port); reduces wineserver round-trip latency | dlls/ntdll/unix/server.c, server/thread.c |
| v2.0 | librtpi vendoring into libs/librtpi/; PI-aware mutexes/condvars for Unix-side Wine internals |
libs/librtpi/*, include/rtpi.h |
| v2.3 | CS-PI: FUTEX_LOCK_PI on CRITICAL_SECTION via repurposed LockSemaphore field | dlls/ntdll/sync.c, dlls/ntdll/unix/sync.c |
| kernel | NTSync PI: 5 patches (raw_spinlock, priority-ordered queues, PI boost v2, uring_fd, kmalloc fix) | drivers/misc/ntsync.c |
| kernel | Client-side NTSync: anonymous sync object creation without wineserver round-trip | dlls/ntdll/unix/sync.c |
| io_uring | Phase 1+2: sync poll replacement + async file I/O server bypass + TLS pool allocator | dlls/ntdll/unix/io_uring.c, dlls/ntdll/unix/file.c |
| io_uring | Phase 3: socket I/O bypass via E2 bitmap + ALERTED-state interception (sync+overlapped) | dlls/ntdll/unix/socket.c, server/sock.c |
| kernel | ntsync uring_fd extension: CQE wakeup in ntsync waits + PI kmalloc pre-alloc fix | drivers/misc/ntsync.c |
| msvcrt | AVX memcpy/memmove, SSE2 memchr/strlen/memcmp with runtime CPU dispatch | dlls/msvcrt/string.c, dlls/msvcrt/math.c |
| sync | SRW lock spin phase (256 iters, skipped for RT threads matching Windows behavior) | dlls/ntdll/sync.c |
| sync | pi_cond requeue-PI upgrade (FUTEX_WAIT_REQUEUE_PI / FUTEX_CMP_REQUEUE_PI) | libs/librtpi/rtpi.h |
| sync | Win32 condvar PI: FUTEX_WAIT_REQUEUE_PI for RtlSleepConditionVariableCS (condvar→mutex mapping table + 3 new syscalls) | dlls/ntdll/sync.c, dlls/ntdll/unix/sync.c, dlls/ntdll/ntsyscalls.h |
| Variable | Default | Description |
|---|---|---|
NSPA_RT_PRIO |
unset (dormant) | Master RT switch + ceiling FIFO priority |
NSPA_RT_POLICY |
FF | FF/RR/TS for lower RT band [16..30] |
NSPA_SRV_RT_PRIO |
NSPA_RT_PRIO-16 | Wineserver FIFO priority override |
NSPA_SRV_RT_POLICY |
FF | Wineserver scheduler policy |
| File | Role |
|---|---|
dlls/ntdll/unix/thread.c |
Tier 1 RT promotion, v1.2 cross-thread map, priority mapping |
server/thread.c |
Tier 2 RT promotion, wineserver self-promotion, env var parsing |
dlls/ntdll/sync.c |
CS-PI fast path (PE side): TID CAS, recursion handling |
dlls/ntdll/unix/sync.c |
CS-PI slow path, NTSync wait/create, client-side handles, QPC, timer slack |
dlls/ntdll/unix/virtual.c |
Large/huge pages, MAP_HUGETLB allocation |
dlls/ntdll/unix/server.c |
Shmem wineserver IPC (v1.5) |
dlls/winejack.drv/jack.c |
JACK audio driver (WASAPI backend) |
dlls/winejack.drv/jackmidi.c |
JACK MIDI driver |
libs/librtpi/ |
Vendored PI-aware mutex/condvar library |
dlls/ntdll/unix/io_uring.c |
io_uring per-thread ring, pool allocator, Phase 1-3 submit/complete |
dlls/msvcrt/string.c |
SSE2 memchr, strlen, memcmp |
dlls/msvcrt/mem.c |
AVX/SSE2 memcpy, memmove with runtime CPU dispatch |
libs/librtpi/pi_cond.c |
Requeue-PI condition variable for unix-side consumers (FUTEX_WAIT_REQUEUE_PI) |
dlls/ntdll/sync.c + unix/sync.c |
Win32 condvar PI: requeue-PI for RtlSleepConditionVariableCS (3 new syscalls + mapping table) |
dlls/combase/apartment.c |
CoWaitForMultipleHandles correctness rewrite |
drivers/misc/ntsync.c |
NTSync kernel driver (in kernel tree) |
Wine-NSPA Architecture Reference | Generated 2026-04-16 | Wine 11.6 + NSPA RT patchset