Wine-NSPA 11.6 | Kernel 6.19.11-rt1-1-nspa (PREEMPT_RT) | 2026-04-15 Author: Jordan Johnston
Upstream Wine uses a single-threaded wineserver that communicates with client processes over Unix domain sockets. Every SERVER_START_REQ / SERVER_END_REQ pair requires a full round-trip: client writes request to socket, wineserver’s epoll loop wakes, dispatches, writes reply, client reads reply.
Wine-NSPA v1.5 (Torge Matthies forward-port) adds per-thread shared memory between each client thread and the wineserver. Instead of socket I/O, requests and replies are written to a shared page, and futexes signal readiness. The wineserver spawns a per-client dispatcher pthread that watches each thread’s futex and dispatches requests under global_lock.
This eliminates the socket round-trip but introduces two new challenges:
- The wineserver is now multi-threaded (dispatchers + main epoll loop), requiring global_lock serialization
- RT client threads can be blocked waiting for a reply from a normal-priority dispatcher, creating priority inversion
| Aspect | Upstream Wine | Wine-NSPA Shmem |
|---|---|---|
| IPC mechanism | Unix socket write/read | Shared memory page + futex |
| Server threading | Single-threaded epoll loop | Multi-threaded: epoll + per-client dispatchers |
| Serialization | None (single thread) | global_lock (PI-aware pi_mutex_t) |
| Syscalls per request | 2 socket I/O + epoll wake | 1 futex wake + 2 sched_setscheduler |
| Priority inversion | Not applicable | Mitigated by PI boost (v2.5) |
| Context switches | Client -> wineserver -> client | Client -> dispatcher (same process) |
Each client thread that connects to the wineserver gets a dedicated dispatcher pthread on the server side. The dispatcher watches the thread’s shmem futex and processes requests under global_lock.
wine_server_call() with a requestfutex_wake()futex_wait(futex, 1) – sleeps until replyglobal_lock, dispatches the requestfutex_wake()When an RT client thread (SCHED_FIFO) sends a request, it must boost the dispatcher pthread so the dispatcher runs at sufficient priority to process the request promptly. Without boosting, CFS could delay the dispatcher behind dozens of other normal-priority threads.
Client (SCHED_FIFO:80):
1. Write request to shmem
2. CAS futex 0->1, futex_wake (wake dispatcher)
3. Read dispatcher TID from shmem (atomic load, cached by dispatcher)
4. sched_getscheduler(TID) + sched_getparam(TID) – save original
5. sched_setscheduler(TID, SCHED_FIFO, client_prio) – BOOST
6. futex_wait(futex, 1) – sleep
Dispatcher (now boosted):
7. Wakes at boosted priority
8. global_lock.lock() (PI mutex – if contended, holder also boosted)
9. Dispatch request, write reply
10. CAS futex 1->0, futex_wake (wake client)
11. global_lock.unlock()
Client (wakes):
12. Read reply
13. sched_setscheduler(TID, original_policy, original_prio) – UNBOOST
2 syscalls per RT request: sched_setscheduler (boost) + sched_setscheduler (unboost). Down from 4 in v2.4 (v2.5 caches the scheduler state, eliminating sched_getscheduler + sched_getparam).
Between steps 3 and 5, another client’s unboost could lower the dispatcher’s priority. The window is small (~100ns on modern hardware) and the consequence is a one-request delay (the next request re-boosts). Accepted as a practical trade-off vs kernel-managed PI (see appendix).
server/fd.c:global_lock serializes all wineserver state access between the main epoll loop and the per-client dispatcher pthreads. Converted from pthread_mutex_t to pi_mutex_t (FUTEX_LOCK_PI), providing kernel-managed priority inheritance.
When a boosted dispatcher (SCHED_FIFO:80) contends with a normal-priority thread holding global_lock, the kernel’s rt_mutex PI chain automatically boosts the holder. This is transitive: if the holder is itself blocked on another PI mutex, the boost propagates through the chain.
| Files Changed | What |
|---|---|
server/fd.c |
pthread_mutex_t global_lock -> pi_mutex_t global_lock |
server/file.h |
Declaration + #include <rtpi.h> |
server/thread.c |
All lock/unlock calls updated |
Status: Implemented and tested 2026-04-15. REJECTED – deadlocks on SMP.
Replace the manual sched_setscheduler PI boost with FUTEX_LOCK_PI on a shared pi_lock. The dispatcher would hold pi_lock while idle; the client’s futex_lock_pi would atomically boost the dispatcher through the kernel’s rt_mutex. Zero race window, zero sched_* syscalls.
The dispatcher must unlock pi_lock (to wake the client) then re-acquire it (for the next request). On SMP, if the dispatcher is faster than the client:
UNLOCK_PI – no waiters (client hasn’t blocked yet), futex cleared to 0LOCK_PI – re-acquires immediately (futex was 0)WAIT(notify) – sleeps, holding pi_lockLOCK_PI – blocks (dispatcher holds it)Root cause: FUTEX_LOCK_PI can’t serve as both reply notification and PI mechanism. The unlock/re-acquire has a window where ownership transfer to the client isn’t guaranteed.
The v2.5 manual boost (2 syscalls per RT request) remains correct. A kernel-managed solution would require a combined notify+PI atomic operation that doesn’t exist in the Linux futex API.