Wine-NSPA – Shmem IPC Architecture (LEGACY – superseded by gamma channel dispatcher)

Historical reference for the older per-thread shmem dispatcher, superseded by the current Gamma Channel Dispatcher.

Overview
Upstream vs NSPA Comparison
Dispatcher Architecture
PI Boost Protocol (v2.5)
Global Lock PI
Appendix: Rejected FUTEX_LOCK_PI Redesign

1. Overview

Upstream Wine uses a single-threaded wineserver that communicates with client processes over Unix domain sockets. Every SERVER_START_REQ / SERVER_END_REQ pair requires a full round-trip: client writes request to socket, wineserver’s epoll loop wakes, dispatches, writes reply, client reads reply.

Wine-NSPA v1.5 (Torge Matthies forward-port) adds per-thread shared memory between each client thread and the wineserver. Instead of socket I/O, requests and replies are written to a shared page, and futexes signal readiness. The wineserver spawns a per-client dispatcher pthread that watches each thread’s futex and dispatches requests under global_lock.

This eliminates the socket round-trip but introduces two new challenges: - The wineserver is now multi-threaded (dispatchers + main epoll loop), requiring global_lock serialization - RT client threads can be blocked waiting for a reply from a normal-priority dispatcher, creating priority inversion

2. Upstream vs NSPA Comparison

Aspect	Upstream Wine	Wine-NSPA Shmem
IPC mechanism	Unix socket write/read	Shared memory page + futex
Server threading	Single-threaded epoll loop	Multi-threaded: epoll + per-client dispatchers
Serialization	None (single thread)	`global_lock` (PI-aware `pi_mutex_t`)
Syscalls per request	2 socket I/O + epoll wake	1 futex wake + 2 sched_setscheduler
Priority inversion	Not applicable	Mitigated by PI boost (v2.5)
Context switches	Client -> wineserver -> client	Client -> dispatcher (same process)

3. Dispatcher Architecture

Each client thread that connects to the wineserver gets a dedicated dispatcher pthread on the server side. The dispatcher watches the thread’s shmem futex and processes requests under global_lock.

Dispatcher Lifecycle

Client thread calls wine_server_call() with a request
Request data written to the thread’s shared memory page
Client CAS’s the shmem futex from 0 -> 1, then futex_wake()
Client PI-boosts the dispatcher (v2.5 protocol)
Client futex_wait(futex, 1) – sleeps until reply
Dispatcher wakes, acquires global_lock, dispatches the request
Dispatcher writes reply to shmem, CAS futex 1 -> 0, futex_wake()
Client wakes, reads reply, PI-unboosts the dispatcher

4. PI Boost Protocol (v2.5)

When an RT client thread (SCHED_FIFO) sends a request, it must boost the dispatcher pthread so the dispatcher runs at sufficient priority to process the request promptly. Without boosting, CFS could delay the dispatcher behind dozens of other normal-priority threads.

Protocol

Client (SCHED_FIFO:80):
  1. Write request to shmem
  2. CAS futex 0->1, futex_wake (wake dispatcher)
  3. Read dispatcher TID from shmem (atomic load, cached by dispatcher)
  4. sched_getscheduler(TID) + sched_getparam(TID)  -- save original
  5. sched_setscheduler(TID, SCHED_FIFO, client_prio) -- BOOST
  6. futex_wait(futex, 1) -- sleep
Dispatcher (now boosted):
  7. Wakes at boosted priority
  8. global_lock.lock() (PI mutex -- if contended, holder also boosted)
  9. Dispatch request, write reply
  10. CAS futex 1->0, futex_wake (wake client)
  11. global_lock.unlock()
Client (wakes):
  12. Read reply
  13. sched_setscheduler(TID, original_policy, original_prio) -- UNBOOST

Syscall Cost: v2.4 vs v2.5

2 syscalls per RT request: sched_setscheduler (boost) + sched_setscheduler (unboost). Down from 4 in v2.4 (v2.5 caches the scheduler state, eliminating sched_getscheduler + sched_getparam).

Race Window

Between steps 3 and 5, another client’s unboost could lower the dispatcher’s priority. The window is small (~100ns on modern hardware) and the consequence is a one-request delay (the next request re-boosts). Accepted as a practical trade-off vs kernel-managed PI (see appendix).

5. Global Lock PI

server/fd.c:global_lock serializes all wineserver state access between the main epoll loop and the per-client dispatcher pthreads. Converted from pthread_mutex_t to pi_mutex_t (FUTEX_LOCK_PI), providing kernel-managed priority inheritance.

When a boosted dispatcher (SCHED_FIFO:80) contends with a normal-priority thread holding global_lock, the kernel’s rt_mutex PI chain automatically boosts the holder. This is transitive: if the holder is itself blocked on another PI mutex, the boost propagates through the chain.

Files Changed	What
`server/fd.c`	`pthread_mutex_t global_lock` -> `pi_mutex_t global_lock`
`server/file.h`	Declaration + `#include <rtpi.h>`
`server/thread.c`	All lock/unlock calls updated

6. Appendix: Rejected FUTEX_LOCK_PI Redesign

Implemented and tested on 2026-04-15, then rejected after SMP deadlocks.

Concept

Replace the manual sched_setscheduler PI boost with FUTEX_LOCK_PI on a shared pi_lock. The dispatcher would hold pi_lock while idle; the client’s futex_lock_pi would atomically boost the dispatcher through the kernel’s rt_mutex. Zero race window, zero sched_* syscalls.

Why It Failed

The dispatcher must unlock pi_lock (to wake the client) then re-acquire it (for the next request). On SMP, if the dispatcher is faster than the client:

Dispatcher UNLOCK_PI – no waiters (client hasn’t blocked yet), futex cleared to 0
Dispatcher LOCK_PI – re-acquires immediately (futex was 0)
Dispatcher WAIT(notify) – sleeps, holding pi_lock
Client LOCK_PI – blocks (dispatcher holds it)
Deadlock: client waits for pi_lock, dispatcher waits for notify

Root cause: FUTEX_LOCK_PI can’t serve as both reply notification and PI mechanism. The unlock/re-acquire has a window where ownership transfer to the client isn’t guaranteed.

Conclusion

The v2.5 manual boost (2 syscalls per RT request) remains correct. A kernel-managed solution would require a combined notify+PI atomic operation that doesn’t exist in the Linux futex API.