Wine 11.6 + NSPA RT patchset | Kernel patch 1010 + Gamma dispatcher Phase 2/3 | 2026-04-30 Author: Jordan Johnston
This page documents the landed aggregate-wait slice in Wine-NSPA:
kernel patch 1010 (NTSYNC_IOC_AGGREGATE_WAIT) plus the first
userspace consumer shape that uses it, namely dispatcher Phase 2
(per-process dispatcher-owned io_uring) and Phase 3 (gamma waits
on channel + uring eventfd + shutdown eventfd and drains CQEs inline on
the same RT thread).
Status: shipped and validated. NSPA_AGG_WAIT is default-on as of 2026-04-29. The 2026-04-30 follow-ons (NSPA_ENABLE_ASYNC_CREATE_FILE=1 and, on 1011 kernels, NSPA_TRY_RECV2=1) now ship on top of this same foundation.
Aggregate-wait is the kernel-side wait primitive that lets the gamma dispatcher block on request traffic, deferred-completion wakeups, and teardown wakeups in one place while keeping receive, CQE drain, and reply signaling on the same RT thread.
That is the architectural role of patch 1010 plus the dispatcher Phase
2/3 userspace work. Gamma already gave Wine-NSPA the correct
request-side priority inheritance story: client threads do
CHANNEL_SEND_PI, the kernel enqueues by priority, and the wineserver
dispatcher runs the handler at the right effective priority. What gamma
lacked was the matching async completion-side wait primitive.
The first async-completion prototype used the wineserver main thread as the CQE drain site. That proved the basic mechanism but broke the more important invariant: the thread that received the request was no longer the thread that completed and replied to it.
Patch 1010 and the accompanying dispatcher restructure fix that. The dispatcher now owns all three parts of the async path:
io_uringCHANNEL_REPLYThe same RT thread handles the full lifecycle.
| Layer | Landed change | Why it matters |
|---|---|---|
| Kernel | NTSYNC_IOC_AGGREGATE_WAIT |
One wait covers NTSync objects plus pollable fds |
| Kernel | Channel notify-only support inside aggregate-wait | lets the dispatcher block on the channel without consuming the entry in the aggregate ioctl itself |
| Kernel | follow-up PI fixes (072bfee) |
stable boost propagation for aggregate-waiting dispatchers |
| Userspace | struct nspa_uring_instance per process |
dispatcher-local ring + eventfd + fixed pending pool |
| Userspace | struct nspa_dispatcher_ctx |
single owner for channel fd, shutdown eventfd, and ring lifetime |
| Userspace | aggregate-wait dispatcher loop | same-thread request receive, CQE drain, and reply |
This page stays focused on the 1010 / Phase 2 / Phase 3 slice itself: the kernel wait primitive, the dispatcher-owned ring, and the same-thread completion/reply invariant that those pieces established.
The later follow-ons are intentionally not expanded here. Phase 4 async
create_file is a later consumer of the same dispatcher-owned ring, and
1011 TRY_RECV2 is a later queue-drain optimization on top of the
already-landed dispatcher shape. Those are part of the current shipped
system, but they belong in the pages that track the dispatcher hot path
and current production state:
gamma-channel-dispatcher,
io_uring-architecture, and
current-state.
The rejected shape was:
The problem was not that the code path was impossible. The problem was that it was the wrong ownership model for an RT request path:
That shape showed up exactly where expected: real workloads tolerated it structurally, but timing-sensitive application behavior did not.
Patch 1010 adds NTSYNC_IOC_AGGREGATE_WAIT: a heterogeneous wait that combines
NTSync object sources, pollable fd sources, and an optional absolute deadline.
The dispatcher is the first consumer, but the primitive is intentionally general.
struct ntsync_aggregate_source {
__u32 type; /* NTSYNC_AGG_OBJECT | NTSYNC_AGG_FD */
__u32 events; /* FD source: POLLIN / POLLOUT / POLLERR / POLLHUP */
__u64 handle_or_fd; /* ntsync object handle, or unix fd */
};
struct ntsync_aggregate_wait_args {
__u32 nb_sources;
__u32 reserved;
__u64 sources; /* user pointer to struct ntsync_aggregate_source[] */
struct __kernel_timespec deadline; /* CLOCK_MONOTONIC ABSTIME or {0,0} */
__u32 fired_index;
__u32 fired_events;
__u32 flags;
__u32 owner;
};
#define NTSYNC_AGG_OBJECT 0x1
#define NTSYNC_AGG_FD 0x2
#define NTSYNC_AGG_MAX 64
#define NTSYNC_AGG_FLAG_REALTIME 0x1
#define NTSYNC_AGG_TIMEOUT 0xFFFFFFFFu
#define NTSYNC_IOC_AGGREGATE_WAIT _IOWR('N', 0x95, struct ntsync_aggregate_wait_args)
CHANNEL_RECV2 to consume
the actual entry.-ENOTTY on the first
aggregate-wait attempt and permanently falls back to the legacy direct
CHANNEL_RECV2 loop for that dispatcher.That last point is operationally important: public docs can describe the new default without pretending the code lost its rollback path.
io_uringPhase 2 did not make handlers async by itself. It put the ring and its state in the correct ownership domain first.
The old global-ring direction was abandoned. The landed design keeps one
nspa_uring_instance per gamma channel / per Wine process, stored alongside the
dispatcher context.
struct nspa_dispatcher_ctx {
int channel_fd;
int shutdown_efd;
struct nspa_uring_instance uring;
};
Key properties:
shutdown_efd gives the aggregate-wait path an explicit teardown wakeupThe dispatcher now waits on three sources:
The loop is now:
{channel, uring eventfd if active, shutdown eventfd}NTSYNC_IOC_AGGREGATE_WAITCHANNEL_RECV2nspa_uring_drain()CHANNEL_REPLYshutdown_efd:
Userspace still handles two older-kernel shapes:
-ENOTTY, dispatcher permanently falls back to direct CHANNEL_RECV2CHANNEL_RECV2 returns -ENOTTY, dispatcher falls back to legacy CHANNEL_RECVThat logic is runtime feature detection, not a release ladder:
| Item | Value |
|---|---|
| Kernel module srcversion | 10124FB81FDC76797EF1F91 |
| Wine userspace state | Phase 2 + Phase 3 landed; Phase 4 create_file now uses the same ring |
| Default gate | NSPA_AGG_WAIT=1 |
| Opt-out | NSPA_AGG_WAIT=0 |
| Follow-on gates on top of this base | NSPA_ENABLE_ASYNC_CREATE_FILE=1; NSPA_TRY_RECV2=1 on 1011 kernels |
| Test | Result |
|---|---|
test-aggregate-wait |
9/9 PASS |
| channel-PI propagation sub-test | PASS |
| 1k mixed-concurrency stress | PASS |
| 30k stress + full native ntsync suite | PASS, dmesg clean |
| PE matrix | 24 PASS / 0 FAIL / 0 TIMEOUT, including dispatcher-burst |
Ableton level 2/3 with NSPA_AGG_WAIT=1 |
PASS |
| Phase 3 default-on under Ableton | PASS |
The follow-up kernel fixes in 072bfee matter here. The first 1010 cut exposed exactly
the kind of PI edge that the dispatcher cannot tolerate: an aggregate-waiting dispatcher
must still be visible to SEND_PI wake/boost logic and must not be woken before the new
boost state is established. The production module includes those corrections.
The public decomposition plan still has queued work in front of it, but the aggregate-wait story is no longer purely hypothetical.
Already shipped:
Still queued:
So the right interpretation is:
That is a better architectural state than the earlier plan assumed. Future work no longer needs to prove the syscall shape from scratch; it can build on a production consumer.
wine/server/nspa/shmem_channel.c — dispatcher context, aggregate-wait loop, shutdown pathwine/server/nspa/uring.h — per-process nspa_uring_instance public surfacentsync-patches/1010-ntsync-aggregate-wait.patch — aggregate-wait kernel patch1879e2c — ntsync 1010 first cut072bfee — SEND_PI any_waiters fallback + wake-after-boost reorder8cc157c — userspace Phase 2 per-process uring infrastructuref21c6e1 — userspace Phase 3 aggregate-wait dispatcherb36e36d — Phase 3 default-onwine/nspa/docs/session-handoff-20260429-phase-4.md