This page documents the Wine-NSPA ntsync kernel overlay that backs PI waits, gamma channels, and aggregate-wait. The companion Wine-side half lives on NTSync Userspace Sync.
NTSync is a Linux kernel driver (drivers/misc/ntsync.c, /dev/ntsync) that implements Windows NT synchronization primitives – mutexes, semaphores, and events – directly in the kernel. Upstream Wine uses it to replace the wineserver-mediated sync path for these objects, eliminating cross-process round-trips for wait/wake operations.
For Wine-NSPA, upstream ntsync is necessary but insufficient. The upstream driver uses FIFO waiter queues, has no priority inheritance, and uses spinlock_t for the per-object lock – which becomes a sleeping rt_mutex on PREEMPT_RT. None of those characteristics is acceptable for an RT audio workload where the audio callback must wait deterministically on Wine’s primitives without inheriting unbounded inversion latency.
Wine-NSPA carries a kernel overlay that extends upstream
ntsync.c in three broad layers:
The current overlay on kernel 6.19.11-rt1-1-nspa includes the
dedicated wait-queue cache plus SLAB_NO_MERGE across all four ntsync
caches (see Section 14). The feature-by-feature
detail below keeps the patch numbers for traceability, but the public
reading order is by capability rather than by patch label.
This doc is the design and implementation reference for that kernel half:
what each carried feature adds, what bug it closes, how it preserves NT
semantics, and how it interacts with obj_lock and PREEMPT_RT.
Wine-NSPA does not fork ntsync. The patches are diffs against upstream
drivers/misc/ntsync.c and apply cleanly in series
1003 -> 1004 -> 1005 -> 1006 -> 1007 -> 1008 -> 1009 -> 1010 -> 1011 ->
1012 -> 1013 -> 1014 -> 1015. They live in wine-rt-claude/ntsync-patches/
as standalone unified diffs. The kernel build (linux-nspa) applies the
stack at PKGBUILD time; the resulting .ko ships as part of the
kernel package.
The patch numbering (1003- through 1015-) is local to NSPA. It bears no relationship to upstream NTSync revisions or any LKML series.
| # | Patch | Purpose | LOC |
|---|---|---|---|
| 1003 | PI primitives | raw_spinlock obj_lock, priority-ordered waiter queues, mutex owner PI boost, per-task tracking | ~600 |
| 1004 | Channel object | New NTSYNC_TYPE_CHANNEL with CREATE, SEND_PI, RECV, REPLY ioctls |
~530 |
| 1005 | Thread-token | Per-channel (tid -> token) registry + RECV2 ioctl, eliminates dispatcher userspace lookup |
~340 |
| 1006 | RT alloc-hoist | Hoists 6 sites of kmalloc/kfree out of raw_spinlock_t (RT-illegal); pi_work pool |
~750 |
| 1007 | Channel exclusive recv | wake_up_all priority-inversion fix: 3-LOC wait_event_interruptible_exclusive swap |
~3 |
| 1008 | EVENT_SET_PI deferred boost | Closes fast-path race where consumer takes obj_lock first, sees signaled, returns unboosted | ~80 |
| 1009 | channel_entry refcount UAF | KASAN-caught REPLY-vs-SEND_PI cleanup race; refcount_t on ntsync_channel_entry |
~15 |
| 1010 | Aggregate-wait | NTSYNC_IOC_AGGREGATE_WAIT: heterogeneous object+fd wait, channel notify-only support |
~400 |
| 1011 | Channel TRY_RECV2 | NTSYNC_IOC_CHANNEL_TRY_RECV2: non-blocking RECV2 for post-dispatch burst drain |
~30 |
| 1012 | Channel recv field-snapshot UAF fix | Snapshot popped-entry fields under obj_lock before unlock, closes RECV/RECV2 vs sender-cleanup slab UAF |
~15 |
| 1013 | Dedicated kmem_caches | ntsync_event_pi / ntsync_channel_entry / ntsync_pi_owner -> own kmem_caches with SLAB_HWCACHE_ALIGN |
~120 |
| 1014 | SEND_PI lockless target scan | list_empty_careful fast-path skips wq->lock round-trip on empty waiter queues |
~10 |
| 1014a | kmem_cache_free NULL guard | Site-2089 pending_pi.new_ep free is NULL-guarded; closes cache_from_obj deref under SLAB_FREELIST_HARDENED |
~3 |
| 1015 | Wait-queue dedicated cache | struct ntsync_q -> own kmem_cache (≤16 entries + kmalloc fallback); SLAB_NO_MERGE retro-correction across all 4 ntsync caches |
~120 |
Patches 1003-1006, 1010, 1011, 1013, and 1015 are feature/infrastructure work; 1007-1009, 1012, 1014, and 1014a are minimal surgical fixes for specific KASAN- or trace-confirmed bugs (1014 is also a measurable IRQ-off window reduction on the audio hot path). The distinction matters: Section 16 discusses why.
Wine-NSPA’s ntsync exposes four object types via /dev/ntsync (one character device opened once per Wine process; object creation returns FDs).
| Type | Win32 primitive | Created via | Wait via | Wake / signal via |
|---|---|---|---|---|
| Mutex | CreateMutex, WaitForSingleObject |
NTSYNC_IOC_CREATE_MUTEX |
NTSYNC_IOC_WAIT_ANY / WAIT_ALL |
NTSYNC_IOC_MUTEX_UNLOCK |
| Semaphore | CreateSemaphore, ReleaseSemaphore |
NTSYNC_IOC_CREATE_SEM |
NTSYNC_IOC_WAIT_ANY / WAIT_ALL |
NTSYNC_IOC_SEM_RELEASE |
| Event | CreateEvent, SetEvent, ResetEvent |
NTSYNC_IOC_CREATE_EVENT |
NTSYNC_IOC_WAIT_ANY / WAIT_ALL |
NTSYNC_IOC_EVENT_SET / _RESET / _PULSE / _SET_PI |
| Channel | (no Win32 equivalent – NSPA-private IPC) | NTSYNC_IOC_CREATE_CHANNEL |
NTSYNC_IOC_CHANNEL_RECV / _RECV2 / _TRY_RECV2 |
NTSYNC_IOC_CHANNEL_SEND_PI / _REPLY |
Mutex / semaphore / event are upstream concepts; their semantics map 1:1 to Win32. The mutex tracks an owner TID for WAIT_ABANDONED semantics and abandoned-recovery; the semaphore is a counted resource pool; the event has both manual-reset and auto-reset variants plus the NSPA-private EVENT_SET_PI for cross-thread priority intent.
The channel is wholly NSPA-private. It does not map to any Win32
primitive. It is a transport for Wine-NSPA’s wineserver request-reply
fast path – a kernel-mediated alternative to the legacy
futex+manual-sched_setscheduler shm IPC. Channels do not participate
in generic WAIT_ANY / WAIT_ALL; they are accessed through their own
ioctls, and patch 1010 adds a separate aggregate-wait registration
path that can observe channel readiness without consuming the entry.
On 1011 kernels the current consumer shape is aggregate-wait, then
CHANNEL_RECV2, then TRY_RECV2 until the ready queue is empty.
The driver’s central is_signaled() predicate (called from try_wake_any / try_wake_all) returns differently per type:
| Type | Signaled when |
|---|---|
| Mutex | count == 0 (unowned) or owner matches current TID |
| Semaphore | count > 0 |
| Event | signaled == true |
| Channel | always false (channels never wake WAIT_ANY/ALL) |
The channel case in is_signaled() is a deliberate hard-false: any
caller that arrives via WAIT_ANY/ALL with a channel FD is misusing
the API and the wait will time out. That remains true after 1010. The
aggregate-wait path is different: it registers the channel as a
notify-only source and returns “channel fired” to userspace, after
which userspace follows with CHANNEL_RECV2 to consume the actual
entry.
The 1003 patch (originally three logical patches 1001/1002/1003, collapsed in this section for clarity) established the RT baseline that all subsequent patches build on.
The driver has three locks. NSPA classifies them explicitly for PREEMPT_RT:
raw_spinlock_t obj->lock per-object, protects state + waiter lists
rt_mutex dev->wait_all_lock device-wide, serializes wait-all setup
raw_spinlock_t dev->boost_lock device-wide, protects boosted_owners list
raw_spinlock_t keeps true spin semantics on PREEMPT_RT (does not become an rt_mutex). obj->lock is held only across short pointer-only state updates: rb-tree manipulation, list manipulation, signaled-flag flip, owner-TID write. dev->boost_lock is held only across boosted_owners list updates plus a single sched_setattr_nocheck() call. Both critical sections are short, bounded, and never sleep – the PREEMPT_RT contract.
dev->wait_all_lock is rt_mutex, not raw_spinlock_t, because wait-all setup is long: it walks all named objects to be waited on, may copy_from_user the FD array, and may need to take per-object locks. A raw spinlock is the wrong primitive for that. The rt_mutex carries PI – a high-priority thread blocked on wait_all_lock boosts whoever holds it.
The obj_lock() fast path acquires only obj->lock. When obj->dev_locked is set (another thread is doing a wait-all on this object), obj_lock() falls back to acquiring wait_all_lock first. This avoids ABBA deadlocks between per-object and device-wide locks.
Upstream ntsync uses list_add_tail() to append waiters: FIFO order. NSPA replaces this with ntsync_insert_waiter(), which performs a sorted insertion based on the kernel-internal task->prio (lower numeric value = higher scheduling priority).
static void ntsync_insert_waiter(struct ntsync_q_entry *new_entry,
struct list_head *head)
{
struct ntsync_q_entry *entry;
list_for_each_entry(entry, head, node) {
if (new_entry->q->task->prio < entry->q->task->prio) {
list_add_tail(&new_entry->node, &entry->node);
return;
}
}
list_add_tail(&new_entry->node, head);
}
Same-priority waiters maintain FIFO order within their priority level. try_wake_any_*() walks from the head, so the highest-priority satisfiable waiter wakes first. This restores NT semantics (highest-priority waiter wins) and is strictly stronger than upstream’s FIFO.
When an RT thread (e.g. SCHED_FIFO prio 80) waits on a mutex held by a SCHED_OTHER thread (prio 120 in kernel terms), the holder is preempted by every running RT thread and time-sliced by CFS against every other normal thread. The RT waiter’s bounded-latency guarantee is violated.
ntsync_pi_recalc(obj, pi_work) (line 424 of the production source) handles this. Whenever a mutex’s wait list changes (insert, wake, unlock) it scans both any_waiters and all_waiters for the highest-priority waiter, then boosts the owner’s scheduling attributes via sched_setattr_nocheck() to match. Per-task tracking (struct ntsync_pi_owner, anchored in dev->boosted_owners) saves the original attributes once and counts how many of the task’s owned mutexes are contributing boosts. Restore happens only when the count drops to zero.
The PI boost design has three v2 lessons baked in:
| Bug | v1 behaviour | v2 fix |
|---|---|---|
| Multi-object PI corruption | Single global orig_attr overwritten when 2nd mutex boosted |
Per-task ntsync_pi_owner with boost_count |
| Zero PI for WaitAll | all_waiters not scanned |
Scan both any_waiters and all_waiters |
Stale normal_prio thrash |
owner->normal_prio mutates after boost -> oscillation |
Compare against saved orig_normal_prio from tracker |
The ntsync_pi_owner struct is the unit of bookkeeping. The pool/cleanup pattern that 1006 introduces (Section 6) is the unit of RT-safe allocation for that struct.
EVENT_SET_PI was originally introduced in 1003 as the cross-thread priority-intent primitive: an RT thread sets an event, and along with the signal it carries a (policy, prio) boost that the kernel applies to the event’s first waiter. Wine-NSPA uses this for the audio-thread -> dispatcher SendMessage bypass: the audio callback sets a queue event with its own RT priority, and the dispatcher pthread is woken at that priority.
The original design walked event->any_waiters under obj_lock at EVENT_SET_PI time and applied the boost to the head waiter. This had a fast-path race that 1008 closes – see Section 8.
ntsync_pi_owner is allocated lazily on first boost and freed only when the last contributing object releases. Between the first removal and the last, the owner is conservatively over-boosted: it runs at too-high priority briefly, never too-low. That is the safe direction; under-boost would leak inversion. The lazy lifetime also means owner_task is resolved lazily on the first unlock (where current is the actual Win32-owning thread), since at create time current is the wineserver, not the eventual owner.
1004-ntsync-channel.patch adds a new object type, NTSYNC_TYPE_CHANNEL. A channel is a bounded, kernel-side priority-ordered request/reply mailbox. It exists to replace Wine-NSPA’s user-space futex + manual sched_setscheduler shm-IPC fast path between client processes and the wineserver.
Wine’s wineserver protocol is fundamentally a request/reply RPC. Each client thread sends a request, blocks for the reply, and resumes. The legacy fast path used a process-shared futex on a request slot plus a sched_setscheduler call from the sending audio thread to lift the dispatcher pthread’s priority. That worked but had three problems:
sched_setscheduler on it explicitly. Token-stale racy on thread death.A kernel-mediated channel solves all three. The kernel:
EVENT_SET_PI’s drain-on-wait pattern).The channel is purely a transport, not a protocol. The wineserver still drives the request/reply contract; the kernel multiplexes and priority-orders, and never reorders within a single sender (each sender blocks for reply, so per-thread ordering is preserved).
Four ioctls, all on a channel FD obtained via NTSYNC_IOC_CREATE_CHANNEL:
| ioctl | Caller | Effect |
|---|---|---|
NTSYNC_IOC_CREATE_CHANNEL |
wineserver | Create channel with max_depth. Returns FD. |
NTSYNC_IOC_CHANNEL_SEND_PI |
client thread | Enqueue (prio, payload_off, reply_off); boost recv'er; sleep for reply. |
NTSYNC_IOC_CHANNEL_RECV |
dispatcher pthread | Pop highest-prio entry; auto-boost current to that priority. |
NTSYNC_IOC_CHANNEL_REPLY |
dispatcher pthread | Wake the sender of entry_id; drain receiver boost. |
The payload_off and reply_off fields are opaque to the kernel; conventionally they are indices into a per-process shared-memory region the client and wineserver both map. The kernel transports the cookies; user space interprets them.
That is the 1004 base interface. The current production surface layers
1005’s CHANNEL_RECV2 on top for thread-token return, then 1011’s
CHANNEL_TRY_RECV2 for non-blocking post-dispatch drain.
The channel object’s per-instance state lives in obj->u.channel:
struct {
struct rb_root pending; /* PENDING entries (prio DESC, seq ASC) */
struct list_head dispatched; /* DISPATCHED entries (REPLY can find by id) */
atomic64_t next_id;
atomic64_t next_seq;
__u32 depth; /* current PENDING count */
__u32 max_depth;
wait_queue_head_t recv_wq; /* blocked receivers */
struct hlist_head thread_regs[64]; /* added by 1005 */
} channel;
Each entry is a struct ntsync_channel_entry:
struct ntsync_channel_entry {
struct rb_node rb; /* in pending rb-tree */
struct list_head list; /* in dispatched list */
__u64 id, seq;
__u32 prio, policy;
__u64 payload_off, reply_off;
__u32 sender_tid;
enum ntsync_channel_state state; /* PENDING | DISPATCHED */
bool replied;
wait_queue_head_t wq; /* sender sleeps on this */
__u64 thread_token; /* added by 1005 */
refcount_t refcnt; /* added by 1009 */
};
The rb-tree key is (prio DESC, seq ASC): higher priority sorts first; ties break by enqueue order. channel_pending_insert() returns true iff the entry became the new tree minimum – i.e. it would be popped next. That return value drives the speculative-boost decision in SEND_PI.
(policy, prio). Pre-allocate e and new_ep (the boost tracking entry) with GFP_KERNEL outside any lock – slab on RT cannot be called under raw_spinlock_t.obj_lock(ch). Reject with -EAGAIN if depth >= max_depth. Insert into pending rb-tree; bump depth. Note whether this entry is the new minimum.obj_unlock(ch).prio is set, peek the recv_wq head. Take a get_task_struct reference under wq->lock, then call apply_event_pi_boost() to boost that receiver to (policy, prio).wake_up(&ch->recv_wq) – wakes exactly the head receiver (1007 made this exclusive).e->wq until e->replied is true or signal pending.obj_lock(ch), detach e from whichever list/tree it’s on, obj_unlock(ch). Drop refcount_dec_and_test(&e->refcnt); kfree if last ref (1009).The cleanup path covers the case where the sender was interrupted (signal). The entry might still be PENDING (rb-tree) or DISPATCHED (list); we use e->state to dispatch correctly. depth is decremented only in the PENDING branch – DISPATCHED entries no longer count against max_depth.
drain_event_pi_boosts(dev, current) – release any boost left over from a prior RECV cycle.new_ep outside lock.obj_lock(ch). While pending is empty: obj_unlock, wait_event_interruptible_exclusive(recv_wq, !empty) (1007 made this exclusive), obj_lock again.dispatched list; decrement depth.RECV2:) e->thread_token = channel_lookup_token(ch, e->sender_tid). See Section 5.obj_unlock(ch).e->prio, auto-boost current to (e->policy, e->prio) for the handler duration via apply_event_pi_boost(dev, current, ...). Boost releases at next RECV’s drain, or at REPLY’s drain.(entry_id, payload_off, reply_off, sender_tid, prio[, thread_token]) to user space.In the post-1011 dispatcher path, userspace follows the first
successful RECV2 with TRY_RECV2 after each reply until the channel
returns empty.
obj_lock(ch). Walk dispatched list for entry_id. If not found or already replied: -ENOENT.e->replied = true.refcount_inc(&e->refcnt) (1009 – keep the entry alive across wake_up_all).obj_unlock(ch).wake_up_all(&e->wq) – wakes the blocked sender. Outside obj_lock because wq’s internal lock is spinlock_t (becomes rt_mutex on PREEMPT_RT) and cannot nest under our raw_spinlock_t.drain_event_pi_boosts(dev, current) – handler is done, drop the receiver’s auto-boost.refcount_dec_and_test(&e->refcnt); kfree if last ref (1009).Kernel ioctl syscall entry/exit is a full memory barrier. So payload visibility from sender -> receiver and reply visibility from receiver -> sender is naturally serialised: the sender’s copy_from_user of the payload completed before SEND_PI returns from the syscall handler; the receiver’s copy_to_user happens-before RECV returns; the receiver’s writes to the reply region happen-before REPLY returns; the sender’s copy_from_user of the reply happens-after SEND_PI’s wake.
The kernel does not promise ordering across senders – it priority-orders, but a SCHED_OTHER sender behind a SCHED_FIFO sender will wait. Cross-thread ordering was never guaranteed under the prior per-thread dispatcher pthread shape, so this is strictly stronger semantically (no thread can starve while a higher-prio thread is waiting). Within a single sender, ordering is preserved: each SEND_PI blocks for reply, so back-to-back sends from the same TID are serialised.
obj_lock sections in SEND_PI / RECV / REPLY are bounded by tree height. With max_depth = 1024, that is 10 rb-tree comparisons. Zero allocation under lock. No memory copies under lock (the copy_to_user happens after obj_unlock).
A channel can only be freed when both pending and dispatched are empty; otherwise senders or dispatchers still hold the file open via the syscall ref. ntsync_free_obj() WARN_ONs either non-empty list at free time – a useful canary if user space ever leaks a channel FD with active entries.
Once the channel was in production, perf 2026-04-26 showed ~10% of dispatcher CPU sitting in a userspace get_thread_from_id() lookup inside the gamma dispatcher’s hot loop. Every received request needed to map sender_tid -> struct thread * to dispatch. This patch eliminates that lookup by stamping a wineserver-supplied opaque token onto each entry at RECV time.
The wineserver registers (tid, token) per channel via a new ioctl. The kernel stores the mapping in a 64-bucket hash on the channel (hlist_head thread_regs[64], keyed by tid & 63, protected by the existing obj_lock). At RECV2 time the kernel looks up the token for e->sender_tid and returns it in extended args.
struct ntsync_channel_recv2_args {
__u64 entry_id;
__u64 payload_off;
__u64 reply_off;
__u32 sender_tid;
__u32 prio;
__u64 thread_token; /* OUT: registered token (0 if unregistered) */
};
Two new ioctls:
| ioctl | Effect |
|---|---|
NTSYNC_IOC_CHANNEL_REGISTER_THREAD |
Install or replace (tid, token) |
NTSYNC_IOC_CHANNEL_DEREGISTER_THREAD |
Evict entry for tid (idempotent) |
Plus NTSYNC_IOC_CHANNEL_RECV2 – same as RECV but returns an extra thread_token field. The older RECV ioctl still exists in the UAPI, but current Wine-NSPA userspace requires RECV2 and no longer ships the old fallback ladder.
The first version of this patch did the hash lookup in SEND_PI and stamped thread_token onto the entry there. v2 moved the lookup to RECV2. Two reasons:
token = 0, and userspace falls back to get_thread_from_id (which will fail on a dead TID, and the request gets dropped by the existing logic).The hash bucket count is fixed at 64 (no resize, no rhashtable). For a typical Wine process with dozens to a few hundred threads, that gives single-digit average chain lengths – well under the rb-tree key comparison cost in SEND_PI/RECV.
The wineserver enforces:
init_first_thread reply that signals the client may issue requests).Together these ensure RECV2 always sees a non-zero token for a still-live thread. A momentarily-zero token (if registration races a fast first send) yields a userspace fallback that completes correctly – it is only a perf regression, not a correctness one.
channel_drain_thread_regs() on freeWhen a channel is freed, any leftover (tid, token) registrations are dropped. By construction the channel is unreachable at ntsync_free_obj() time (no senders, no dispatchers can have an FD), so no concurrent access is possible – a single pass through the buckets, kfreeing each ntsync_thread_reg.
Old RECV entries still carry thread_token = 0 (initialized in kzalloc), so older consumers can continue using the legacy shape if they exist. Current Wine-NSPA userspace, however, assumes RECV2/TRY_RECV2 and resolves sender threads from the returned token on the normal path.
This is a safety patch, not a feature: it fixes six sites in the driver where slab kzalloc/kfree was being called under raw_spinlock_t on PREEMPT_RT – which is illegal. The bug was latent until 2026-04-26, when an Ableton workload hard-froze the host with a clean kernel oops.
After installing the first thread-token ntsync.ko build, Ableton hard-froze the host 13 minutes into a session:
BUG: kernel NULL pointer dereference, address: 0x9a
RIP: ___slab_alloc+0x316 (xor (%rbx,%rdx,1),%rax RBX=0x3a)
Call: __kmalloc_cache_noprof <- ntsync_obj_ioctl+0x427 [ntsync]
Comm: Ableton Web Con PREEMPT_{RT,(lazy)}
Classic SLUB freelist corruption.
obj->lock and dev->boost_lock are both raw_spinlock_t. On PREEMPT_RT, SLUB’s per-CPU fast path uses local_lock_t, which is spinlock_t – a sleeping lock under PREEMPT_RT (confirmed in include/linux/local_lock_internal.h). So kzalloc / kfree under any raw_spinlock_t is unsafe on RT, including GFP_ATOMIC (the GFP flag gates reclaim, not the local_lock).
This is a mechanically verifiable rule: CONFIG_DEBUG_ATOMIC_SLEEP will splat any sleeping function called from a non-sleepable context. The bug was not caught by that infrastructure only because the production kernel ships without it for performance reasons; the rule itself is unambiguous.
Six sites in ntsync.c violated this rule:
| # | Function | Line | Issue |
|---|---|---|---|
| 1 | ntsync_pi_recalc |
345 | kzalloc(GFP_ATOMIC) under raw |
| 2 | ntsync_pi_recalc |
409 | kfree under boost_lock |
| 3 | ntsync_pi_recalc |
417 | kfree under caller’s obj->lock |
| 4 | ntsync_pi_drop |
441 | kfree under boost_lock |
| 5 | ntsync_channel_register_thread |
1614 | kfree under obj_lock |
| 6 | ntsync_channel_deregister_thread |
1639 | kfree under obj_lock |
Sites 1-4 had been latent since the 1003 PI patch landed; 5-6 were new in 1005 (thread-token registration). The Ableton lockup was almost certainly triggered by 5 or 6: T2 thread-token registration is always-on when channel + kernel support are present, and Ableton boot creates dozens of threads -> dozens of register/deregister calls -> poisoned freelist 13 minutes in. Sites 1-4 had likely also caused several previous unexplained host lockups in the earlier msg-ring, paint-cache, and instrumentation-related lockup series.
The fix introduces a stack-resident struct ntsync_pi_work that the caller pre-allocates and finishes outside any raw lock:
struct ntsync_pi_work {
struct list_head new_po_pool; /* pre-allocated; consumed on demand */
struct list_head to_free_list; /* removed entries to free post-unlock */
};
Three helpers:
void ntsync_pi_work_init(w); /* INIT_LIST_HEAD x2 */
void ntsync_pi_work_prealloc(w); /* kzalloc + list_add to pool, OUTSIDE locks */
struct ntsync_pi_owner *ntsync_pi_work_take_new(w); /* pointer-only list_del under raw */
void ntsync_pi_work_finish(w); /* kfree pool leftovers + to_free_list */
Lifecycle of a pi_owner via this struct:
kzalloc -> list_add to new_po_pool (caller, no lock)
consumed: list_del from pool, list_add to dev list (pi_recalc, raw)
removed: list_move from dev list to to_free_list (pi_recalc/_drop, raw)
kfree from new_po_pool + to_free_list (caller, no lock)
Empty pool is a non-fatal fallback: pi_recalc skips the boost (transient priority inversion until next op), matching the prior GFP_ATOMIC behaviour. The hot path stays one slab op per ioctl – just hoisted past the lock, so no extra latency.
Every ioctl entry that may invoke pi_recalc / pi_drop declares one of these on stack:
struct ntsync_pi_work pi_work;
ntsync_pi_work_init(&pi_work);
ntsync_pi_work_prealloc(&pi_work);
/* ... acquire raw locks, possibly call pi_recalc/pi_drop ... */
/* ... release all raw locks ... */
ntsync_pi_work_finish(&pi_work);
This pattern shows up in try_wake_any, try_wake_all_obj, release_mutex, wait_any, wait_all, event_set_pi, and several other entry points. Sites 5-6 (channel register/deregister) use a simpler local victim pointer pattern – a single removal per call doesn’t justify the pool.
Only observable difference: ntsync_pi_owner cleanup deferred by tens of nanoseconds past raw_spin_unlock. Mutex ownership transfers atomically with wake (cmpxchg unchanged). PI boost levels and stacking semantics unchanged. Channel priority ordering (DESC, seq ASC) unchanged. Token registration replace-or-insert unchanged. Wait-any/all wakeup ordering unchanged.
1006 is a prerequisite for honest stress-testing of the channel path. Without it, every register/deregister churn in a stress test was rolling SLUB freelist dice. With it, KASAN under PREEMPT_RT became a useful tool: any splat is a real bug, not slab dust. That is what made 1009 (the channel_entry refcount UAF) catchable.
obj_lock() between prepare_to_wait and schedule in ntsync_channel_send_pi: rt_mutex_lock inside obj_lock would clobber TASK_INTERRUPTIBLE state if obj->dev_locked were set. Latent only – channels never participate in wait_all so dev_locked is never set on channels. Safe today; tighten when convenient.
Bug: ntsync_channel_send_pi speculatively boosts recv_wq.head to the sender’s priority before wake_up(), but wake_up() was waking all non-exclusive waiters because wait_event_interruptible adds non-exclusive waiters by default. Non-head receivers could win the entry-pop race -> the boosted head was stranded with high priority and no work; the winner had low priority and the actual work. A real production priority inversion.
This was the plausible root cause of unexplained gamma-dispatcher lockups previously (and incorrectly) blamed on userspace patches.
- ret = wait_event_interruptible(ch->u.channel.recv_wq,
+ /* Exclusive wait: wake_up() in SEND_PI walks the recv_wq and
+ * stops at the first exclusive waiter. This makes the head
+ * (which SEND_PI speculatively boosted) the unique winner of
+ * the entry-pop race -- closes the priority-inversion window
+ * where a non-head receiver could pop the entry while the
+ * boosted head got stranded with high prio and no work. */
+ ret = wait_event_interruptible_exclusive(ch->u.channel.recv_wq,
!RB_EMPTY_ROOT(&ch->u.channel.pending));
Applied in both ntsync_channel_recv and ntsync_channel_recv2.
wake_up() is already exclusive-aware: it walks the wait queue and stops at the first exclusive waiter. So once both RECV and RECV2 register exclusive waiters, SEND_PI’s wake_up() wakes exactly the head – the boost target. The boost target becomes the unique race winner.
wait_event_interruptible_exclusive is a kernel primitive; it takes the wait queue lock, sets the waiter’s WQ_FLAG_EXCLUSIVE flag, and otherwise behaves identically to the non-exclusive variant. No new behaviour introduced; we just opted into the existing semantics.
test-channel-recv-exclusive: 100/100 PASS (was deterministic hang before because the test was stale-coded around pre-1007 wake-all behaviour).The rolled-back “Codex 1007-1011” patch series (Section 10) had attempted a much larger redesign of the channel path, including channel-rejection in setup_wait, cross-snapshot PI cleanup, and a pool/cleanup refactor of the channel allocations themselves. None of that was needed. Three lines suffice.
Bug: the original EVENT_SET_PI design (Section 3) walked event->any_waiters under obj_lock at signal time and applied the boost to the head waiter. This missed any consumer that took obj_lock first, saw signaled=true and returned without queueing – the standard wait fast-path. Result: ~4% of EVENT_SET_PI calls under PREEMPT_RT debug-kernel scheduling silently failed to apply the boost. A real RT-correctness hole.
Thread A (consumer, fast path) Thread B (signaler, EVENT_SET_PI)
obj_lock(event)
if (signaled) { kzalloc(new_ep)
/* signaled=false set later */
fast-path return (NO QUEUE)
}
obj_unlock(event)
obj_lock(event)
walk any_waiters: EMPTY
target = NULL
signaled = true
obj_unlock(event)
kfree(new_ep) /* dropped! */
The signaler sets the event but has no target to boost; the consumer returns from wait_any having seen the signal but unboosted. The boost was lost.
This was hard to spot because most EVENT_SET_PI calls under PREEMPT_RT scheduling do find a queued waiter (the consumer hadn’t reached obj_lock yet). Only the fast-path race – consumer arrives just before signaler – silently dropped the boost. KASAN debug-kernel testing showed it as a ~4% flake rate on the test-event-set-pi test.
The fix flips ownership of the boost target. Instead of the signaler finding the target at EVENT_SET_PI time, the consumer applies the boost to itself at wait-return.
New per-event state in the event union:
struct {
u32 policy;
u32 prio;
struct ntsync_event_pi *new_ep; /* pre-allocated; consumer takes ownership */
} pending_pi;
Mechanism in five steps:
(policy, prio, new_ep) on the event under obj_lock; ALSO set signaled=true and wake any queued waiter.consume_event_pi_boost() at wait-return. This is race-free: the consumer is by definition the task whose wait_any/wait_all returned with this event as the signaled obj.EVENT_SET_PI is called twice without an intervening consumption – earlier staged new_ep is freed (under obj_lock-released, RT-safe).Plus a 6th rule: ntsync_free_obj frees any leaked staging entry on object death (no leak if the event dies unconsumed).
Called from wait_any unqueue loop on the signaled obj if it is an event:
static void consume_event_pi_boost(struct ntsync_obj *event)
{
struct ntsync_event_pi *new_ep = NULL;
u32 policy = 0, prio = 0;
bool valid = false, all;
if (event->type != NTSYNC_TYPE_EVENT)
return;
all = ntsync_lock_obj(event->dev, event);
if (event->u.event.pending_pi.new_ep) {
new_ep = event->u.event.pending_pi.new_ep;
policy = event->u.event.pending_pi.policy;
prio = event->u.event.pending_pi.prio;
event->u.event.pending_pi.new_ep = NULL;
valid = true;
}
ntsync_unlock_obj(event->dev, event, all);
if (valid) {
if (!apply_event_pi_boost(event->dev, current,
policy, prio, new_ep))
kfree(new_ep);
}
}
The atomic capture-and-clear under obj_lock is the one-shot guarantee: the first consumer wins, subsequent consumers see new_ep == NULL and no-op. If EVENT_SET_PI is called again before consumption, the prior new_ep is freed under the same lock and replaced.
The new ntsync_event_set_pi:
new_ep = kzalloc(sizeof(*new_ep), GFP_KERNEL);
if (!new_ep) return -ENOMEM;
ntsync_pi_work_init(&pi_work);
ntsync_pi_work_prealloc(&pi_work);
all = ntsync_lock_obj(dev, event);
/* Stage the boost. Last-writer-wins. */
prior_new_ep = event->u.event.pending_pi.new_ep;
event->u.event.pending_pi.policy = args.policy;
event->u.event.pending_pi.prio = args.prio;
event->u.event.pending_pi.new_ep = new_ep;
/* Signal: identical to EVENT_SET. */
event->u.event.signaled = true;
if (all)
try_wake_all_obj(dev, event, &pi_work);
try_wake_any_event(event);
ntsync_unlock_obj(dev, event, all);
ntsync_pi_work_finish(&pi_work);
/* Free overwritten prior staging outside lock (slab on RT). */
kfree(prior_new_ep);
No more target = list_first_entry(...) walk under obj_lock. No more get_task_struct(target) ref management. The signaler just sets the event; whoever consumes it boosts themselves.
Resetting the event cancels the signal, so it must cancel any pending boost too:
prior_new_ep = event->u.event.pending_pi.new_ep;
event->u.event.pending_pi.new_ep = NULL;
ntsync_unlock_obj(dev, event, all);
kfree(prior_new_ep);
If the event dies unconsumed, free the staging entry:
if (obj->type == NTSYNC_TYPE_EVENT)
kfree(obj->u.event.pending_pi.new_ep);
Inside the wait_any unqueue loop, after the obj is unlocked but before put_obj:
if ((int)i == signaled && obj->type == NTSYNC_TYPE_EVENT)
consume_event_pi_boost(obj);
The signaled index identifies which obj actually woke this wait. We consume only on that obj – non-signaled objs in a multi-object wait have nothing to apply.
ntsync_wait_all cannot call consume_event_pi_boost because that helper takes the obj’s wait-all lock path (via ntsync_lock_obj), and the unqueue loop already holds wait_all_lock. The audio-callback path uses wait_any so this gap is rare in practice; revisit if cross-event boost across wait_all becomes a workload concern. Comment in source:
/* NSPA: TODO -- wait_all consumer hook for EVENT_SET_PI deferred
* boost. Cannot call consume_event_pi_boost here because it
* takes obj's wait-all lock path and we already hold
* wait_all_lock. Audio-callback path uses wait_any (handled in
* the wait_any unqueue), so this is rare in practice; revisit
* if cross-event boost becomes a workload concern. */
test-event-set-pi: 100/100 PASS (was 4% flake rate).test-event-set-pi-stress 60s/8x8: 2.8M signaler ops + 3.4M waiter consumes, 596K boosts cleanly applied, zero KASAN/KCSAN splats, zero leaks (refcnt=0 post-stress), drain restores cleanly.One extra atomic exchange under obj_lock per EVENT_SET_PI (the pending_pi store + signal flip). One extra obj_lock/obj_unlock per consume. The latter is the only new path; it runs only if the event has staged PI – so on workloads that don’t use EVENT_SET_PI it is a no-op (pending_pi.new_ep == NULL check is one load).
Bug: KASAN-caught slab-use-after-free on ntsync_channel_entry under test-channel-stress 4x4 with thread-registration churn. REPLY’s wake_up_all on e->wq runs outside obj_lock (it must – wq’s internal lock is spinlock_t, becomes rt_mutex on PREEMPT_RT, can’t nest under our raw_spinlock_t). That creates a window where SEND_PI’s cleanup could kfree(e) between REPLY’s obj_unlock and REPLY’s wake_up_all reaching the freed wait queue.
BUG: KASAN: slab-use-after-free in do_raw_spin_lock+0x23c/0x270
Read of size 4 at addr ffff8882e30b2564 by task test-channel-st/51072
Call: __wake_up -> ntsync_obj_ioctl+0x8d5 [ntsync]
Allocated by task 51069: __kasan_kmalloc -> ntsync_obj_ioctl+0x941
Freed by task 51069: kfree -> ntsync_obj_ioctl+0x3e3c
Cache: kmalloc-256 (256-byte object), 248 bytes used.
Address is 100 bytes inside freed region.
Disassembly maps:
+0x941 = kzalloc(sizeof(*e), GFP_KERNEL) in ntsync_channel_send_pi (size 0xf8 = 248 bytes).+0x8d5 = wake_up_all(&e->wq) in ntsync_channel_reply (call to __wake_up(wq=rbx+0x60, mode=3=TASK_NORMAL, 0, 0)). Offset 0x60 matches wait_queue_head_t wq field in ntsync_channel_entry.+0x3e3c = kfree(e) at the tail of ntsync_channel_send_pi cleanup.Thread A (SEND_PI sleeper) Thread B (REPLY)
obj_lock(ch)
find e in dispatched
e->replied = true
obj_unlock(ch)
loop iter: prepare_to_wait
loop iter: obj_lock(ch)
loop iter: e->replied is true, break
finish_wait
obj_lock(ch); list_del(&e->list);
obj_unlock(ch)
kfree(e) wake_up_all(&e->wq) <-- UAF
The wake_up_all outside obj_lock is necessary on PREEMPT_RT (wq’s internal lock cannot be taken under raw_spinlock obj_lock). But that creates the window where SEND_PI’s cleanup can free e between REPLY’s obj_unlock and REPLY’s wake_up_all.
Add refcount_t refcnt to struct ntsync_channel_entry. SEND_PI initializes it to 1 after queue insertion (the sleeping sender holds one ref). REPLY does refcount_inc under obj_lock before unlock, then wake_up_all, then refcount_dec_and_test+kfree-if-last. SEND_PI cleanup does refcount_dec_and_test+kfree-if-last. Whichever decrement reaches 0 frees.
Code addition is ~15 LOC:
struct ntsync_channel_entry {
...
refcount_t refcnt;
};
/* In SEND_PI, after successful queue insertion: */
refcount_set(&e->refcnt, 1); /* sleeper holds 1; REPLY will inc */
/* In SEND_PI cleanup, replacing kfree(e): */
if (refcount_dec_and_test(&e->refcnt))
kfree(e);
/* In REPLY, between obj_unlock and wake_up_all: */
e->replied = true;
refcount_inc(&e->refcnt);
obj_unlock(ch);
wake_up_all(&e->wq);
drain_event_pi_boosts(ch->dev, current);
if (refcount_dec_and_test(&e->refcnt))
kfree(e);
There was a previous “Codex 1007-1011” patch series (rolled back; see Section 10) that targeted this same bug class but bundled it with a number of unrelated audit-derived changes (REPLY-fake-on-copy-fail, channel-reject in setup_wait, cross-boost cleanup refactor). The core fix – refcount on the entry – was correct in that series. Everything else was speculative noise that introduced its own bugs.
This patch is just the refcount.
test-channel-stress 30s/4x4: 819,803 SEND_PI = 819,803 REPLY (perfect match), 974K register ops, 0 syscall errors, 0 KASAN/KCSAN splats, refcnt=0 post-stress.test-event-set-pi 20/20 PASS, test-channel-recv-exclusive 20/20 PASS (no regression on Bugs 2/3 fixes).test-event-set-pi-stress 60s/8x8: 2.7M signaler + 3.5M waiter, drain OK, 0 splats.A common alternative for this class of bug is to take a sleepable lock around the wake. We can’t – the obj_lock that protects entry membership is raw_spinlock_t, and we cannot promote it to rt_mutex without losing the bounded-CS guarantee that the rest of the driver depends on. Refcount on the entry is the textbook fix for “object outlives its containing-collection lifetime due to async finishers” – no lock-order changes, no protocol changes, just two incs and three dec_and_tests in the right places.
Patch 1010 adds the heterogeneous wait primitive that the rest of the
NSPA stack had been designing around: NTSYNC_IOC_AGGREGATE_WAIT.
The immediate consumer is the post-1010 gamma dispatcher. Instead of
blocking in direct CHANNEL_RECV2 forever, the dispatcher can wait
on:
in one syscall, while still keeping channel PI visible.
struct ntsync_aggregate_source {
__u32 type;
__u32 events;
__u64 handle_or_fd;
};
struct ntsync_aggregate_wait_args {
__u32 nb_sources;
__u32 reserved;
__u64 sources;
struct __kernel_timespec deadline;
__u32 fired_index;
__u32 fired_events;
__u32 flags;
__u32 owner;
};
WAIT_ANYWAIT_ANY and WAIT_ALL remain NT-object waits.WAIT_ANY participants; 1010 adds a
separate notify-only path for them.CHANNEL_RECV2,
which preserves the existing channel ownership and PI semantics.1010 was not treated as a paper design or a future placeholder. It was validated with a dedicated native aggregate-wait suite:
The first result was the post-1009 base plus aggregate-wait and its PI-ordering follow-ups. The next overlay added burst drain on top, and the current overlay keeps both surfaces while adding the later hardening and cache-isolation work.
CHANNEL_TRY_RECV21011 adds NTSYNC_IOC_CHANNEL_TRY_RECV2, a non-blocking companion to
CHANNEL_RECV2. It does not replace aggregate-wait; it is the follow-on
that lets a woken dispatcher keep draining the ready list without paying
one more AGG_WAIT round-trip per queued entry.
That is a small kernel change, but it is exactly the shape the gamma dispatcher needs under bursty server-bound RPC load:
CHANNEL_RECV2TRY_RECV2 until the channel is emptyThe ioctl is additive at the kernel interface level, but current Wine-NSPA userspace assumes it is present. The old sticky fallback ladder was retired once aggregate-wait became the project baseline.
Bug: in ntsync_obj_ioctl paths for NTSYNC_IOC_CHANNEL_RECV and
NTSYNC_IOC_CHANNEL_RECV2, the receiver popped a channel_entry *e
under obj_lock, then unlocked, then read e->fields (payload,
sender_pid, etc). Between the unlock and the field reads the sender
thread – parked in wait_event_interruptible – could be
signal-interrupted, run its cleanup path, and kfree(e) in that
window. The receiver then read freed memory. KASAN reproducibly caught
this on test-channel-stress and on real Ableton workloads.
The lock-drop is mandatory for the rest of the RECV/RECV2 path:
copy_to_user, the apply_event_pi_boost call, and the receiver
auto-boost cannot run under the raw_spinlock_t obj_lock. The fix
therefore narrows what the post-unlock path needs to read off e.
obj_lock before unlocking/* Pre-1012 (broken): */
spin_lock(&obj->obj_lock);
e = list_first_entry_or_null(&channel->pending, ...);
if (!e) { spin_unlock(...); return -EAGAIN; }
list_del(&e->link);
spin_unlock(&obj->obj_lock);
/* RACE: sender can free e here */
copy_to_user(buf, e->payload, e->len);
/* Post-1012 (fixed): */
spin_lock(&obj->obj_lock);
e = list_first_entry_or_null(&channel->pending, ...);
if (!e) { spin_unlock(...); return -EAGAIN; }
list_del(&e->link);
/* SNAPSHOT under the lock */
local_payload = e->payload;
local_len = e->len;
local_pid = e->sender_pid;
spin_unlock(&obj->obj_lock);
copy_to_user(buf, &local_payload, local_len);
The 1009 fix used a refcount_t on the entry to keep it alive across
REPLY’s wake_up_all. 1012 does not. Refcount would have worked here
too, but it would have added two atomic ops (atomic_inc on entry,
atomic_dec_and_test on exit) to every channel RECV / RECV2
on the audio dispatcher’s critical chain. Snapshotting collapses the
lock-drop window to zero rather than extending the entry’s lifetime,
costs zero atomics, and keeps the recv hot path one cacheline
narrower. The snapshotted fields are small and well-bounded (a few
words).
A new test-channel-try-recv2-stress.c was added in the same change
as a gap-filler for patch 1011: TRY_RECV2 had no dedicated stress
test before this session.
Pre-1013 three ntsync allocation classes lived in the system kmalloc pool:
struct ntsync_event_pi (120 bytes) -> kmalloc-128struct ntsync_channel_entry (192 bytes) -> kmalloc-192struct ntsync_pi_owner (120 bytes) -> kmalloc-128That is functionally correct but architecturally weak for an RT-class
hot path: two 120B objects in kmalloc-128 sit back-to-back so the
tail of one and the head of the next can share a cacheline; an
ntsync object can neighbour a network struct or fs metadata in the
same kmalloc bucket; /proc/slabinfo lumps everything into
kmalloc-128; kmem_cache_shrink, SLAB_FREELIST_HARDENED, and
SLAB_HWCACHE_ALIGN cannot be applied to a subset of kmalloc-128.
Three dedicated caches, each sized exactly to the struct, each with
SLAB_HWCACHE_ALIGN:
ntsync_event_pi_cache = kmem_cache_create("ntsync_event_pi",
sizeof(struct ntsync_event_pi),
0, SLAB_HWCACHE_ALIGN, NULL);
ntsync_channel_entry_cache = kmem_cache_create("ntsync_channel_entry",
sizeof(struct ntsync_channel_entry),
0, SLAB_HWCACHE_ALIGN, NULL);
ntsync_pi_owner_cache = kmem_cache_create("ntsync_pi_owner",
sizeof(struct ntsync_pi_owner),
0, SLAB_HWCACHE_ALIGN, NULL);
All kzalloc / kfree callsites for the three structs are converted
to kmem_cache_alloc / kmem_cache_free. The conversion is
mechanical except for one subtle gotcha (see Section 13’s 1014a
follow-up).
Caches are constructed in ntsync_init in dependency order with
mirrored unwind labels, and destroyed in ntsync_exit in reverse
order:
misc_register runs after all three caches are constructed –
an ioctl can never run against a half-initialised module.misc_deregister runs before kmem_cache_destroy – all
.release callbacks complete before any cache is torn down..owner = THIS_MODULE, so the module refcount
blocks unload while any fd is open.pending_pi.new_ep) land
predictably in cacheline 0 of the object. No false sharing between
ntsync objects.kmalloc-128. All ntsync state lives in
dedicated pools; coherence traffic stays inside the ntsync hot path./proc/slabinfo and /sys/kernel/slab/ntsync_*
expose per-cache objects, partial, and cpu_slabs directly.SLAB_FREELIST_HARDENED covers the dedicated caches as a unit
on kernels built with that flag. Catches double-free and
bad-pointer-free at the slab layer.kmalloc-128 route on the 120B
structs.| cache | idle | drum-load | delta | size | pre-1013 home |
|---|---|---|---|---|---|
| ntsync_event_pi | 637 | 795 | +158 | 120B | kmalloc-128 |
| ntsync_pi_owner | 637 | 795 | +158 | 120B | kmalloc-128 |
| ntsync_channel_entry | 168 | 168 | 0 | 192B | kmalloc-192 |
| kmalloc-128 (system) | 2240 | 2240 | 0 | 128B | n/a |
158 new event-PI staging pairs (one event_pi + one paired
pi_owner) absorbed cleanly in the dedicated caches; kmalloc-128
stayed flat – isolation under real load. SLUB internal state moved
in the expected direction: partial slabs filled (8 -> 2), per-CPU
slabs went up (18 -> 24), matching “hot path picks up CPU-local
allocations”.
1013 has no dependency on 1014 and vice versa; the patches are separately revertable.
SEND_PI target scanIn ntsync_channel_send_pi, before staging the boost on a target
waiter, the code scans the channel’s wait_queue_head_t to pick a
target. Pre-1014 that scan acquired wq->lock
(spin_lock_irqsave – still raw on PREEMPT_RT here because it is the
wait-queue’s own lock, not obj_lock) even when the queue was empty.
The empty case is the common one for an audio dispatcher under
steady load: most SEND_PI fires hit a channel with no parked
waiters. That is a wasted IRQ-disable plus spinlock round-trip on the
audio thread’s hot path.
Replace the unconditional lock+scan with a list_empty_careful peek
first:
/* Pre-1014: */
spin_lock_irqsave(&wq->lock, flags);
list_for_each_entry(...) { ... }
spin_unlock_irqrestore(&wq->lock, flags);
/* Post-1014: */
if (list_empty_careful(&wq->head)) {
/* fall through to any_waiters fallback path; no lock taken */
goto no_target;
}
spin_lock_irqsave(&wq->lock, flags);
/* same as before */
spin_unlock_irqrestore(&wq->lock, flags);
list_empty_careful uses smp_load_acquire and is documented as
appropriate for “lockless check, then maybe lock” patterns.wq->head mutators (wait_event_*, prepare_to_wait,
finish_wait) take wq->lock, so the lockless reader sees a
consistent list state.SEND_PI picks
up the missed waiter and the unconditional wake_up fires either
way – no waiter is lost, no boost is misdirected. The fall-through
to the any_waiters fallback path is the documented escape valve
on the existing path.Removes a spin_lock_irqsave from the audio thread’s SEND_PI hot
path in the common (empty-queue) case – a measurable IRQ-off window
reduction in the path that matters most for audio jitter.
kmem_cache_free is not NULL-safeThe 1013 conversion left one kfree-style site for
obj->u.event.pending_pi.new_ep un-NULL-guarded in ntsync_free_obj.
The diff comment claimed kmem_cache_free is NULL-safe like kfree.
The kernel source disagrees:
mm/slub.c:6900 (Linux 6.19.11):
void kmem_cache_free(struct kmem_cache *s, void *x)
{
s = cache_from_obj(s, x);
...
}
static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
{
if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) &&
!kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS))
return s;
cachep = virt_to_cache(x); /* DEREFS x */
...
}
Under SLAB_FREELIST_HARDENED (debug kernel: enabled) the
short-circuit fails and virt_to_cache(NULL) runs, dereferencing
offset 0x8 of NULL. kfree() early-outs on ZERO_OR_NULL_PTR(x);
kmem_cache_free does not – the asymmetry the diff comment got
wrong.
The crash signature on the debug kernel was a page fault at
kmem_cache_free+0x5c with RAX = vmemmap[0] (the struct page for
NULL) and CR2 = 0x...0008 (the slab->slab_cache deref at offset
8) – exact match.
if (obj->type == NTSYNC_TYPE_EVENT && obj->u.event.pending_pi.new_ep)
kmem_cache_free(ntsync_event_pi_cache,
obj->u.event.pending_pi.new_ep);
The pattern matches the existing explicit-guard sites the same conversion already used at lines 1306, 1336, 1604, 1740, 1829, 1916. Site 2089 was simply missed in the original 1013 audit.
A four-dimension audit covering the entire post-1014 file:
kmem_cache_free site:
17 sites total, 16 provably safe (loop-vars, refcnt-protected, or
explicit upstream if (...) {), 1 bug at site 2089.kfree on a
cache-allocated struct, no kmem_cache_free on a generic
kmalloc-allocated struct.misc_register,
mirrored unwind labels, misc_deregister before
kmem_cache_destroy, module-owner refcount blocks unload while
any fd is open.list_empty_careful RCU-safe,
mutators all take wq->lock, false-positive “empty” benign by
design.A post-1014a audit (2026-05-05) of the live driver enumerated all
remaining kmalloc / kzalloc sites in ntsync.c. Six sites total;
only two on the audio dispatcher hot path, both for the per-ioctl
wait queue (struct ntsync_q) allocated by setup_wait and
ntsync_aggregate_setup. Every WAIT_ANY / WAIT_ALL / AGGREGATE_WAIT
ioctl pays one kmalloc(struct_size(...)) then one kfree.
| Site (post-1014a line) | Path | Status |
|---|---|---|
1974 ntsync_thread_reg |
per channel thread-register | COLD – skip |
2160 ntsync_obj |
per CreateEvent/Mutex/Sem | COOL – marginal |
2383 wait queue q |
per WAIT_ANY/WAIT_ALL ioctl | HOT – target |
2829 wait queue q (agg) |
per AGGREGATE_WAIT ioctl | HOT – target |
2840 fds array |
per wait-with-FDs (var-count) | not eligible |
3092 ntsync_device |
per chardev open | COLD – skip |
struct ntsync_q is the only HOT kmalloc class that survived 1013.
struct ntsync_q has a flexible-array member entries[] whose count
is total_count – 1 (typical audio worker) up to
NTSYNC_MAX_WAIT_COUNT+1 = 65 (NtWaitForMultipleObjects cap) or
NTSYNC_AGG_MAX = 64 (aggregate). Three options were considered:
kmalloc fallback when total_count > 16. Cleanest.kmalloc above. Halves
typical-case per-slot waste but doubles the cache count.Shipped (a). 16 entries comfortably covers the typical 1-8 audio wait
depth; larger waits keep the kmalloc path with no regression. Slot
size with SLAB_HWCACHE_ALIGN is 704B on x86_64 (header 32 +
16×entry(40) = 672B, rounded to the next 64-byte cacheline).
A bool from_cache field is added to struct ntsync_q, placed in the
existing 2-byte trailing pad after bool ownerdead so
sizeof(struct ntsync_q) is unchanged. Set by ntsync_alloc_q, read
by ntsync_free_q:
static struct ntsync_q *ntsync_alloc_q(__u32 total_count)
{
struct ntsync_q *q;
if (total_count <= NTSYNC_Q_CACHE_MAX_ENTRIES) {
q = kmem_cache_alloc(ntsync_wait_q_cache, GFP_KERNEL);
if (q)
q->from_cache = true;
} else {
q = kmalloc(struct_size(q, entries, total_count), GFP_KERNEL);
if (q)
q->from_cache = false;
}
return q;
}
static void ntsync_free_q(struct ntsync_q *q)
{
if (!q)
return;
if (q->from_cache)
kmem_cache_free(ntsync_wait_q_cache, q);
else
kfree(q);
}
ntsync_free_q is NULL-safe by design (early return). kmem_cache_free
is not NULL-safe under SLAB_FREELIST_HARDENED (the 1014a lesson);
centralising the guard in the wrapper makes per-site audit trivial.
Two alloc-site conversions plus six free-site conversions complete the
WAIT_* / AGGREGATE_WAIT path.
Both kmem_cache_alloc(..., GFP_KERNEL) and kmalloc(..., GFP_KERNEL)
are sleep-prone (they may direct-reclaim). R1 from
ntsync-rt-audit.md forbids sleeping operations under
raw_spinlock_t. Verified at every call site:
| Site | Context when ntsync*q runs |
|---|---|
setup_wait 2383 |
Top of function, no locks held |
ntsync_aggregate_setup 2829 |
Top of function, no locks held |
setup_wait err 2421 |
Error cleanup, no locks held |
ntsync_wait_any 2573 |
After unqueue and ntsync_pi_work_finish |
ntsync_wait_all 2746 |
After wait_all_lock unlock and ntsync_pi_work_finish |
ntsync_aggregate_setup err 2842 |
fds-alloc fail, no locks held |
ntsync_aggregate_setup err 2891 |
Partial-init fail, no locks held |
ntsync_aggregate_wait 3083 |
After unqueue and ntsync_pi_work_finish |
The 1006 alloc-hoist invariant is preserved end-to-end.
struct ntsync_q has a task-private lifecycle: allocated by the
syscalling task, populated by the same task, published into wait
queues under obj_lock, list_del’d under obj_lock during unqueue
(mutually exclusive with try_wake_any_*), then freed. No
cross-thread free path exists. The 1012 snapshot-vs-refcount lesson
does not apply – there is no lock-drop window between mutator and
freer.
SLAB_NO_MERGE retro-correctionThe original 1015 patch only added SLAB_HWCACHE_ALIGN (mirroring
1013). First boot showed the new cache absent from /proc/slabinfo.
/sys/kernel/slab/ revealed why:
ntsync_channel_entry -> :0000192 # merged
ntsync_event_pi -> :0000128 # merged
ntsync_pi_owner -> :0000128 # merged
ntsync_wait_q -> :0000704 # merged (1015 alone)
All four ntsync caches had been merged by SLUB into generic
kmalloc-N classes. The 1013 architectural promise of “isolation from
kmalloc-128” had not been holding on the prod kernel since 1013
landed. It held on the debug kernel because
SLAB_FREELIST_HARDENED makes caches incompatible for merging –
different debug-vs-prod config. Section 12’s drum-load slabinfo
absorption table was therefore debug-kernel evidence; on prod, those
allocations were going into kmalloc-128 the whole time.
Fix: add SLAB_NO_MERGE (available since kernel 6.4; prod runs 6.19)
to all four kmem_cache_create calls, bundled into the 1015 patch as
a retroactive correction:
ntsync_event_pi_cache = kmem_cache_create("ntsync_event_pi", ..., SLAB_HWCACHE_ALIGN | SLAB_NO_MERGE, NULL);
ntsync_channel_entry_cache = kmem_cache_create("ntsync_channel_entry", ..., SLAB_HWCACHE_ALIGN | SLAB_NO_MERGE, NULL);
ntsync_pi_owner_cache = kmem_cache_create("ntsync_pi_owner", ..., SLAB_HWCACHE_ALIGN | SLAB_NO_MERGE, NULL);
ntsync_wait_q_cache = kmem_cache_create("ntsync_wait_q", ..., SLAB_HWCACHE_ALIGN | SLAB_NO_MERGE, NULL);
After the fix, /sys/kernel/slab/ntsync_*/ are all real directories;
no symlinks, no merging. The 1013 isolation promise holds
on prod.
| cache | active_objs (steady-state) | kmalloc-N delta during same window |
|---|---|---|
ntsync_wait_q |
184 | kmalloc-1k delta = 0 |
ntsync_event_pi |
256 | covered in dedicated cache |
ntsync_channel_entry |
168 | covered in dedicated cache |
ntsync_pi_owner |
256 | covered in dedicated cache |
184 active ntsync_wait_q objects is the steady-state concurrency of
Ableton’s worker pool parked in NtWaitForMultipleObjects. Pre-1015
those 184 would have lived in kmalloc-1k; post-1015 they sit in the
dedicated cache, with kmalloc-1k flat across the load window –
isolation proven on the prod kernel for the first time since 1013.
The active_objs metric is concurrency, not throughput; a
per-second alloc-rate proof would need
/sys/kernel/slab/ntsync_wait_q/alloc_calls cumulative deltas or a
slabtop snapshot pair. Not a gate, just a refinement for future
evidence-gathering.
1015 has no dependency on 1012 / 1013 / 1014; the patches remain
separately revertable. The SLAB_NO_MERGE retro-correction is bundled
because both edits live in the same kmem_cache_create chain –
landing it as a separate 1013a would have meant two patch
applications for one logical change.
| Stage | What landed | Notes |
|---|---|---|
| PI baseline | 1003 + 1004 + 1005 + 1006 | priority inheritance, channel transport, thread-token return, and RT-safe alloc/free discipline |
| Channel wake correctness | 1007 + 1008 + 1009 | exclusive receive wakeup, deferred event boost, and channel-entry lifetime fix |
| Aggregate-wait | 1010 | heterogeneous wait over objects plus fds, with channel notify-only support |
| Burst drain | 1011 | non-blocking TRY_RECV2 after one aggregate-wait wake |
| Snapshot + cache hardening | 1012 + 1013 + 1014 + 1014a | receive snapshot fix, dedicated caches, lockless SEND_PI fast path, and the free-site NULL guard |
| Wait-queue cache isolation | 1015 | dedicated wait-queue cache plus SLAB_NO_MERGE across all four ntsync caches |
The current module at /lib/modules/6.19.11-rt1-1-nspa/kernel/drivers/misc/ntsync.ko carries the full overlay above.
| Test | Build stage | Ops | KASAN | Result |
|---|---|---|---|---|
| test-event-set-pi-stress 30s/4x4 | deferred-boost fix build | 1.5M signaler | 0 | PASS |
| test-event-set-pi-stress 60s/8x8 | deferred-boost fix build | 2.8M sig + 3.4M waiter | 0 | PASS |
| test-mutex-pi-stress 30s/8+4mtx | deferred-boost fix build | 726K acq+rel matched, 632K PI events | 0 | PASS |
| test-channel-stress 30s/4x4 | deferred-boost fix build | KASAN UAF caught at ~30s | 1 | EXPECTED FAIL (Bug 4 found) |
| test-channel-stress 30s/4x4 | post-channel-entry fix build | 819K SEND_PI = 819K REPLY | 0 | PASS |
| test-event-set-pi-stress 60s/8x8 | post-channel-entry fix build | 2.7M sig + 3.5M waiter | 0 | PASS |
| test-event-set-pi 20x sanity | post-channel-entry fix build | 20/20 PASS | 0 | PASS |
| test-channel-recv-exclusive 20x | post-channel-entry fix build | 20/20 PASS | 0 | PASS |
| test-mixed-load-stress 5min/13W | post-channel-entry fix build | ~10.3M ops, all paths | 0 | PASS |
| test-aggregate-wait 9/9 | aggregate-wait build | functional + PI sub-tests | n/a | PASS |
| aggregate-wait 1k mixed stress | aggregate-wait build | 1k iterations | 0 | PASS |
| aggregate-wait 30k + native suite | aggregate-wait build | long stress + full suite | 0 | PASS |
| test-channel-stress (post-1012) | snapshot + cache-hardening build | 1.34M ops (post-1012 KASAN re-soak) | 0 | PASS |
| test-channel-try-recv2-stress | snapshot + cache-hardening build | 2.6M TRY_RECV2 ops | 0 | PASS |
| test-mixed-load-stress 300s/13W | snapshot + cache-hardening build | 5.28M chan SEND/REPLY, 1.99M audio waits, 12.6M REG/DEREG | 0 | PASS |
| test-channel-stress 60s/4x4 (1014a) | snapshot + cache-hardening build | 1.40M SEND/REPLY, 1.40M RECV+RECV2 | 0 | PASS |
| test-channel-try-recv2-stress 30s | snapshot + cache-hardening build | 62k SEND, 2.68M attempts, 97.68% EAGAIN | 0 | PASS |
Cumulative debug-kernel: ~30 million operations through post-1009; post-1014a adds another ~14 million ntsync ops (channel SEND_PI hit ~21x more than the post-1012 validation window), zero KASAN splats, zero dmesg matches for BUG/KASAN/Oops/use-after-free/lockdep/warn.
The aggregate-wait consumer path was validated on the production kernel/userspace pair rather than only in isolation:
test-aggregate-wait 9/9 PASSdispatcher-burst in the PE matrix gives a reproducible A/B for the
dispatcher path itself; burst ops/sec is 841,765 with TRY_RECV2 on vs
555,567 with TRY_RECV2 off (+34% / 1.5x)create_file plus TRY_RECV2 under Ableton PASSThis matters because 1010 is load-bearing only when the userspace dispatcher is actually blocked inside it. The build result therefore includes both the syscall itself and the post-1010 wake/boost ordering fixes.
13-thread/300s soak across every ntsync path concurrently against a single dev_fd:
Operation totals:
| Path | Ops | Notes |
|---|---|---|
| audio multi-obj waits | 8,757,969 | 100% wake rate |
| ui EVENT_SET_PI | 139,513 | |
| ui EVENT_SET / RESET / PULSE | 46,506 / 23,181 / 23,324 | |
| ui mutex acq=rel | 137,297 / 137,297 | perfect |
| chan SEND_PI / REPLY | 308,546 / 308,548 | perfect after 30 benign races |
| chan REGISTER / DEREGISTER | 730,985 / 365,492 | |
| sem release/acquire/read | 136,683 / 180,063 / 180,064 | |
| wait_all 3-obj acq=rel | 71,855 / 71,855 | perfect |
| syscall errors | 0 | |
| KASAN/KCSAN splats | 0 | |
| module refcnt post-soak | 0 |
After cross-build to the production kernel 6.19.11-rt1-1-nspa (no debug instrumentation, throughput 5x-149x higher than debug):
| Layer | Run | Result | Ops | Errors |
|---|---|---|---|---|
| 1 native sanity | run-rt-suite.sh native | 3/3 PASS | small | 0 |
| 1 stress | event-set-pi 60s 8x8 | PASS | ~158M | 0 |
| 1 stress | mutex-pi 30s 8h+4mtx | PASS | ~12M | 0 |
| 1 stress | channel 30s 4x4 | PASS | ~52M | 0 |
| 1 stress | mixed-load 300s 13 workers | PASS | ~145M | 0 |
| 2 PE matrix | nspa_rt_test.exe baseline+rt | 32 PASS / 0 FAIL / 0 TIMEOUT | n/a | 0 |
Cumulative on the production kernel: post-channel-entry baseline ~370 M ops, then aggregate-wait, then burst drain, then the receive snapshot and dedicated-cache hardening carries, and the wait-queue cache plus full cache isolation; 0 syscall errors, 0 dmesg splats, refcnt=0 post-soak.
The post-1014a build was also re-validated with the full RT-suite v7
on prod kernel 6.19.11-rt1-1-nspa: 16/16 RT pass + 3/3 native ioctl
pass; channel snapshot UAF and kmem_cache_free NULL-deref both
closed; dedicated kmem_cache slabinfo evidence captured under real
Ableton drum-load (158 new event-PI staging pairs absorbed in the
dedicated caches with kmalloc-128 flat). Note: that drum-load
slabinfo capture was on the debug kernel; on the prod kernel the
1013 caches had been SLUB-merged into kmalloc-128 the entire time,
which is the issue 1015’s SLAB_NO_MERGE retro-correction fixes
(Section 14).
The 1015 build was validated against the same prod kernel
6.19.11-rt1-1-nspa. The native ioctl soak (validate-1015.sh,
which exercises both setup_wait and ntsync_aggregate_setup alloc
paths through test-mixed-load-stress, test-channel-stress,
test-channel-try-recv2-stress, and test-aggregate-wait) was not
invoked this round: the prod kernel has no
SLAB_FREELIST_HARDENED/KASAN tooling, so the soak’s signal value
collapses to functional-only – which Ableton already provides at
much higher rate. The actual correctness gate was the
four-dimension audit plus the NULL-safe ntsync_free_q wrapper.
Empirical safety: Ableton booted clean both pre- and
post-SLAB_NO_MERGE rebuild; audio-path WAIT_ANY ioctls drove the
new alloc/free pair constantly with no GP-fault, so from_cache
routing is correct in both directions.
Slabinfo absorption (validate-1015-slabinfo-watch.sh, Ableton 30s
windows at 1Hz, project loaded, mixed transport activity):
ntsync_wait_q steady-state 184 active objects (worker pool parked
in NtWaitForMultipleObjects).ntsync_event_pi 256, ntsync_channel_entry 168, ntsync_pi_owner
kmalloc-1k delta = 0 across the load window; kmalloc-128/192/
256/512 deltas are unrelated system traffic, same shape pre-1015.The 184 active ntsync_wait_q objects – that would have been in
kmalloc-1k on every prior build – combined with the flat
kmalloc-1k row, are the isolation proof on prod. active_objs is
concurrency, not throughput; per-second alloc rate would need
/sys/kernel/slab/ntsync_wait_q/alloc_calls deltas (not a gate, just
a refinement).
Only PASS/FAIL is authoritative across debug vs production kernels; throughput numbers aren’t directly comparable because the debug-kernel slub_debug=FZPU + kfence + KASAN tax dominates.
The PI contention / priority wakeup ordering / rapid mutex throughput / philosophers tests from the original single-page ntsync doc remain valid. None of the later channel or aggregate-wait carries changed the mutex PI path; the metrics are unchanged:
| Metric / Test | v4 RT | v5 RT | Delta |
|---|---|---|---|
| ntsync-d4 RT PI avg | 387 ms | 270 ms | -30.2% |
| ntsync-d8 RT PI avg | 419 ms | 201 ms | -52.0% |
| Rapid mutex throughput | 232K ops/s | 259K ops/s | +11.6% |
| Rapid mutex RT max_wait | 54 us | 47 us | -13.0% |
| Philosophers RT max_wait | 1620 us | 865 us | -46.6% |
Priority wakeup ordering is exact (5 waiters at distinct priorities wake in priority order, both baseline and RT modes, all test runs). PI chain propagation is correct up to depth 12.
The patches in this stack divide cleanly into two categories. The boundary matters because it dictates which patches were safe to ship in a flurry and which weren’t.
Patches in the mechanically verifiable category enforce a rule that has an oracle. If the rule is violated, kernel debug infra (CONFIG_DEBUG_ATOMIC_SLEEP, LOCKDEP, KASAN) will splat. The patch either makes the splat go away or it doesn’t; there is no ambiguity.
raw_spinlock_t on PREEMPT_RT”. CONFIG_DEBUG_ATOMIC_SLEEP will splat on a violation. Mechanical.wait_event_interruptible_exclusive) whose semantics are documented and obvious. Mechanical.kmem_cache_free NULL guard) closes a cache_from_obj deref at site 2089 surfaced by SLAB_FREELIST_HARDENED. The kernel source disagrees with the diff comment; mm/slub.c is the oracle. Mechanical.list_empty_careful fast-path) is a smp_load_acquire-based pre-check before an existing wq->lock section. Best-effort target selection means a false-positive “empty” reading is benign by design (next SEND_PI picks up the missed waiter, the unconditional wake fires either way). The semantics are documented. Mechanical.1013 (dedicated kmem_caches) is structural infrastructure, not a
correctness fix. It is always-on (cacheline alignment, isolation,
visibility) and does not change observable semantics; the cost was a
single missed NULL guard caught and fixed in 1014a.
Patches in the code-review hypothesis category encode a reviewer’s argument that some code is buggy. There is no oracle. If the reviewer’s argument is wrong (or the bug is somewhere else), the patch ships new bugs without fixing the original one.
On 2026-04-26 there was an unfound EVENT_SET_PI slab UAF (___slab_alloc+0x316 GP-fault, ntsync_obj_ioctl+0x44e). KASAN was queued but not yet run. Codex’s review surfaced three “other issues” (cross-snapshot PI, non-exclusive RECV, channel-accept-in-setup_wait), and patches 1007-1011 (5 patches in 6 hours, including a 34KB rewrite) landed under the rationale that “(1) ∧ (2) explains the hang.”
That rationale was theory, not a measured trace. The actual unfound slab UAF was 1006 – a kfree under raw_spinlock_t in channel_register/deregister_thread. None of 1007-1011’s hypotheses were correct about the original symptom. Worse, the 1007-1011 series introduced a new UAF (the CHANNEL_REPLY UAF that 1009 ultimately fixed) that only existed because channels had been added at all.
All of 1007-1011 were rolled back. The proper sequence was then:
When chasing an unidentified bug, narrow on the actual symptom (trace / KASAN / ftrace / repro) – do not pile speculative fixes from adjacent code review under the cover of “while I was in there, I noticed…”. Even when the audit is internally well-reasoned, the issues it surfaces are almost certainly unrelated to the observed symptom – and landing them piles new failure modes on top of the original one.
Independent CRIT findings can still be filed as separate tickets/patches, but they should not ship until the original symptom is understood. At minimum: do not ship them on the same day, on top of an unfound bug, in the same module.
A small surface area that is clearly correct in isolation (e.g. a refcount discipline patch with a real KASAN trace) can ship – but only after asking: “is this fixing damage I caused with adjacent work, or real upstream-relevant correctness?” 1009 was the latter.
This is also why 1006 was safe to ship in-flurry while the rolled-back 1007-1011 wasn’t: 1006 has an oracle (CONFIG_DEBUG_ATOMIC_SLEEP), the rolled-back series had only Codex’s argument.
All in wine-rt-claude/ntsync-patches/:
1003-ntsync-mutex-owner-pi-boost.patch – PI baseline (combined with 1001+1002 in the live module)1004-ntsync-channel.patch – channel object1005-ntsync-channel-thread-token.patch – thread-token + RECV21006-ntsync-rt-alloc-hoist.patch – pi_work pool, alloc/free hoist1007-ntsync-channel-exclusive-recv.patch – exclusive wait_event1008-ntsync-event-set-pi-deferred-boost.patch – deferred-boost machinery1009-ntsync-channel-entry-refcount.patch – refcount_t on channel_entry1010-ntsync-aggregate-wait.patch – NTSYNC_IOC_AGGREGATE_WAIT1011-ntsync-channel-try-recv2.patch – NTSYNC_IOC_CHANNEL_TRY_RECV21012-ntsync-channel-recv-snapshot-pop-fields-uaf-fix.patch – snapshot popped fields under obj_lock1013-ntsync-dedicated-slab-caches.patch – dedicated kmem_caches for the three hot allocation classes1014-ntsync-channel-send-pi-lockless-target-scan.patch – list_empty_careful fast-path on SEND_PI, with the 1014a kmem_cache_free NULL guard at site 2089 folded in1015-ntsync-wait-q-kmem-cache.patch – dedicated kmem_cache for struct ntsync_q (≤16 entries + kmalloc fallback) and SLAB_NO_MERGE retro-correction across all four ntsync cachesdrivers/misc/ntsync.c in linux-nspa-6.19.11-1.src/linux-nspa/src/linux-6.19.11/ – 2182 lines.ntsync_channel_send_pi line 1489, ntsync_channel_recv line 1620, ntsync_channel_recv2 line 1690, ntsync_channel_reply line 1757, consume_event_pi_boost line 1131, apply_event_pi_boost line 596, channel_lookup_token line 1420.pi_work infrastructure: struct ntsync_pi_work line 196, ntsync_pi_work_*() helpers lines 201-244.include/uapi/linux/ntsync.h – ioctl numbers, ntsync_wait_args, NTSYNC_INDEX_URING_READY, channel and thread-token ioctl arg structs.dlls/ntdll/unix/sync.c (Wine submodule) – linux_wait_objs() lines 482-549, linux_set_event_obj_pi() lines 411-417, semaphore/mutex/event helpers lines 380-475.In wine/nspa/tests/:
test-event-set-pi.c – 1008 EVENT_SET_PI deferred-boost validationtest-event-set-pi-stress.c – 8x8 EVENT_SET_PI hammertest-channel-recv-exclusive.c – 1007 exclusive-recv validation (with symmetric cleanup)test-mutex-pi-stress.c – mutex contention + Tier B FIFOtest-channel-stress.c – channel SEND_PI/RECV/REPLY + register churn (caught the 1009 + 1012 UAFs)test-channel-try-recv2-stress.c – TRY_RECV2 stress (1011/1012 gap-filler)test-aggregate-wait.c – 1010 aggregate-wait functional + PI sub-teststest-mixed-load-stress.c – 13-thread cross-path soakrun-rt-suite.sh – sanity runner with SKIPPED_BY_DESIGN listwine-rt-claude/ntsync-patches/validate-1015.sh – 1015 native ioctl soak driver (covers both setup_wait and ntsync_aggregate_setup alloc paths)wine-rt-claude/ntsync-patches/validate-1015-slabinfo-watch.sh – 1015 Ableton-side /proc/slabinfo 1Hz capture for absorption / isolation evidencentsync-userspace.gen.html – the Wine ntdll handle-to-fd cache, server-owned vs client-created anonymous sync handles, dispatcher ioctl wrappers, and PE-side wait coverage that consume the kernel surface documented here.cs-pi.gen.html – the userspace CS-PI counterpart; together with this driver they close all priority inversion vectors in Wine’s synchronization stack.gamma-channel-dispatcher.gen.html – how the gamma dispatcher userspace code uses the channel object and consumes the thread-token.