This page documents the Wine-side ntsync integration: handle-to-fd caching, client-created sync objects, direct wait / signal helpers, and the current zero-time wait fast paths that sit above them. The kernel half lives on NTSync PI Kernel.
DuplicateHandle on client-range sync handleslinux_wait_objslinux_set_event_obj_pialloc_client_handleWine-NSPA’s ntsync userspace integration lives primarily in
dlls/ntdll/unix/sync.c, with the server-owned bridge in
server/inproc_sync.c. This is the half of the story the kernel-side
patches do not show by themselves: the kernel provides
/dev/ntsync and the ioctl set; Wine has to decide which Win32
handles can resolve to an ntsync fd, where that fd comes from, and how
to keep the resulting (handle -> fd) mapping coherent with handle
reuse and process exit.
Steady state for a supported sync object on Wine-NSPA is:
NtWaitForSingleObject / NtSetEvent / NtReleaseSemaphore /
NtReleaseMutant go straight to /dev/ntsync with no wineserver
round-trip. The first wait or signal on a handle resolves the handle to
an fd; subsequent operations hit the local cache.
That steady state has two important fast-path refinements above it:
process_shmthread_shmThere are two distinct userspace shapes:
get_inproc_sync_fd request; after that, ntdll caches the fd and
goes direct.The two shapes share the same inproc_sync cache layout and the same
wait / signal helpers downstream. They differ only in where the fd
comes from on the first reference.
Server-owned sync objects still exist because some Win32 handles are not purely local: named objects, inherited handles, and cross-process objects all need wineserver as the authoritative creator and bookkeeper. That does not mean every wait on those objects keeps round-tripping through wineserver.
server/inproc_sync.c attaches an ntsync-backed
struct inproc_sync object to the server object and keeps the fd
alive there. On the client side, get_inproc_sync() first tries a
lock-free cache lookup. On a miss, ntdll enters the protected
fd_cache_mutex section, asks wineserver for get_inproc_sync_fd,
receives the fd once, and then caches (handle -> fd, type, access)
locally.
The important steady-state property is: server-owned does not mean
server-waited. Once the fd is cached, NtWaitForSingleObject,
NtWaitForMultipleObjects, NtSetEvent, NtReleaseSemaphore, and
the other supported paths all go straight to /dev/ntsync.
dlls/ntdll/unix/sync.c keeps a flat array indexed by handle of
struct inproc_sync:
struct __attribute__((aligned(64))) inproc_sync {
int fd;
unsigned int refcount;
unsigned char closed;
unsigned char type; /* enum inproc_sync_type as short */
ACCESS_MASK access;
...
};
#define INPROC_SYNC_CACHE_BLOCK_BYTES (256 * 1024)
#define INPROC_SYNC_CACHE_BLOCK_SIZE (INPROC_SYNC_CACHE_BLOCK_BYTES / sizeof(struct inproc_sync))
static struct inproc_sync *inproc_sync_cache[INPROC_SYNC_CACHE_ENTRIES];
static struct inproc_sync inproc_sync_cache_initial_block[INPROC_SYNC_CACHE_BLOCK_SIZE];
The cache is laid out as an array of blocks; block 0 is statically
allocated, later blocks are mmaped as handles climb. Each entry
carries a refcount and a closed bit.
The current layout is deliberately cacheline-shaped:
LOCK traffic on one handle
does not false-share with unrelated handles524288 after the padding changeget_inproc_sync():
get_cached_inproc_sync() – single
relaxed atomic load on the entry plus an acquire fence to pair
with the cache writer.server_enter_uninterrupted_section(&fd_cache_mutex, ...),
re-check the cache (another thread may have populated it), then
SERVER_REQ get_inproc_sync_fd to receive the fd. Cache it via
cache_inproc_sync().closed bit prevents handing the same fd back after close.The miss path is protected by fd_cache_mutex plus the uninterrupted
section so fd receipt cannot race with handle close or concurrent fd
caching by another thread.
For anonymous objects, Wine-NSPA can skip wineserver even at creation time.
alloc_client_handle() hands out values from a client-private handle
range that is disjoint from server handles. ntdll then issues the
kernel create ioctl itself:
NTSYNC_IOC_CREATE_MUTEXNTSYNC_IOC_CREATE_SEMNTSYNC_IOC_CREATE_EVENTand stores the returned fd directly in the same inproc_sync cache
that the server-owned path uses.
The rules are:
Win32 mutexes have abandoned semantics: a thread that holds a mutex
and exits without releasing it leaves the mutex in an abandoned state
that the next acquirer observes as WAIT_ABANDONED. The kernel
ntsync driver implements this via NTSYNC_IOC_MUTEX_KILL, which marks
the mutex as abandoned by a TID.
Wineserver normally tracks ownership by walking a thread’s owned objects on death. Client-created mutexes are not visible to wineserver, so ntdll has to track them itself:
static struct list client_mutex_list = LIST_INIT( client_mutex_list );
Each NTSYNC_IOC_CREATE_MUTEX from alloc_client_handle registers a
client_mutex_entry in this list. On thread exit, ntdll walks the
list and issues NTSYNC_IOC_MUTEX_KILL for any mutex still owned by
the dying TID. That preserves Win32 abandoned-mutex semantics for
client-created mutexes.
A server-side async completion (file I/O, RPC, etc.) needs to signal the consumer’s event. Server code holds a Win32 handle, not an fd. When the consumer’s event is client-created the server cannot resolve the handle – the handle is in the client-private range.
Client-created events register their fd with wineserver after creation
so server-side async completion can signal them directly. The
registration carries (handle, fd); the server stashes the fd against
the handle’s existing async completion machinery.
DuplicateHandle on client-range sync handlesClient-created anonymous sync handles no longer have to fail ordinary
same-process NtDuplicateObject() just because wineserver never minted the
source handle.
For anonymous mutexes, semaphores, and events created on the client-range path, same-process, non-inheritable duplicate is handled entirely in ntdll:
inproc_sync cachedup() the cached ntsync fd(handle, fd, type, access) entryThat keeps both handles as independent client-range handles routing to the same kernel object, with no wineserver round-trip.
Two object-specific follow-ons mirror the create paths:
Cross-process duplicates and inheritable duplicates still require a server-visible handle and therefore still fall through to the ordinary wineserver path.
| Duplicate shape | Current behavior |
|---|---|
| same-process, non-inheritable duplicate of client-range mutex/semaphore/event | handled fully client-side |
| cross-process duplicate | still requires a server-visible handle |
inheritable duplicate (OBJ_INHERIT) |
still requires a server-visible handle |
The steady-state wait helper is inproc_wait(). It resolves each
handle to an fd with get_inproc_sync(), collects an optional alert
fd, adds the optional io_uring eventfd, and then calls
linux_wait_objs().
Two special cases sit above that common helper:
WaitForSingleObject(process, 0) can answer from process_shm.exit_code
before any ntsync wait path runsWaitForSingleObject(thread, 0) can answer from
THREAD_SHM_FLAG_TERMINATED before any ntsync wait path runsThe ordinary blocking and multi-object wait shapes still go through the ntsync path described here.
The full wait path is therefore a userspace + kernel design:
linux_wait_objs() issues NTSYNC_IOC_WAIT_ANY / WAIT_ALLio_uring eventfd fired, ntdll drains
CQEs and re-enters the waitSignal-side helpers follow the same shape. inproc_signal_and_wait()
releases or signals the source object directly with the matching
ioctl, then waits on the destination object with the same in-process
wait path.
For cross-thread wakeups inside Wine,
wine_server_signal_internal_sync() is the high-level entry point.
If the current thread is running with an RT policy and priority, it
calls linux_set_event_obj_pi() (which issues
NTSYNC_IOC_EVENT_SET_PI); otherwise it falls back to plain
linux_set_event() (which issues NTSYNC_IOC_EVENT_SET). That is
the userspace half of the kernel’s deferred-boost behaviour from
patch 1008.
The current zero-time wait fast paths are a small but important extension of the in-process sync model. By the time a process or thread handle has resolved to an ntsync-backed wait object, Wine also already has the published shared object that can answer the liveness question directly.
| Handle type | Shared-state predicate | Local result |
|---|---|---|
| Process | process_shm.exit_code == STILL_ACTIVE |
alive -> STATUS_TIMEOUT, dead -> STATUS_WAIT_0 |
| Thread | THREAD_SHM_FLAG_TERMINATED |
clear -> STATUS_TIMEOUT, set -> STATUS_WAIT_0 |
The thread case uses the termination flag instead of exit_code != 0 because a
thread exit code begins at 0, which is a valid user result.
The inproc_sync cache itself is also part of the current userspace sync
story. Hot waits and signals increment entry refcounts constantly, so false
sharing across unrelated handles showed up as distributed coherence cost.
The layout uses one cacheline per entry and keeps the original
524288-handle capacity by widening each cache block instead of shrinking the
cache.
linux_wait_objsThe wait wrapper is largely unchanged from upstream. NSPA’s only
addition is the uring_fd parameter (passed via the repurposed pad
field of struct ntsync_wait_args) that lets a single WAIT_ANY call
wake on either an ntsync object signal or an io_uring CQE.
static NTSTATUS linux_wait_objs(int device, DWORD count, const int *objs,
WAIT_TYPE type, int alert_fd, int uring_fd,
const LARGE_INTEGER *timeout)
{
struct ntsync_wait_args args = {0};
...
args.objs = (uintptr_t)objs;
args.count = count;
args.owner = GetCurrentThreadId();
args.alert = alert_fd;
args.pad = uring_fd > 0 ? uring_fd : 0;
request = (type != WaitAll || count == 1) ? NTSYNC_IOC_WAIT_ANY
: NTSYNC_IOC_WAIT_ALL;
do { ret = ioctl(device, request, &args); }
while (ret < 0 && errno == EINTR);
...
}
The user-space code is deliberately oblivious to the kernel-side
EVENT_SET_PI staging machinery (patch 1008) and the field-snapshot
fix (patch 1012). Wine just calls WAIT_ANY / WAIT_ALL; the kernel
handles boost consumption and entry lifetime transparently. No
Wine-side change was needed for those carries.
linux_set_event_obj_piThe cross-thread priority-intent setter is a thin ioctl wrapper:
static NTSTATUS linux_set_event_obj_pi(int obj, unsigned int policy,
unsigned int prio)
{
struct ntsync_event_set_pi_args args = {
.flags = 0,
.policy = policy,
.prio = prio,
.__pad = 0
};
if (ioctl(obj, NTSYNC_IOC_EVENT_SET_PI, &args) < 0)
return errno_to_status(errno);
return STATUS_SUCCESS;
}
This is called from the gamma dispatcher path when an RT audio thread
signals a queue event to the dispatcher pthread. The audio thread
passes its own (SCHED_FIFO, prio); the kernel stages the boost on
the event; the dispatcher consumes the signal in its WAIT_ANY and
gets boosted at wait-return.
After patch 1008, the path is bulletproof against the fast-path race:
even if the dispatcher pthread takes obj_lock first and sees
signaled=true, it consumes the staged boost in the unqueue loop on
its way out.
The wineserver dispatcher uses the channel ioctls directly via
ioctl() calls; there is no portable linux_channel_* helper at the
Wine ntdll layer because channels are wineserver-process-private (they
do not cross the wineserver / client boundary as Win32 handles).
The dispatcher loop calls:
ioctl(channel_fd, NTSYNC_IOC_CHANNEL_RECV2, &args);
/* dispatch using args.thread_token */
ioctl(channel_fd, NTSYNC_IOC_CHANNEL_REPLY, &args.entry_id);
On the current kernel/userspace pair, the dispatcher uses
RECV2 for dequeue, follows each reply with TRY_RECV2 until the
channel returns empty, and uses NTSYNC_IOC_AGGREGATE_WAIT to block on
the channel + uring eventfd + shutdown eventfd in one syscall.
The client-side SEND_PI is invoked from the wineserver
request-marshalling fast path; the client’s RT thread blocks in the
kernel until reply.
alloc_client_handleClient-side ntsync object creation uses
InterlockedDecrement(&client_handle_next) to allocate client-range
handles that do not collide with server-allocated handles. The
client-private range starts at a large constant
(INPROC_SYNC_CACHE_TOTAL) and counts down, while server handles
count up from low values; the two ranges never meet for typical Wine
processes.
Wait operations (NtWaitForSingleObject) resolve the handle to a
cached fd via inproc_wait(), then call linux_wait_objs() which
issues the kernel ioctl directly.
Currently enabled for anonymous mutexes, semaphores, and events.
The userspace ntsync surface is exercised by both Layer 1 native ntsync tests and the Layer 2 PE matrix. The split is:
nspa_rt_test.exe ntsync harness creates Win32 mutexes,
semaphores, and events, resolves them through inproc_wait(), and
then hits linux_wait_objs() / the direct signal helpers.wine_server_signal_internal_sync() and the registered
event-fd path used by async completion).EVENT_SET_PI, channel
races, aggregate-wait source ordering, and the hardening bugs from
1007-1009 / 1012 / 1014a.Layer 2 current archived full-suite boundary is
32 PASS / 0 FAIL / 0 TIMEOUT on the PE matrix; Layer 1 native
sanity is 3 PASS / 0 FAIL / 0 SKIP. The cross-build production-kernel runs
advanced from the earlier post-1009 baseline through aggregate-wait,
burst drain, the later receive-snapshot and dedicated-cache hardening,
and the current cache-isolated overlay – with zero syscall errors and
zero dmesg splats at every step.
dlls/ntdll/unix/sync.c – linux_wait_objs(),
linux_set_event_obj_pi(), linux_set_event(),
linux_release_semaphore(), linux_unlock_mutex(),
the inproc_sync cache (get_cached_inproc_sync(),
cache_inproc_sync(), release_inproc_sync(),
get_server_inproc_sync()), alloc_client_handle(), the
client_mutex_list thread-exit walker, and the client-created
anonymous-event path in NtCreateEvent.server/inproc_sync.c – the server-side struct inproc_sync that
attaches an ntsync fd to a wineserver object, plus the
get_inproc_sync_fd request handler.include/uapi/linux/ntsync.h – ioctl numbers,
ntsync_wait_args, NTSYNC_INDEX_URING_READY, channel and
thread-token ioctl arg structs.drivers/misc/ntsync.c – the kernel implementation; see the
NTSync PI Kernel page for the
patch-by-patch walkthrough.