Wine-NSPA – Thread and Process Shared-State Bypass

This page documents the shared-state bypass for read-mostly thread and process queries, plus the zero-time process and thread wait fast paths built on the same published snapshots.

Table of Contents

  1. Overview
  2. Coverage
  3. Architecture
  4. Thread query coverage
  5. Process query coverage and zero-time waits
  6. Correctness boundaries
  7. Related docs

1. Overview

Wine already had an upstream shared-object publication mechanism for queue, window, class, input, and desktop state. Wine-NSPA extends that same seqlock-published shape to thread and process state, so a set of NtQueryInformationThread() and NtQueryInformationProcess() classes can be answered from shared memory instead of a wineserver RPC.

The same published state also powers zero-time WaitForSingleObject() polls for process and thread handles. For those single-handle, non-alertable, timeout-0 waits, ntdll can answer from the shared snapshot instead of paying an ntsync wait ioctl.


2. Coverage

Surface Current behavior
Thread shared-state publication wineserver publishes a per-thread shared object with seqlock update discipline and a per-handle locator RPC (get_thread_shm) for first resolve
Process shared-state publication wineserver publishes a per-process shared object with the same seqlock shape and a matching first-resolve RPC (get_process_shm)
Thread query bypass 7 NtQueryInformationThread() classes are served shmem-first with RPC fallback
Process query bypass 6 NtQueryInformationProcess() classes are served shmem-first with RPC fallback
Zero-time thread wait WaitForSingleObject(thread, 0) can answer from thread_shm and skip the ntsync ioctl on a hit
Zero-time process wait WaitForSingleObject(process, 0) can answer from process_shm and skip the ntsync ioctl on a hit
Cache discipline first use resolves the locator once; later reads are local; stale-slot detection and negative-cache entries force safe fallback instead of silent drift

The current query coverage is:

ThreadBasicInformation is intentionally left on the server path. The existing reply applies server-side transforms that are not mirrored in the published snapshot, so the public design keeps that one class authoritative instead of adding a special-case partial mirror.


3. Architecture

The bypass has two layers:

  1. wineserver publishes thread and process snapshots inside the existing shared object union, using the normal seqlock write protocol
  2. ntdll resolves a handle to its published object once, caches the locator, then serves later queries from a single seqlock snapshot read
Shared-state query bypass: one resolve RPC, then local seqlock reads Win32 query call site NtQueryInformationThread / Process same handle may be queried repeatedly first use only resolve shared object `get_thread_shm` / `get_process_shm` returns locator locator id cached with object pointer steady state single seqlock snapshot read thread/process fields copied locally class-specific reply built without wineserver server-published shared object wineserver updates fields under `SHARED_WRITE_BEGIN` / `END` client retries until the seqlock cycle is stable object id is rechecked after read to catch slot recycling safe miss path no query access, stale slot, or map failure negative cache or stale-id eviction caller falls back to the original RPC

The public point of the design is simple:

That is why this feature can land safely without changing Win32-visible semantics.


4. Thread query coverage

The thread snapshot carries the fields needed by the current read-mostly thread classes:

Thread query coverage: snapshot classes vs. retained RPC classes served from `thread_shm` `ThreadAffinityMask` `ThreadQuerySetWin32StartAddress` `ThreadGroupInformation` `ThreadIsTerminated` `ThreadSuspendCount` `ThreadHideFromDebugger` `ThreadPriorityBoost` retained on RPC `ThreadBasicInformation` server reply still applies effective-priority and exit-status transforms `ThreadAmILastThread` depends on process-scoped last-thread computation `ThreadNameInformation` variable-length payload stays on the original reply path

This boundary is deliberate. The point is not to force every thread query onto shared memory. The point is to retire the cheap, high-frequency, fixed-shape queries and leave the odd or transformed replies on the authoritative path.


5. Process query coverage and zero-time waits

The process snapshot carries enough state to answer the six current NtQueryInformationProcess() classes and to answer one additional hot liveness question: “has this process already exited?”

That second use matters because Wine’s in-process sync path already resolves a process handle to an ntsync-backed wait object. For WaitForSingleObject(proc, 0), ntdll can short-circuit before the wait ioctl:

This is both faster and slightly more correct for Wine’s own layering, because it removes the small gap between the process info snapshot and the separate wait path.

Zero-time process wait: shared-state answer before the wait ioctl `WaitForSingleObject(process, 0)` single handle, zero timeout, non-alertable only ordinary waits still use the normal ntsync path shmem fast path read `process_shm.exit_code` alive -> `STATUS_TIMEOUT` dead -> `STATUS_WAIT_0` fallback resolve fd and issue wait ioctl used on cache miss, access miss, multi-handle, alertable wait, or non-zero timeout measured synthetic poll cost ntsync ioctl path: ~10000 ns/poll (`9916`, `10030`, `10141`) shmem fast path: ~144 ns/poll (`130`, `130`, `171`) — about `70x` faster per poll

The public process-query coverage is:

The fixed-shape, read-mostly part of that surface is local. Process image name queries, debug-object queries, variable-length payloads, and other server authority cases still use the original RPC path.

5.1 Zero-time thread wait

Thread handles get the same zero-time short-circuit shape, but the predicate is different. A thread exit code starts life at 0, which is a valid user exit code, so the thread fast path cannot use exit_code != 0 as a liveness test. It instead reads THREAD_SHM_FLAG_TERMINATED from the published thread snapshot:

That keeps the thread wait path honest while still removing the wait ioctl from the common zero-time poll case.

Zero-time thread wait: use the published termination flag before the wait ioctl `WaitForSingleObject(thread, 0)` single handle, timeout 0, non-alertable only ordinary waits still use the normal ntsync path shmem fast path read `THREAD_SHM_FLAG_TERMINATED` clear -> `STATUS_TIMEOUT` set -> `STATUS_WAIT_0` fallback resolve fd and issue wait ioctl used on cache miss, access miss, multi-handle, alertable wait, or non-zero timeout measured synthetic poll cost ntsync ioctl path: ~11940 ns/poll shmem fast path: ~164 ns/poll — about `73x` faster per poll

6. Correctness boundaries

Three parts make this safe enough to ship as the default behavior:

That is the important discipline for this feature family. It is not trying to be clever about every thread or process query. It is publishing the read-mostly state that Wine can mirror honestly, reading it with the existing seqlock pattern, and refusing the rest.