Wine-NSPA Audio Stack

Wine 11.6 + NSPA | JACK 1.x / PipeWire-JACK | PREEMPT_RT kernel | 2026-04-27 Author: Jordan Johnston Status: current audio-stack reference for the shipped JACK/WASAPI/nspaASIO path, including the low-latency Phase F route.

This page explains how Wine-NSPA moves Windows audio through winejack, how nspaASIO fits into that stack, and which timing-critical work stays inside the JACK callback.

Table of Contents

  1. Overview
  2. Backend constraints and JACK selection
  3. Stack layering
  4. winejack.drv: the JACK backend for Wine
  5. WASAPI surface and shared mode
  6. Exclusive mode and the fast path
  7. MIDI
  8. nspaASIO: the ASIO bridge
  9. Phase F: zero-latency bufferSwitch in the JACK callback
  10. Intentionally unimplemented surfaces
  11. Deferred work
  12. Other audio drivers
  13. Validation
  14. References

1. Overview

The Wine-NSPA audio stack provides deterministic low-latency WASAPI and ASIO transport on PREEMPT_RT systems. Validation focuses on DAW workloads, but the same backend serves any Windows audio application that opens WASAPI or ASIO.

The audio stack consists of three components that work together:

  1. winejack.drv is a Wine audio driver that exposes a JACK backend to Wine’s WASAPI surface and to WinMM MIDI. It replaces the role that winealsa.drv and winepulse.drv play in upstream Wine. One driver, two transports: WASAPI audio over JACK audio ports, and WinMM MIDI over JACK MIDI ports.
  2. nspaASIO is a vendored ASIO driver shipped as dlls/nspaasio. It implements the COM IASIO interface that DAWs probe for, and routes the ASIO callback model into a path that ends at winejack.drv and JACK. It does not ship its own JACK client; it delegates to winejack so that ASIO and WASAPI applications share a single transport.
  3. Phase F is the design that closes the loop on latency. Instead of bouncing audio data through an intermediate ring buffer, Phase F dispatches the ASIO bufferSwitch callback directly inside the JACK process callback, with a small futex-based handshake to wake the application’s process thread. The data written by the host comes out the same JACK period it went in.

This document describes how those pieces fit together, what each one is responsible for, and which design decisions were forced by the constraint of running on a PREEMPT_RT kernel under JACK.

2. Backend constraints and JACK selection

Vanilla Wine ships three audio drivers: winealsa.drv (ALSA PCM), winepulse.drv (PulseAudio), and wineoss.drv (OSS). Each of them satisfies the WASAPI surface in their own way, and each of them runs into the same set of problems on a PREEMPT_RT kernel hosting a real-time audio workload.

ALSA PCM is not RT-friendly when driven from a Wine timer thread. The vanilla winealsa.drv audio path uses NtDelayExecution (a Sleep-equivalent) inside a timer loop to pace WASAPI period events. Sleeps under PREEMPT_RT are honored, but their wakeups are scheduled against the rest of the system, which means Wine’s audio service thread wakes whenever the scheduler gets to it. Sleep granularity is not the same as JACK period granularity. The ALSA driver also accepts AUDCLNT_SHAREMODE_EXCLUSIVE but does only a token amount of work for it – buffer-size rounding, no exclusive device claim, no format enforcement, no exclusive-mode timing. On a typical Ableton session this manifests as occasional missed deadlines that turn into xruns.

PulseAudio routes audio through a userspace daemon that is not on the RT path. PipeWire’s PulseAudio compatibility layer is closer to RT-correct, but winepulse.drv is still talking to PulseAudio through its compatibility ABI, not directly to the underlying RT engine. There is an extra hop, and that hop costs both latency and predictability.

OSS is a legacy compatibility path. It remains in upstream Wine for older systems and is not a target backend for low-latency PREEMPT_RT workloads.

The deeper problem is that each of these drivers tries to manufacture a clock from the host system’s general-purpose timing primitives – a CLOCK_MONOTONIC sleep, an ALSA wakeup timed against PCM availability, a PulseAudio buffer-fill notification. None of those clocks were designed to be authoritative for a hard-real-time audio callback running at SCHED_FIFO 80+. On a PREEMPT_RT kernel they can be made better, but they cannot be made deterministic.

JACK is built around a different premise. The JACK process callback runs on a SCHED_FIFO thread inside the JACK server (or, with PipeWire-JACK, inside the PipeWire RT loop, which provides the same contract). The callback fires once per period at a frame boundary that the rest of the system has already committed to. Every JACK client on the box is woken by JACK and produces or consumes one period’s worth of audio inside that callback. There is no separate clock; the JACK callback is the clock. That callback is the authoritative timing source for an RT-correct Wine audio driver.

Accordingly, the implementation uses a JACK-native Wine audio driver. That driver is winejack.drv.

3. Stack layering

The transport has three modes, selected by API surface.

WASAPI shared mode (Windows media players, browsers, generic apps):

Win32 app -> mmdevapi -> WASAPI client interface
          -> winejack.drv (Unix side)
          -> JACK audio ports
          -> JACK / PipeWire RT engine
          -> hardware

WASAPI exclusive mode (DAWs that want a guaranteed buffer contract, or apps using AUDCLNT_STREAMFLAGS_EVENTCALLBACK):

Win32 app -> mmdevapi -> WASAPI client (EXCLUSIVE + EVENTCALLBACK)
          -> winejack.drv exclusive event-driven path
          -> JACK audio ports
          -> JACK RT engine

ASIO (DAWs and plugin hosts that prefer the ASIO callback model: Reaper, Ableton, Cubase, FL Studio):

Win32 app -> COM IASIO -> nspaASIO
          -> Phase F registration with winejack.drv
          -> JACK process callback dispatches bufferSwitch in-band
          -> JACK audio ports
          -> JACK RT engine

The same JACK transport carries all three modes. Multiple ASIO and WASAPI clients can coexist, and JACK handles graph-level mixing and routing. The stack does not implement Windows-style exclusive-device lockout; that behavior is discussed in Section 10.

MIDI takes a parallel path through the same driver:

Win32 app -> WinMM MIDI -> winejack.drv (jackmidi.c)
          -> JACK MIDI ports
          -> external synths / soft synths / DAW MIDI tracks

WinMM MIDI is a separate JACK client (wine-midi) from the audio one (wine-audio). They have separate process callbacks, separate lifecycles, and separate port sets. Sharing a single client for audio and MIDI is possible but offers no real benefit – JACK callbacks are cheap, and decoupling lets MIDI come up before audio is initialized and stay up after audio shuts down.

The three flavors above resolve into a single layered data path. Every Win32 audio API ultimately funnels through mmdevapi into winejack.drv’s Unix side, which holds the JACK client and the per-period process callback. Phase F shortcuts ASIO data past the WASAPI ring while still re-using the same JACK client, the same port set, and the same process callback. The diagram below shows the layering and which boundary each API surface enters at.

Wine-NSPA audio data path -- three API surfaces, one JACK transport Win32 PE WASAPI shared (media player) WASAPI exclusive event-driven ASIO host (DAW + plugins) Win32 ABI mmdevapi -- IAudioClient / IAudioRenderClient (WASAPI surface) nspaasio.dll -- IASIO COM PE / Unix boundary Wine driver (Unix side) winejack.drv (jack.c) -- WASAPI client implementation, stream state machine general path: interleaved ring fast path: per-channel double-buf Phase F: register_asio + futex pair single JACK process callback services every active stream (shared / excl / Phase F) pi_mutex_trylock against WASAPI threads -- no blocking inside RT callback JACK wine-audio JACK client -- output_1..N + input_1..N ports (deinterleaved float32) SCHED_FIFO process_callback owns the period clock; wakes via libjack JACK server jackd / pipewire-jack RT engine -- graph mix, port routing, period scheduling Hardware ALSA hw: device -- USB / PCI audio interface, sample-rate clock owner Phase F shortcut: same-period zero-copy data lands at HW in 1 JACK period

The Phase F path (rightmost column) removes an extra JACK-period staging step. Instead of filling an intermediate ring and waiting for the next callback to consume it, the host’s bufferSwitch data is emitted in the same JACK period in which it was produced.

4. winejack.drv: the JACK backend for Wine

winejack.drv lives at dlls/winejack.drv/ in the Wine tree. It is a standard Wine audio driver in the sense that it presents the same Unix-side function table that winealsa.drv and winepulse.drv present to mmdevapi. The function table – the set of enum unix_funcs entries declared in unixlib.h – is what mmdevapi’s WASAPI client implementation calls into when it needs to enumerate endpoints, create a stream, push or pull a buffer, query the position, or report latency.

There are two source files:

The driver is registered in configure.ac and links against libjack. It builds as winejack.so and ships alongside the other Wine DLLs.

Phase 1 vs Phase 2

The driver was implemented in two phases. Phase 1 delivered MIDI – jackmidi.c and the MIDI half of unixlib.h. During Phase 1, audio still went through winealsa.drv (with a small delegation that let winealsa.drv ask winejack.drv for its MIDI driver via NSPA_JACK_MIDI=1), so applications could get JACK MIDI without depending on the audio side. Phase 2 delivered WASAPI audio – the function-table entries in jack.c, the stream lifecycle, and the JACK audio process callback. After Phase 2, MIDI and audio share winejack.drv as a single Wine driver, and winealsa.drv’s MIDI delegation is no longer the recommended path.

Internal layering

Inside winejack.drv the audio side is organized into eight loosely-coupled pieces:

  1. Endpoint and device management – enumerates physical JACK ports, groups them by client prefix, presents them to mmdevapi as audio endpoints.
  2. WASAPI stream state machine – Initialize / Start / Stop / Reset / Release transitions, error reporting on contract violation, lifecycle of one IAudioClient instance.
  3. Event-driven scheduler bridge – pull mode. The application sets up AUDCLNT_STREAMFLAGS_EVENTCALLBACK, calls SetEventHandle, and waits on the handle in a loop. winejack signals the event each JACK period.
  4. Timer-driven scheduler bridge – push mode. The application polls GetCurrentPadding on its own cadence and calls GetBuffer / ReleaseBuffer when it wants to. winejack maintains the buffer-state contract against a JACK-backed stream.
  5. JACK audio transport layer – one JACK client (wine-audio), one process callback that services every active stream, port registration on stream creation, port destruction on stream release.
  6. Timing, clock, and latency reportingIAudioClock::GetPosition, IAudioClient::GetStreamLatency, GetDevicePeriod. Position is monotonic and synchronized with actual JACK frame progress.
  7. WinMM MIDI layer (in jackmidi.c) – midiOutShortMsg, midiOutLongMsg, midiInStart, callback dispatch, MIM/MOM notifications, ringbuffer plumbing.
  8. JACK MIDI transport layer (in jackmidi.c) – one JACK client (wine-midi), MIDI process callback, frame-aligned event timestamps, port registration per opened MIDI device.

The audio side runs to roughly 3000 lines in jack.c. MIDI is roughly 700 lines in jackmidi.c.

Timing model

winejack.drv treats JACK callback timing as the authoritative engine. WASAPI-facing events, padding, periods, position, and latency are synthesized on top of that callback cadence.

WASAPI gives the application a contract: the device has a period, you’ll be woken at period boundaries (or you can poll), padding is accurate, position monotonic. winejack honors that contract. But the contract is synthesized – there is no Windows audio engine underneath. The JACK process callback fires, winejack updates internal state, and the next time the application reads padding or waits on its event, it sees a state consistent with one more JACK period having elapsed.

This is the same shape as ASIO2WASAPI (a native-Windows project that inverts the relationship: an ASIO driver that calls into a WASAPI exclusive client). Both are bridges from a callback-driven backend to the WASAPI ABI; the surface is identical, only the backend differs.

5. WASAPI surface and shared mode

mmdevapi calls into the audio driver through the enum unix_funcs function table. The headline entries on the audio side are:

Each is satisfied by winejack.drv in terms of JACK state.

Endpoint enumeration

JACK exposes a flat list of physical and virtual ports. winejack groups them into endpoints by client prefix (every system:capture_* becomes one capture endpoint, every system:playback_* one playback endpoint, and similarly for any other JACK clients with stable port-naming patterns – a USB interface, a virtual cable, a soundcard exposed by PipeWire). The result is a small handful of endpoints that look enough like Windows audio devices for mmdevapi to enumerate.

Endpoint information is built once at first query. There is a known gap here – device hotplug events from JACK do not refresh the endpoint list, so plugging in a USB interface after Wine is up requires a Wine restart to see the new endpoint. This is filed under deferred work; in practice DAW workflows tend to set up the audio environment before launching the DAW.

Format negotiation

JACK speaks one format: deinterleaved 32-bit IEEE float, at one sample rate (whatever JACK was started with), at one buffer size (whatever JACK was configured for). Everything else is the driver’s problem.

get_mix_format reports float32 at JACK’s native rate. is_format_supported answers honestly:

get_device_period reports JACK’s buffer size as both the minimum and default period. This is the only period winejack can support without introducing additional buffering.

Format conversion

JACK is deinterleaved float32 per port. WASAPI is interleaved multi-channel in whatever format the application chose. Converting between them is the driver’s job and happens in two places:

  1. Render path – the application’s audio sits in a per-stream interleaved ring buffer in its native format. The JACK process callback reads from the ring, converts to float32 if needed, deinterleaves into per-channel JACK port buffers, and applies per-channel volume.
  2. Capture path – JACK port buffers are read in the process callback, interleaved into the application’s format, and stored in the per-stream ring. The application reads the ring on its own cadence.

The conversions cover the standard WASAPI integer formats (int16, int24-in-32, int32) and the float formats (float32 passthrough, float64). For float32 at JACK rate, the only work in the render path is the deinterleave – the format is already correct.

Shared-mode behavior

Shared mode is straightforward. Multiple shared-mode streams open into a single wine-audio JACK client. JACK handles graph-level mixing the way it always does – if two shared-mode streams write to the same endpoint, both end up at the JACK output port and JACK mixes them downstream. There is no per-stream mixer in winejack itself; the driver writes its converted sample data into JACK port buffers and lets JACK do the rest.

The locking strategy

The JACK process callback runs at SCHED_FIFO and must not block. The application’s WASAPI threads run at SCHED_OTHER (or, in the DAW case, SCHED_FIFO with a priority below the audio callback). They share state – the per-stream ring buffer, the held-frames counter, the position counter.

The chosen scheme is a pi_mutex_trylock in the process callback. The Wine WASAPI threads take the mutex normally. The JACK callback trylocks; if it fails, that period outputs silence (or skips a capture), and a counter ticks. If the lock is held by a Wine thread when the callback fires, the kernel’s PI machinery boosts that Wine thread to the JACK callback’s priority – but in practice the trylock-fallback path is what actually runs, because we don’t want to wait on the application thread under any circumstance. The PI boost is a safety net, not a planned interaction.

This is the same PI-mutex pattern Wine-NSPA uses elsewhere (see CS-PI), so the mechanism is reused rather than reinvented.

The two JACK clients

winejack.drv opens two JACK clients per Wine process:

JACK clients are cheap on the JACK server side – registering a new client is a few hundred microseconds, port registration is comparable – so the two-client design adds no observable cost. The benefit is decoupling: a MIDI device that is opened, used, and closed during a DAW session does not perturb the audio client’s port set, and a stream that is created and torn down on the audio side does not affect MIDI.

When PipeWire-JACK is the JACK server, the same two-client design applies. PipeWire’s JACK compatibility layer is functionally complete for client-side semantics, including process callbacks at correct period boundaries and port registration; everything in this document applies to PipeWire-JACK as well as a native jackd server.

6. Exclusive mode and the fast path

WASAPI exclusive mode with AUDCLNT_STREAMFLAGS_EVENTCALLBACK is the path serious audio applications take. It is the path Ableton takes when configured against a WASAPI device. It is the path that an ASIO host’s WASAPI fallback uses, and it is the path that nspaASIO uses when Phase F is unavailable.

The exclusive contract is tight:

Mapping that to JACK:

The fast path

For the common DAW case – exclusive mode, event-driven, float32, JACK-native rate, channel count within JACK port budget – winejack uses a fast path that strips out everything not needed for the float32-at-JACK-rate case.

The criteria, all required:

When all of those hold, create_stream allocates per-channel double buffers (set A and set B, each one JACK period long) instead of an interleaved ring buffer. The application’s GetBuffer returns a pointer to the write set; ReleaseBuffer deinterleaves into the per-channel write set. The JACK callback flips an atomic rt_buf_idx and memcpys the read set straight into the JACK port buffers. No format conversion (float32 to float32), no volume application unless volume is non-unity, no ringbuffer head/tail bookkeeping. Just a buffer-index flip and a per-channel memcpy.

This is the same pattern wineasio uses internally to bridge ASIO double-buffer semantics to JACK. It is the right shape for the exclusive event-driven path because both ends agree on the period and the format – the only thing that has to happen is moving the bytes.

When the fast-path criteria are not met – shared mode, push mode, format mismatch, rate mismatch – winejack falls back to the general path: interleaved ring buffer, format conversion, volume application, the works. The fast path is a per-stream optimization and adds no overhead when it isn’t engaged.

Padding and position

Padding (the number of frames queued but not yet consumed) is read by the application to decide how much to write. For exclusive event-driven mode it is approximately zero immediately after ReleaseBuffer – because the JACK callback consumes the whole period in one go – and full again immediately afterwards, until the next callback. The driver tracks held_frames atomically; the JACK callback subtracts what it consumed, the WASAPI thread adds what was released.

Position – IAudioClock::GetPosition – is read by DAWs for transport timing and drift compensation. The driver maintains a 64-bit frame counter that the JACK callback advances by the period size each time it runs, plus a QPC timestamp captured at the same point. The application gets a position that is monotonic, synchronized with actual JACK frame progress, and correlatable with QPC. Latency is reported via jack_port_get_latency_range – the max of the range, conservative – so DAWs can apply input-monitoring compensation correctly.

Latency budget

For a session at 48 kHz with a 64-frame JACK period, the period is 1333 microseconds. The pre-Phase-F WASAPI exclusive path consumed roughly:

Stage Pre-Phase-F After fast path With Phase F
nspaASIO interleave ~50 us ~50 us 0 (per-channel direct)
WASAPI GetBuffer overhead ~5 us ~2 us n/a (no GetBuffer)
Ring buffer write ~20 us ~10 us (memcpy) n/a
RT-side deinterleave + volume ~30 us ~10 us (memcpy) ~10 us (memcpy)
Event signaling ~100 us (NtSetEvent) ~5 us (futex) ~2 us (futex)
Timer drift +-1 ms 0 (JACK-synced) 0
Pre-fill latency +period 0 0

Phase F’s full additive overhead per period, on top of the JACK period itself, is a couple of memcpys plus the futex round trip – on the order of 30 microseconds for typical channel counts. That is well below the variance of the kernel scheduler and not measurable end-to-end against a clean JACK reference.

7. MIDI

jackmidi.c is the WinMM MIDI implementation. It opens a separate JACK client (wine-midi), registers JACK MIDI input and output ports per opened device, and bridges WinMM’s MOM_* and MIM_* notification model to JACK’s per-period event lists.

The shape:

The lock-free ringbuffers are the standard SPSC variety with atomic head and tail. The JACK process callback never blocks; the WinMM threads never block on the JACK callback.

WinMM MIDI flow + lifecycle (jackmidi.c, wine-midi JACK client) OUTPUT (host -> external) INPUT (external -> host) Win32 host: midiOutShortMsg / midiOutLongMsg External MIDI source (keyboard / DAW track) midi_out_data / midi_out_long_data (WinMM thread) JACK input port (jack_midi_event_get) SPSC ringbuffer (8 KB) -- atomic head/tail producer: WinMM consumer: JACK callback SPSC ringbuffer (8 KB) per port producer: JACK callback consumer: WinMM jack_midi_process_cb (one wine-midi JACK client, RT thread) drain output ring -> jack_midi_event_write(port_buf, ev.time, data) push input ev -> (timestamp = base + ev.time/rate) no allocation, no Wine call, no locks held inside the callback JACK output port -> external MIDI device WinMM dispatch -> MIM_DATA / MIM_LONGDATA Per-port lifecycle (DRVM_*) DRVM_INIT jack client connect MOM_OPEN / MIM_OPEN port register ringbuf alloc RUNNING events flow, MIM_ERROR on overflow MOM_CLOSE / MIM_CLOSE port unregister drain queue DRVM_EXIT walk arrays, close leaks

The two ringbuffers are the synchronisation surface between the WinMM threads and the JACK process callback. Output’s producer side and input’s consumer side are owned by WinMM threads at SCHED_OTHER (or SCHED_FIFO under MMCSS naming when applicable); the other side of each ring is owned by the JACK RT callback. The lifecycle row at the bottom is the per-port progression: DRVM_INIT opens the JACK client lazily on first use, MOM_OPEN / MIM_OPEN registers a port and allocates its ring, the RUNNING state is where the audit’s six bug fixes sit, MOM_CLOSE / MIM_CLOSE unregisters cleanly, and DRVM_EXIT is the audit’s leak-fix path that walks the destination and source arrays to close anything still open at process exit.

Bugs and fixes (the MIDI audit)

A six-issue audit of jackmidi.c produced the following fixes. Each shipped as a separate commit.

Input timestamp jitter. The original code stamped MIDI input events with get_time_msec() at dequeue time – that is, when the WinMM thread drained the ringbuffer, not when JACK saw the event. JACK provides a per-event frame offset (ev.time) within the period, but the dequeue-time approach ignored it entirely. The result was that multiple events in the same period got the same timestamp and the next-period boundary added up to one full JACK period of jitter on every event. For DAWs that record MIDI – a keyboard playing into a piano roll – that jitter is audible as smeared timing.

The fix is to compute the timestamp at enqueue time, in the JACK callback, as base_time + (ev.time * 1000 / jack_rate). Sub-millisecond resolution, no smearing. This was the largest single contributor to the “clunky MIDI” feel that motivated the audit.

Silent message drops on overflow. midi_out_data (the short-message path) silently dropped messages when the ringbuffer was full and returned NOERROR. midi_out_long_data (the SysEx path) was worse – it not only dropped the message but set MHDR_DONE and fired MOM_DONE, lying to the application about completion. The fix is to report the failure honestly: MIDIERR_NOTREADY for short messages, no MOM_DONE for SysEx that didn’t actually go out. Large SysEx dumps (patch banks, firmware uploads to hardware synths) are the visible failure mode here; they were silently truncating, which is the worst possible class of bug.

MODM_RESET only sent CC 123. The WinMM MODM_RESET reset behavior is documented as “All Notes Off” – which on Windows means CC 123 (All Notes Off) and CC 120 (All Sound Off). Without CC 120, sustained notes and reverb tails on external synths keep ringing after the reset. The fix is to send both CCs on each MIDI channel during reset.

No MIM_ERROR on dropped input. When the input ringbuffer overflowed, events were silently swallowed. Windows expects MIM_ERROR for malformed or dropped short messages and MIM_LONGERROR for SysEx that couldn’t be delivered. The fix wires up the appropriate notifications when the JACK callback can’t enqueue.

Output event timestamps were always frame 0. Every output event was written with jack_midi_event_write(..., 0, ...), putting all output at the start of the period regardless of when WinMM received the message. This piles up rapid messages at the same instant within the period. WinMM’s API doesn’t carry sub-period timing on the output side, so the impact is small, but the fix spreads events across the period based on arrival time.

DRVM_EXIT was a no-op. The driver’s exit handler did nothing, so when an application exited without properly closing its MIDI ports, the JACK MIDI ports leaked. The fix walks the destination and source arrays and closes anything that’s still open.

The MIDI audit deliberately kept its commits separate from the audio-side work in jack.c. MIDI bugs and audio bugs have different reproduction paths, different test surfaces, and different blast radii, and shipping them in one commit makes bisection harder when one of the changes regresses.

MIDI process callback shape

The JACK MIDI process callback is a small, focused loop:

jack_midi_process_cb(nframes, arg):
    for each registered output port:
        jack_midi_clear_buffer(port_buf)
        drain SPSC ringbuffer:
            read short or long event
            jack_midi_event_write(port_buf, frame_offset, data, len)
    for each registered input port:
        count = jack_midi_get_event_count(port_buf)
        for i in 0..count:
            jack_midi_event_get(&ev, port_buf, i)
            push (timestamp = base + ev.time/rate, data) into per-port SPSC ringbuffer

There is no allocation, no Wine call, no lock taken in the callback. SPSC ringbuffers are the standard atomic-head, atomic-tail variety with one producer (the WinMM thread on output, the JACK callback on input) and one consumer (the JACK callback on output, the WinMM dispatch thread on input). Capacity is sized for typical SysEx burst patterns – 8 KB per direction – which absorbs ordinary patch-bank transfers without overflow.

8. nspaASIO: the ASIO bridge

dlls/nspaasio/ is a Wine-side COM DLL that implements the IASIO interface. It is the audio driver name a DAW sees when it asks Windows for a list of installed ASIO drivers, and it is what gets loaded when “nspaASIO” is selected from the DAW’s audio device menu.

ASIO is Steinberg’s audio driver model and is the de facto standard for low-latency audio on Windows. DAWs prefer it over WASAPI for two reasons: ASIO predates WASAPI and has a longer track record on professional audio hardware, and ASIO’s callback model exposes a cleaner notion of “fill this output buffer right now” than WASAPI’s pull-from-event loop. From a DAW author’s perspective, ASIO is the easy path.

The job of nspaASIO is to be the ASIO driver Windows audio applications expect, while routing the audio data into a path that ends at JACK. It does not talk to JACK directly. There is already a Wine project that does that – wineasio, which implements IASIO and opens a JACK client of its own. nspaASIO deliberately takes a different shape.

The layered model

Conceptually, nspaASIO is an ASIO-to-WASAPI-exclusive bridge. When a DAW asks nspaASIO to start, nspaASIO (in the slow path) opens a IAudioClient on the default endpoint in exclusive mode with EVENTCALLBACK, sets the buffer duration to the ASIO buffer size, and runs an event-loop thread that does WaitForSingleObject on the WASAPI event, then calls the host’s bufferSwitch callback, then writes the buffer through GetBuffer / ReleaseBuffer. That IAudioClient is backed by winejack.drv, so the audio ends up at JACK – but the layering is clean: ASIO talks to WASAPI, WASAPI talks to JACK.

The mapping table looks like:

ASIO concept WASAPI exclusive equivalent
ASIOCreateBuffers(bufferSize) IAudioClient::Initialize(EXCLUSIVE, EVENTCALLBACK, hnsBufferDuration=bufferSize)
ASIOStart() -> bufferSwitch SetEventHandle() then a wait-loop that calls GetBuffer/ReleaseBuffer
ASIOGetLatencies() IAudioClient::GetStreamLatency() plus per-port JACK latency
ASIOGetSampleRate() mix-format sample rate
ASIOGetBufferSize() IAudioClient::GetBufferSize()
Double-buffer swap per-period GetBuffer/ReleaseBuffer

Why the layered model

The alternative – having nspaASIO open its own JACK client – is what wineasio does, and it is a simpler architecture for the ASIO use case alone. But it forks the audio code. The same Wine prefix running an ASIO DAW and a WASAPI media player and a WinMM game now has two JACK clients, two sets of latency-reporting decisions, and two sets of bugs to fix. By going through WASAPI exclusive, nspaASIO and any WASAPI exclusive application share the same winejack.drv code path, the same JACK client, the same format conversion logic, the same locking strategy.

This is the Phase 3 of the original winejack roadmap: Phase 1 was MIDI, Phase 2 was WASAPI audio, Phase 3 was the ASIO bridge that sits on top.

What’s in nspaasio.c

The file (~1200 lines) implements the IASIO COM vtable: init, start, stop, getChannels, getSampleRate, setSampleRate, getBufferSize, createBuffers, disposeBuffers, controlPanel, future, outputReady, plus the standard COM QueryInterface / AddRef / Release. Most entries are thin – they translate the ASIO call into a sequence of WASAPI calls or look up a value cached at init time.

The interesting entries are createBuffers and start. createBuffers allocates the ASIO buffer pool (per-channel float32 arrays, size 2 – the standard ASIO double buffer), sets up the WASAPI exclusive client, and attempts to register with winejack for Phase F. If Phase F registration succeeds, start becomes a thin pass-through; if it fails, start spins up the play_thread that runs the WASAPI fallback loop.

9. Phase F: zero-latency bufferSwitch in the JACK callback

Phase F is the design that gives ASIO applications the same single-period latency as a native JACK client. The idea, in one sentence: don’t run the ASIO bufferSwitch on a separate Wine thread that reads from a buffer the JACK callback wrote – run bufferSwitch from inside the JACK callback itself, with a futex handshake to a Wine thread that supplies the Win32 thread context.

The problem Phase F solves

The pre-Phase-F (slow-path) ASIO chain looks like this:

JACK process callback (thread T_jack):
    write capture data into the WASAPI ring buffer (at time t)
    signal the WASAPI event

Wine play_thread (thread T_play, SCHED_FIFO):
    wake from WaitForSingleObject(WASAPI event)
    call bufferSwitch(buf_idx, ASIOTrue)  -- host fills output (at time t+epsilon)
    write the output via GetBuffer/ReleaseBuffer into the WASAPI ring

Next JACK process callback (at time t+period):
    read output from the WASAPI ring (at time t+period)
    memcpy into JACK port buffers

The data the host wrote at t+epsilon doesn’t come out of JACK until t+period. That’s an entire JACK period of added output latency, on top of whatever the JACK period itself is. For a 64-frame period at 48 kHz that’s an extra 1.3 ms; for 256 frames it’s 5.3 ms. ASIO drivers with their own JACK clients (wineasio) don’t have this added period because they run bufferSwitch inside the JACK callback; the WASAPI ring buffer is what costs the period.

Phase F removes the period.

The Phase F architecture

Phase F adds a small registration interface between nspaASIO and winejack.drv. When nspaASIO::createBuffers runs and the conditions are met (float32, JACK rate, channel count fits), nspaASIO calls a winejack-private Unix-side function that registers the ASIO callback’s buffer pointers and a handshake state. From that point on, the JACK process callback knows about the ASIO stream and dispatches it in-band.

Inside one JACK period:

JACK process callback (thread T_jack):
    1. Copy JACK capture ports -> ASIO input buffers (memcpy per channel)
    2. CAS handshake state: IDLE -> CAPTURE_READY
    3. futex_wake the play_thread
    4. futex_wait for handshake state == OUTPUT_READY (timeout = 2 * period)
    5. Copy ASIO output buffers -> JACK port buffers (memcpy per channel)
    6. Flip buf_index, reset handshake state to IDLE

play_thread (thread T_play, Wine, SCHED_FIFO):
    1. Unix call asio_wait_callback (futex_wait for CAPTURE_READY)
    2. bufferSwitch(buf_index, ASIOTrue) -- host fills output
    3. Unix call asio_signal_complete (CAS -> OUTPUT_READY, futex_wake T_jack)

Steps 2 through 4 of the JACK callback take place while the play_thread is running bufferSwitch. The JACK callback is parked on a futex and is not consuming CPU. When the host returns from bufferSwitch and the play_thread CASes the state to OUTPUT_READY, the JACK callback wakes, copies the output, and returns. That output goes out the JACK port at the same period. The application’s data lands at the audio interface in one period, not two.

The futex round trip is on the order of 1 to 2 microseconds on PREEMPT_RT. The full period budget at 48 kHz / 64 frames is 1333 microseconds, of which a typical bufferSwitch consumes 300-800 microseconds in a moderate plugin chain. The handshake overhead is in the noise.

The PE/Unix boundary

There is one structural complication. The JACK callback runs on a Unix thread (pthread-managed by libjack). The play_thread is a Wine PE thread, and the bufferSwitch callback is Win32 code that requires a valid Wine thread context (TEB, TLS, exception handling). The futex handshake has to bridge those two worlds.

A PE thread cannot call syscall(SYS_futex) directly; the syscall path goes through the Wine NT layer. To work around this, Phase F adds four new entries to the audio function table in mmdevapi/unixlib.h:

Each is exported from mmdevapi.spec and wrapped in mmdevapi/main.c. The wrappers are thin; they just dispatch to the active driver’s Unix function table. On the Unix side (winejack.drv/jack.c), the four functions manipulate the futex word directly. The play_thread crosses the PE/Unix boundary twice per period – once to wait, once to signal – which is cheap given the wrapping is a normal Wine unix-call.

Other audio drivers (winealsa.drv, winepulse.drv, wineoss.drv) needed stub entries for the four new function-table slots. Those stubs return STATUS_NOT_IMPLEMENTED; ASIO over those drivers falls back to the slow path. Once those drivers are dropped from the build (see Section 12), the stubs become irrelevant.

Same-period diagram

One JACK period under Phase F t = period start t + period T_jack SCHED_FIFO JACK RT capture -> ASIO memcpy CAS + wake CAPTURE_READY futex_wait OUTPUT_READY (T_jack parked) no CPU consumed output -> JACK memcpy flip IDLE T_play SCHED_FIFO Wine PE futex_wait CAPTURE_READY parked bufferSwitch(buf_idx, true) host fills output (Win32 ctx) CAS + wake OUTPUT_READY futex_wake (1) futex_wake (2) data flow: JACK capture -> ASIO input -> bufferSwitch -> ASIO output -> JACK port handshake: IDLE -> CAPTURE_READY -> OUTPUT_READY -> IDLE (one JACK period) audio out the JACK port at end of same period the host filled

Why the play_thread is needed at all

A reasonable question is why Phase F doesn’t just call bufferSwitch directly from the JACK process callback, with no play_thread. The answer is the Win32 thread context. ASIO host code (the DAW’s audio engine, the plugin chain, the VSTs) expects a valid Wine thread when it runs – it allocates from the heap, takes critical sections, calls Win32 APIs. The JACK process thread is a Unix pthread created by libjack and has no Wine context. Constructing one on the fly from a JACK callback is risky – signal masks, TLS, exception scopes all have to be set up correctly, and any mistake takes down the host.

wineasio does take this approach (it uses jack_set_thread_creator to construct Wine threads from JACK’s thread spawner), but it predates the modern Wine PE/Unix split and operates in a different threading model. The Phase F design preserves the cleaner split: Unix code stays Unix, PE code stays PE, the futex bridges them. The play_thread is a small, persistent Wine thread whose only job is to wake on a futex, run bufferSwitch, and signal another futex. It is cheap, predictable, and stays out of the way of the JACK callback.

Priority configuration

The play_thread is created at AvSetMmThreadCharacteristics priority, which on Wine-NSPA maps to a SCHED_FIFO priority below the JACK callback’s priority. The intent is that the JACK callback (which runs at JACK’s process-callback priority, typically RT 80 or higher depending on JACK configuration) is always preemptable up to it – but the bufferSwitch work runs at high enough priority to not be displaced by ordinary application threads. The exact priority comes from the NSPA priority-mapping table; see the CS-PI document for the details on how Wine-NSPA derives RT priorities from the audio thread’s MMCSS task name.

The handshake state is a single 32-bit int shared between the play_thread and the JACK callback. The CAS sequence is IDLE -> CAPTURE_READY -> OUTPUT_READY -> IDLE, and a malformed transition (state observed in an unexpected value) is treated as a protocol error: the JACK callback drops to silence for that period and a counter ticks. In practice the transitions are deterministic; the only error path is timeout, which fires if bufferSwitch takes more than two periods to return – in that case the audio is clearly broken at the host level, and dropping the period is the correct response.

Phase F handshake state machine -- one period int handshake_state shared between T_jack and T_play; CAS transitions only IDLE period boundary T_jack: ready to fire T_play: futex_wait CAPTURE_READY capture buf populated T_jack: futex_wait OUTPUT T_play: bufferSwitch() OUTPUT_READY host wrote output T_jack: copy + flip T_play: futex_wait SILENCE / DROP timeout = 2 * period T_jack: zero output period counter++ CAS: IDLE -> CAPTURE_READY T_jack copies capture, futex_wake T_play CAS: CAPTURE_READY -> OUTPUT_READY T_play returns from bufferSwitch, futex_wake T_jack CAS: OUTPUT_READY -> IDLE T_jack copies output -> JACK ports, flip buf_idx timeout: bufferSwitch > 2 periods malformed transition (protocol error) legend normal CAS transition timeout / malformed all transitions: __atomic_compare_exchange one JACK period total: IDLE -> CAPTURE_READY -> OUTPUT_READY -> IDLE

The state field is a single 32-bit integer; every transition is a __atomic_compare_exchange on it. The two threads coordinate without a shared lock or condition variable – the futex pair (one per direction) plus the CAS state is the entire IPC surface inside the audio period. Errors are noisy but recoverable: a timeout drops one period to silence, the counter ticks, and the next period restarts the cycle from IDLE. There is no recovery state machine because there is no useful recovery – if bufferSwitch ran long, the data it produced is no longer fresh by the time it returns.

Fallback

If Phase F registration fails – non-float32 format, channel mismatch, bug in the registration path – nspaASIO falls back to the WASAPI slow path described in Section 8. The application still works, just with one extra period of latency. There is no version of the code where the application sees an error because of Phase F unavailability; Phase F is a strict performance enhancement on top of a working WASAPI fallback.

The driver description seen by the DAW is just “nspaASIO” regardless of which path is active. There is no “Phase F” string in the DAW-visible UI; the distinction is internal only.

What Phase D became

The earlier per-channel direct-buffer plan – Phase D in the rework roadmap – was the design where nspaASIO and winejack agreed on per-channel float32 buffer pointers and exchanged data without any interleave step. nspaASIO’s play_thread would copy ASIO channels into winejack’s per-channel buffers; winejack’s RT callback would copy from those into JACK port buffers; total cost two memcpys per channel per period, no format conversion.

Phase F is strictly better than Phase D for the ASIO case because it removes the period of latency that Phase D could not. Phase D existed in a partial form (the fast-path per-channel double buffers in Section 6 are descended from it), but the nspaASIO-side direct-buffer access was superseded by Phase F before it was completed. The fast path on the WASAPI exclusive side remains – it serves any non-ASIO exclusive WASAPI stream that meets the criteria.

10. Intentionally unimplemented surfaces

A few things that look like gaps but are deliberate non-features.

Sample-rate switching. ASIO’s setSampleRate() returns ASE_NoClock if the requested rate doesn’t match JACK. JACK owns the sample rate – changing it requires restarting the JACK server, and it isn’t a Wine client’s place to do that. A real Windows ASIO driver might switch the hardware sample-rate clock on demand, but JACK is fixed by design. This is correct JACK behavior, not a bug.

WASAPI exclusive at non-JACK rate. Same reasoning. The driver could in principle add SRC to support exclusive streams at arbitrary rates, but that adds latency and defeats the purpose of exclusive mode. The driver returns AUDCLNT_E_UNSUPPORTED_FORMAT for rate mismatches in exclusive mode and lets the application either resample on its side or accept JACK’s rate.

Exclusive-mode lockout. On real Windows hardware, exclusive mode locks every other application out of the audio device. The Wine-NSPA stack does not enforce this. Two ASIO applications can coexist; an ASIO application and a WASAPI application can coexist; a WASAPI exclusive stream does not preclude a WASAPI shared stream. JACK handles the mixing at the graph level.

This is intentional and follows JACK’s graph model. Wineasio has the same behavior. Workloads that require Windows-style exclusive-device lockout should not expect it from this stack.

Spatial audio (ISpatialAudioClient). No Wine driver implements this and there is no near-term reason to.

Auxiliary device (legacy CD-audio volume control). Irrelevant in 2026.

11. Deferred work

These are real gaps that aren’t shipped yet, in priority order.

Loopback capture. get_loopback_capture_device is stubbed. JACK can do loopback via port routing, but the Wine driver doesn’t expose it as a WASAPI loopback endpoint. OBS, Discord, Audacity loopback recording, and similar use cases need this. Tracking but not yet on the audio stack roadmap.

Device hotplug. Endpoint enumeration is built once at first query and not refreshed. Hot-plugging a USB audio interface requires a Wine restart to see the new endpoint. JACK exposes graph-change callbacks that could drive a refresh; the wiring is straightforward, the deferral is just bandwidth.

IMMNotificationClient device-change notifications. Applications that respond to audio-device hotplug (changing output to a USB headset on connect) don’t get notified because the notifications aren’t fired. Same root cause as the hotplug deferral.

Capture fast path. Only the render path has the per-channel double-buffer fast path. Capture goes through the general interleaved path. Low priority because ASIO capture uses Phase F directly, and the capture rate of typical DAW workloads is far less critical than the render rate.

ASIO control panel. controlPanel() is a no-op. Some DAWs offer “Open Driver Panel” as a convenience for setting buffer sizes and channel counts. A simple Wine dialog could expose JACK buffer-size and channel-count selection. Nice UX improvement, not on the critical path.

ASIO future selectors. future() rejects every selector. kAsioCanReportOverload, kAsioSupportsTimeInfo, kAsioCanTimeCode should respond correctly where supported.

ASIO outputReady. Returns ASE_NotPresent. With Phase F driving timing from the JACK callback, this could return ASE_OK and let some hosts optimize. Marginal.

Multiple ASIO device entries. Some DAWs expect one ASIO driver per physical audio device. nspaASIO appears as a single driver. The DAW’s device selector inside nspaASIO would have to expose JACK port groupings as virtual devices. Doable, not done.

Raw-mode reporting. AUDCLNT_STREAMFLAGS_RAWMODE is ignored. Correct behavior for JACK (no APOs to bypass) but the WASAPI ABI lets applications query whether raw mode is supported, and the answer should be “yes” rather than silently ignored.

12. Other audio drivers

winealsa.drv, winepulse.drv, and wineoss.drv are all still present in the source tree because Wine’s build system expects them and because they share function-table definitions with winejack.drv via mmdevapi/unixlib.h. The four new function-table entries Phase F added (register_asio, unregister_asio, asio_wait_callback, asio_signal_complete) have stub implementations in each of those drivers that return STATUS_NOT_IMPLEMENTED. The MIDI delegation that winealsa.drv does (alsa_midi_get_driver returning the winejack MIDI driver when NSPA_JACK_MIDI=1 is set) is still wired up.

The plan, once winejack.drv is fully validated for shared, exclusive, and ASIO paths and the deferred items are no longer blocking, is to drop these other drivers from the Wine-NSPA build entirely. The user runs PipeWire with the JACK interface; everything routes through JACK already; the other drivers add no value and can interfere with routing. Disabling them in configure.ac (or removing them from the build set) is mechanical. At that point the stub function-table entries become dead code and the Phase F additions to mmdevapi/unixlib.h can be split into a winejack-specific header rather than the shared one.

This is filed as future work and not yet executed. The drivers stay in the build for now, as a safety net during the period when winejack still has deferred items.

13. Validation

The audio stack has been exercised against a handful of real-world DAW workloads during development. A non-exhaustive list:

The MIDI side has been exercised against external hardware synths over USB-MIDI (Korg, Roland, Novation) for input-timing validation, and against soft-synth plugins inside Ableton for output-timing and SysEx handling.

The validation surface is informal – there is no PE-side audio test harness comparable to nspa_rt_test for the sync primitives – because the failure modes are perceptual (audible glitches, MIDI smear, latency feel) rather than assertable. Periodic regressions are caught by listening, not by exit code. This is a known limitation; building a deterministic audio reproducer that exercises bufferSwitch reentrancy and JACK callback timing without false positives is non-trivial.

The kernel side – the PI mutex behavior, the futex round-trip latency under PREEMPT_RT, the SCHED_FIFO priority chain – has been validated indirectly through the larger Wine-NSPA RT validation suite (run-rt-suite, the ntsync test harnesses). When those tests pass clean, the audio path’s RT assumptions hold.

14. References

Source

Memory entries