Plan 034·Architecture·SpaceMusic·Draft

Headless engine + separate UI process

Spout-bridged textures locally. Streaming for remote. Same UI binary against either. Both halves auto-detect each other; a QR in the engine's render window onboards a tablet in seconds.

01 · Why this plan exists

Plan 033 addressed the in-process Avalonia composition issues and chose Option B (CPU-bitmap cache) as the right model. Several debugging sessions later we've reached the structural ceiling that plan named but didn't solve: the UI and the renderer share one render thread. Concretely:

Independently, SpaceMusic has a product requirement to be controllable from other machines — tablets, remote control surfaces, eventually cloud-hosted instances streamed back to a thin client. We cannot meet that requirement with the current architecture because every UI is an Avalonia layer composited inside the running engine process.

Both problems have the same answer: split the engine and the UI into separate processes with a well-defined IPC contract.

Locally they live on the same machine and share GPU textures via Spout (Windows shared D3D11 resources). Remotely they communicate over WebSocket for state and WebRTC for streamed GPU surfaces. The local case becomes a degenerate of the remote case — same protocol, same UI code, just a different host URL.

02 · The fundamental shift

Today: one vvvv process owns Stride rendering AND the UI. Avalonia's compositor runs on Stride's render thread. The UI window paints into the same swapchain Stride uses. They cannot run independently because they share hardware-level state.

Target: two processes, each with its own window.

Crucially: scrolling the UI at full screen 4K cannot slow Stride down. And the same UI binary works whether it runs on the same machine as the engine or on a tablet across the network.

03 · Architecture

Two reads of the same picture. The first shows the shape: SpaceMusic is a central core surrounded by four plugin categories (the circle) with the UI sitting below (the rectangle) — the same outline as the SpaceMusic logo. The plan is, in essence, drawing the dashed line that separates the two halves into independent processes. The second diagram drills into the wiring once that line exists.

ENGINE PROCESS CHANNELS · MODEL Input IO · 1D · 2D · 3D Geometry stages · lights Output render · post FX Transforms placement · … Core state · channels mapping · plugins EXE UI parameter rows · interactive previews

Two processes, two windows, one channel hub mirrored across them. Texture transport adapts to locality: zero-copy shared D3D11 (Spout) on the same machine, encoded video (WebRTC) over the wire when remote. The UI never knows the difference at the API level.

ENGINE PROCESS vvvv, no UI host UI PROCESS standalone .NET exe WS · CH SPOUT · TEX WEBRTC · REMOTE RENDER Stride render thread scene + plugins + audio · uncapped writes shared textures · QR overlay when no client HUB Channel hub (source of truth) SmChannelBridge · IChannelHub · auto-discovery all public channels · plugin pin propagation SERVER LocalWebSocketServer /hello · /qr.png · /ws/ channel sync · MemoryPack envelope TEX OUT Texture publisher Spout · NVENC+WebRTC shared D3D11 / video track WINDOW Stride render window engine's own face · scene + QR overlay until client connects RENDER UI render thread (Skia) 30 Hz idle · 60 Hz during input independent of engine framerate MIRROR Local model mirror same SMCodeGen types · same CsvPageView save / load scene locally · syncs via channels CLIENT Transport client auto: loopback / LAN / relay LocalWS · CentrifugoSdk TEX IN Texture consumer Spout → SKImage WebRTC → <video> WINDOW UI exe window parameter rows · interactive preview textures LEGEND Focal / render Backend / service Mirror / store Window / surface Channels (WS) Textures (Spout) Remote (WebRTC)

Local case: both processes on one machine. Spout uses zero-copy shared D3D11 textures. WebSocket uses loopback. End-to-end texture latency ~1 frame.

Remote case: engine on host machine or cloud, UI on tablet/laptop. WebRTC encodes texture frames with hardware H.264/AV1. WebSocket targets a public WSS endpoint (or Centrifugo relay). Total latency 30–80 ms.

04 · Both processes share the same model

Both processes reference the same SMCodeGen output (ParameterIds.g.cs, ParameterHierarchy.g.cs, ChannelNodes.g.cs, the ProjectModel / EnvironmentModel / etc. record graph). The engine's channel hub is the source of truth; the UI maintains a mirror of the same hub kept in sync over the transport. Same generated types → same binding logic — the UI's parameter-row rendering code in CsvPageView is unchanged from today; it just talks to a remote hub instead of the local one via the existing IChannelTransport abstraction.

Practical consequences that fall out for free:

  1. Save / load scene works on either side. The model serializer runs against the local mirror. The engine saves to its own filesystem (current behavior). The UI exe can save to its own local disk (handy for "tablet operator snapshots preset" or "remote user captures state without filesystem access to the engine machine"). The WASM remote UI serializes the model in browser memory and offers it as a download. Drag-drop a scene file onto the browser, the UI loads it and syncs to the engine via channel writes.
  2. Multi-client consistency is the channel hub's existing problem, not something we need to invent.
  3. Type changes are caught at compile time for both processes — adding a parameter to the CSV regenerates the same types on both sides.

Serializer choice: MemoryPack (Cysharp). Source-generated, zero-allocation, roughly 4–10× faster than MessagePack-CSharp on .NET-to-.NET payloads. Works in Avalonia.Browser (AOT-compatible, requires <EmitCompilerGeneratedFiles>true</EmitCompilerGeneratedFiles> in the WASM project). Use end-to-end for channel-value serialization across the WS transport; the existing VL.Serialization.MessagePack stays for any cross-platform wire-format compatibility we may have already shipped (e.g. Centrifugo legacy clients), but new work standardises on MemoryPack.

05 · Existing infrastructure we lean on

Most of the pieces already exist or are mostly built. This plan is more about wiring them together than inventing new tech.

PieceWhere it lives todayRole in the new architecture
Channel hub auto-discoverySpaceMusic.Centrifugo/SmChannelBridge.csEngine-side channel publisher. Iterates _hub.Channels, forwards all public channels. Production-grade.
Local WebSocket transportSpaceMusic.LocalServer/LocalWebSocketServer.csMinimal HttpListener + WebSocket foundation. Will host /hello discovery + /qr.png + /ws/ channel endpoint on the same listener.
Centrifugo transportSpaceMusic.Centrifugo/Production transport for remote UIs that traverse NAT via a relay.
Channel transport abstractionIChannelTransport.csLets us swap WS / Centrifugo without changing call sites. UI exe selects implementation at startup based on discovery result.
CSV → parameter spec generatorSMCodeGen/Program.csAlready emits ParameterIds.g.cs / ChannelNodes.g.cs. Both processes reference the same generated types.
Universal Plugin DataSpaceMusic.Out.Basic.vl + plugin slotsAlready aggregates ~10–15 Stride.Graphics.Texture preview slots. The natural feed for Spout publishing and (Phase 4) WebRTC track production.
Stride shared texturesTextureOptions.Shared + standard D3D11 shared-handle pathMark preview textures as Shared at creation and they're Spout-ready.
Skia ↔ shared D3D11VL.Skia/src/FromSharedHandle.csThe Skia-side pattern for consuming a shared D3D11 handle as an SKImage. We mirror this in the UI process.
Existing Avalonia/Skia UISmAvaloniaLayerV2, AppShell, TexturePresenterBecomes the body of the standalone UI exe instead of an embedded layer. Lifts out of vvvv with relatively little change.
TexturePresenter mouse back-channelsTexturePresenter.cs:33-141MouseX/Y/Buttons/Wheel/IsOver sub-channels with normalized [0..1] coords. Wire contract for interactive previews already exists — zero engine plugin changes needed.
Existing Avalonia WASM clientSpaceMusic.Pro.Browserpro.spacemusic.tvnet8.0-browser, Avalonia.Browser 11.3.4, SkiaSharp WASM. Hosts the full AppShell. The QR scan target — already deployed, already working against Centrifugo.
Existing WASM channel transportCentrifugoSdkTransport.csIChannelTransport over Centrifuge SDK, confirmed working in Avalonia.Browser.
QR code libraryvvvv-binaries/nugets/ZXing.Net.0.16.8Already vendored, no new dependency. QRCodeWriter.encode(url, 256, 256) → byte matrix → Stride.Graphics.Texture sprite.
Plan 033 architecture lessonsdocs/Plans/033All the "what works / what doesn't" research carries over verbatim.

What we don't have yet

06 · Process lifecycle

Three launch modes from one engine binary, no rebuild required:

  1. Double-click launcher mode (today's UX, multi-process under the hood) — engine starts, auto-spawns SpaceMusic.UI.Standalone.exe as a child process with --engine ws://localhost:<port>/ws/?token=<...> so it skips discovery entirely. Engine tracks the child's PID. Engine ignores clean exit (user closed the UI window → engine keeps running, falls back to QR display). Engine restarts the child on crash up to N times with backoff. Engine close → child receives a clean disconnect → child shows RemoteConnectDialog.
  2. Tablet/remote mode — engine boots, no UI child spawned. QR overlay visible in the Stride render window. User scans QR → browser opens hosted WASM UI at pro.spacemusic.tv?engine=<wss>&token=<...>&fp=<cert> → connects. QR disappears once any client is connected, reappears when all clients disconnect.
  3. Advanced split mode — power user starts engine on one machine and UI exe on another with --engine wss://<host>:<port>/.... UI's EngineDiscovery honours an explicit --engine arg above all probes.

Single-instance enforcement: each engine writes %LOCALAPPDATA%/SpaceMusic/engine.<pid>.port on startup containing { port, pid, startedUtc } and removes it on shutdown. UIs scan for any existing port files first; if more than one is recent, the UI presents a picker. This also supports multiple engines on one box (different Spout sender prefixes, different ports, separate QR codes).

Token model (v1): engine generates a per-launch GUID, scoped to the LAN session. Same token for all clients. QR encodes it. Acceptable risk because token rotates per launch and scope is single LAN session. Per-client token issuance and revocation lands in a later iteration.

07 · Recommended approach

Eight phases, each independently demoable. The hardest pieces (Spout interop, WebRTC) sit deepest in the sequence — earlier phases each unlock visible product progress with minimal new technology risk.

Phase 1

Engine launchable without a UI host

Add a SpaceMusic startup mode that runs the engine without instantiating the Avalonia UI layer: no SmAvaloniaLayerV2, no SkiaRenderer window for hosting Avalonia. Stride keeps its render window — this is the engine's own face now, showing the scene plus (later) the QR overlay. The existing SmChannelBridge + LocalWebSocketServer runs from startup so any client can connect. A new SmEngineInfoOverlay component is wired but inert in this phase (just draws engine name + port) — full QR rendering arrives in Phase 2c.

Concretely a new Start - Headless.bat script with vvvv launch args that load SpaceMusic.vl with the UI subtree disabled. No code changes required — just a patch-level toggle.

Effort½ day Risktrivial Reversiblefully
Phase 2a

UI process scaffold + WebSocket channel transport

Create a new SpaceMusic.UI.Standalone .NET project. Avalonia-based, single window using the existing AppShell and CsvPageView directly (both already in SpaceMusic.UI.Core / SpaceMusic.UI.Pro with no Stride dependency).

The single load-bearing new component is the UI-side IChannelProvider implementation backed by LocalWebSocketTransport: it lets CsvPageView and TexturePresenter use channels exactly as today, but the underlying transport hops to the engine over a socket instead of touching an in-process hub. Adding this to Phase 2a unblocks both parameter writes AND the mouse round-trip (Phase 3.5) — same plumbing, different channel paths.

Save/load scene against the local mirror; UI caps its own framerate at 30 Hz idle, 60 Hz during pointer interaction.

Effort3–5 days Risklow UnlocksUI cannot slow Stride down
Phase 2b

Auto-detect & bootstrap (/hello, port-file, tokens)

Replace Phase 2a's hardcoded connection URL with auto-discovery. Extend LocalWebSocketServer with GET /hello returning an EngineDescriptor JSON. EnginePortFile writes %LOCALAPPDATA%/SpaceMusic/engine.<pid>.port. EngineTokenService issues a per-launch GUID. EngineDiscovery runs three probes in parallel — port-file scan, loopback port scan, Spout SenderNamesMMF enumeration. First probe to answer wins; transport selection follows from the descriptor.

Effort~1 day New depszero (loopback only) OptionalMakaretu.Dns for mDNS
Phase 2c

QR bootstrap + engine-as-launcher (child UI process)

Build SmEngineInfoOverlay: renders a QR code sprite into the Stride render window using ZXing.Net (already vendored, no new dep). The overlay auto-hides when any client connects.

URL format encoded in the QR (frozen now so it doesn't change in Phase 4):

https://pro.spacemusic.tv/?engine=wss://<lan-ip>:<port>/ws/&token=<guid>&fp=<sha256(serverCert)[:8]>

Add the child-process launcher mode: when the engine is launched as a standalone .exe (double-click flow), it spawns SpaceMusic.UI.Standalone.exe --engine ws://localhost:<port>/ws/?token=<...> as a child process. --no-ui flag disables auto-spawn for true server-mode deployments.

Effort~1.5 days Winstablet-via-QR control works without WebRTC yet
Phase 3

Spout texture publisher + consumer (local previews)

Modify the engine's Universal Plugin Data aggregation so the texture slots are created with TextureOptions.Shared. Add a new SmSpoutPublisher component (Stride-side node) that registers each shared texture with the Spout SDK under a stable name. Active sender names surface in /hello's transports.spoutSenders[].

On the UI side, add a SmSpoutConsumer that opens the named senders, gets a D3D11 shared handle, and exposes them as SKImage via the FromSharedHandle pattern. TexturePresenter now draws the Spout-sourced SKImage as part of its own visual tree — the texture is part of the cache, not an overlay drawn on top. The whole SmAvaloniaLayerV2 / SmSkiaRenderTarget gating problem evaporates.

Effort1–2 weeks RiskD3D11 shared-handle edges Winslocal previews end-to-end
Phase 3.5

Interactive previews (mouse round-trip)

Essentially free once Phase 2a and Phase 3 are in place. Refactor TexturePresenterRemoteTexturePresenter: subscribe to Avalonia pointer events, fill PendingMouseX/Y/Buttons/Wheel/IsOver fields identically to today, MouseChannelFlusher DispatcherTimer coalesces moves (60 Hz pointer-over, 30 Hz idle), discrete events bypass the timer and publish immediately.

FitMode coordinate fix: normalize coords relative to the fitted texture rect, not the control bounds. Today's code feeds plugins coords for the whole control including letterbox bars; the new code feeds true texture-space UVs.

Measured latency budget: loopback ClientWebSocket round-trips small frames in <1 ms. Slider drag loop is dominated by the Stride frame boundary (4–16 ms). WebSocket is fast enough; named-pipes are not required.

Effort1–2 days SettlesOpen Decision #2
Phase 4

WebRTC + WASM browser client

The WASM client target already exists. SpaceMusic.Pro.Browser ships to pro.spacemusic.tv via .github/workflows/deploy-wasm.yml. Phase 4 work reduces to three pieces:

  1. Engine-side SmWebRtcPublisher: take Universal Plugin Data textures, encode via NVENC (FFmpeg.AutoGen — already in our dependencies), push as WebRTC tracks via SIPSorcery. Advertise track IDs on the channel hub under sm:<path>.WebRtcTrackId.
  2. Client-side WebRTC bridge (in SpaceMusic.Pro.Browser): wwwroot/webrtc-bridge.js owns one RTCPeerConnection per engine, exposes attachTrack(streamId) returning a <video> element. WebRtcPreviewBridge.cs uses [JSImport]/[JSExport] to bridge .NET ↔ JS. TexturePresenter branches on browser platform: if WebRtcTrackId present, absolute-position the <video> overlay with pointer-events: none so Avalonia still receives pointer events for the interactive-preview round trip.
  3. LocalWebSocketTransport for the WASM client: targets the engine's LocalWebSocketServer directly when the QR URL has ?engine=wss://.... Token passed as query parameter (avoids browser-WASM Authorization header quirks).

Bundle size budget: current pro.spacemusic.tv build is ~14 MB cold, <1 s warm via PWA cache.

Effort3–4 weeks RiskDOM-overlay sharp edges UnlocksQR-scan-and-go from any tablet
Phase 5

Decommission embedded UI

Once Phases 1–4 are stable in real use: remove SmAvaloniaLayerV2 and all the surrounding compositor gating from SpaceMusic.UI.Stride. Remove the SkiaRenderer window. Delete SmSkiaRenderTarget, SmSkiaGpuRenderSession, SmTopLevelImpl's in-process rendering plumbing. Update launch scripts so default = headless engine + UI process pair.

The four flash/perf fixes from 2026-06-01 (InCacheRender gate, _confirmedDestRects/Clips, HasAnyPresenterMovedSinceCache, no-placeholder-when-texture-set) all become obsolete and are deleted.

Effort~2 days Winslarge code reduction

08 · Open decisions

  1. UI exe — Avalonia AppShell vs raw SkiaWindow + custom drawing of CsvPageView?
    • Full Avalonia (recommended for v1): ~14 MB exe + ~0.5–1 s cold startup; maximum reuse of CsvPageView, AppShell, UiFactory, focus/scroll/popup/text-input behaviour, theming, IME. 3–5 days to lift.
    • Raw SkiaWindow (lighter): ~3–5 MB exe; near-instant startup; but reimplements layout, pointer input + focus management, scrolling, popups/dropdowns, text entry. Weeks of work.
    Recommendation: Avalonia for v1. Revisit if measured startup time exceeds 2 s on target hardware.
  2. Local IPC — WebSocket only, or named pipes for very-low-latency channel sync? settled WebSocket only. Phase 3.5 research measured loopback at <1 ms; Stride frame boundary dominates.
  3. Spout naming scheme — stable IDs across engine restarts, or regenerated each run? Recommend stable with an engine-instance prefix (e.g. SpaceMusic-instance1.PluginIO).
  4. Single UI window or N windows? The transport supports it for free — one .exe instance = one engine connection; multi-monitor = launch multiple .exe instances. Document as the first-class pattern in Phase 2a.
  5. UI framerate strategy — adaptive (30 Hz idle, 60 Hz during input). Mouse-flusher already needs the distinction.
  6. Engine-as-launcher default — when user double-clicks the engine, auto-spawn the UI child by default (matches today's UX), or boot headless and show only the QR? Recommend auto-spawn with --no-ui flag.
  7. Model serializer: MemoryPack end-to-end (recommended). MessagePack stays for any Centrifugo legacy wire compatibility.
  8. Engine token model — per-launch GUID (v1, simple) vs per-client tokens (production-grade, revocable). Default per-launch for v1.
  9. WebRTC NAT traversal — Cloudflare's free TURN tier vs coturn alongside Centrifugo on existing Hetzner host. Defer until Phase 4 starts.
  10. Decommission timing — Recommend immediate Phase 5 deletion after Phase 4 stabilises to avoid divergent code paths.

09 · Risks

  1. Spout shared-handle interop — D3D11 shared handles across processes work reliably but the API edges (DXGI keyed mutex synchronisation, format compatibility) have sharp corners. Phase 3's main risk; budget trial runs on NVIDIA, AMD, Intel.
  2. Avalonia standalone exe vs embedded — Avalonia is designed to be a standalone app; this direction is easier than today's embedded mode. Risk is the opposite direction: rebuilding state-restoration, docking, multi-window from scratch. Mitigation: AppShell already encapsulates most of this.
  3. Latency feel — measured, acceptable. WebSocket loopback <1 ms; Stride frame boundary 4–16 ms dominates. Slider drag is interactive without any binary-protocol optimisation.
  4. WASM client video integration — DOM-overlay <video> for WebRTC tracks works but has sharp edges: positioning under Avalonia transforms (forbid non-translate transforms on TexturePresenter ancestors — assert in DEBUG); z-order with Avalonia popups (hide overlays while popups are open); browser autoplay policies (first frame paint must happen after user gesture).
  5. QR token leakage — anyone photographing the screen gets the token. Acceptable for v1 (per-launch rotation, single-LAN-session scope). Per-client token issuance in a later iteration.
  6. FitMode coordinate semantics change — today's MouseX/Y ignores letterbox bars; Phase 3.5 fixes that. Audit existing plugin patches that read .MouseX before flipping.
  7. WAN reachability — beyond LAN, WebRTC needs TURN and WebSocket needs a public WSS endpoint. Phase 4 planning (Cloudflare TURN free tier is the easiest start).
  8. Plugin model coexistence — the buzzing-bubbling-bubble plan envisions UI plugins as headless plugins inside the engine. With this architecture, every UI is its own process, not a plugin. The plugin concept becomes "transport route" instead. Worth aligning vocabulary.
  9. Existing CLAUDE.md / docs assume single-process — many memory notes reference "the UI" as singular and in-process. After Phase 5 those need updating. Track it.

10 · Success criteria

11 · Critical files

Engine-side wiring

UI-side reusable code

Existing WASM client (the QR target)

Code-gen

Reference patterns