---
title: "One source, three GPUs, and a browser: putting a native UI on WebGPU"
date: "2026-06-24"
description: "The same Vel widget tree renders on Metal, D3D12, and the browser because Dawn is the portability layer — and the WASM build needs zero COOP/COEP headers."
slug: "one-source-three-gpus-and-a-browser"
tldr: "Vel compiles one C++ engine to native (Metal/D3D12) and to WebGPU in the browser via Emscripten's emdawnwebgpu. Because the build is single-threaded with ASYNCIFY, it needs no SharedArrayBuffer and therefore no cross-origin-isolation headers — so it hosts on any static CDN and embeds in a plain iframe."
tags: ["webassembly", "webgpu", "cpp", "emscripten"]
cover: "/images/blog/one-source-three-gpus-and-a-browser.svg"
coverAlt: "One engine compiling to four targets — Metal, D3D12, Vulkan, and browser WebGPU — through a single Dawn API"
author: "Sai Chandan Kadarla"
devto_id: 3983576
---

The Vel playground at [vel.kadarla.com/play](https://vel.kadarla.com/play) is the same engine that draws the native macOS app, compiled to WebAssembly and pointed at the browser's GPU. Not a re-implementation, not a canvas2d fallback, not a screenshot service — the literal C++ widget tree, running in your tab, rendering through WebGPU.

The surprising part isn't that it works. It's how little code the browser target needed, and one deployment property that makes it genuinely cheap to host.

## Dawn is the portability layer, so the browser is just another backend

I wrote about [the platform seam](/blog/windows-one-surface-seam) being two functions. The web is the cleanest demonstration of why that design pays off. Native platforms hand the GPU a window handle; the browser binds the GPU to an HTML canvas by CSS selector. So `SurfaceWeb.cpp` barely does anything:

```cpp
// Web (Emscripten) surface glue. There is no window handle — the surface is
// bound to the "#canvas" element in Surface.cpp. So we only return a non-null
// sentinel so the validity check passes; resize is driven by the JS host.
void* attachNativeSurface(GLFWwindow* window) {
    if (!window) return nullptr;
    return reinterpret_cast<void*>(0x1);  // sentinel: "canvas-backed"
}
void resizeNativeSurface(void*, int, int) {}
```

The reason this is enough: I build Lume on **Dawn**, Google's WebGPU implementation. Natively, Dawn translates my `wgpu::` calls to Metal, D3D12, or Vulkan. On the web, Emscripten ships **emdawnwebgpu** — a port of the exact same `webgpu.h` API that forwards to the browser's *real* WebGPU device. So the engine code doesn't change. The WGSL shaders don't change. The instanced-rect pipeline that draws every shape doesn't change. They all compile to WASM and talk to a GPU that happens to live behind the browser instead of behind the kernel.

There's no `#ifdef __EMSCRIPTEN__` in the paint code. The web is a backend, not a rewrite — the same way Windows was.

## The blocking loop problem, and the header you don't need

A native app loop is allowed to block. Vel's idle path literally parks the thread in `glfwWaitEventsTimeout` and burns ~0 CPU until an event arrives. You cannot do that on the web: blocking the main thread freezes the tab.

The usual answer is threads — run your loop on a Web Worker, use `SharedArrayBuffer` to talk to the main thread. But `SharedArrayBuffer` is the expensive choice, and not for the reason people expect. Since Spectre, browsers only expose it when the page is **cross-origin isolated**, which means you must serve these two headers on every response:

```
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
```

Those headers are quietly hostile. `require-corp` means every cross-origin resource the page loads — fonts, images, analytics, an embedded iframe — must opt in with its own CORP/COEP headers or it's blocked. It breaks third-party embeds. It means you can't just drop the build on a static CDN and link it. And it makes the playground hard to embed in *another* page (like the docs).

So I went the other way: **single-threaded, with ASYNCIFY.** ASYNCIFY is an Emscripten transform that rewrites the WASM so a "blocking" call can actually unwind the stack, yield to the browser's event loop, and resume later. The engine keeps its natural blocking-loop shape in C++; ASYNCIFY makes that cooperate with the event loop instead of freezing it. No worker, no `SharedArrayBuffer`, and therefore **no COOP/COEP headers at all.**

The payoff is operational: the playground is four static files (`index.html`, `index.js`, a ~3 MB `index.wasm`, and an `app.html`). It hosts on plain Vercel with no special headers, and it embeds in the docs as an ordinary `<iframe>`. The preview pane you see is a *real* nested iframe running its own WebGPU device, with source streamed in over `postMessage` — which is only possible because nothing requires cross-origin isolation.

## HiDPI falls out for free

One detail I like: the canvas backing size is set to `CSS size × devicePixelRatio` by the JS host, and Surface reconfigures the wgpu surface to match. That's the same physical-pixel rule the [native text rasterizer uses](/blog/hidpi-crisp-text) — so text on the web is snapped to device pixels and stays crisp on Retina, using the identical code path as the desktop app. Cross-platform consistency isn't a goal I chase; it's a consequence of there being one renderer.

## What it costs

ASYNCIFY isn't free. It instruments the binary, which adds size and a small per-call overhead on the functions that can unwind — you don't want it everywhere, so you scope which calls it applies to. Single-threaded also means exactly that: no offloading layout or decode to a worker, so a genuinely heavy frame has nowhere to hide. For a UI that idles at ~0 CPU and lays out [10k rows in ~2 ms](/blog/65x-faster-relayout) that's fine; for a compute-heavy app it would be a real ceiling.

And the honest caveat: this is WebGPU, so it needs a recent browser. Chrome and Edge have had it on by default since 113; Safari shipped it; Firefox is partial. A blank canvas almost always means "this browser doesn't have WebGPU enabled," which is a worse failure mode than a 2D fallback would be — I chose fidelity over reach.

But the thing I set out to prove held up: porting a native GPU UI to the browser was a 20-line surface file and a build-flag decision, not a parallel web codebase. The hard part of "write once, run everywhere" was never the rendering. It was refusing to let the platforms leak into the parts that aren't platform-specific.