---
title: "From Skia to Lume: writing my own 2D rendering engine for Vel"
date: "2026-06-07"
description: "Vel started on Skia because it was the most complete 2D API I could grab. Once the DSL was working I replaced it with Lume, a from-scratch Dawn/WebGPU engine."
slug: "lume-rendering-engine"
tldr: "Vel's renderer is now Lume, a from-scratch 2D GPU engine on Dawn/WebGPU with analytic SDF shapes, a FreeType glyph atlas, and four instanced WGSL pipelines. ~3,000 lines of engine code, libvel.dylib at 11 MB, zero Sk* references in the tree."
tags: ["vel", "gpu", "rendering", "webgpu"]
cover: "/images/blog/lume-rendering-engine/cover.jpg"
coverAlt: "Lume, a 2D rendering engine. Gold-and-charcoal L mark with the LUME wordmark and 2D RENDERING ENGINE subtitle on a cream background."
author: "Sai Chandan Kadarla"
devto_id: 3842924
---

Vel is about a week old. I started it as a DSL plus framework experiment, and from day one the rendering substrate was Skia. That wasn't an accident. Skia is the most complete 2D rendering API you can drop into a C++ project today. Clean canvas surface, built-in text with system font fallback, image decode, a GPU backend that already works on every desktop OS. If you want a UI framework drawing pixels by the end of the week, Skia is what you reach for.

But the plan was always to replace it. Skia is a brilliant CPU-rasterization library bolted to a GPU backend, and as soon as you push it hard, the bolts show. Flutter publicly battled this same class of problems for years before they shipped [Impeller](https://docs.flutter.dev/perf/impeller) and finally got rid of the runtime shader-compilation jank that made early Flutter apps stutter. I'd rather not repeat their story. So once the DSL was working and the framework was responding to my changes the way I wanted, I started writing the renderer I actually needed.

The new engine is called **Lume**. It lives in [`engine/`](https://github.com/chan27-2/Vel/tree/main/engine) of the Vel repo. This post is about why I started with Skia, why I'm replacing it now, and what Lume does differently.

![Vel's showcase app, every pixel painted by Lume.](/images/blog/lume-rendering-engine/showcase-1.png)

## Why Skia first

Skia gave Vel three things I needed in the first few days of having a framework at all:

- A clean `SkCanvas` API the widget pipeline could draw into without me writing any GPU code.
- A working text renderer (CoreText on macOS, FreeType elsewhere, all behind the same API) with system font fallback.
- Image decode plus GPU upload as table-stakes, so Image widgets just worked.

That let me focus on the actual hard problem of the framework, the DSL and the reactive substrate. Layout, signals, hot-reload, event dispatch, the widget registry. The rendering substrate didn't need to be mine yet. Skia was a load-bearing dependency for exactly the amount of time it took the rest of the system to stop being the bottleneck.

## Why I'm replacing it now

Once the DSL and framework were in shape, I had a clear view of what the renderer was actually doing for me, and what it was going to cost as the surface area grew. Three things, all well-known to anyone who's tried to ship a Skia-based UI runtime at scale.

**1. Shader compilation jank.** Skia compiles shaders the first time it sees a new primitive *during the frame that wants to draw it*. The first time you open a dialog with a blur, you pay 40 to 120 ms while Skia builds the right shader for the GPU. Flutter spent years trying to predict and pre-warm these (the infamous "skp shader cache") and never fully won. The Impeller team's own postmortem describes this as the engine's defining flaw.

**2. Tessellation on the CPU.** Skia turns rounded rectangles, strokes, and curves into triangle meshes on the CPU, then ships them to the GPU. For one card it's free. For a table of 200 rows with rounded corners and hover highlights, the CPU is doing a lot of work that a fragment shader could do once and for all with [an analytic SDF](https://iquilezles.org/articles/distfunctions2d/).

**3. The framework didn't own its render path.** This was the real one. Every cross-cutting question I expected to hit later (popovers clipping inside scroll views, text positioning in tight cells, atlas eviction policy, draw order across overlays, HiDPI handling) was eventually going to bottom out in Skia's behavior, and the answer was always going to be "work around it." When you don't control the rendering substrate, every one of those concerns is something you negotiate with a library that doesn't know what your widgets are.

I'm not the first person to land here. Flutter, Servo, Bevy's UI work, Slint: every team building a rendering-heavy UI runtime has eventually concluded that owning the engine is the only way to make the rest of the system answer to one design. The cheaper time to do it is before you have a year of code depending on someone else's render path.

## What I borrowed from Flutter

Impeller's defining decision is *ahead-of-time shader compilation*. Every shader the engine could ever need is compiled at build time into Metal or Vulkan IR and bundled with the binary. The "first render is slow" problem goes away because there is no first render. Every shader has already been seen.

That insight was the foundation. The other thing I borrowed: keep the pipeline list small. Impeller has on the order of a dozen pipelines, not hundreds. The way you do that is by reducing every primitive you draw to a small set of canonical shapes (rounded rects with optional ring strokes, textured quads, line segments) and varying their behavior through *uniforms*, not new shaders.

Lume's pipeline count today is four:

1. **Shape**: analytic SDF rounded-rect. Fills, strokes, circles, lines, soft shadows all collapse to this.
2. **Line**: per-segment rotated quad with butt caps for polylines.
3. **Text**: textured quad sampling an R8 glyph atlas.
4. **Image**: textured quad sampling RGBA8 with corner-radius mask.

Every shape in the Vel showcase is one of those four primitives. A `roundedFill` is the shape pipeline with `strokeWidth=0, radius=R`. A `shadowRect` is the same pipeline with `blur>0`, which switches the fragment shader to a `smoothstep` falloff instead of the AA clamp. A `circleStroke` is a shape with `radius=w/2`. The instance attributes do the heavy lifting; the GPU just rasterizes.

## What Lume actually is

![Lume, a 2D rendering engine for Vel.](/images/blog/lume-rendering-engine/icon.jpg)

The architecture is four layers:

```
L1  platform/   → CAMetalLayer attach (macOS). Future: ANativeWindow, HWND, canvas.
L2  gpu::Device → Dawn instance + adapter + device + queue (singleton).
L2  gpu::Surface → wgpu::Surface bound to the window's native layer.
L3  paint/      → DawnPainterImpl: four WGSL pipelines, glyph atlas,
                  per-instance state for shape/line/text/image, submission-
                  order draw segments.
L4  Painter API → public surface: fill, roundedFill, stroke, polyline,
                  arc, image, text, pushClip, pushTransform, and so on.
```

The whole stack is `engine/include/vel/` (public headers) plus `engine/src/` (about 3,000 lines of implementation). The framework calls into the Painter API and never sees a WebGPU type.

Three details that took real effort:

**The glyph atlas is keyed on physical pixel size.** When you ask for 14 px text on a 2× DPR display, FreeType rasterizes at 28 px. Lume's atlas cache key includes that physical size, so a window dragged to a 1× external monitor doesn't render upsampled-blurry text. It just rasterizes a second 14 px entry and uses that. The dst rect stays in logical pixels; the GPU samples the physical atlas 1:1.

**Submission-order draw segments.** Originally Lume batched all-shapes, then all-lines, then all-text per frame. This broke the Table widget's sticky header: the header background was drawn before the row text, so row text overdrew the header bg, and rows became visible *through* the header during scroll. The fix was to track a small `DrawCmd` list (`{kind, firstInstance, count}`) in submission order and emit one Draw call per segment. Same-kind cmds fuse. The Table works, and any widget that depends on draw order ("this card needs to be on top of those cards") works for the same reason.

**Drag capture survives reactive rebuilds.** Vel is signal-driven. When the user drags a Slider, the slider writes to a signal, which triggers a re-render, which replaces the Slider widget instance. The new instance has `dragging_=false`. The drag dies after one mouse-move event. The fix wasn't in Lume; it was in the framework's `EventDispatcher`. `captureDrag(handler)` registers a callable that closes over the slider's geometry plus its `onChange` (whose own closure captures the long-lived owning component's `this`). Mouse-move and mouse-up route to the captured handler directly, bypassing the widget tree. Drag continues across any number of rebuilds.

![Cards, forms, and overlays from the showcase, all going through Lume's four pipelines.](/images/blog/lume-rendering-engine/showcase-2.png)

## The Skia / Impeller / Lume comparison

The dimensions that matter for a 2D UI runtime:

| | **Skia (Vel v1)** | **Impeller (Flutter)** | **Lume (Vel today)** |
|---|---|---|---|
| Shader compilation | JIT, at first-draw time | AOT, build-time | WGSL precompiled by Dawn at device init |
| Shape rendering | CPU tessellation → GPU triangles | Compute + tessellation hybrid | Analytic SDF in the fragment shader |
| Pipeline count | hundreds (one per primitive + state combo) | ~12 | 4 |
| Text | CoreText / FreeType per platform | Manual rasterizer → MTLTexture atlas | FreeType → R8 atlas, OS/2 typo metrics |
| Idle frame cost | Always paints | Always paints | ~0 (frame-dirty flag short-circuits the whole pipeline) |
| HiDPI | Surface scaled in canvas | Per-pass DPR awareness | Atlas keyed on physical px; dst rect in logical px |
| Cross-platform reach | GL/Vulkan/Metal/D3D11 | Metal + Vulkan (+ work-in-progress) | Dawn handles Metal/Vulkan/D3D12/WebGPU from one WGSL source |
| Library code in libvel | Skia + image codecs (~25 MB linked) | n/a | 0 |
| `libvel.dylib` size (macOS arm64) | ~30 MB | n/a | **11 MB** |
| Hot-reload safety | Crashes if plugin link drops Skia symbols | n/a | Plugin links the same `libvel.dylib`; nothing else to share |

The single most useful number on that table is the bottom one. With Skia gone, the hot-reload plugin no longer needs to think about which graphics symbols it shares with the host. `libvel.dylib` is the sole boundary. A hot reload re-emits a `.vel.cpp`, recompiles 200 lines, and `dlopen`s the new dylib in under a second.

![A virtualized Table with a sticky header, the test case that forced submission-order draw segments into existence.](/images/blog/lume-rendering-engine/showcase-3.png)

## What Lume doesn't do yet (the honest section)

This is the first usable version of the engine, and I'd be lying if I said it was at parity with Skia for every workload. Three real gaps:

**Compute-shader Gaussian blur.** Lume's shadow is currently a `smoothstep` outer falloff applied to the rounded-rect SDF. For small blur radii (4 to 16 px, which covers most UI shadows) it's perceptually identical to a Gaussian. For larger radii it reads as "the rect got bigger and softer at the edges" rather than a true Gaussian. A two-pass separable Gaussian in a compute pipeline is next; for now the cheap approximation is honest about what it is.

**Complex-script text shaping.** I link HarfBuzz; I don't drive it yet. Latin, Cyrillic, and Greek render correctly. Arabic ligatures, Devanagari conjuncts, vertical text: those are next. The FreeType path is in; the HarfBuzz shaping pass on top of it isn't.

**The platform surface is macOS-only.** Dawn supports Vulkan and D3D12, so the underlying portability is real. The part missing is the window-to-surface glue. Lume has a `SurfaceMac.mm` that attaches a `CAMetalLayer` to a GLFW window's `NSView`; the Windows and Linux equivalents are file-shaped holes today. CI builds compile against the abstraction, but the surface code is the actual port.

The roadmap continues from here: native arcs and dashed strokes via additional pipelines, then HarfBuzz, then compute blur, then a Web target via Dawn plus Emscripten, then Windows and Linux surface layers, then partial-repaint damage rects. Owning the engine means the work is real, but at least it's bounded.

![Images, icons, and inputs in the showcase, all decoded through ImageIO and uploaded as `wgpu::Texture`.](/images/blog/lume-rendering-engine/showcase-4.png)

## The journey is in the git log

The proof that Lume isn't a paper exercise is the diff. The Skia removal was commit `0d7a8f4`. The reorganized four-tier repo (Lume in `engine/`, the framework in `framework/`, the component registry in `registry/`, and the .vel compiler in `velc/`) is `f5b86af`. `grep -rE 'Sk[A-Z]|sk_sp' engine framework registry velc` returns zero hits. The dependency list in `vcpkg.json` is six lines now, none of them Skia.

```
dawn          : GPU abstraction (Metal/Vulkan/D3D12/WebGPU)
freetype      : glyph rasterization
harfbuzz      : complex-script shaping (next)
glfw3         : windowing
spdlog        : logging
nlohmann-json : JSON for the framework
```

If you want to read it, the code is at [github.com/chan27-2/Vel](https://github.com/chan27-2/Vel). The README's [Lume section](https://github.com/chan27-2/Vel#lume--the-rendering-engine) walks the engine specifically; the `engine/` tree on `main` is the smallest version of "a 2D GPU rendering engine you can actually run" I know how to write.

The lesson I'd take from this, and I'm saying this because I want to remember it later, is that *the rendering substrate is not a library decision*. It's an architecture decision. The moment your framework needs to answer cross-cutting questions about hit-testing, atlas eviction, draw order, and HiDPI all at once, you can either keep negotiating with someone else's library or you can write your own. Flutter eventually came to the same conclusion. So did I, just earlier. The work is bigger than it looks. The result is that everything downstream of the renderer stops feeling like it's fighting the renderer.

![Lume v1, running the full Vel showcase on macOS.](/images/blog/lume-rendering-engine/showcase-5.png)
