Engineering notes,
occasional opinions,
and benchmarks we trust.

Posts about Metal compute, wavelet math, what Apple Silicon actually does well, and what we got wrong on the way. Written by the people who write the code.

Newest

Where the 2 ms went: profiling MetaWave on M4 Max

An annotated trace through Instruments. We walk through the wavelet pass, the entropy stage, and the one stupid memory copy we kept missing for three releases. With pictures.

Read →

M4 Max: where we landed, and what surprised us

7,856 FPS on FullHD decode. The headline number is fine. The interesting part is what didn't change between M3 Max and M4 Max — and why the bandwidth jump matters less than you'd think.

Read →

Why radiologists still wait for images, in 2024

A short rant about JPEG 2000 in PACS. The codec isn't slow. The implementation choices around it are. We went and watched a clinical workflow for a day. Here's what we learned.

Read →

Unified memory, in plain language

No PCIe bus. No memcpy between CPU and GPU. We measured the real-world impact for a streaming codec workload — it's not the bandwidth that wins, it's the lack of round-trips.

Read →