Delv
Hacker News·INFRA·2mo ago

Zero-copy GPU inference from WebAssembly on Apple Silicon

The CPU and GPU read and write the same physical bytes. End-to-end, it works.

A developer has demonstrated that WebAssembly modules can share memory directly with the GPU on Apple Silicon, eliminating the usual serialization overhead. The technique chains three components: mmap for page-aligned memory, Metal's bytesNoCopy API to wrap that pointer without copying, and Wasmtime's custom memory allocator to use the same region as Wasm linear memory. The result is that a Wasm guest and the GPU read and write the same physical bytes, with no intermediate buffers. The developer measured zero RSS delta compared to 16.78 MB for the copy path. This matters because it turns Wasm into a viable control plane for stateful AI inference on Apple hardware, with near-zero overhead between the sandbox and the accelerator.

#webassembly#apple-silicon#gpu-inference#wasmtime

More in Infra