Pure Go WebGPU Implementation
No Rust, No CGO, Just Go.
Part of the GoGPU ecosystem
Status: v0.9.0-dev — Compute shaders in development. All 5 HAL backends ready.
A complete WebGPU implementation in pure Go:
- No wgpu-native dependency — Standalone Go library
- Direct GPU access — Vulkan, Metal, DX12 backends
- WebGPU compliant — Following the W3C specification
- WASM compatible — Run in browsers via WebAssembly
go get github.com/gogpu/wgpuimport (
"github.com/gogpu/wgpu/core"
"github.com/gogpu/wgpu/types"
)
// Create instance for GPU discovery
instance := core.NewInstance(&types.InstanceDescriptor{
Backends: types.BackendsVulkan | types.BackendsMetal,
})
// Request high-performance GPU
adapterID, _ := instance.RequestAdapter(&types.RequestAdapterOptions{
PowerPreference: types.PowerPreferenceHighPerformance,
})
// Get adapter info
info, _ := core.GetAdapterInfo(adapterID)
fmt.Printf("GPU: %s\n", info.Name)
// Create device
deviceID, _ := core.RequestDevice(adapterID, &types.DeviceDescriptor{
Label: "My Device",
})
// Get queue for command submission
queueID, _ := core.GetDeviceQueue(deviceID)// Create compute pipeline
pipelineID, _ := core.DeviceCreateComputePipeline(deviceID, &core.ComputePipelineDescriptor{
Label: "My Compute Pipeline",
Layout: layoutID,
Compute: core.ProgrammableStage{
Module: shaderModuleID,
EntryPoint: "main",
},
})
// Begin compute pass
encoder, _ := core.DeviceCreateCommandEncoder(deviceID, nil)
computePass := encoder.BeginComputePass(nil)
// Dispatch workgroups
computePass.SetPipeline(pipelineID)
computePass.SetBindGroup(0, bindGroupID, nil)
computePass.Dispatch(64, 1, 1) // 64 workgroups
computePass.End()wgpu/
├── types/ # WebGPU type definitions ✓
├── core/ # Validation, state tracking ✓
├── hal/ # Hardware abstraction layer ✓
│ ├── noop/ # No-op backend (testing) ✓
│ ├── software/ # Software backend ✓ (Full rasterizer, ~10K LOC)
│ ├── gles/ # OpenGL ES backend ✓ (Pure Go, ~7500 LOC, Windows + Linux)
│ ├── vulkan/ # Vulkan backend ✓ (Pure Go, ~27K LOC)
│ │ ├── vk/ # Generated Vulkan bindings (~20K LOC)
│ │ └── memory/ # GPU memory allocator (~1.8K LOC)
│ ├── metal/ # Metal backend ✓ (Pure Go, ~3K LOC, macOS)
│ └── dx12/ # DirectX 12 backend ✓ (Pure Go, ~12K LOC, Windows)
└── cmd/
├── vk-gen/ # Vulkan bindings generator from vk.xml
└── vulkan-triangle/ # Vulkan integration test (red triangle) ✓
Phase 1: Types Package ✓
- Backend types (Vulkan, Metal, DX12, GL)
- Adapter and device types
- Feature flags
- GPU limits with presets
- Texture formats (100+)
- Buffer, sampler, shader types
- Bind group and render state types
- Vertex formats with size calculations
Phase 2: Core Validation ✓
- Type-safe ID system with generics
- Epoch-based use-after-free prevention
- Instance, Adapter, Device, Queue management
- Hub with 17 resource registries
- Comprehensive error handling
- 127 tests with 95% coverage
Phase 3: HAL Interface ✓
- Backend abstraction layer (Backend, Instance, Adapter, Device, Queue)
- Resource interfaces (Buffer, Texture, Surface, Sampler, etc.)
- Command encoding (CommandEncoder, RenderPassEncoder, ComputePassEncoder)
- Backend registration system
- Noop backend for testing
- 54 tests with 94% coverage
Phase 4: Pure Go Backends ✓
- OpenGL ES backend (
hal/gles/) — Pure Go via goffi, Windows (WGL) + Linux (EGL), ~7.5K LOC - Vulkan backend (
hal/vulkan/) — Pure Go via goffi, cross-platform (Windows/Linux/macOS), ~27K LOC - Software backend (
hal/software/) — Full rasterization pipeline, ~10K LOC, 100+ tests - Metal backend (
hal/metal/) — Pure Go via goffi, macOS, ~3K LOC - DX12 backend (
hal/dx12/) — Pure Go via syscall, Windows, ~12K LOC
Phase 5: Compute Shaders (In Progress)
- Core API: ComputePipelineDescriptor, ComputePassEncoder
- HAL infrastructure: glDispatchCompute, SetBindGroup, workgroup sizes
- Backend tests: Vulkan, DX12, Metal, GLES
- Examples: array sum, buffer copy, image filter
- Documentation and tutorials
All backends implemented without CGO:
| Backend | Status | Approach | Platforms |
|---|---|---|---|
| Software | Done | Pure Go CPU rendering | All (headless) |
| OpenGL ES | Done | goffi + WGL/EGL | Windows, Linux |
| Vulkan | Done | goffi + vk-gen from vk.xml | Windows, Linux, macOS |
| Metal | Done | goffi (Obj-C bridge) | macOS, iOS |
| DX12 | Done | syscall + COM | Windows |
Full-featured CPU rasterizer for headless rendering:
# Build with software backend
go build -tags software ./...import _ "github.com/gogpu/wgpu/hal/software"
// Use cases:
// - CI/CD testing without GPU
// - Server-side image generation
// - Embedded systems without GPU
// - Fallback when no GPU available
// - Reference implementation for testing
// Key feature: read rendered pixels
surface.GetFramebuffer() // Returns []byte (RGBA8)Rasterization Pipeline (hal/software/raster/):
- Edge function (Pineda) triangle rasterization with top-left fill rule
- Perspective-correct attribute interpolation
- Depth buffer with 8 compare functions
- Stencil buffer with 8 operations
- 13 blend factors, 5 blend operations (WebGPU spec compliant)
- 6-plane frustum clipping (Sutherland-Hodgman)
- Backface culling (CW/CCW)
- 8x8 tile-based rasterization for cache locality
- Parallel rasterization with worker pool
- Incremental edge evaluation (O(1) per pixel)
Shader System (hal/software/shader/):
- Callback-based vertex/fragment shaders
- Built-in shaders: SolidColor, VertexColor, Textured
- Custom shader support via
VertexShaderFunc/FragmentShaderFunc
Metrics: ~10K LOC, 100+ tests, 94% coverage
- Auto-generated bindings from official Vulkan XML specification
- Memory allocator with buddy allocation (O(log n), minimal fragmentation)
- Vulkan 1.3 dynamic rendering — No render pass objects needed
- Swapchain management with automatic recreation
- Semaphore synchronization for frame presentation
- Complete HAL implementation:
- Buffer, Texture, TextureView, Sampler
- ShaderModule, BindGroupLayout, BindGroup
- PipelineLayout, RenderPipeline, ComputePipeline
- CommandEncoder, RenderPassEncoder, ComputePassEncoder
- Fence synchronization, WriteTexture immediate upload
- Comprehensive unit tests (93 tests, 2200+ LOC):
- Conversion functions (formats, usage, blend modes)
- Descriptor allocator logic
- Resource structures
- Memory allocator (buddy allocation)
- Pure Go Objective-C bridge via goffi
- Metal API access via Objective-C runtime
- Device and adapter enumeration
- Command buffer and render encoder
- Shader compilation (MSL via naga v0.6.0)
- Texture and buffer management
- Surface presentation (CAMetalLayer integration)
- ~3K lines of code
- Pure Go COM bindings via syscall (no CGO!)
- D3D12 API access via COM interface vtables
- DXGI integration for swapchain and adapter enumeration
- Descriptor heap management (CBV/SRV/UAV, Sampler, RTV, DSV)
- Flip model swapchain with tearing support (VRR)
- Command list recording with resource barriers
- Root signature and PSO creation
- ~12K lines of code
Structure:
hal/dx12/
├── d3d12/ # D3D12 COM bindings (~4K LOC)
├── dxgi/ # DXGI bindings (~2K LOC)
├── instance.go # Backend, Instance, Surface
├── adapter.go # Adapter enumeration
├── device.go # Device, descriptor heaps
├── queue.go # Command queue
├── surface.go # Swapchain management
├── resource.go # Buffer, Texture, TextureView
├── command.go # CommandEncoder, RenderPassEncoder
├── pipeline.go # RenderPipeline, ComputePipeline
└── convert.go # Format conversion helpers
- wgpu (Rust) — Reference implementation
- WebGPU Specification
- Dawn (C++) — Google's implementation
| Project | Description | Purpose |
|---|---|---|
| gogpu/gogpu | Graphics framework | GPU abstraction, windowing, input |
| gogpu/naga | Shader compiler | WGSL → SPIR-V, MSL, GLSL |
| gogpu/gg | 2D graphics | Canvas API, scene graph, GPU text |
| gogpu/ui | GUI toolkit | Widgets, layouts, themes (planned) |
| go-webgpu/webgpu | FFI bindings | wgpu-native integration |
Note: Always use the latest versions. Check each repository for current releases.
We welcome contributions! See CONTRIBUTING.md for guidelines.
MIT License — see LICENSE for details.
wgpu — WebGPU in Pure Go