Skip to content

Releases: go-webgpu/goffi

v0.5.0 — Windows ARM64 + FreeBSD support

29 Mar 11:44
ca3231c

Choose a tag to compare

What's New

Two new platforms -- goffi now supports 7 targets:

Windows ARM64 (Snapdragon X)

  • Extended AAPCS64 ARM64 implementation to Windows via build tag changes
  • Zero new assembly -- Windows ARM64 ABI is identical to Unix ARM64
  • runtime.cgocall works on Windows without fakecgo
  • Tested on Samsung Galaxy Book 4 Edge (Snapdragon X Elite) by @SideFx

FreeBSD amd64

  • Added libc.so.7 dynamic loading support
  • System V ABI identical to Linux -- same assembly code
  • Requires -gcflags="github.com/go-webgpu/goffi/internal/fakecgo=-std" for CGO_ENABLED=0 builds

CI improvements

  • New cross-compilation job validates all 7 platforms compile correctly

Platform Support (7 targets)

Platform Arch ABI Status
Windows amd64 Win64 Production
Windows arm64 AAPCS64 NEW -- tested on Snapdragon X
Linux amd64 System V Production
Linux arm64 AAPCS64 Production
macOS amd64 System V Production
macOS arm64 AAPCS64 Production
FreeBSD amd64 System V NEW -- cross-compile verified

Full Changelog

See CHANGELOG.md

v0.4.2

03 Mar 19:49
c8ef100

Choose a tag to compare

purego Compatibility Fix

Fixed

  • Unix: duplicate symbol conflict with purego — added build tag nofakecgo to resolve _cgo_init linker collision when goffi and purego coexist with CGO_ENABLED=0 (#22)

Workaround

When using both goffi and purego in the same binary:

CGO_ENABLED=0 go build -tags nofakecgo ./...

This disables goffi's internal fakecgo, relying on purego's identical copy.

Also included

  • Unit tests for types and internal/arch/amd64 packages
  • Test coverage increased from 75% to 89% (-coverpkg=./...)
  • Dynamic Codecov badge in README

Closes #22

v0.4.1 — ABI Compliance Hotfix

02 Mar 00:27
7e87b11

Choose a tag to compare

ABI Compliance Hotfix

Full forward call path audit — 10 of 11 identified ABI gaps fixed.

Fixed

  • Float32 argument encoding bugmath.Float32bits instead of float64 widening, which corrupted XMM bit patterns
  • AMD64 Unix: stack spill for arguments 7+ — args beyond 6 GP registers now correctly pushed to stack before CALL
  • ARM64 Unix: stack spill for arguments 9+ — args beyond 8 GP registers now correctly pushed to stack before BL
  • AMD64 struct return 9-16 bytes — RAX+RDX register pair correctly assembled into output buffer
  • AMD64 sret hidden pointer — structs >16B inject caller buffer as first arg (RDI), per System V ABI
  • ARM64 HFA stack spill — HFA overflow correctly spills entire aggregate to stack per AAPCS64
  • runtime.KeepAlive — added after each FFI call to prevent GC of argument pointers

Added

  • Overflow detectionErrTooManyArguments for >15 args
  • Regression tests: TestWindowsStackArguments, TestWindowsStackArgumentsFileIO, TestWindowsStackArguments10Args, TestFloat32ArgEncoding, TestOverflowDetection, TestUnixStackSpill7Args

Removed

  • Dead callUnix64 assembly experiment

Known Limitation (documented)

  • Windows: float return from XMM0syscall.SyscallN only returns RAX, not XMM0

Verification

  • Build: all 5 platforms cross-compile OK
  • Tests: all PASS, coverage 89.6%
  • Linter: 0 issues

Closes #19

v0.4.0 — crosscall2 Integration

27 Feb 09:35
6fe9a0b

Choose a tag to compare

What's New

crosscall2 integration — callbacks now work from C-library-created threads (Metal, wgpu-native).

Added

  • crosscall2 integration for C-thread callback support (#16)
    • Dispatchers route through crosscall2 → runtime·load_g → runtime·cgocallback
    • Supports callbacks from arbitrary C threads
    • callbackWrap_call closure for ABIInternal fn ptr from assembly
    • go_asm.h constants for callbackArgs struct offsets

Fixed

  • fakecgo trampoline register bugs (synced with purego v0.10.0)
    • ARM64: R26→R9, R2→R9, threadentry callee-save/restore
    • AMD64: DX→R11, CX→R11, BX→R11, JMP tail calls, PUSH_REGS_HOST_TO_ABI0

Verification

  • All CI checks pass (Linux, Windows, macOS)
  • 89.6% test coverage
  • 0 linter issues
  • 5-platform cross-compile verified

Full Changelog: v0.3.9...v0.4.0

v0.3.9 — ARM64 Callback Fixes

18 Feb 10:46
aa78271

Choose a tag to compare

What's Changed

Fixed

  • ARM64 callback trampoline rewrite — replaced BL dispatcher with MOVD $index, R12 + B dispatcher pattern (matching Go runtime and purego conventions). Fixes LR corruption and entrySize mismatch for callbacks at index > 0.
  • Symbol rename — callback assembly symbols renamed to package-scoped (·callbackTrampoline/·callbackDispatcher) to avoid linker collision with purego (#15)

Known Limitations

  • crosscall2 bypass — callbacks invoked from C-library-created threads (e.g., Metal addCompletedHandler:) may fail because goffi calls Go directly without crosscall2 → runtime·cgocallback. Tracked in #16, planned for v0.4.0.

Upgrading

go get github.com/go-webgpu/goffi@v0.3.9

If you use goffi callbacks on ARM64 (macOS Apple Silicon / Linux ARM64), this update is strongly recommended.

Full Changelog: v0.3.8...v0.3.9

v0.3.8: Enterprise-grade CGO_ENABLED=1 Error Handling

24 Jan 20:07
84e93e8

Choose a tag to compare

What's Changed

This release fixes confusing linker errors that occurred when building on Linux/macOS with a C compiler (gcc/clang) installed.

Fixed

  • CGO_ENABLED=1 build error handling (gogpu/wgpu#43)
    • Users now see a clear compile-time error: undefined: GOFFI_REQUIRES_CGO_ENABLED_0
    • Opening the source file shows full instructions in godoc comment

Added

  • Compile-time CGO detection with descriptive error identifier
  • Requirements section in README.md with clear CGO_ENABLED=0 instructions
  • Runtime panic fallback with detailed fix instructions (defense in depth)

Changed

  • Added !cgo build constraint to:
    • ffi/dl_unix.go, ffi/dl_darwin.go
    • internal/dl/dl_stubs_unix.s, internal/dl/dl_wrappers_unix.s
    • internal/dl/dl_stubs_arm64.s, internal/dl/dl_wrappers_arm64.s

User Experience

Before (v0.3.7):

# github.com/go-webgpu/goffi/ffi
.../dl_unix.go:54:20: undefined: dl.Dlopen

Confusing - no indication of how to fix

After (v0.3.8):

# github.com/go-webgpu/goffi/ffi
ffi/cgo_unsupported.go:28:9: undefined: GOFFI_REQUIRES_CGO_ENABLED_0

Clear - identifier name tells user exactly what's needed

Quick Fix

CGO_ENABLED=0 go build ./...

Or set permanently:

go env -w CGO_ENABLED=0

Full Changelog: v0.3.7...v0.3.8

v0.3.7 - ARM64 Darwin Comprehensive Support

03 Jan 07:18

Choose a tag to compare

ARM64 Darwin Comprehensive Support

This release adds comprehensive ARM64 darwin (Apple Silicon) support, tested on M3 Pro.

Added

  • ARM64 Darwin comprehensive support (PR #9 by @ppoage)

    • Tested on Apple Silicon M3 Pro (64 ns/op benchmark)
    • Nested struct handling via placeStructRegisters()
    • Mixed int/float struct support via countStructRegUsage()
    • ensureStructLayout() for auto-computing size/alignment
    • Assembly shim (abi_capture_test.s) for ABI verification
    • Comprehensive darwin ObjC tests (747 lines)
    • Struct argument tests (537 lines)
  • r2 (X1) return for 9-16 byte struct returns

    • Call8Float now returns both X0 and X1
    • Fixes struct returns between 9-16 bytes on ARM64
  • uint64 bit patterns for float registers

    • Cleaner handling of mixed float32/float64 arguments

Fixed

  • BenchmarkGoffiStringOutput segfault on darwin
    • Pointer argument now correctly passed as unsafe.Pointer(&strPtr)

Contributors

  • @ppoage - ARM64 Darwin fixes, ObjC tests, assembly shim

Full Changelog: v0.3.6...v0.3.7

v0.3.6 - ARM64 HFA/Large Struct Return Fix

29 Dec 09:38
da2f8f7

Choose a tag to compare

Critical Fixes for ARM64 (Apple Silicon M1/M2/M3/M4)

Fixed

  • ARM64 HFA (Homogeneous Floating-point Aggregate) returns

    • NSRect (4 × float64) returned zeros on Apple Silicon
    • Root cause: assembly only saved D0-D1, HFA needs D0-D3
    • Solution: save all 4 float registers for HFA returns
  • ARM64 large struct return via X8 (sret)

    • Non-HFA structs >16 bytes returned via implicit pointer in X8
    • Root cause: X8 register never loaded before function call
    • Solution: load rvalue pointer into X8 for sret calls

Added

  • ReturnHFA2, ReturnHFA3, ReturnHFA4 return flag constants
  • handleHFAReturn function for processing HFA struct returns
  • Unit tests for ARM64 HFA classification

Technical Details

  • AAPCS64: HFA structs with 1-4 same-type floats return in D0-D3
  • AAPCS64: Large non-HFA structs (>16 bytes) return via hidden pointer in X8
  • NSRect = CGRect = 4 × float64 = 32 bytes = HFA (returns in D0-D3)

Impact

  • Fixes blank window issue on macOS ARM64 (GPU window size was 0×0)
  • Fixes gogpu#24

Full Changelog: v0.3.5...v0.3.6

v0.3.5 - Windows Stack Arguments Fix

27 Dec 11:26
934decc

Choose a tag to compare

Fixed

  • Windows stack arguments not implemented (Critical)
    • Functions with >4 arguments caused panic: stack arguments not implemented
    • Win64 ABI: first 4 args in registers (RCX/RDX/R8/R9), args 5+ on stack
    • Solution: Use syscall.SyscallN with variadic args for unlimited argument support
    • Affected Vulkan functions: vkCreateGraphicsPipelines (6 args), vkCmdBindVertexBuffers (5 args), etc.

Changed

  • Simplified Windows FFI - removed intermediate syscall wrapper
    • Removed: internal/syscall/syscall_windows_amd64.go
    • call_windows.go now calls syscall.SyscallN directly with args...
    • Cleaner code, fewer indirections

Technical Details

  • syscall.SyscallN(fn, args...) supports up to 15+ arguments
  • Handles both register (1-4) and stack (5+) arguments automatically
  • Same approach used by purego for Windows FFI

Full Changelog

v0.3.4...v0.3.5

v0.3.4 - Windows Stack Overflow Fix

27 Dec 10:55
9f9c843

Choose a tag to compare

Fixed

  • Windows stack overflow on Vulkan API calls (Critical)
    • callWin64 assembly used NOSPLIT, $32 - prevented Go runtime stack growth
    • Solution: Replace with syscall.SyscallN (Go runtime's asmstdcall mechanism)
    • Matches purego's proven approach for Windows FFI

Changed

  • Windows FFI architecture refactored
    • Removed: internal/arch/amd64/call_windows.s
    • Added: internal/syscall/syscall_windows_amd64.go
    • Uses Go runtime's built-in stack management

Technical Details

The custom Windows assembly used NOSPLIT directive which prevents Go runtime from growing the goroutine stack. When C functions (especially Vulkan/WebGPU APIs) require more stack space than the fixed 32 bytes, this caused STACK_OVERFLOW (Exception 0xc00000fd).

The fix uses syscall.SyscallN which internally leverages runtime.cgocall + asmstdcall, properly managing stack growth through Go runtime.

Full Changelog

v0.3.3...v0.3.4