Fleek (@fleek) / X

Fleek

3,499 posts

Fleek

@fleek

Something new coming soon

Joined October 2018

Fleek
@fleek
Apr 3
please excuse the silence. we've been cooking up something cool and are excited to share more details soon
4.2K
Fleek
@fleek
Jan 29
NVIDIA just dropped benchmarks showing 4-bit inference loses less than 1 point vs BF16 on most tasks. It's not accuracy per request that you should be measuring. It's tasks completed per dollar. And at that metric, 4-bit wins by a landslide. Read the full blog 👇
Fleek
@fleek
Jan 29
Article
NVIDIA Just Killed the "Quantization = Quality Loss" Myth
NVIDIA's new benchmarks show NVFP4 loses less than 1 point on most tasks while delivering 4x FLOPS. The quantization-kills-quality myth is officially dead. There's this take that floats around AI...
8K
Fleek
@fleek
Jan 29
Article
NVIDIA Just Killed the "Quantization = Quality Loss" Myth
NVIDIA's new benchmarks show NVFP4 loses less than 1 point on most tasks while delivering 4x FLOPS. The quantization-kills-quality myth is officially dead. There's this take that floats around AI...
12K
Fleek
@fleek
Jan 24
Replying to @fleek
8/ Check out the code: github.com/weyl-ai/mdspan… Check out the Proofs: github.com/weyl-ai/mdspan… /end
1.7K
Fleek
@fleek
Jan 24
Replying to @fleek
7/ Layout algebra is formalized in Lean 4. 26 theorems, 0 sorry. Properties extracted to RapidCheck tests. The art/ directory has 23 SVG visualizations - we drew pictures until we understood.
1.9K
Fleek
@fleek
Jan 24
Replying to @fleek
6/ What can you actually do with this? * Use cute's bank-conflict-free layouts with standard syntax * Drop into existing mdspan codebases * Keep cute's layout algebra for the hard parts All from one header.
98
Fleek
@fleek
Jan 24
Replying to @fleek
5/ Here's a complete example: cppauto layout = composition(Swizzle<3,3,3>{}, base); std::mdspan<float, std::extents<int, 64, 64>, layout_cute<decltype(layout)>> tile(ptr, layout); tile[i, j] = tile[j, i]; That's it. Swizzle applied automatically.
68
Fleek
@fleek
Jan 24
Replying to @fleek
4/ The trick: mdspan_cute::layout_cute<> wraps any cute layout as an mdspan layout policy. Swizzles, compositions, hierarchical layouts - all work transparently through the standard subscript operator.
65
Fleek
@fleek
Jan 24
Replying to @fleek
3/ cute's layout algebra and mdspan's layout policy are the same mathematical object - a function from coordinates to memory offsets. @nvidia designed both. We just connected the headers.
300
Fleek
@fleek
Jan 24
Replying to @fleek
2/ If you write CUDA kernels, you know the pain: You want tile[i, j] syntax. You need swizzled layouts for bank conflict avoidance. Two different APIs. Manual index translation. It's tedious.
328
Fleek
@fleek
Jan 24
1/ Yesterday we announced mdspan-cute: C++23 std::mdspan syntax with CUTLASS cute layouts. One header. Zero overhead. Here's how it works 🧵
2.8K
Fleek
@fleek
Jan 23
Replying to @fleek
Read the blog: weyl.ai/plan/mdspan-cu… Check out the repo: github.com/weyl-ai/mdspan…
mdspan-cute: Zero-Overhead Bridge to CUTLASS | Weyl
From weyl.ai
1.4K
Fleek
@fleek
Jan 23
💿 Open Source Release 💿 mdspan-cute: a zero-overhead bridge between C++23 std::mdspan and CUTLASS cute layouts. One header. Swizzled memory. No bank conflicts. Read the blog and check out the repo (links in reply)
2.1K