Fleek
3,499 posts
- NVIDIA just dropped benchmarks showing 4-bit inference loses less than 1 point vs BF16 on most tasks. It's not accuracy per request that you should be measuring. It's tasks completed per dollar. And at that metric, 4-bit wins by a landslide. Read the full blog 👇
- Replying to @fleek8/ Check out the code: github.com/weyl-ai/mdspan… Check out the Proofs: github.com/weyl-ai/mdspan… /end
- Replying to @fleek


