-
Notifications
You must be signed in to change notification settings - Fork 14k
HIP: fix RDNA3 FP16/BF16 matrix multiplication #17817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIP: fix RDNA3 FP16/BF16 matrix multiplication #17817
Conversation
Beinsezii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full test-backend-ops green now on gfx1100
|
though interesting according to |
|
Looking at the build command in #17576 (comment) they have |
|
yeah. using a realistic workload all mmq commits + this one https://github.com/Beinsezii/llama.cpp/tree/rdna3_perf_mmq
all recent mmq commits reverted Still think this should get merged first to stop the failures but maybe I should open a new issue for perf? I assume it'll be fixed by #17495 eventually since that will replace the rocWMMA path. |
|
Any differences you see should be from MMQ vs. rocBLAS. If you compile with |
Rebuilt against this PR confirmed perf good with cublas. Might be worth making it the default again until the other PR is ready, people will really notice 1/3 of throughput gone. |
|
I don't know if it was within the scope of this PR but building with GGML_HIP_ROCWMMA_FATTN=ON is still broken. |
Fixes #17797 by simply adding an explicit RDNA4 requirement to MMF. @jiachengjason as outlined in https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#pull-requests-for-contributors--collaborators , please test changes to the CUDA/HIP backend for correctness using
test-backend-ops.