Skip to content

Conversation

@AidanBeltonS
Copy link
Contributor

This PR enables using the batched mul_mat pathway when appropriate. Previously the single gemm path was being taken and it was not suitable for the type of operation causing segfaults. This PR changes things to more closely match the CUDA impl and use the batched gemm path.

This change allows a lot more tests to pass for SYCL devices. There is one limitation with this approach, we cannot use non default precision operations. As oneMKL has not open sourced the gemm_batch for the data types <half, half, float, float> (corresponding to <src0, src1, dst, scaling>) yet. This is something I have raised with oneMKL

@AidanBeltonS
Copy link
Contributor Author

@NeoZhangJianyu, @abhilash1910, @Alcpz, feedback would be appreciated

Copy link
Contributor

@abhilash1910 abhilash1910 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we can use this until MKL adds the dtypes for batched gemm . Pinging @airMeng @ggerganov for a look when available.
@AidanBeltonS could you please rebase , should fix the android build issue. Thanks

@abhilash1910 abhilash1910 merged commit 38d1521 into ggml-org:master Mar 1, 2024
hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024
* Use batched mul_mat pathway

* rm extra line

* Explicitly state scaled data type

---------

Co-authored-by: Abhilash Majumder <[email protected]>
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* Use batched mul_mat pathway

* rm extra line

* Explicitly state scaled data type

---------

Co-authored-by: Abhilash Majumder <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants