Enable POWER9 fp32 and fp16 SIMD code #366

fitzsim · 2023-01-03T06:43:20Z

With the FP32 base model and this patch set, the jfk example takes about 3.2 seconds to transcribe. This is another data point for #300, and it is about one second faster than the current FP16 SIMD code.

ggerganov · 2023-01-03T20:05:58Z

ggml.c

+#define GGML_F32x4_REDUCE(sumf, sum)                            \
+  sum[0] = vec_add(sum[0], sum[1]);                             \
+  sum[2] = vec_add(sum[2], sum[3]);                             \
+  sum[4] = vec_add(sum[4], sum[5]);                             \
+  sum[6] = vec_add(sum[6], sum[7]);                             \
+  sum[0] = vec_add(sum[0], sum[2]);                             \
+  sum[4] = vec_add(sum[4], sum[6]);                             \
+  sum[0] = vec_add(sum[0], sum[4]);                             \
+  sumf = vec_extract(sum[0], 0) + vec_extract(sum[0], 1)        \
+    + vec_extract(sum[0], 2) + vec_extract(sum[0], 3);


Is there a reason to use this version instead of the for-based version?
The advantage of the latter is that it will work for GGML_F32_ARR == 1, 2, 4, 8, 16, while doing it like this it will only work for GGML_F32_ARR == 8

fitzsim · 2023-01-03T22:34:46Z

I'll try reverting that and compare speed. Also, I think I can get rid of the load/store argument changes, but it'll take some more work. And the F32 implementation seems to still use some F16 operations, so I'll investigate that. I'll make a new pull request for all this later. For now I'll close this one.

fitzsim added 3 commits January 3, 2023 00:48

ggml : change GGML_F16_VEC_LOAD, GGML_F16_VEC_STORE arguments

63cf29c

ggml : macroize POWER9 ppc64le fp16 SIMD code

cdbe556

ggml : enable f32 SIMD for POWER9 ppc64le

eb87ee5

fitzsim mentioned this pull request Jan 3, 2023

Very slow - any way to speed up? #300

Closed

ggerganov reviewed Jan 3, 2023

View reviewed changes

fitzsim closed this Jan 3, 2023

fitzsim mentioned this pull request Jan 4, 2023

Reorganize POWER9 SIMD code #369

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable POWER9 fp32 and fp16 SIMD code #366

Enable POWER9 fp32 and fp16 SIMD code #366

Uh oh!

fitzsim commented Jan 3, 2023

Uh oh!

ggerganov Jan 3, 2023

Uh oh!

fitzsim commented Jan 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable POWER9 fp32 and fp16 SIMD code #366

Enable POWER9 fp32 and fp16 SIMD code #366

Uh oh!

Conversation

fitzsim commented Jan 3, 2023

Uh oh!

ggerganov Jan 3, 2023

Choose a reason for hiding this comment

Uh oh!

fitzsim commented Jan 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants