Add group_norm_v2 #1887

yuantailing · 2025-03-12T07:37:37Z

GroupNorm v2 outperforms the original GroupNorm extension by utilizing coalesced memory access, tested on B200 with a set of shapes.

alpha0422 · 2025-03-12T14:08:10Z

Hi @crcrpar, could you also help review this PR?

setup.py

apex/contrib/group_norm/group_norm.py

apex/contrib/csrc/group_norm_v2/gn_utils.hpp

apex/contrib/csrc/group_norm_v2/gn_cuda_kernel.cuh

apex/contrib/group_norm/group_norm.py

apex/contrib/csrc/group_norm_v2/generate_gn_cuda_inst.py

apex/contrib/csrc/group_norm_v2/gn_cuda.cu

crcrpar · 2025-03-12T15:42:52Z

apex/contrib/csrc/group_norm_v2/gn_cuda_host_template.cuh

+            constexpr auto params = compute_gn_params<T, false, false, HW, G, CPG, LB_N, RUNTIME_CUDA_ARCH, LB_SM_COUNT, EFFECTIVE_CUDA_ARCH, SM_MARGIN>();
+            constexpr int BLOCK_DIM_X =       std::get<0>(params);
+            constexpr int C_PER_BLOCK =       std::get<1>(params);
+            constexpr int ROWS_PER_BLOCK =    std::get<2>(params);
+            constexpr int VEC_ELEMS =         std::get<3>(params);
+            constexpr bool LOAD_TWICE =       std::get<4>(params);
+            constexpr int BLOCKS_PER_SM =     std::get<5>(params);
+            constexpr bool HARDWARE_CLUSTER = std::get<6>(params);


(ah, structured binding cannot be used with constexpr...)

crcrpar

Could you add some test cases, e.g.

check numeric and functionality of v2
error is expectedly thrown for invalid inputs
?

apex/contrib/csrc/group_norm_v2/gn.cpp

Aidyn-A · 2025-03-17T19:00:27Z

setup.py

+    if bare_metal_version >= Version("12.8"):
+        arch_flags = ["-gencode=arch=compute_100,code=sm_100"]
+    else:
+        arch_flags = ["-gencode=arch=compute_90,code=compute_90"]


QQ: Looking at this conditions, it is a Blackwell exclusive. Is it supposed to work on Hopper?

It works on Hopper if we add it to arch_flags and DISPATCH_CUDA_ARCH_AND_LOWER_BOUND_SM_COUNT. However, I think we need a smarter way to determine template args to support more GPUs, instead of exhausting SM_COUNT.

nWEIdia · 2025-03-26T06:51:49Z

Has "check numeric and functionality of v2" tests been performed?

yuantailing · 2025-03-26T08:44:26Z

Has "check numeric and functionality of v2" tests been performed?

Yes, tests were added and passed.

crcrpar · 2025-03-26T08:48:57Z

@nWEIdia yes.

Add group_norm_v2

184f060

crcrpar reviewed Mar 12, 2025

View reviewed changes

yuantailing added 2 commits March 13, 2025 04:52

Refine the coding style

d9ba006

fix LB_SM_COUNT

1b25443

crcrpar reviewed Mar 13, 2025

View reviewed changes

apex/contrib/csrc/group_norm_v2/gn.cpp Outdated Show resolved Hide resolved

yuantailing added 3 commits March 14, 2025 02:56

coding style

9e689ed

add tests

4614f2f

add comments

389b723

Aidyn-A reviewed Mar 17, 2025

View reviewed changes

comply with c++17

802d7c6

yuantailing marked this pull request as ready for review March 21, 2025 02:15

fix data race

707e400

crcrpar approved these changes Mar 25, 2025

View reviewed changes

crcrpar added the contrib label Mar 25, 2025

crcrpar merged commit c02c6c8 into NVIDIA:master Mar 26, 2025

yuantailing mentioned this pull request Aug 5, 2025

[https://nvbugs/5433581][fix] DeepGEMM installation on SBSA NVIDIA/TensorRT-LLM#6588

Merged

Add group_norm_v2 #1887

Add group_norm_v2 #1887

Uh oh!

Conversation

yuantailing commented Mar 12, 2025

Uh oh!

alpha0422 commented Mar 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crcrpar Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

crcrpar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Aidyn-A Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

yuantailing Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

nWEIdia commented Mar 26, 2025

Uh oh!

yuantailing commented Mar 26, 2025

Uh oh!

crcrpar commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants