[None][chore] Fix disagg assert#10596
Conversation
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
/bot skip --comment "Not cover in CI pipelines" |
📝 WalkthroughWalkthroughThis PR updates benchmark configuration files across multiple models to adjust concurrency list values, standardizing them to power-of-2 or rounded values. Additionally, it introduces parsing and validation functionality for a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📝 WalkthroughWalkthroughBenchmark configuration files across multiple test suites have concurrency parameter values updated to powers of two (1075→1024, 2150→2048, 36→32, 34→32, 18→16, 538→512). Simultaneously, utility modules add parsing and validation support for concurrency_list configuration field. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/defs/perf/disagg/utils/common.py (1)
1-4: Missing NVIDIA copyright header.Per coding guidelines, all TensorRT-LLM source files (.py) should contain an NVIDIA copyright header with the year of latest meaningful modification. This file only has a docstring but no copyright notice.
Suggested fix
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + """Disaggregated Benchmark Configuration."""Based on coding guidelines requiring NVIDIA copyright header for all source files.
🧹 Nitpick comments (3)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-NIXL.yaml (1)
24-24: LGTM with a minor note.The value is correctly updated from 2150 to 2048, aligning with power-of-two conventions.
Minor: The value is quoted as a string (
'2048') while other config files use unquoted space-separated values. Consider using the unquoted format for consistency, though both should parse correctly.tests/integration/defs/perf/disagg/utils/config_validator.py (1)
92-116: Move loop-invariant dictionary lookups outside the loop.
gen_batch_sizeandgen_tp_sizedon't change per iteration. Also, the docstring statesRaises: ValueErrorbut the method raisesAssertionError.♻️ Suggested refactor
@staticmethod def _validate_concurrency_list(extracted_config: dict) -> None: """Validate concurrency list. Args: extracted_config: Extracted configuration fields Raises: - ValueError: If concurrency list is invalid + AssertionError: If concurrency list is invalid """ concurrency_list = extracted_config["concurrency_list"] assert concurrency_list, "config error: concurrency_list is empty" gen_enable_dp = extracted_config["gen_enable_dp"] + gen_batch_size = extracted_config["gen_batch_size"] logger.info(f"gen_enable_dp: {gen_enable_dp}") + if gen_enable_dp: + gen_tp_size = extracted_config["gen_tp_size"] + max_concurrency = gen_batch_size * gen_tp_size + else: + max_concurrency = gen_batch_size for concurrency in concurrency_list: - gen_batch_size = extracted_config["gen_batch_size"] - if gen_enable_dp: - gen_tp_size = extracted_config["gen_tp_size"] - assert concurrency <= gen_batch_size * gen_tp_size, ( - "config error: concurrency exceeds gen_batch_size * gen_tp_size" - ) - else: - assert concurrency <= gen_batch_size, ( - "config error: concurrency exceeds gen_batch_size" - ) + assert concurrency <= max_concurrency, ( + f"config error: concurrency {concurrency} exceeds max allowed {max_concurrency}" + )tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yaml (1)
24-24: Consider format consistency across config files.The value update to 2048 is correct. However, this file uses a quoted string format (
'2048') while other configs in this PR use unquoted space-separated format (e.g.,1 2 4 8 16 32). While the parsing utility should handle both, consistent formatting across config files improves maintainability.🔧 Suggested format for consistency
- concurrency_list: '2048' + concurrency_list: 2048
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/utils/common.pytests/integration/defs/perf/disagg/utils/config_validator.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces. Do not use tabs
Always maintain the namespace when importing Python modules, even if only one class or function from a module is used
Python filenames should use snake_case (e.g.,some_file.py)
Python classes should use PascalCase (e.g.,class SomeClass)
Python functions and methods should use snake_case (e.g.,def my_awesome_function():)
Python local variables should use snake_case, with prefixkfor variable names that start with a number (e.g.,k_99th_percentile)
Python global variables should use upper snake_case with prefixG(e.g.,G_MY_GLOBAL)
Python constants should use upper snake_case (e.g.,MY_CONSTANT)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Use comments in Python for code within a function, or interfaces that are local to a file
Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with the format"""<type>: Description"""
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of errors possible
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block for the main logic
Files:
tests/integration/defs/perf/disagg/utils/common.pytests/integration/defs/perf/disagg/utils/config_validator.py
**/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification
Files:
tests/integration/defs/perf/disagg/utils/common.pytests/integration/defs/perf/disagg/utils/config_validator.py
🧬 Code graph analysis (1)
tests/integration/defs/perf/disagg/utils/config_validator.py (1)
tests/integration/defs/perf/disagg/utils/logger.py (1)
info(81-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (25)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency_list adjustment from 36 to 32 aligns the maximum concurrency with
max_batch_size(line 54) and uses a power-of-2 value, which is consistent with the validation fix across this PR.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency value 1024 (power of 2) correctly aligns with the product of
max_batch_size(32) ×tensor_parallel_size(32) whenenable_attention_dpis true.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency_list adjustment from 34 to 32 aligns the maximum concurrency with
max_batch_size(line 54) and ensures power-of-2 values throughout.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!Consistent with the UCX variant—concurrency value 1024 properly aligns with the data-parallel configuration (
max_batch_size×tensor_parallel_size).tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency value 2048 correctly aligns with
max_batch_size(128) ×tensor_parallel_size(16) for the data-parallel configuration.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM!The
concurrency_listis correctly updated from 18 to 16, aligning with powers of two. The maximum concurrency value (16) now matchesmax_batch_sizein the gen worker config (line 54), which is consistent.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM!Consistent with the NIXL variant—
concurrency_listis correctly updated to use powers of two, with the maximum matchingmax_batch_size.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!The
concurrency_listis correctly updated from 36 to 32, aligning with powers of two. The maximum concurrency value (32) now matchesmax_batch_sizein the gen worker config (line 54).tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: LGTM!Consistent with the NIXL variant—
concurrency_listcorrectly updated to use powers of two, with the maximum matchingmax_batch_size.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency_list adjustment from 36 to 32 aligns with the power-of-2 pattern and is consistent with
max_batch_size: 32at line 54.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency_list adjustment from 1075 to 1024 aligns with the power-of-2 constraint addressed by this PR.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency_list adjustment from 538 to 512 aligns with the power-of-2 constraint.
tests/integration/defs/perf/disagg/utils/common.py (2)
195-197: LGTM!The parsing logic correctly handles space-separated values with proper stripping and filtering of empty strings. The list comprehension is idiomatic Python.
232-232: LGTM!The
concurrency_listis properly included in the returned dictionary.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency_list adjustment from "512 1075" to "512 1024" ensures all values are powers of 2, consistent with the PR objective.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency value of 2048 correctly aligns with the maximum allowed when attention DP is enabled:
gen_batch_size (128) * gen_tp_size (16) = 2048.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency value of 512 correctly aligns with the maximum allowed when attention DP is enabled:
gen_batch_size (16) * gen_tp_size (32) = 512.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency list values (1, 2, 4, 8, 16, 32) all correctly stay within the maximum allowed when attention DP is disabled:
gen_batch_size = 32.tests/integration/defs/perf/disagg/utils/config_validator.py (2)
40-41: LGTM!The new validation call integrates cleanly with the existing validation flow.
56-60: LGTM!The relaxed assertion (
>=instead of==) and updated error message correctly reflect the intended validation semantics.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency value of 1024 correctly aligns with the maximum allowed when attention DP is enabled:
gen_batch_size (64) * gen_tp_size (16) = 1024.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency list is correctly updated to use power-of-two values, and the maximum concurrency (32) aligns with
gen.max_batch_size(32) at line 54. This standardization is appropriate for GPU workloads.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency list update from 1075 to 1024 standardizes to a power of two. The high concurrency values (512, 1024) relative to
gen.max_batch_size(64) are valid givenenable_attention_dp: trueconfiguration at line 52.tests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml (1)
31-31: LGTM with minor format note.The value update from 2150 to 2048 correctly standardizes to a power of two. Same optional consistency note as the previous file regarding the quoted string format.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency list correctly standardizes to power-of-two values (1, 2, 4, 8, 16, 32), with maximum concurrency matching
gen.max_batch_size(32) at line 54. This mirrors the UCX variant config appropriately.
|
PR_Github #31491 [ skip ] triggered by Bot. Commit: |
|
/bot skip --comment "Not cover in CI pipelines" |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @tests/integration/defs/perf/disagg/utils/config_validator.py:
- Around line 99-100: The docstring in
tests/integration/defs/perf/disagg/utils/config_validator.py claims "Raises:
ValueError" but the implementation uses assert statements (which raise
AssertionError); update the function to be consistent by replacing those assert
checks with explicit raises of ValueError (preserving the same error messages),
or alternatively update the docstring to state "Raises: AssertionError" — locate
the function containing the "Raises: ValueError" docstring and modify the
assert-based checks to raise ValueError (e.g., raise ValueError("...")) so the
raised exception type matches the docstring.
🧹 Nitpick comments (5)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM with minor formatting note.The max concurrency value of 32 correctly matches
max_batch_size.Note: This file uses unquoted space-separated format while other files in this PR use quoted strings (e.g.,
'1024'). Consider using consistent formatting across configuration files for maintainability, though the parser handles both.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: Consider quoting the concurrency_list value for consistency.The concurrency values themselves are correct (all power-of-2, max value 32 matches
gen.max_batch_size). However, other config files in this PR use quoted strings (e.g.,'2048','512'). For consistency across the configuration files, consider quoting this value:concurrency_list: '1 2 4 8 16 32'tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM on the values; consider quoting for consistency.The fix from 18 to 16 is correct—the max concurrency now matches
gen.max_batch_size(16), and all values are power-of-2. Sinceenable_attention_dpis false, concurrency is bounded by batch size alone.For consistency with other config files using quoted strings (e.g.,
'512','2048'), consider:concurrency_list: '1 2 4 8 16'tests/integration/defs/perf/disagg/utils/common.py (1)
195-197: Consider adding error handling for malformed concurrency values.If
concurrency_listcontains non-integer values, this will raise an unhandledValueError. Consider wrapping with a try-except to provide a more descriptive error message for invalid configuration.♻️ Proposed improvement
- concurrency_list = [ - int(x.strip()) for x in config_data["benchmark"]["concurrency_list"].split() if x.strip() - ] + try: + concurrency_list = [ + int(x.strip()) for x in config_data["benchmark"]["concurrency_list"].split() if x.strip() + ] + except ValueError as e: + raise ValueError(f"Invalid concurrency_list: must contain integers only") from etests/integration/defs/perf/disagg/utils/config_validator.py (1)
106-112: Move loop-invariant extractions outside the loop.
gen_batch_sizeandgen_tp_sizeare extracted inside the loop but their values are constant across iterations.♻️ Proposed improvement
concurrency_list = extracted_config["concurrency_list"] assert concurrency_list, "config error: concurrency_list is empty" gen_enable_dp = extracted_config["gen_enable_dp"] + gen_batch_size = extracted_config["gen_batch_size"] + if gen_enable_dp: + gen_tp_size = extracted_config["gen_tp_size"] + max_concurrency = gen_batch_size * gen_tp_size + else: + max_concurrency = gen_batch_size logger.info(f"gen_enable_dp: {gen_enable_dp}") for concurrency in concurrency_list: - gen_batch_size = extracted_config["gen_batch_size"] - if gen_enable_dp: - gen_tp_size = extracted_config["gen_tp_size"] - assert concurrency <= gen_batch_size * gen_tp_size, ( - "config error: concurrency exceeds gen_batch_size * gen_tp_size" - ) - else: - assert concurrency <= gen_batch_size, ( - "config error: concurrency exceeds gen_batch_size" - ) + assert concurrency <= max_concurrency, ( + f"config error: concurrency {concurrency} exceeds max {max_concurrency}" + )This also improves the error message by including the actual values for easier debugging.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yamltests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yamltests/integration/defs/perf/disagg/utils/common.pytests/integration/defs/perf/disagg/utils/config_validator.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces. Do not use tabs
Always maintain the namespace when importing Python modules, even if only one class or function from a module is used
Python filenames should use snake_case (e.g.,some_file.py)
Python classes should use PascalCase (e.g.,class SomeClass)
Python functions and methods should use snake_case (e.g.,def my_awesome_function():)
Python local variables should use snake_case, with prefixkfor variable names that start with a number (e.g.,k_99th_percentile)
Python global variables should use upper snake_case with prefixG(e.g.,G_MY_GLOBAL)
Python constants should use upper snake_case (e.g.,MY_CONSTANT)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Use comments in Python for code within a function, or interfaces that are local to a file
Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with the format"""<type>: Description"""
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of errors possible
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block for the main logic
Files:
tests/integration/defs/perf/disagg/utils/config_validator.pytests/integration/defs/perf/disagg/utils/common.py
**/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification
Files:
tests/integration/defs/perf/disagg/utils/config_validator.pytests/integration/defs/perf/disagg/utils/common.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (23)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: LGTM!The updated
concurrency_listvalue of 1024 correctly aligns withgen_batch_size × gen_tp_size(32 × 32 = 1024), which should pass the new concurrency validation.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!The
concurrency_listvalue of 1024 aligns withgen_batch_size × gen_tp_size(64 × 16 = 1024).tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM!The
concurrency_listvalue of 512 aligns withgen_batch_size × gen_tp_size(16 × 32 = 512).tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency values are now all powers of 2, with the max value of 32 matching
max_batch_size. This aligns with the broader normalization effort in this PR.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency value of 2048 is a power-of-2 that aligns with the CUDA graph batch sizes configuration (line 72) and is a sensible correction from the previous non-power-of-2 value.
tests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml (1)
31-31: LGTM!The concurrency value update to 2048 aligns with the CUDA graph batch sizes and maintains consistent formatting with other single-value configurations.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency value of 512 correctly aligns with the effective capacity (
max_batch_size×tensor_parallel_size= 16 × 32 = 512) whenenable_attention_dpis true. This should pass the new concurrency validation.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM - Concurrency value now aligns with gen config constraints.The max concurrency is now 1024, which matches
gen.max_batch_size (64) × gen.tensor_parallel_size (16) = 1024. This aligns with the new validation logic referenced in the PR.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM - Concurrency value correctly bounded.Max concurrency 2048 aligns with
gen.max_batch_size (128) × gen.tensor_parallel_size (16) = 2048.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: LGTM - Concurrency aligned with gen config.Max concurrency 1024 matches
gen.max_batch_size (64) × gen.tensor_parallel_size (16) = 1024.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yaml (1)
24-24: LGTM - Concurrency value correctly bounded.Max concurrency 2048 aligns with
gen.max_batch_size (128) × gen.tensor_parallel_size (16) = 2048.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM - Max concurrency corrected to power-of-2.The max concurrency is now 32, matching
gen.max_batch_size. The previous value of 36 was not a power-of-2 and exceeded the batch size constraint.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!Changing concurrency from 1075 to 1024 aligns with the power-of-2 standardization pattern, which is appropriate for GPU workload scheduling and memory alignment.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM!Replacing 18 with 16 ensures all concurrency values are powers of 2, matching the
max_batch_size: 16configuration for the gen worker.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-NIXL.yaml (1)
24-24: LGTM!Adjusting the upper concurrency bound from 1075 to 1024 maintains power-of-2 consistency across the list.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: LGTM!Replacing 36 with 32 aligns the maximum concurrency value with
max_batch_size: 32and maintains power-of-2 consistency.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)
24-24: LGTM!Replacing 36 with 32 aligns the maximum concurrency value with
max_batch_size: 32and ensures all concurrency levels are powers of 2.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!The concurrency_list upper bound is correctly reduced from 36 to 32 to align with the
gen_batch_sizeof 32 (line 54). Withenable_attention_dp: false, the new validation logic enforcesconcurrency <= gen_batch_size, which this change now satisfies.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)
24-24: LGTM!Concurrency list properly adjusted to stay within the
gen_batch_sizelimit of 32 whenenable_attention_dpis false.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml (1)
24-24: LGTM!The concurrency value is correctly reduced from 2150 to 2048 to align with the maximum allowed when
enable_attention_dp: true(i.e.,gen_batch_size * gen_tp_size = 128 * 16 = 2048).tests/integration/defs/perf/disagg/utils/common.py (1)
232-232: LGTM!The parsed
concurrency_listis correctly exposed in the returned configuration dictionary for downstream validation.tests/integration/defs/perf/disagg/utils/config_validator.py (2)
40-41: LGTM!The new concurrency list validation is correctly integrated into the validation flow.
56-60: LGTM!The relaxed assertion from
==to>=is appropriate and well-documented with the developer confirmation comment.
|
PR_Github #31498 [ skip ] triggered by Bot. Commit: |
|
PR_Github #31491 [ skip ] completed with state |
|
/bot skip --comment "Not cover in CI pipelines" |
|
PR_Github #31505 [ skip ] triggered by Bot. Commit: |
|
PR_Github #31498 [ skip ] completed with state |
|
/bot run --skip-test |
|
PR_Github #31509 [ run ] triggered by Bot. Commit: |
|
PR_Github #31505 [ skip ] completed with state |
|
PR_Github #31509 [ run ] completed with state |
|
/bot reuse-pipeline |
|
PR_Github #31646 [ reuse-pipeline ] triggered by Bot. Commit: |
|
/bot reuse-pipeline |
|
PR_Github #31653 [ reuse-pipeline ] triggered by Bot. Commit: |
|
PR_Github #31646 [ reuse-pipeline ] completed with state |
|
PR_Github #31653 [ reuse-pipeline ] completed with state |
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.