Skip to content

[None][chore] Fix disagg assert#10596

Merged
fredricz-20070104 merged 7 commits intoNVIDIA:mainfrom
fredricz-20070104:feature/fix_disagg_asset
Jan 13, 2026
Merged

[None][chore] Fix disagg assert#10596
fredricz-20070104 merged 7 commits intoNVIDIA:mainfrom
fredricz-20070104:feature/fix_disagg_asset

Conversation

@fredricz-20070104
Copy link
Collaborator

@fredricz-20070104 fredricz-20070104 commented Jan 12, 2026

Summary by CodeRabbit

  • Tests
    • Updated benchmark concurrency configurations across multiple test scenarios to use adjusted concurrency levels (e.g., 1024 instead of 1075, 2048 instead of 2150, and adjusted ranges).
    • Enhanced validation logic for benchmark configurations to ensure concurrency settings align with batch size and parallelism parameters.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
@fredricz-20070104 fredricz-20070104 enabled auto-merge (squash) January 12, 2026 06:09
@fredricz-20070104
Copy link
Collaborator Author

/bot skip --comment "Not cover in CI pipelines"

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

📝 Walkthrough

Walkthrough

This PR updates benchmark configuration files across multiple models to adjust concurrency list values, standardizing them to power-of-2 or rounded values. Additionally, it introduces parsing and validation functionality for a new concurrency_list configuration field in the benchmark utilities.

Changes

Cohort / File(s) Summary
Benchmark Configuration Updates - Qwen3 Models
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-{NIXL,UCX}.yaml
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-{NIXL,UCX}.yaml
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-{NIXL,UCX}.yaml
Updated concurrency_list values: 512 1075→512 1024, 1→36 1 2 4 8 16 36→1 2 4 8 16 32, 2150→2048
Benchmark Configuration Updates - DeepSeek-R1 Models
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-{NIXL,UCX}.yaml
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-{NIXL,UCX}.yaml
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-{NIXL,UCX}.yaml
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-{NIXL,UCX}.yaml
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-{NIXL,UCX}.yaml
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-{NIXL,UCX}.yaml
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-{NIXL,UCX}.yaml
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-{NIXL,UCX}.yaml
Standardized concurrency_list values across multiple configurations: 1075→1024, 36→32, 2150→2048, 18→16, 34→32, 538→512
Benchmark Configuration Updates - WidEEP Accuracy
tests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml
Updated concurrency_list from 2150 to 2048
Utility Functions - Configuration Parsing and Validation
tests/integration/defs/perf/disagg/utils/common.py
Added parsing of concurrency_list from space-separated string to integer list in extract_config_fields() function
Utility Functions - Configuration Parsing and Validation
tests/integration/defs/perf/disagg/utils/config_validator.py
Added new _validate_concurrency_list() method to validate concurrency list bounds based on gen_batch_size and gen_tp_size; relaxed _validate_gen_max_tokens() from equality to greater-than-or-equal check when mtp_size > 0; integrated concurrency_list validation into validate_test_config()

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is missing entirely. The template requires a clear explanation of the issue and solution, test coverage details, and a checklist, but none of these are provided. Add a detailed description explaining what assertion was fixed, why the concurrency validation was needed, and which tests cover these changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][chore] Fix disagg assert' directly relates to the main change: fixing a disaggregation assertion and adding validation logic, which is reflected in the code changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

📝 Walkthrough

Walkthrough

Benchmark configuration files across multiple test suites have concurrency parameter values updated to powers of two (1075→1024, 2150→2048, 36→32, 34→32, 18→16, 538→512). Simultaneously, utility modules add parsing and validation support for concurrency_list configuration field.

Changes

Cohort / File(s) Summary
Disaggregated Perf Benchmark Configs
tests/integration/defs/perf/disagg/test_configs/disagg/perf/*-NIXL.yaml, tests/integration/defs/perf/disagg/test_configs/disagg/perf/*-UCX.yaml
Updated concurrency_list values across 20 benchmark configuration files: single values adjusted (1075→1024, 2150→2048, 538→512) and space-separated lists modified (36→32, 34→32, 18→16). Each file contains a single-line value change.
Wideep Accuracy Benchmark Config
tests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml
Updated concurrency_list from 2150 to 2048.
Configuration Parsing Logic
tests/integration/defs/perf/disagg/utils/common.py
Added extraction and parsing of concurrency_list field from benchmark configuration, converting space-separated string values to integer list.
Configuration Validation Logic
tests/integration/defs/perf/disagg/utils/config_validator.py
Introduced new _validate_concurrency_list() method to validate concurrency values against batch size and tensor parallelism parameters. Modified _validate_gen_max_tokens() assertion from equality to greater-than-or-equal when mtp_size > 0.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description only contains '@coderabbitai summary' with no actual content explaining the changes, issues, test coverage, or following the required template sections. Provide a comprehensive description including what was fixed, why the change was necessary, relevant test coverage, and a checklist confirmation of compliance with coding guidelines.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][chore] Fix disagg assert' is specific and directly related to the main changes, which involve fixing a disaggregation assertion and validating concurrency configurations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/integration/defs/perf/disagg/utils/common.py (1)

1-4: Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files (.py) should contain an NVIDIA copyright header with the year of latest meaningful modification. This file only has a docstring but no copyright notice.

Suggested fix
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 """Disaggregated Benchmark Configuration."""

Based on coding guidelines requiring NVIDIA copyright header for all source files.

🧹 Nitpick comments (3)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-NIXL.yaml (1)

24-24: LGTM with a minor note.

The value is correctly updated from 2150 to 2048, aligning with power-of-two conventions.

Minor: The value is quoted as a string ('2048') while other config files use unquoted space-separated values. Consider using the unquoted format for consistency, though both should parse correctly.

tests/integration/defs/perf/disagg/utils/config_validator.py (1)

92-116: Move loop-invariant dictionary lookups outside the loop.

gen_batch_size and gen_tp_size don't change per iteration. Also, the docstring states Raises: ValueError but the method raises AssertionError.

♻️ Suggested refactor
     @staticmethod
     def _validate_concurrency_list(extracted_config: dict) -> None:
         """Validate concurrency list.
 
         Args:
             extracted_config: Extracted configuration fields
 
         Raises:
-            ValueError: If concurrency list is invalid
+            AssertionError: If concurrency list is invalid
         """
         concurrency_list = extracted_config["concurrency_list"]
         assert concurrency_list, "config error: concurrency_list is empty"
         gen_enable_dp = extracted_config["gen_enable_dp"]
+        gen_batch_size = extracted_config["gen_batch_size"]
         logger.info(f"gen_enable_dp: {gen_enable_dp}")
+        if gen_enable_dp:
+            gen_tp_size = extracted_config["gen_tp_size"]
+            max_concurrency = gen_batch_size * gen_tp_size
+        else:
+            max_concurrency = gen_batch_size
         for concurrency in concurrency_list:
-            gen_batch_size = extracted_config["gen_batch_size"]
-            if gen_enable_dp:
-                gen_tp_size = extracted_config["gen_tp_size"]
-                assert concurrency <= gen_batch_size * gen_tp_size, (
-                    "config error: concurrency exceeds gen_batch_size * gen_tp_size"
-                )
-            else:
-                assert concurrency <= gen_batch_size, (
-                    "config error: concurrency exceeds gen_batch_size"
-                )
+            assert concurrency <= max_concurrency, (
+                f"config error: concurrency {concurrency} exceeds max allowed {max_concurrency}"
+            )
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yaml (1)

24-24: Consider format consistency across config files.

The value update to 2048 is correct. However, this file uses a quoted string format ('2048') while other configs in this PR use unquoted space-separated format (e.g., 1 2 4 8 16 32). While the parsing utility should handle both, consistent formatting across config files improves maintainability.

🔧 Suggested format for consistency
-  concurrency_list: '2048'
+  concurrency_list: 2048
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a9a00b and aa9bfbd.

📒 Files selected for processing (25)
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/utils/common.py
  • tests/integration/defs/perf/disagg/utils/config_validator.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces. Do not use tabs
Always maintain the namespace when importing Python modules, even if only one class or function from a module is used
Python filenames should use snake_case (e.g., some_file.py)
Python classes should use PascalCase (e.g., class SomeClass)
Python functions and methods should use snake_case (e.g., def my_awesome_function():)
Python local variables should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL)
Python constants should use upper snake_case (e.g., MY_CONSTANT)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Use comments in Python for code within a function, or interfaces that are local to a file
Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with the format """<type>: Description"""
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of errors possible
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block for the main logic

Files:

  • tests/integration/defs/perf/disagg/utils/common.py
  • tests/integration/defs/perf/disagg/utils/config_validator.py
**/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification

Files:

  • tests/integration/defs/perf/disagg/utils/common.py
  • tests/integration/defs/perf/disagg/utils/config_validator.py
🧬 Code graph analysis (1)
tests/integration/defs/perf/disagg/utils/config_validator.py (1)
tests/integration/defs/perf/disagg/utils/logger.py (1)
  • info (81-83)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (25)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency_list adjustment from 36 to 32 aligns the maximum concurrency with max_batch_size (line 54) and uses a power-of-2 value, which is consistent with the validation fix across this PR.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency value 1024 (power of 2) correctly aligns with the product of max_batch_size (32) × tensor_parallel_size (32) when enable_attention_dp is true.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency_list adjustment from 34 to 32 aligns the maximum concurrency with max_batch_size (line 54) and ensures power-of-2 values throughout.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

Consistent with the UCX variant—concurrency value 1024 properly aligns with the data-parallel configuration (max_batch_size × tensor_parallel_size).

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency value 2048 correctly aligns with max_batch_size (128) × tensor_parallel_size (16) for the data-parallel configuration.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency_list is correctly updated from 18 to 16, aligning with powers of two. The maximum concurrency value (16) now matches max_batch_size in the gen worker config (line 54), which is consistent.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM!

Consistent with the NIXL variant—concurrency_list is correctly updated to use powers of two, with the maximum matching max_batch_size.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency_list is correctly updated from 36 to 32, aligning with powers of two. The maximum concurrency value (32) now matches max_batch_size in the gen worker config (line 54).

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: LGTM!

Consistent with the NIXL variant—concurrency_list correctly updated to use powers of two, with the maximum matching max_batch_size.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency_list adjustment from 36 to 32 aligns with the power-of-2 pattern and is consistent with max_batch_size: 32 at line 54.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency_list adjustment from 1075 to 1024 aligns with the power-of-2 constraint addressed by this PR.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency_list adjustment from 538 to 512 aligns with the power-of-2 constraint.

tests/integration/defs/perf/disagg/utils/common.py (2)

195-197: LGTM!

The parsing logic correctly handles space-separated values with proper stripping and filtering of empty strings. The list comprehension is idiomatic Python.


232-232: LGTM!

The concurrency_list is properly included in the returned dictionary.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency_list adjustment from "512 1075" to "512 1024" ensures all values are powers of 2, consistent with the PR objective.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency value of 2048 correctly aligns with the maximum allowed when attention DP is enabled: gen_batch_size (128) * gen_tp_size (16) = 2048.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency value of 512 correctly aligns with the maximum allowed when attention DP is enabled: gen_batch_size (16) * gen_tp_size (32) = 512.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency list values (1, 2, 4, 8, 16, 32) all correctly stay within the maximum allowed when attention DP is disabled: gen_batch_size = 32.

tests/integration/defs/perf/disagg/utils/config_validator.py (2)

40-41: LGTM!

The new validation call integrates cleanly with the existing validation flow.


56-60: LGTM!

The relaxed assertion (>= instead of ==) and updated error message correctly reflect the intended validation semantics.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency value of 1024 correctly aligns with the maximum allowed when attention DP is enabled: gen_batch_size (64) * gen_tp_size (16) = 1024.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency list is correctly updated to use power-of-two values, and the maximum concurrency (32) aligns with gen.max_batch_size (32) at line 54. This standardization is appropriate for GPU workloads.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency list update from 1075 to 1024 standardizes to a power of two. The high concurrency values (512, 1024) relative to gen.max_batch_size (64) are valid given enable_attention_dp: true configuration at line 52.

tests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml (1)

31-31: LGTM with minor format note.

The value update from 2150 to 2048 correctly standardizes to a power of two. Same optional consistency note as the previous file regarding the quoted string format.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency list correctly standardizes to power-of-two values (1, 2, 4, 8, 16, 32), with maximum concurrency matching gen.max_batch_size (32) at line 54. This mirrors the UCX variant config appropriately.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31491 [ skip ] triggered by Bot. Commit: aa9bfbd

@fredricz-20070104
Copy link
Collaborator Author

/bot skip --comment "Not cover in CI pipelines"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @tests/integration/defs/perf/disagg/utils/config_validator.py:
- Around line 99-100: The docstring in
tests/integration/defs/perf/disagg/utils/config_validator.py claims "Raises:
ValueError" but the implementation uses assert statements (which raise
AssertionError); update the function to be consistent by replacing those assert
checks with explicit raises of ValueError (preserving the same error messages),
or alternatively update the docstring to state "Raises: AssertionError" — locate
the function containing the "Raises: ValueError" docstring and modify the
assert-based checks to raise ValueError (e.g., raise ValueError("...")) so the
raised exception type matches the docstring.
🧹 Nitpick comments (5)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM with minor formatting note.

The max concurrency value of 32 correctly matches max_batch_size.

Note: This file uses unquoted space-separated format while other files in this PR use quoted strings (e.g., '1024'). Consider using consistent formatting across configuration files for maintainability, though the parser handles both.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: Consider quoting the concurrency_list value for consistency.

The concurrency values themselves are correct (all power-of-2, max value 32 matches gen.max_batch_size). However, other config files in this PR use quoted strings (e.g., '2048', '512'). For consistency across the configuration files, consider quoting this value:

concurrency_list: '1 2 4 8 16 32'
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM on the values; consider quoting for consistency.

The fix from 18 to 16 is correct—the max concurrency now matches gen.max_batch_size (16), and all values are power-of-2. Since enable_attention_dp is false, concurrency is bounded by batch size alone.

For consistency with other config files using quoted strings (e.g., '512', '2048'), consider:

concurrency_list: '1 2 4 8 16'
tests/integration/defs/perf/disagg/utils/common.py (1)

195-197: Consider adding error handling for malformed concurrency values.

If concurrency_list contains non-integer values, this will raise an unhandled ValueError. Consider wrapping with a try-except to provide a more descriptive error message for invalid configuration.

♻️ Proposed improvement
-    concurrency_list = [
-        int(x.strip()) for x in config_data["benchmark"]["concurrency_list"].split() if x.strip()
-    ]
+    try:
+        concurrency_list = [
+            int(x.strip()) for x in config_data["benchmark"]["concurrency_list"].split() if x.strip()
+        ]
+    except ValueError as e:
+        raise ValueError(f"Invalid concurrency_list: must contain integers only") from e
tests/integration/defs/perf/disagg/utils/config_validator.py (1)

106-112: Move loop-invariant extractions outside the loop.

gen_batch_size and gen_tp_size are extracted inside the loop but their values are constant across iterations.

♻️ Proposed improvement
     concurrency_list = extracted_config["concurrency_list"]
     assert concurrency_list, "config error: concurrency_list is empty"
     gen_enable_dp = extracted_config["gen_enable_dp"]
+    gen_batch_size = extracted_config["gen_batch_size"]
+    if gen_enable_dp:
+        gen_tp_size = extracted_config["gen_tp_size"]
+        max_concurrency = gen_batch_size * gen_tp_size
+    else:
+        max_concurrency = gen_batch_size
     logger.info(f"gen_enable_dp: {gen_enable_dp}")
     for concurrency in concurrency_list:
-        gen_batch_size = extracted_config["gen_batch_size"]
-        if gen_enable_dp:
-            gen_tp_size = extracted_config["gen_tp_size"]
-            assert concurrency <= gen_batch_size * gen_tp_size, (
-                "config error: concurrency exceeds gen_batch_size * gen_tp_size"
-            )
-        else:
-            assert concurrency <= gen_batch_size, (
-                "config error: concurrency exceeds gen_batch_size"
-            )
+        assert concurrency <= max_concurrency, (
+            f"config error: concurrency {concurrency} exceeds max {max_concurrency}"
+        )

This also improves the error message by including the actual values for easier debugging.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5445937 and 7d6695c.

📒 Files selected for processing (25)
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yaml
  • tests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml
  • tests/integration/defs/perf/disagg/utils/common.py
  • tests/integration/defs/perf/disagg/utils/config_validator.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces. Do not use tabs
Always maintain the namespace when importing Python modules, even if only one class or function from a module is used
Python filenames should use snake_case (e.g., some_file.py)
Python classes should use PascalCase (e.g., class SomeClass)
Python functions and methods should use snake_case (e.g., def my_awesome_function():)
Python local variables should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL)
Python constants should use upper snake_case (e.g., MY_CONSTANT)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Use comments in Python for code within a function, or interfaces that are local to a file
Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with the format """<type>: Description"""
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of errors possible
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block for the main logic

Files:

  • tests/integration/defs/perf/disagg/utils/config_validator.py
  • tests/integration/defs/perf/disagg/utils/common.py
**/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification

Files:

  • tests/integration/defs/perf/disagg/utils/config_validator.py
  • tests/integration/defs/perf/disagg/utils/common.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (23)
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: LGTM!

The updated concurrency_list value of 1024 correctly aligns with gen_batch_size × gen_tp_size (32 × 32 = 1024), which should pass the new concurrency validation.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency_list value of 1024 aligns with gen_batch_size × gen_tp_size (64 × 16 = 1024).

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency_list value of 512 aligns with gen_batch_size × gen_tp_size (16 × 32 = 512).

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency values are now all powers of 2, with the max value of 32 matching max_batch_size. This aligns with the broader normalization effort in this PR.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency value of 2048 is a power-of-2 that aligns with the CUDA graph batch sizes configuration (line 72) and is a sensible correction from the previous non-power-of-2 value.

tests/integration/defs/perf/disagg/test_configs/wideep/accuracy/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml (1)

31-31: LGTM!

The concurrency value update to 2048 aligns with the CUDA graph batch sizes and maintains consistent formatting with other single-value configurations.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx8_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency value of 512 correctly aligns with the effective capacity (max_batch_size × tensor_parallel_size = 16 × 32 = 512) when enable_attention_dp is true. This should pass the new concurrency validation.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM - Concurrency value now aligns with gen config constraints.

The max concurrency is now 1024, which matches gen.max_batch_size (64) × gen.tensor_parallel_size (16) = 1024. This aligns with the new validation logic referenced in the PR.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM - Concurrency value correctly bounded.

Max concurrency 2048 aligns with gen.max_batch_size (128) × gen.tensor_parallel_size (16) = 2048.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: LGTM - Concurrency aligned with gen config.

Max concurrency 1024 matches gen.max_batch_size (64) × gen.tensor_parallel_size (16) = 1024.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yaml (1)

24-24: LGTM - Concurrency value correctly bounded.

Max concurrency 2048 aligns with gen.max_batch_size (128) × gen.tensor_parallel_size (16) = 2048.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM - Max concurrency corrected to power-of-2.

The max concurrency is now 32, matching gen.max_batch_size. The previous value of 36 was not a power-of-2 and exceeded the batch size constraint.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

Changing concurrency from 1075 to 1024 aligns with the power-of-2 standardization pattern, which is appropriate for GPU workload scheduling and memory alignment.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM!

Replacing 18 with 16 ensures all concurrency values are powers of 2, matching the max_batch_size: 16 configuration for the gen worker.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-NIXL.yaml (1)

24-24: LGTM!

Adjusting the upper concurrency bound from 1075 to 1024 maintains power-of-2 consistency across the list.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: LGTM!

Replacing 36 with 32 aligns the maximum concurrency value with max_batch_size: 32 and maintains power-of-2 consistency.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml (1)

24-24: LGTM!

Replacing 36 with 32 aligns the maximum concurrency value with max_batch_size: 32 and ensures all concurrency levels are powers of 2.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

The concurrency_list upper bound is correctly reduced from 36 to 32 to align with the gen_batch_size of 32 (line 54). With enable_attention_dp: false, the new validation logic enforces concurrency <= gen_batch_size, which this change now satisfies.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)

24-24: LGTM!

Concurrency list properly adjusted to stay within the gen_batch_size limit of 32 when enable_attention_dp is false.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml (1)

24-24: LGTM!

The concurrency value is correctly reduced from 2150 to 2048 to align with the maximum allowed when enable_attention_dp: true (i.e., gen_batch_size * gen_tp_size = 128 * 16 = 2048).

tests/integration/defs/perf/disagg/utils/common.py (1)

232-232: LGTM!

The parsed concurrency_list is correctly exposed in the returned configuration dictionary for downstream validation.

tests/integration/defs/perf/disagg/utils/config_validator.py (2)

40-41: LGTM!

The new concurrency list validation is correctly integrated into the validation flow.


56-60: LGTM!

The relaxed assertion from == to >= is appropriate and well-documented with the developer confirmation comment.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31498 [ skip ] triggered by Bot. Commit: 7d6695c

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31491 [ skip ] completed with state ABORTED. Commit: aa9bfbd

@fredricz-20070104
Copy link
Collaborator Author

/bot skip --comment "Not cover in CI pipelines"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31505 [ skip ] triggered by Bot. Commit: 1c42664

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31498 [ skip ] completed with state ABORTED. Commit: 7d6695c

@fredricz-20070104
Copy link
Collaborator Author

/bot run --skip-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31509 [ run ] triggered by Bot. Commit: 1c42664

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31505 [ skip ] completed with state ABORTED. Commit: 1c42664

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31509 [ run ] completed with state SUCCESS. Commit: 1c42664
/LLM/main/L0_MergeRequest_PR pipeline #24360 (Partly Tested) completed with status: 'SUCCESS'

@fredricz-20070104
Copy link
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31646 [ reuse-pipeline ] triggered by Bot. Commit: 1c42664

@fredricz-20070104
Copy link
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31653 [ reuse-pipeline ] triggered by Bot. Commit: d4e83aa

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31646 [ reuse-pipeline ] completed with state ABORTED. Commit: 1c42664
Can't reuse PR_Github #31509 (Partly Tested) with status: SUCCESS

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31653 [ reuse-pipeline ] completed with state SUCCESS. Commit: d4e83aa
Reusing PR_Github #31509 (Partly Tested) for commit d4e83aa

@fredricz-20070104 fredricz-20070104 merged commit bbe535f into NVIDIA:main Jan 13, 2026
5 checks passed
videodanchik pushed a commit to videodanchik/TensorRT-LLM that referenced this pull request Jan 14, 2026
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants