Skip to content

Comments

[TRTLLM-10845][feat] Add dynamic llmapi defaults system#11035

Merged
venkywonka merged 41 commits intoNVIDIA:mainfrom
venkywonka:venky/model-specific-defaults
Feb 18, 2026
Merged

[TRTLLM-10845][feat] Add dynamic llmapi defaults system#11035
venkywonka merged 41 commits intoNVIDIA:mainfrom
venkywonka:venky/model-specific-defaults

Conversation

@venkywonka
Copy link
Collaborator

@venkywonka venkywonka commented Jan 27, 2026

Motivation and Design: #11083

Summary:
This PR introduces a dynamic model-specific defaults system that allows models to declare their required configurations, eliminating the need for users and tests to manually specify model-specific constraints.

Problem:
Previously, users had to manually configure model-specific settings like disabling KV cache block reuse for SSM models. This was error-prone and required deep knowledge of model architectures.

Solution:

  • Added get_model_defaults() classmethod to model classes
  • ModelLoader automatically applies these defaults before model initialization
  • User overrides are still respected

Explicitly disabling enable_block_reuse will no longer be necessary for below models:

  • Qwen3Next
  • NemotronH

Benefits:

  • A way for models to be tweaked to work correctly out-of-the-box
  • Self-documenting model requirements
  • User overrides still possible when needed

Other changes

  • Cleaned up the way trtllm-serve applies CLI overrides to prevent pydantic defaults from getting contaminated by serve-frontend's own defaults (which are redundant and same as pydantic defaults)
  • Moved fail_fast_on_attention_window_too_large attribute to TrtLlmArgs from BaseLlmArgs

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for LlavaLlamaForCausalLM model type alias.
    • Enhanced model architecture resolution with improved handling for specialized model variants.
    • Implemented automatic model-specific configuration defaults for optimized performance.
  • Improvements

    • Improved model initialization flow with better configuration management.
    • Extended test coverage for model defaults and argument handling.

@venkywonka venkywonka changed the title [None][feat] Add model-specific defaults and 2-phase ckpt loading [TRTC-119][feat] Add model-specific defaults and 2-phase ckpt loading Jan 29, 2026
@venkywonka venkywonka changed the title [TRTC-119][feat] Add model-specific defaults and 2-phase ckpt loading [TRTC-119][feat] Add dynamic llmapi defaults system Jan 29, 2026
@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@venkywonka venkywonka self-assigned this Jan 30, 2026
@tensorrt-cicd
Copy link
Collaborator

PR_Github #34155 [ run ] triggered by Bot. Commit: b768547

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34155 [ run ] completed with state SUCCESS. Commit: b768547
/LLM/main/L0_MergeRequest_PR pipeline #26352 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34495 [ run ] triggered by Bot. Commit: 5b9275f

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34495 [ run ] completed with state FAILURE. Commit: 5b9275f
/LLM/main/L0_MergeRequest_PR pipeline #26615 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@venkywonka
Copy link
Collaborator Author

/bot kill

@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34520 [ run ] triggered by Bot. Commit: 0c563a1

@venkywonka venkywonka marked this pull request as ready for review February 3, 2026 02:15
@venkywonka venkywonka requested review from a team as code owners February 3, 2026 02:15
@venkywonka venkywonka force-pushed the venky/model-specific-defaults branch from 5c6123a to ed420ac Compare February 3, 2026 02:16
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

This pull request implements a model-defaults framework that allows individual model architectures to define configuration defaults, which are resolved during model loading, validated against LLM arguments, and applied before runtime feature construction. The system integrates defaults resolution into the model initialization pipeline across loading, execution, and CLI layers.

Changes

Cohort / File(s) Summary
Model Class Resolution
tensorrt_llm/_torch/models/modeling_auto.py
New _resolve_class() static method centralizes model architecture resolution logic, handling Eagle3 checkpoints, draft model remapping, and vision encoder selection. Replaces inline architecture resolution in from_config().
Model-Specific Defaults Registration
tensorrt_llm/_torch/models/modeling_hunyuan_dense.py, tensorrt_llm/_torch/models/modeling_nemotron_h.py, tensorrt_llm/_torch/models/modeling_qwen3_next.py, tensorrt_llm/_torch/models/modeling_vila.py
New get_model_defaults() classmethod added to HunYuan, Nemotron, and Qwen3 models returning KV cache configurations with enable_block_reuse: False. Vila model gains additional auto-model decorator registration for "LlavaLlamaForCausalLM".
Config Loading & Defaults Application
tensorrt_llm/_torch/pyexecutor/model_loader.py
New load_config_and_apply_defaults() static method loads model config, resolves model class, retrieves and validates defaults, then applies them to LLM arguments. Removes deprecated layer-override warning and memo cleanup.
Executor Initialization
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
Defers AttentionRuntimeFeatures construction until after model defaults are loaded and applied, ensuring runtime features reflect updated LLM arguments (e.g., enable_chunked_prefill, kv_cache_config).
CLI Defaults Processing
tensorrt_llm/commands/serve.py
Introduced is_non_default_or_required() helper function to conditionally include LLM parameters. Refactored get_llm_args() to build configuration from backend-specific overrides, filtering only non-default or required parameters into final LLM args.
Defaults Utilities
tensorrt_llm/llmapi/llm_utils.py
New public utilities for defaults handling: extract_llm_args_overrides(), validate_model_defaults(), apply_model_defaults_to_llm_args(), merge_llm_configs_with_defaults(), compute_applied_llm_defaults(). Updated _load_model_from_hf() to fetch and apply model defaults with validation and logging.
Test Coverage
tests/unittest/_torch/modeling/test_modeling_nemotron_h.py, tests/unittest/llmapi/test_llm_args.py
Nemotron tests made KV cache config conditional on mamba SSM cache dtype. Extensive new test cases covering defaults merging, application, extraction, validation, and end-to-end integration with PyTorch and TensorRT backends.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant PEC as PyExecutorCreator
    participant ML as ModelLoader
    participant AC as AutoModelForCausalLM
    participant ModelClass as Model Class
    participant Utils as LlmUtils
    participant Config as Config

    App->>PEC: create_py_executor(checkpoint_dir, llm_args)
    PEC->>ML: load_config_and_apply_defaults(checkpoint_dir, llm_args)
    ML->>Config: Load model config from checkpoint
    Config-->>ML: ModelConfig
    ML->>AC: _resolve_class(config)
    AC->>AC: Determine model architecture
    AC-->>ML: Model class
    ML->>ModelClass: get_model_defaults(llm_args)
    alt Model provides defaults
        ModelClass-->>ML: model_defaults dict
        ML->>Utils: validate_model_defaults(defaults, llm_args)
        Utils-->>ML: validated_defaults
        ML->>Utils: apply_model_defaults_to_llm_args(llm_args, defaults)
        Utils-->>ML: updated_llm_args
    else No defaults
        ML-->>ML: Use llm_args as-is
    end
    ML-->>PEC: updated_llm_args
    PEC->>PEC: Create AttentionRuntimeFeatures with updated llm_args
    PEC->>PEC: Construct PyExecutor
    PEC-->>App: PyExecutor
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: introducing a dynamic, model-specific defaults system for LLM API configuration. It reflects the core objective of allowing models to declare required defaults that are applied automatically.
Description check ✅ Passed The PR description clearly explains motivation, problem, solution, and benefits with specific examples of affected models.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/models/modeling_auto.py (1)

12-63: ⚠️ Potential issue | 🟡 Minor

Guard empty architectures to avoid IndexError in error paths.

If pretrained_config.architectures is empty, _resolve_class returns None, but from_config still formats the error using architectures[0], which raises IndexError instead of the intended ValueError. A small guard keeps the error deterministic.

🛠️ Suggested fix
-        if not pretrained_config.architectures:
+        architectures = pretrained_config.architectures or []
+        if not architectures:
             return None
-
-        model_arch = pretrained_config.architectures[0]
+        model_arch = architectures[0]
@@
-        if config.mm_encoder_only:
-            model_arch = config.pretrained_config.architectures[0]
+        if config.mm_encoder_only:
+            architectures = config.pretrained_config.architectures or []
+            if not architectures:
+                raise ValueError(
+                    "Missing architectures in config for multimodal encoder-only mode."
+                )
+            model_arch = architectures[0]
@@
-            raise ValueError(
-                f"Unknown architecture for AutoModelForCausalLM: {config.pretrained_config.architectures[0]}"
-            )
+            model_arch = (config.pretrained_config.architectures[0]
+                          if config.pretrained_config.architectures else
+                          "<unknown>")
+            raise ValueError(
+                f"Unknown architecture for AutoModelForCausalLM: {model_arch}"
+            )
🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/models/modeling_hunyuan_dense.py`:
- Around line 626-632: Import the LLM argument type using a namespaced module
import (e.g., import the module that defines BaseLlmArgs and reference it as
module.BaseLlmArgs) to avoid F821, and mark the unused parameter in
get_model_defaults by renaming or explicitly ignoring it (e.g., prefix with
underscore or add a no-op assignment like _ = llm_args) to silence ARG003;
ensure the function signature still reads get_model_defaults(cls, llm_args:
'module.BaseLlmArgs') and keep the return dict unchanged.

In `@tensorrt_llm/_torch/models/modeling_nemotron_h.py`:
- Around line 531-537: Import the module that defines BaseLlmArgs under a
namespace and update the annotation and parameter handling in
get_model_defaults: change the type hint from the bare 'BaseLlmArgs' to the
namespaced form (e.g., "base_llm_args.BaseLlmArgs" or base_llm_args.BaseLlmArgs)
by adding a namespace import (import base_llm_args), and mark the llm_args
parameter as intentionally unused by assigning it to a throwaway variable (e.g.,
_ = llm_args) or renaming to _llm_args to silence ARG003; keep the function name
get_model_defaults and the returned dict unchanged.

In `@tensorrt_llm/_torch/models/modeling_qwen3_next.py`:
- Around line 1254-1257: Import the module that defines BaseLlmArgs under a
namespace and update get_model_defaults to use that namespaced annotation and
mark the parameter as intentionally unused; specifically, add a namespace import
for the module that provides BaseLlmArgs (e.g., import as base), change the
signature of get_model_defaults to use base.BaseLlmArgs for the llm_args
annotation, and either rename the parameter to _llm_args or add a short
assignment _ = llm_args inside get_model_defaults to silence the unused-arg lint
(keep the method name get_model_defaults unchanged).

In `@tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`:
- Around line 344-350: The logger.info call currently passes
attn_runtime_features as a separate formatting argument without a placeholder
which triggers a TypeError; update the call that logs AttentionRuntimeFeatures
(the variable attn_runtime_features and the AttentionRuntimeFeatures
instantiation) to include a proper format placeholder (e.g., use "%s") or pass a
single formatted string so logger.info receives matching arguments and no
formatting error occurs.

In `@tensorrt_llm/llmapi/llm_utils.py`:
- Around line 33-43: The import list is pulling in SchedulerConfig from
bindings.executor but you also import SchedulerConfig from .llm_args, causing
the bindings SchedulerConfig to be unused and shadowed; remove SchedulerConfig
from the bindings.executor import (the import on the line that imports
bindings.executor symbols) so only the wrapper SchedulerConfig from llm_args
remains, and if there are other unused symbols imported from bindings.executor
consolidate or remove them as well to avoid redundant bindings imports; ensure
references to SchedulerConfig in this module continue to resolve to the
.llm_args SchedulerConfig (no other code changes needed).
🧹 Nitpick comments (6)
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)

332-340: Keep module namespace on the model_loader import.

Line 332 uses from ... import ..., which drops the module namespace. Import the module and qualify uses to align with repo conventions.

♻️ Suggested refactor
-from tensorrt_llm._torch.pyexecutor.model_loader import ModelLoader, _construct_checkpoint_loader
+import tensorrt_llm._torch.pyexecutor.model_loader as model_loader
@@
-checkpoint_loader = _construct_checkpoint_loader(llm_args.backend,
-                                                 llm_args.checkpoint_loader,
-                                                 llm_args.checkpoint_format)
+checkpoint_loader = model_loader._construct_checkpoint_loader(
+    llm_args.backend, llm_args.checkpoint_loader, llm_args.checkpoint_format)
@@
-llm_args = ModelLoader.load_config_and_apply_defaults(
-    checkpoint_dir, llm_args, checkpoint_loader)
+llm_args = model_loader.ModelLoader.load_config_and_apply_defaults(
+    checkpoint_dir, llm_args, checkpoint_loader)

As per coding guidelines: Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.

tests/unittest/llmapi/test_llm_args.py (2)

1102-1114: Consider using pytest's tmp_path fixture instead of hardcoded /tmp path.

The hardcoded /tmp/dummy_model_for_validation_test path raises a security concern (flagged by S108). Since this is for validation testing where the path doesn't need to exist, this is acceptable, but using tmp_path fixture would be more robust.

♻️ Optional: Use pytest's tmp_path fixture
-def test_model_defaults_validation(defaults_dict, should_raise, error_contains):
+def test_model_defaults_validation(defaults_dict, should_raise, error_contains, tmp_path):
     # Use a dummy model path for testing (doesn't need to exist for validation)
-    llm_args = TorchLlmArgs(model="/tmp/dummy_model_for_validation_test")
+    llm_args = TorchLlmArgs(model=str(tmp_path / "dummy_model_for_validation_test"))

1140-1170: Remove unused monkeypatch parameter from fixture.

The monkeypatch parameter in the setup fixture is unused (static analysis ARG002). It can be removed since class attribute manipulation is done directly via setattr.

♻️ Remove unused parameter
     `@pytest.fixture`(autouse=True)
-    def setup(self, monkeypatch, tmp_path):
+    def setup(self, tmp_path):
         self.get_model_defaults_called = False
         self.tmp_path = tmp_path
tensorrt_llm/_torch/pyexecutor/model_loader.py (1)

217-241: Well-designed static method for early defaults application.

The method correctly:

  1. Guards against None checkpoint_loader
  2. Uses _resolve_class to get the model class without instantiation
  3. Validates defaults before applying them
  4. Logs applied defaults at debug level

One consideration: The load_config call with trust_remote_code=True is hardcoded. This may not align with the user's preference if they explicitly set trust_remote_code=False in their llm_args.

♻️ Consider respecting user's trust_remote_code preference
     `@staticmethod`
     def load_config_and_apply_defaults(
             checkpoint_dir: str, llm_args: TorchLlmArgs,
             checkpoint_loader: BaseCheckpointLoader) -> TorchLlmArgs:
         """Load model config and apply model-specific defaults to llm_args."""
         if checkpoint_loader is None:
             return llm_args

         config = checkpoint_loader.load_config(checkpoint_dir,
-                                               trust_remote_code=True)
+                                               trust_remote_code=llm_args.trust_remote_code)
tensorrt_llm/commands/serve.py (1)

167-198: Consider the empty dict default for cp_config.

On line 185, cp_config defaults to an empty dict {} when not provided. This empty dict will always pass the is_non_default_or_required check since {} != field_default. If the intention is to exclude cp_config when not explicitly set, you may want to handle this similarly to kv_cache_config.

♻️ Consider using None instead of empty dict for unset cp_config
         "cp_config":
-        cp_config if cp_config else {},
+        cp_config if cp_config else None,
tensorrt_llm/llmapi/llm_utils.py (1)

1014-1033: Loop variable field shadows the field import from dataclasses.

The variable name field on line 1026 shadows the field import from dataclasses on line 7. While this doesn't cause a runtime issue here, it's a code smell that could lead to confusion.

♻️ Rename loop variable to avoid shadowing
     overrides = {}
-    for field in fields_set:
-        value = getattr(llm_args, field)
+    for field_name in fields_set:
+        value = getattr(llm_args, field_name)
         if isinstance(value, BaseModel):
             nested = extract_llm_args_overrides(value)
-            overrides[field] = nested
+            overrides[field_name] = nested
         else:
-            overrides[field] = value
+            overrides[field_name] = value
     return overrides

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34520 [ run ] completed with state SUCCESS. Commit: 0c563a1
/LLM/main/L0_MergeRequest_PR pipeline #26635 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@venkywonka venkywonka force-pushed the venky/model-specific-defaults branch from ed420ac to c746d57 Compare February 3, 2026 06:18
@venkywonka
Copy link
Collaborator Author

/bot run --disable-fail-fast

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
@venkywonka venkywonka force-pushed the venky/model-specific-defaults branch from e4930ac to 339017a Compare February 12, 2026 16:06
@venkywonka
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35798 [ run ] triggered by Bot. Commit: 339017a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35798 [ run ] completed with state FAILURE. Commit: 339017a
/LLM/main/L0_MergeRequest_PR pipeline #27650 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
@venkywonka
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35818 [ run ] triggered by Bot. Commit: 99c8d03

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35818 [ run ] completed with state SUCCESS. Commit: 99c8d03
/LLM/main/L0_MergeRequest_PR pipeline #27666 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@venkywonka
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35833 [ run ] triggered by Bot. Commit: 99c8d03

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35833 [ run ] completed with state SUCCESS. Commit: 99c8d03
/LLM/main/L0_MergeRequest_PR pipeline #27677 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@venkywonka
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35850 [ run ] triggered by Bot. Commit: cc55564

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35850 [ run ] completed with state FAILURE. Commit: cc55564

@venkywonka
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35901 [ run ] triggered by Bot. Commit: cc55564

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35901 [ run ] completed with state SUCCESS. Commit: cc55564
/LLM/main/L0_MergeRequest_PR pipeline #27731 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@venkywonka
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36094 [ run ] triggered by Bot. Commit: cc55564

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36094 [ run ] completed with state SUCCESS. Commit: cc55564
/LLM/main/L0_MergeRequest_PR pipeline #27890 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@venkywonka
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36100 [ run ] triggered by Bot. Commit: cc55564

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36100 [ run ] completed with state SUCCESS. Commit: cc55564
/LLM/main/L0_MergeRequest_PR pipeline #27895 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@venkywonka
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36120 [ run ] triggered by Bot. Commit: cc55564

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36120 [ run ] completed with state SUCCESS. Commit: cc55564
/LLM/main/L0_MergeRequest_PR pipeline #27910 completed with status: 'SUCCESS'

@venkywonka venkywonka merged commit 776132f into NVIDIA:main Feb 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants