[TRTLLM-10845][feat] Add dynamic llmapi defaults system#11035
[TRTLLM-10845][feat] Add dynamic llmapi defaults system#11035venkywonka merged 41 commits intoNVIDIA:mainfrom
Conversation
|
/bot run --disable-fail-fast |
|
PR_Github #34155 [ run ] triggered by Bot. Commit: |
|
PR_Github #34155 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #34495 [ run ] triggered by Bot. Commit: |
|
PR_Github #34495 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
/bot kill |
|
/bot run --disable-fail-fast |
|
PR_Github #34520 [ run ] triggered by Bot. Commit: |
5c6123a to
ed420ac
Compare
📝 WalkthroughWalkthroughThis pull request implements a model-defaults framework that allows individual model architectures to define configuration defaults, which are resolved during model loading, validated against LLM arguments, and applied before runtime feature construction. The system integrates defaults resolution into the model initialization pipeline across loading, execution, and CLI layers. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant PEC as PyExecutorCreator
participant ML as ModelLoader
participant AC as AutoModelForCausalLM
participant ModelClass as Model Class
participant Utils as LlmUtils
participant Config as Config
App->>PEC: create_py_executor(checkpoint_dir, llm_args)
PEC->>ML: load_config_and_apply_defaults(checkpoint_dir, llm_args)
ML->>Config: Load model config from checkpoint
Config-->>ML: ModelConfig
ML->>AC: _resolve_class(config)
AC->>AC: Determine model architecture
AC-->>ML: Model class
ML->>ModelClass: get_model_defaults(llm_args)
alt Model provides defaults
ModelClass-->>ML: model_defaults dict
ML->>Utils: validate_model_defaults(defaults, llm_args)
Utils-->>ML: validated_defaults
ML->>Utils: apply_model_defaults_to_llm_args(llm_args, defaults)
Utils-->>ML: updated_llm_args
else No defaults
ML-->>ML: Use llm_args as-is
end
ML-->>PEC: updated_llm_args
PEC->>PEC: Create AttentionRuntimeFeatures with updated llm_args
PEC->>PEC: Construct PyExecutor
PEC-->>App: PyExecutor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/models/modeling_auto.py (1)
12-63:⚠️ Potential issue | 🟡 MinorGuard empty
architecturesto avoid IndexError in error paths.If
pretrained_config.architecturesis empty,_resolve_classreturnsNone, butfrom_configstill formats the error usingarchitectures[0], which raisesIndexErrorinstead of the intendedValueError. A small guard keeps the error deterministic.🛠️ Suggested fix
- if not pretrained_config.architectures: + architectures = pretrained_config.architectures or [] + if not architectures: return None - - model_arch = pretrained_config.architectures[0] + model_arch = architectures[0] @@ - if config.mm_encoder_only: - model_arch = config.pretrained_config.architectures[0] + if config.mm_encoder_only: + architectures = config.pretrained_config.architectures or [] + if not architectures: + raise ValueError( + "Missing architectures in config for multimodal encoder-only mode." + ) + model_arch = architectures[0] @@ - raise ValueError( - f"Unknown architecture for AutoModelForCausalLM: {config.pretrained_config.architectures[0]}" - ) + model_arch = (config.pretrained_config.architectures[0] + if config.pretrained_config.architectures else + "<unknown>") + raise ValueError( + f"Unknown architecture for AutoModelForCausalLM: {model_arch}" + )
🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/models/modeling_hunyuan_dense.py`:
- Around line 626-632: Import the LLM argument type using a namespaced module
import (e.g., import the module that defines BaseLlmArgs and reference it as
module.BaseLlmArgs) to avoid F821, and mark the unused parameter in
get_model_defaults by renaming or explicitly ignoring it (e.g., prefix with
underscore or add a no-op assignment like _ = llm_args) to silence ARG003;
ensure the function signature still reads get_model_defaults(cls, llm_args:
'module.BaseLlmArgs') and keep the return dict unchanged.
In `@tensorrt_llm/_torch/models/modeling_nemotron_h.py`:
- Around line 531-537: Import the module that defines BaseLlmArgs under a
namespace and update the annotation and parameter handling in
get_model_defaults: change the type hint from the bare 'BaseLlmArgs' to the
namespaced form (e.g., "base_llm_args.BaseLlmArgs" or base_llm_args.BaseLlmArgs)
by adding a namespace import (import base_llm_args), and mark the llm_args
parameter as intentionally unused by assigning it to a throwaway variable (e.g.,
_ = llm_args) or renaming to _llm_args to silence ARG003; keep the function name
get_model_defaults and the returned dict unchanged.
In `@tensorrt_llm/_torch/models/modeling_qwen3_next.py`:
- Around line 1254-1257: Import the module that defines BaseLlmArgs under a
namespace and update get_model_defaults to use that namespaced annotation and
mark the parameter as intentionally unused; specifically, add a namespace import
for the module that provides BaseLlmArgs (e.g., import as base), change the
signature of get_model_defaults to use base.BaseLlmArgs for the llm_args
annotation, and either rename the parameter to _llm_args or add a short
assignment _ = llm_args inside get_model_defaults to silence the unused-arg lint
(keep the method name get_model_defaults unchanged).
In `@tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`:
- Around line 344-350: The logger.info call currently passes
attn_runtime_features as a separate formatting argument without a placeholder
which triggers a TypeError; update the call that logs AttentionRuntimeFeatures
(the variable attn_runtime_features and the AttentionRuntimeFeatures
instantiation) to include a proper format placeholder (e.g., use "%s") or pass a
single formatted string so logger.info receives matching arguments and no
formatting error occurs.
In `@tensorrt_llm/llmapi/llm_utils.py`:
- Around line 33-43: The import list is pulling in SchedulerConfig from
bindings.executor but you also import SchedulerConfig from .llm_args, causing
the bindings SchedulerConfig to be unused and shadowed; remove SchedulerConfig
from the bindings.executor import (the import on the line that imports
bindings.executor symbols) so only the wrapper SchedulerConfig from llm_args
remains, and if there are other unused symbols imported from bindings.executor
consolidate or remove them as well to avoid redundant bindings imports; ensure
references to SchedulerConfig in this module continue to resolve to the
.llm_args SchedulerConfig (no other code changes needed).
🧹 Nitpick comments (6)
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)
332-340: Keep module namespace on the model_loader import.Line 332 uses
from ... import ..., which drops the module namespace. Import the module and qualify uses to align with repo conventions.♻️ Suggested refactor
-from tensorrt_llm._torch.pyexecutor.model_loader import ModelLoader, _construct_checkpoint_loader +import tensorrt_llm._torch.pyexecutor.model_loader as model_loader @@ -checkpoint_loader = _construct_checkpoint_loader(llm_args.backend, - llm_args.checkpoint_loader, - llm_args.checkpoint_format) +checkpoint_loader = model_loader._construct_checkpoint_loader( + llm_args.backend, llm_args.checkpoint_loader, llm_args.checkpoint_format) @@ -llm_args = ModelLoader.load_config_and_apply_defaults( - checkpoint_dir, llm_args, checkpoint_loader) +llm_args = model_loader.ModelLoader.load_config_and_apply_defaults( + checkpoint_dir, llm_args, checkpoint_loader)As per coding guidelines: Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.
tests/unittest/llmapi/test_llm_args.py (2)
1102-1114: Consider using pytest's tmp_path fixture instead of hardcoded /tmp path.The hardcoded
/tmp/dummy_model_for_validation_testpath raises a security concern (flagged by S108). Since this is for validation testing where the path doesn't need to exist, this is acceptable, but usingtmp_pathfixture would be more robust.♻️ Optional: Use pytest's tmp_path fixture
-def test_model_defaults_validation(defaults_dict, should_raise, error_contains): +def test_model_defaults_validation(defaults_dict, should_raise, error_contains, tmp_path): # Use a dummy model path for testing (doesn't need to exist for validation) - llm_args = TorchLlmArgs(model="/tmp/dummy_model_for_validation_test") + llm_args = TorchLlmArgs(model=str(tmp_path / "dummy_model_for_validation_test"))
1140-1170: Remove unusedmonkeypatchparameter from fixture.The
monkeypatchparameter in thesetupfixture is unused (static analysis ARG002). It can be removed since class attribute manipulation is done directly viasetattr.♻️ Remove unused parameter
`@pytest.fixture`(autouse=True) - def setup(self, monkeypatch, tmp_path): + def setup(self, tmp_path): self.get_model_defaults_called = False self.tmp_path = tmp_pathtensorrt_llm/_torch/pyexecutor/model_loader.py (1)
217-241: Well-designed static method for early defaults application.The method correctly:
- Guards against None checkpoint_loader
- Uses
_resolve_classto get the model class without instantiation- Validates defaults before applying them
- Logs applied defaults at debug level
One consideration: The
load_configcall withtrust_remote_code=Trueis hardcoded. This may not align with the user's preference if they explicitly settrust_remote_code=Falsein their llm_args.♻️ Consider respecting user's trust_remote_code preference
`@staticmethod` def load_config_and_apply_defaults( checkpoint_dir: str, llm_args: TorchLlmArgs, checkpoint_loader: BaseCheckpointLoader) -> TorchLlmArgs: """Load model config and apply model-specific defaults to llm_args.""" if checkpoint_loader is None: return llm_args config = checkpoint_loader.load_config(checkpoint_dir, - trust_remote_code=True) + trust_remote_code=llm_args.trust_remote_code)tensorrt_llm/commands/serve.py (1)
167-198: Consider the empty dict default for cp_config.On line 185,
cp_configdefaults to an empty dict{}when not provided. This empty dict will always pass theis_non_default_or_requiredcheck since{} != field_default. If the intention is to excludecp_configwhen not explicitly set, you may want to handle this similarly tokv_cache_config.♻️ Consider using None instead of empty dict for unset cp_config
"cp_config": - cp_config if cp_config else {}, + cp_config if cp_config else None,tensorrt_llm/llmapi/llm_utils.py (1)
1014-1033: Loop variablefieldshadows thefieldimport from dataclasses.The variable name
fieldon line 1026 shadows thefieldimport fromdataclasseson line 7. While this doesn't cause a runtime issue here, it's a code smell that could lead to confusion.♻️ Rename loop variable to avoid shadowing
overrides = {} - for field in fields_set: - value = getattr(llm_args, field) + for field_name in fields_set: + value = getattr(llm_args, field_name) if isinstance(value, BaseModel): nested = extract_llm_args_overrides(value) - overrides[field] = nested + overrides[field_name] = nested else: - overrides[field] = value + overrides[field_name] = value return overrides
|
PR_Github #34520 [ run ] completed with state
|
ed420ac to
c746d57
Compare
|
/bot run --disable-fail-fast |
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
e4930ac to
339017a
Compare
|
/bot run |
|
PR_Github #35798 [ run ] triggered by Bot. Commit: |
|
PR_Github #35798 [ run ] completed with state
|
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
|
/bot run |
|
PR_Github #35818 [ run ] triggered by Bot. Commit: |
|
PR_Github #35818 [ run ] completed with state
|
|
/bot run |
|
PR_Github #35833 [ run ] triggered by Bot. Commit: |
|
PR_Github #35833 [ run ] completed with state
|
|
/bot run |
|
PR_Github #35850 [ run ] triggered by Bot. Commit: |
|
PR_Github #35850 [ run ] completed with state |
|
/bot run |
|
PR_Github #35901 [ run ] triggered by Bot. Commit: |
|
PR_Github #35901 [ run ] completed with state
|
|
/bot run |
|
PR_Github #36094 [ run ] triggered by Bot. Commit: |
|
PR_Github #36094 [ run ] completed with state
|
|
/bot run |
|
PR_Github #36100 [ run ] triggered by Bot. Commit: |
|
PR_Github #36100 [ run ] completed with state
|
|
/bot run |
|
PR_Github #36120 [ run ] triggered by Bot. Commit: |
|
PR_Github #36120 [ run ] completed with state |
Motivation and Design: #11083
Summary:
This PR introduces a dynamic model-specific defaults system that allows models to declare their required configurations, eliminating the need for users and tests to manually specify model-specific constraints.
Problem:
Previously, users had to manually configure model-specific settings like disabling KV cache block reuse for SSM models. This was error-prone and required deep knowledge of model architectures.
Solution:
get_model_defaults()classmethod to model classesExplicitly disabling
enable_block_reusewill no longer be necessary for below models:Benefits:
Other changes
trtllm-serveapplies CLI overrides to prevent pydantic defaults from getting contaminated by serve-frontend's own defaults (which are redundant and same as pydantic defaults)fail_fast_on_attention_window_too_largeattribute toTrtLlmArgsfromBaseLlmArgsSummary by CodeRabbit
Release Notes
New Features
LlavaLlamaForCausalLMmodel type alias.Improvements