Skip to content

Comments

[TRTLLM-9527][feat] Add transferAgent binding (step 1)#10113

Merged
chuangz0 merged 6 commits intoNVIDIA:mainfrom
chuangz0:nixl_agent_py_binding
Jan 6, 2026
Merged

[TRTLLM-9527][feat] Add transferAgent binding (step 1)#10113
chuangz0 merged 6 commits intoNVIDIA:mainfrom
chuangz0:nixl_agent_py_binding

Conversation

@chuangz0
Copy link
Collaborator

@chuangz0 chuangz0 commented Dec 18, 2025

@coderabbitai summary

Description

For python cache transceiver, we need nixl binding . This PR add transferAgent binding , which is separated from other binding.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@chuangz0 chuangz0 requested a review from a team as a code owner December 18, 2025 08:44
@chuangz0
Copy link
Collaborator Author

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 18, 2025

📝 Walkthrough

Walkthrough

This PR introduces Python bindings for the TensorRT-LLM Transfer Agent, enabling Python code to interact with cache transmission and memory management capabilities. It adds new enum types (TransferState), modifies the TransferStatus interface to support timed waits with state returns, extends configuration options (listen threads, worker count), and provides comprehensive pybind11 and nanobind binding implementations alongside supporting CMake, build script, and packaging updates.

Changes

Cohort / File(s) Summary
Build configuration
cpp/tensorrt_llm/CMakeLists.txt, cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txt, cpp/tensorrt_llm/nanobind/CMakeLists.txt, cpp/tensorrt_llm/pybind/CMakeLists.txt
Added TRANSFER_AGENT_BINDING_TARGET CMake variable; introduced new tensorrt_llm_transfer_agent_binding target with conditional pybind11/nanobind support, compile options, include directories, linking against NIXL components, and RPATH configuration; added build dependencies to ensure proper linking order.
Header interface updates
cpp/include/tensorrt_llm/executor/transferAgent.h, cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h
Added TransferState enum (kIN_PROGRESS, kSUCCESS, kFAILURE); updated TransferStatus::wait() to return TransferState and accept optional timeout_ms parameter; extended BaseAgentConfig with useListenThread (bool) and numWorkers (unsigned int) fields.
Implementation updates
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp, cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
Implemented timed wait with TransferState return and polling logic; added chrono/thread headers for timeout handling; extended NixlTransferAgent construction with separate branch for useListenThread=false configuration; updated BaseAgentConfig initialization with new fields; modified call sites to capture and assert TransferState::kSUCCESS.
Python bindings
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cpp, cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp
Created comprehensive pybind11 and nanobind binding modules exposing enums (MemoryType, TransferOp, TransferState), data classes (MemoryDesc, MemoryDescs, AgentDesc, TransferRequest), status/agent classes (TransferStatus, NixlTransferStatus, BaseAgentConfig, BaseTransferAgent, NixlTransferAgent) with appropriate constructors, properties, methods, ownership semantics, and GIL handling guards for blocking operations.
Build and packaging
scripts/build_wheel.py, setup.py, .gitignore
Added nixl parameter to generate_python_stubs_linux() function for transfer agent binding stub generation; updated file installation logic for tensorrt_llm_transfer_agent_binding*.so and libtensorrt_llm_nixl_wrapper.so with RPATH configuration; added gitignore patterns for binding artifacts; expanded setup.py package_data with new binding patterns.
Tests
tests/unittest/bindings/test_transfer_agent_bindings.py
Added unit tests validating enums, MemoryDesc/MemoryDescs construction and behavior, AgentDesc initialization, BaseAgentConfig properties, and TransferRequest instantiation with conditional module availability checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Binding implementations (agentBindingsPybind.cpp, agentBindingsNanobind.cpp): Verify semantic equivalence between pybind11 and nanobind variants, ownership transfer semantics via rv_policy::take_ownership/move, and GIL management guards on potentially blocking operations.
  • Timeout logic in transferAgent.cpp: Validate polling implementation, timeout calculation accuracy, and state transition correctness in the new wait() method.
  • Configuration branching: Review the dual-path NixlTransferAgent construction logic (useListenThread true/false branches) for correctness and consistency.
  • CMake dependency chain: Ensure proper build ordering between transfer agent binding and main binding modules across pybind11/nanobind variants.
  • Build script integration: Verify stub generation paths for both binding types and artifact installation with correct RPATH settings.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.13% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The PR description is minimal with only a brief explanation ('For python cache transceiver, we need nixl binding. This PR add transferAgent binding, which is separated from other binding.') but lacks detailed information about changes, rationale, and test coverage. Expand the Description section to explain the design approach, integration points, and why the binding is separated. Complete or clarify the Test Coverage section to detail the tests that validate these changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main feature being added: transferAgent binding support, with step notation indicating this is part 1 of a multi-step implementation.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (9)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txt (2)

72-74: INSTALL_RPATH is not needed per project conventions.

Based on learnings, artifacts in TensorRT-LLM are manually copied rather than installed via cmake --install, so only BUILD_RPATH affects the final artifacts. The INSTALL_RPATH setting can be removed.

🔎 Suggested fix:
   # Set RPATH for the module to find libtensorrt_llm_nixl_wrapper.so
   set_target_properties(
-    ${AGENT_BINDING_TARGET} PROPERTIES BUILD_RPATH "$ORIGIN;$ORIGIN/libs"
-                                       INSTALL_RPATH "$ORIGIN;$ORIGIN/libs")
+    ${AGENT_BINDING_TARGET} PROPERTIES BUILD_RPATH "$ORIGIN;$ORIGIN/libs")

62-69: Consider consolidating CMake directives for readability.

Multiple target_include_directories and target_link_libraries calls can be combined into single calls for cleaner code.

🔎 Suggested consolidation:
   # Add include directories
-  target_include_directories(${AGENT_BINDING_TARGET} PRIVATE NIXL::nixl)
-  target_include_directories(${AGENT_BINDING_TARGET}
-                             PRIVATE ${PROJECT_SOURCE_DIR}/include)
+  target_include_directories(${AGENT_BINDING_TARGET} PRIVATE
+                             NIXL::nixl
+                             ${PROJECT_SOURCE_DIR}/include)

   # Link against NIXL wrapper and NIXL libraries
-  target_link_libraries(${AGENT_BINDING_TARGET} PRIVATE ${NIXL_WRAPPER_TARGET})
-  target_link_libraries(${AGENT_BINDING_TARGET} PRIVATE NIXL::nixl)
-  target_link_libraries(${AGENT_BINDING_TARGET} PRIVATE CUDA::cudart)
+  target_link_libraries(${AGENT_BINDING_TARGET} PRIVATE
+                        ${NIXL_WRAPPER_TARGET}
+                        NIXL::nixl
+                        CUDA::cudart)
cpp/include/tensorrt_llm/executor/transferAgent.h (1)

293-300: Consider adding 'm' prefix to member variables per coding guidelines.

The coding guidelines specify that class member variables should use camelCase prefixed with 'm' (e.g., mNumWorkers). However, the existing fields in this struct (mName, useProgThread, multiThread) are inconsistent - mName has the prefix while others don't. The new fields follow the same inconsistent pattern.

For consistency within this struct, the current approach matches the existing code. You may want to address this inconsistency in a follow-up refactor.

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)

250-250: Consider using designated initializers for clarity.

The BaseAgentConfig brace initialization with 5 positional arguments ({mAgentName, true, false, true, 1}) is correct but less readable. Consider using C++20 designated initializers or adding an inline comment to clarify each field:

BaseAgentConfig config{mAgentName, /* useProgThread = */ true, /* multiThread = */ false, /* useListenThread = */ true, /* numWorkers = */ 1};
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (1)

323-357: LGTM with minor suggestion.

The wait() implementation correctly handles the three transfer states and timeout logic. The use of steady_clock is appropriate for elapsed time measurement.

Consider adding a brief sleep (e.g., std::this_thread::sleep_for(std::chrono::microseconds(100))) instead of pure yield() to reduce CPU spinning during long waits, especially when timeout_ms is large or indefinite.

scripts/build_wheel.py (1)

808-825: Verify agent binding glob pattern matches only one file.

The glob pattern tensorrt_llm_transfer_agent_binding*.so could match multiple files if there are leftover artifacts from previous builds with different Python versions (e.g., tensorrt_llm_transfer_agent_binding.cpython-310-x86_64-linux-gnu.so and tensorrt_llm_transfer_agent_binding.cpython-311-x86_64-linux-gnu.so). Consider adding a check or cleanup.

🔎 Suggested improvement:
         agent_binding_so = list(
             nixl_utils_dir.glob("tensorrt_llm_transfer_agent_binding*.so"))
         if agent_binding_so:
+            if len(agent_binding_so) > 1:
+                print(f"Warning: Multiple agent binding .so files found: {agent_binding_so}, using first", file=sys.stderr)
             trtllm_dir = project_dir / "tensorrt_llm"
             install_file(agent_binding_so[0],
                          trtllm_dir / agent_binding_so[0].name)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cpp (2)

64-79: AgentDesc bytes handling is correct but uses raw Python C API.

The lambda uses PyBytes_AsString and PyBytes_Size directly. While functional, this bypasses pybind11's safer abstractions. Consider using pybind11's py::bytes methods for consistency.

🔎 Alternative using pybind11 accessors:
     py::class_<kvc::AgentDesc>(m, "AgentDesc")
         .def(py::init(
                  [](py::bytes data)
                  {
-                     std::string str(PyBytes_AsString(data.ptr()), PyBytes_Size(data.ptr()));
+                     std::string str = static_cast<std::string>(data);
                      return kvc::AgentDesc{std::move(str)};
                  }),
             py::arg("backend_agent_desc"))

145-168: NixlTransferAgent duplicates BaseTransferAgent method bindings.

The NixlTransferAgent class re-binds all methods already defined in BaseTransferAgent. Since NixlTransferAgent inherits from BaseTransferAgent in the binding definition (line 146), these methods should be inherited automatically. The only difference is submit_transfer_requests with py::call_guard<py::gil_scoped_release>().

Consider removing the duplicate method bindings and only override submit_transfer_requests:

🔎 Suggested simplification:
     // NixlTransferAgent class
     py::class_<kvc::NixlTransferAgent, kvc::BaseTransferAgent>(m, "NixlTransferAgent")
         .def(py::init<kvc::BaseAgentConfig const&>(), py::arg("config"))
-        .def("register_memory", &kvc::NixlTransferAgent::registerMemory, py::arg("descs"))
-        .def("deregister_memory", &kvc::NixlTransferAgent::deregisterMemory, py::arg("descs"))
-        .def("load_remote_agent",
-            py::overload_cast<std::string const&, kvc::AgentDesc const&>(&kvc::NixlTransferAgent::loadRemoteAgent),
-            py::arg("name"), py::arg("agent_desc"))
-        .def("load_remote_agent_by_connection",
-            py::overload_cast<std::string const&, kvc::ConnectionInfoType const&>(
-                &kvc::NixlTransferAgent::loadRemoteAgent),
-            py::arg("name"), py::arg("connection_info"))
-        .def("get_local_agent_desc", &kvc::NixlTransferAgent::getLocalAgentDesc)
-        .def("get_local_connection_info", &kvc::NixlTransferAgent::getLocalConnectionInfo)
-        .def("invalidate_remote_agent", &kvc::NixlTransferAgent::invalidateRemoteAgent, py::arg("name"))
         .def(
             "submit_transfer_requests",
             [](kvc::NixlTransferAgent& self, kvc::TransferRequest const& request)
             { return self.submitTransferRequests(request).release(); },
-            py::arg("request"), py::return_value_policy::take_ownership, py::call_guard<py::gil_scoped_release>())
-        .def(
-            "notify_sync_message", &kvc::NixlTransferAgent::notifySyncMessage, py::arg("name"), py::arg("sync_message"))
-        .def("get_notified_sync_messages", &kvc::NixlTransferAgent::getNotifiedSyncMessages)
-        .def("check_remote_descs", &kvc::NixlTransferAgent::checkRemoteDescs, py::arg("name"), py::arg("memory_descs"));
+            py::arg("request"), py::return_value_policy::take_ownership, py::call_guard<py::gil_scoped_release>());

However, if this explicit re-binding is intentional to ensure the GIL is released for all NixlTransferAgent calls (not just submit_transfer_requests), then consider adding py::call_guard<py::gil_scoped_release>() to the other methods as well for consistency.

cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp (1)

150-173: Same duplication concern as pybind11 version.

The NixlTransferAgent re-binds methods already inherited from BaseTransferAgent. The same refactoring suggestion from the pybind11 review applies here.

If the duplication is intentional to add nb::call_guard<nb::gil_scoped_release>() to blocking operations, consider being explicit about which methods need GIL release. Currently only submit_transfer_requests has the guard, while other potentially blocking methods like register_memory, load_remote_agent, etc. do not.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df15be3 and d73af17.

📒 Files selected for processing (14)
  • .gitignore (1 hunks)
  • cpp/include/tensorrt_llm/executor/transferAgent.h (1 hunks)
  • cpp/tensorrt_llm/CMakeLists.txt (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txt (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cpp (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (6 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/CMakeLists.txt (1 hunks)
  • cpp/tensorrt_llm/pybind/CMakeLists.txt (1 hunks)
  • scripts/build_wheel.py (4 hunks)
  • setup.py (1 hunks)
  • tests/unittest/bindings/test_transfer_agent_bindings.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cpp,h,cu,cuh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.{cpp,h,cu,cuh}: Closing braces of namespaces should have a comment saying the namespace it closes: } // namespace foo
Prefer const or constexpr variables over #define whenever possible, as the latter are not visible to the compiler
A variable that is not modified after its initialization should be declared as const
For naming of constants in C++, follow the naming section conventions
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization in C++
Use the Allman indentation style in C++
Put the semicolon for an empty for or while loop in a new line in C++
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements) in C++
If and else should always be followed by brace-delimited statements, even if empty or a single statement in C++
C++ filenames should use camel case with first letter lowercase: thisIsASubDir and thisIsAFilename.cpp
All files involved in the compilation of a compilation target (.exe/.so) must have filenames that are case-insensitive unique in C++
All types (including class names) in C++ should use camel case with uppercase first letter: FooBarClass
Local variables, methods and namespaces in C++ should use camel case with first letter lowercase: localFooBar
Non-magic-number global variables that are non-static and not defined in anonymous namespace in C++ should use camel case prefixed by a lower case 'g': gDontUseGlobalFoos
Non-magic-number global variables that are static or defined in an anonymous namespace in C++ should use camel case prefixed by a lower case 's': sMutableStaticGlobal
Locally visible static variables in C++ should use camel case with lowercase prefix 's' as the first letter: static std::once_flag sFlag;
Public, private and protected class member variables in C++ should use camel case prefi...

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
  • cpp/include/tensorrt_llm/executor/transferAgent.h
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp
**/*.h

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.h: Use a preprocessor guard in C++ header files with the format TRTLLM_<FILENAME>_H derived from the filename in all caps
The preprocessor guard name in C++ must have prefix TRTLLM_ followed by the filename, all in caps. Only use the file name, not directory names
Do not use prefix with underscore in C++ preprocessor guard symbols as such symbols are reserved in C++ standard for compilers or implementation
Do not use trailing underscore in C++ preprocessor guard symbols (unlike Google C++ guideline)

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h
  • cpp/include/tensorrt_llm/executor/transferAgent.h
**/*.{cpp,h,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the year of its latest meaningful modification

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h
  • tests/unittest/bindings/test_transfer_agent_bindings.py
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
  • cpp/include/tensorrt_llm/executor/transferAgent.h
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cpp
  • setup.py
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp
  • scripts/build_wheel.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces. Do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used
Python files should use snake_case naming: some_file.py
Python classes should use PascalCase naming: class SomeClass
Python functions and methods should use snake_case naming: def my_awesome_function():
Python local variables should use snake_case naming: my_variable = ...
Python variable names that start with a number should be prefixed with 'k': k_99th_percentile = ...
Python global variables should use upper snake_case with prefix 'G': G_MY_GLOBAL = ...
Python constants should use upper snake_case naming: MY_CONSTANT = ...
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings in Python for classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except to the smallest set of errors possible
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible, using the else block for logic

Files:

  • tests/unittest/bindings/test_transfer_agent_bindings.py
  • setup.py
  • scripts/build_wheel.py
🧠 Learnings (20)
📓 Common learnings
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 7294
File: tensorrt_llm/_torch/modules/rms_norm.py:17-17
Timestamp: 2025-08-27T14:23:55.566Z
Learning: The TensorRT-LLM project requires Python 3.10+ as evidenced by the use of TypeAlias from typing module, match/case statements, and union type | syntax throughout the codebase, despite some documentation still mentioning Python 3.8+.
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 8398
File: tensorrt_llm/_torch/pyexecutor/sampling_utils.py:237-272
Timestamp: 2025-10-17T13:21:31.724Z
Learning: The setup.py file in TensorRT-LLM explicitly requires Python 3.10+ via `python_requires=">=3.10, <4"`, making match/case statements and other Python 3.10+ features appropriate throughout the codebase.
📚 Learning: 2025-08-25T08:47:24.758Z
Learnt from: tshmilnvidia
Repo: NVIDIA/TensorRT-LLM PR: 5488
File: cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp:531-547
Timestamp: 2025-08-25T08:47:24.758Z
Learning: For NIXL API postXferReq function in NixlLoopbackAgent::submitLoopbackRequests, the expected return status is exactly NIXL_IN_PROG, not NIXL_SUCCESS. The strict check `TLLM_CHECK(status == NIXL_IN_PROG)` is correct and intentional for this specific API call.

Applied to files:

  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp
📚 Learning: 2025-09-09T18:31:44.336Z
Learnt from: venkywonka
Repo: NVIDIA/TensorRT-LLM PR: 7658
File: .github/CODEOWNERS:160-164
Timestamp: 2025-09-09T18:31:44.336Z
Learning: The ruleset for `release/**` branch patterns in the NVIDIA/TensorRT-LLM repository covers NIM-specific release branches like `release/1.0.1-NIM`, ensuring proper code ownership enforcement.

Applied to files:

  • .gitignore
📚 Learning: 2025-08-18T09:08:07.687Z
Learnt from: tongyuantongyu
Repo: NVIDIA/TensorRT-LLM PR: 6984
File: cpp/tensorrt_llm/CMakeLists.txt:297-299
Timestamp: 2025-08-18T09:08:07.687Z
Learning: In the TensorRT-LLM project, artifacts are manually copied rather than installed via `cmake --install`, so INSTALL_RPATH properties are not needed - only BUILD_RPATH affects the final artifacts.

Applied to files:

  • .gitignore
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txt
  • cpp/tensorrt_llm/nanobind/CMakeLists.txt
  • cpp/tensorrt_llm/CMakeLists.txt
  • setup.py
  • cpp/tensorrt_llm/pybind/CMakeLists.txt
  • scripts/build_wheel.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
Repo: NVIDIA/TensorRT-LLM PR: 6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • .gitignore
  • tests/unittest/bindings/test_transfer_agent_bindings.py
  • setup.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
Repo: NVIDIA/TensorRT-LLM PR: 6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • .gitignore
  • setup.py
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.

Applied to files:

  • .gitignore
  • tests/unittest/bindings/test_transfer_agent_bindings.py
  • setup.py
📚 Learning: 2025-09-16T09:30:09.716Z
Learnt from: tongyuantongyu
Repo: NVIDIA/TensorRT-LLM PR: 7763
File: cpp/tensorrt_llm/CMakeLists.txt:297-301
Timestamp: 2025-09-16T09:30:09.716Z
Learning: In the TensorRT-LLM project, NCCL libraries are loaded earlier by PyTorch libraries or the bindings library, so the main shared library doesn't need NCCL paths in its RPATH - the libraries will already be available in the process address space when needed.

Applied to files:

  • .gitignore
  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txt
  • cpp/tensorrt_llm/nanobind/CMakeLists.txt
  • cpp/tensorrt_llm/CMakeLists.txt
  • setup.py
  • scripts/build_wheel.py
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • .gitignore
  • tests/unittest/bindings/test_transfer_agent_bindings.py
📚 Learning: 2025-08-11T20:09:24.389Z
Learnt from: achartier
Repo: NVIDIA/TensorRT-LLM PR: 6763
File: tests/integration/defs/triton_server/conftest.py:16-22
Timestamp: 2025-08-11T20:09:24.389Z
Learning: In the TensorRT-LLM test infrastructure, the team prefers simple, direct solutions (like hard-coding directory traversal counts) over more complex but robust approaches when dealing with stable directory structures. They accept the maintenance cost of updating tests if the layout changes.

Applied to files:

  • .gitignore
📚 Learning: 2025-08-21T21:48:35.135Z
Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 7104
File: cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_scatter.hpp:399-417
Timestamp: 2025-08-21T21:48:35.135Z
Learning: CUTLASS extensions in TensorRT-LLM (located under cpp/tensorrt_llm/cutlass_extensions/) are designed to integrate with and extend functionality in the external CUTLASS repository. When analyzing these extensions, their consumers and functionality wiring may exist in the CUTLASS codebase rather than within TensorRT-LLM itself.

Applied to files:

  • .gitignore
  • cpp/tensorrt_llm/CMakeLists.txt
📚 Learning: 2025-08-27T14:23:55.566Z
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 7294
File: tensorrt_llm/_torch/modules/rms_norm.py:17-17
Timestamp: 2025-08-27T14:23:55.566Z
Learning: The TensorRT-LLM project requires Python 3.10+ as evidenced by the use of TypeAlias from typing module, match/case statements, and union type | syntax throughout the codebase, despite some documentation still mentioning Python 3.8+.

Applied to files:

  • .gitignore
📚 Learning: 2025-08-14T15:38:01.771Z
Learnt from: MatthiasKohl
Repo: NVIDIA/TensorRT-LLM PR: 6904
File: cpp/tensorrt_llm/pybind/thop/bindings.cpp:55-57
Timestamp: 2025-08-14T15:38:01.771Z
Learning: In TensorRT-LLM Python bindings, tensor parameter collections like mla_tensor_params and spec_decoding_tensor_params are kept as required parameters without defaults to maintain API consistency, even when it might affect backward compatibility.

Applied to files:

  • .gitignore
📚 Learning: 2025-08-29T14:07:45.863Z
Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7370
File: tests/unittest/trt/model_api/test_model_quantization.py:24-27
Timestamp: 2025-08-29T14:07:45.863Z
Learning: In TensorRT-LLM's CI infrastructure, pytest skip markers (pytest.mark.skip) are properly honored even when test files have __main__ blocks that call test functions directly. The testing system correctly skips tests without requiring modifications to the __main__ block execution pattern.

Applied to files:

  • tests/unittest/bindings/test_transfer_agent_bindings.py
📚 Learning: 2025-09-23T15:01:00.070Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:15-17
Timestamp: 2025-09-23T15:01:00.070Z
Learning: In TensorRT-LLM NCCL device kernels, the <sstream> header is not needed as an explicit include in config.cu because it's provided transitively through other headers. Local compilation testing confirms this works without the explicit include.

Applied to files:

  • cpp/tensorrt_llm/nanobind/CMakeLists.txt
  • cpp/tensorrt_llm/CMakeLists.txt
📚 Learning: 2025-09-23T15:12:38.312Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/thop/allreduceOp.cpp:352-446
Timestamp: 2025-09-23T15:12:38.312Z
Learning: In TensorRT-LLM NCCL device implementation, NCCL version 2.28+ requirements are handled at runtime in the nccl_device/config layer rather than with compile-time guards. This allows the allreduceOp to remain version-agnostic and delegates version compatibility validation to the appropriate lower-level components that can gracefully handle unsupported configurations.

Applied to files:

  • cpp/tensorrt_llm/nanobind/CMakeLists.txt
📚 Learning: 2025-09-02T13:42:44.885Z
Learnt from: pcastonguay
Repo: NVIDIA/TensorRT-LLM PR: 7455
File: tensorrt_llm/_torch/pyexecutor/py_executor.py:1852-1860
Timestamp: 2025-09-02T13:42:44.885Z
Learning: In MPI communication within TensorRT-LLM pipeline parallelism, different communication types (tokens, logits, termination sync) must use disjoint tag namespaces to avoid message routing collisions when using the same source/destination patterns.

Applied to files:

  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp
📚 Learning: 2025-09-23T15:01:00.070Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:15-17
Timestamp: 2025-09-23T15:01:00.070Z
Learning: In TensorRT-LLM NCCL device kernels (cpp/tensorrt_llm/kernels/nccl_device/config.cu), std::ostringstream is used but <sstream> doesn't need to be explicitly included because it's provided transitively through other headers like tensorrt_llm/common/cudaUtils.h or config.h. Local compilation testing confirms this works without the explicit include.

Applied to files:

  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp
📚 Learning: 2025-10-17T13:21:31.724Z
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 8398
File: tensorrt_llm/_torch/pyexecutor/sampling_utils.py:237-272
Timestamp: 2025-10-17T13:21:31.724Z
Learning: The setup.py file in TensorRT-LLM explicitly requires Python 3.10+ via `python_requires=">=3.10, <4"`, making match/case statements and other Python 3.10+ features appropriate throughout the codebase.

Applied to files:

  • setup.py
📚 Learning: 2025-08-25T08:48:39.694Z
Learnt from: tshmilnvidia
Repo: NVIDIA/TensorRT-LLM PR: 5488
File: cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp:507-517
Timestamp: 2025-08-25T08:48:39.694Z
Learning: When reviewing potential compilation errors related to function parameter types, always check for type aliases using `using` declarations that might make seemingly incompatible types equivalent. RegisterDescs = MemoryDescs in TensorRT-LLM's transfer agent code.

Applied to files:

  • cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp
🧬 Code graph analysis (7)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (2)
cpp/include/tensorrt_llm/executor/transferAgent.h (7)
  • nodiscard (74-82)
  • nodiscard (103-111)
  • nodiscard (157-165)
  • nodiscard (244-252)
  • nodiscard (254-262)
  • nodiscard (289-354)
  • nodiscard (371-377)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (11)
  • nodiscard (201-212)
  • nodiscard (214-217)
  • nodiscard (219-227)
  • nodiscard (229-237)
  • nodiscard (239-247)
  • nodiscard (249-257)
  • nodiscard (259-267)
  • nodiscard (359-362)
  • nodiscard (461-494)
  • wait (323-357)
  • wait (323-323)
tests/unittest/bindings/test_transfer_agent_bindings.py (2)
cpp/include/tensorrt_llm/executor/types.h (1)
  • MemoryType (165-646)
cpp/include/tensorrt_llm/executor/transferAgent.h (5)
  • TransferOp (218-275)
  • TransferState (277-418)
  • MemoryDescs (94-99)
  • AgentDesc (199-420)
  • TransferRequest (225-240)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)
cpp/tests/unit_tests/executor/transferAgentTest.cpp (2)
  • config (64-64)
  • config (378-378)
cpp/include/tensorrt_llm/executor/transferAgent.h (2)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (13)
  • nodiscard (201-212)
  • nodiscard (214-217)
  • nodiscard (219-227)
  • nodiscard (229-237)
  • nodiscard (239-247)
  • nodiscard (249-257)
  • nodiscard (259-267)
  • nodiscard (359-362)
  • nodiscard (461-494)
  • nodiscard (504-513)
  • isCompleted (359-359)
  • wait (323-357)
  • wait (323-323)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (1)
  • nodiscard (46-51)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (2)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (1)
  • mHandle (52-52)
cpp/tests/unit_tests/executor/transferAgentTest.cpp (2)
  • config (64-64)
  • config (378-378)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp (2)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (25)
  • wait (323-357)
  • wait (323-323)
  • registerMemory (420-429)
  • registerMemory (420-420)
  • registerMemory (596-603)
  • registerMemory (596-596)
  • deregisterMemory (431-436)
  • deregisterMemory (431-431)
  • deregisterMemory (605-612)
  • deregisterMemory (605-605)
  • loadRemoteAgent (438-446)
  • loadRemoteAgent (438-438)
  • loadRemoteAgent (520-554)
  • loadRemoteAgent (520-520)
  • getLocalAgentDesc (448-454)
  • getLocalAgentDesc (448-448)
  • invalidateRemoteAgent (456-459)
  • invalidateRemoteAgent (456-456)
  • notifySyncMessage (496-502)
  • notifySyncMessage (496-496)
  • getNotifiedSyncMessages (504-504)
  • getLocalConnectionInfo (515-518)
  • getLocalConnectionInfo (515-515)
  • checkRemoteDescs (556-562)
  • checkRemoteDescs (556-556)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (2)
  • getLocalAgentDesc (67-81)
  • getLocalConnectionInfo (87-95)
scripts/build_wheel.py (1)
tests/integration/defs/cpp/conftest.py (1)
  • build_dir (28-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (29)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txt (2)

42-57: LGTM!

The conditional binding module creation based on BINDING_TYPE is well-structured, and the status messages provide good build-time visibility.


78-80: LGTM!

Exporting AGENT_BINDING_TARGET to parent scope correctly enables the main library to establish build dependencies on the transfer agent binding module.

cpp/tensorrt_llm/nanobind/CMakeLists.txt (1)

69-73: LGTM!

The conditional dependency correctly ensures the transfer agent binding module is built before the main nanobind module when NIXL is enabled. The if(TARGET ...) guard appropriately handles cases where NIXL is not configured.

cpp/tensorrt_llm/CMakeLists.txt (1)

158-161: LGTM!

The TRANSFER_AGENT_BINDING_TARGET variable is correctly defined within the NIXL_ROOT block, ensuring it's only set when NIXL support is being built. The naming is consistent with the target defined in the nixl_utils subdirectory.

setup.py (1)

116-118: LGTM!

The new transfer agent binding artifacts are correctly added to the package data, ensuring they will be included in the wheel distribution. The glob patterns appropriately capture both the shared library (with platform-specific suffix) and the type stub file.

cpp/include/tensorrt_llm/executor/transferAgent.h (2)

277-282: LGTM!

The TransferState enum is well-designed with clear states for transfer operation outcomes. The naming follows the project's convention for constants (kUPPER_CASE prefix) as per coding guidelines.


289-291: LGTM!

The updated wait() signature with timeout support and TransferState return is a good improvement. The -1 default for indefinite wait follows common conventions. Existing derived classes will need to implement this new signature.

cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (1)

46-48: LGTM!

The updated wait signature with [[nodiscard]] TransferState return type and optional timeout_ms parameter correctly overrides the base interface defined in cpp/include/tensorrt_llm/executor/transferAgent.h. This enables callers to handle completion states explicitly and support timed waits.

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)

144-145: LGTM!

Correctly captures the TransferState return value from wait() and asserts success before proceeding with remote agent notification. This aligns with the updated API that now returns explicit completion states.

cpp/tensorrt_llm/pybind/CMakeLists.txt (1)

72-76: LGTM!

The conditional dependency correctly ensures the Python bindings module builds after TRANSFER_AGENT_BINDING_TARGET when NIXL is enabled. The if(TARGET ...) guard properly handles cases where the transfer agent binding target isn't defined.

tests/unittest/bindings/test_transfer_agent_bindings.py (4)

1-14: LGTM!

The conditional import pattern with try/except and pytest.mark.skipif is a clean approach for handling optional binding availability. This ensures the test suite runs gracefully when NIXL is not enabled.


17-47: LGTM!

The enum tests correctly verify the NIXL-specific MemoryType values (DRAM, VRAM, BLK, OBJ, FILE) which correspond to the nixl_mem_t types used by the transfer agent, distinct from executor::MemoryType. The distinctness assertions ensure enum values are properly defined.


49-95: LGTM!

Good coverage of MemoryDesc and MemoryDescs including edge cases (different values, empty list). The tests validate both construction and field accessors.


97-190: LGTM!

Comprehensive tests for AgentDesc, BaseAgentConfig, and TransferRequest covering construction patterns, field access, and read/write properties. The tests exercise both string and bytes inputs for AgentDesc which validates the binding's flexibility.

cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (2)

368-388: LGTM!

The branching logic correctly distinguishes between listen-thread and non-listen-thread configurations. The else branch appropriately clears mAddress and uses port 0, while both branches consistently use NIXL_THREAD_SYNC_DEFAULT and the new numWorkers parameter from BaseAgentConfig.


689-690: LGTM!

Correctly captures and asserts the TransferState return value from wait(). This aligns with the updated API semantics and provides explicit error handling for loopback transfer completion.

scripts/build_wheel.py (4)

371-373: LGTM: Parameter addition follows existing pattern.

The new nixl: bool parameter is properly positioned after flash_mla and before binding_lib_name, maintaining consistency with the function's parameter ordering convention (binding_type, venv_python, feature flags, then binding lib name).


414-423: LGTM: Stub generation logic is consistent with existing patterns.

The nixl stub generation correctly mirrors the existing deep_ep/flash_mla patterns:

  • Uses nanobind stubgen for nanobind binding type
  • Falls back to pybind11_stubgen for pybind binding type
  • Module name tensorrt_llm_transfer_agent_binding matches the CMake target

One observation: unlike the bindings module which uses -r flag for recursive stub generation with nanobind, this omits it—this is appropriate since the transfer agent binding is a flat module without submodules.


826-838: The nvda_nixl installation block depends on agent_binding_so presence.

The /opt/nvidia/nvda_nixl installation logic at lines 826-838 is nested inside the if agent_binding_so: block (line 819). This means the nixl libraries will only be installed if the agent binding .so exists. Verify this is intentional—it seems correct since both the nixl wrapper and agent binding depend on these libraries.


948-952: LGTM: Call site properly propagates nixl flag.

The updated call correctly passes nixl_root is not None as the nixl boolean argument, aligning with the function signature change.

cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cpp (4)

1-26: LGTM: File header and includes are correct.

The copyright header correctly shows 2025, and the include ordering follows conventions (project headers first, then third-party). The namespace aliases are clear and consistent.


32-49: LGTM: Enum bindings are clean and complete.

All three enums (MemoryType, TransferOp, TransferState) are properly bound with consistent naming conventions mapping kXXX C++ enum values to Python-friendly uppercase names.


93-102: Verify TransferStatus::wait return type matches Python expectations.

The binding exposes wait but doesn't specify the return type. Based on the C++ implementation in transferAgent.cpp:322-356, wait returns TransferState. Ensure Python callers understand this returns an enum value, not a boolean.


134-138: Ownership transfer via .release() is correctly handled.

Using .release() on the unique_ptr combined with py::return_value_policy::take_ownership correctly transfers ownership to Python. This prevents double-free issues.

cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp (5)

1-29: LGTM: File header and nanobind setup are correct.

Proper 2025 copyright, appropriate nanobind includes for STL type support, and consistent namespace aliases matching the pybind11 version.


67-83: AgentDesc bytes constructor uses placement new correctly.

The nanobind custom __init__ pattern with placement new (new (self) kvc::AgentDesc{...}) is the correct idiom for nanobind. The data.c_str() and data.size() calls are appropriate for nb::bytes.


102-106: GIL release on NixlTransferStatus methods is appropriate.

The is_completed and wait methods correctly release the GIL since they may involve blocking I/O operations on the underlying NIXL agent. This matches the pybind11 implementation.


108-124: BaseAgentConfig binding looks complete.

Both the default constructor and parameterized constructor (via custom __init__) are properly bound. The read-write properties expose all config fields correctly.


126-148: Verify API consistency between pybind11 and nanobind implementations.

Both binding files expose the same API surface—all methods are synchronized across implementations. Verified: identical method definitions (check_remote_descs, deregister_memory, get_local_agent_desc, get_local_connection_info, get_notified_sync_messages, invalidate_remote_agent, is_completed, load_remote_agent, load_remote_agent_by_connection, register_memory, wait) with only expected library-specific differences (py:: vs nb:: prefixes). Ensure they remain synchronized when making future changes.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28959 [ run ] triggered by Bot. Commit: d73af17

@tensorrt-cicd
Copy link
Collaborator

PR_Github #28959 [ run ] completed with state SUCCESS. Commit: d73af17
/LLM/main/L0_MergeRequest_PR pipeline #22186 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chuangz0 chuangz0 force-pushed the nixl_agent_py_binding branch from 2fcf3f9 to 92b8846 Compare December 18, 2025 12:52
@chuangz0
Copy link
Collaborator Author

/bot run --disable-fail-fast

@chuangz0 chuangz0 force-pushed the nixl_agent_py_binding branch from 92b8846 to 893e410 Compare December 19, 2025 01:43
@chuangz0
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29061 [ run ] triggered by Bot. Commit: 893e410

@chuangz0 chuangz0 force-pushed the nixl_agent_py_binding branch 2 times, most recently from b22bb74 to 2b31e6f Compare December 19, 2025 11:07
@chuangz0
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29138 [ run ] triggered by Bot. Commit: 2b31e6f

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29138 [ run ] completed with state SUCCESS. Commit: 2b31e6f
/LLM/main/L0_MergeRequest_PR pipeline #22345 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chuangz0 chuangz0 force-pushed the nixl_agent_py_binding branch from 2b31e6f to 9c96531 Compare December 20, 2025 01:46
@chuangz0
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29201 [ run ] triggered by Bot. Commit: 9c96531

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29201 [ run ] completed with state SUCCESS. Commit: 9c96531
/LLM/main/L0_MergeRequest_PR pipeline #22405 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chuangz0 chuangz0 force-pushed the nixl_agent_py_binding branch from 1a91d0a to f7c8e1d Compare December 22, 2025 02:57
@chuangz0
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29353 [ run ] triggered by Bot. Commit: f7c8e1d

@chuangz0 chuangz0 changed the title [TRTLLM-9527][feat] step.1 add transferAgent binding [TRTLLM-9527][feat] add transferAgent binding (step 1) Dec 22, 2025
@chuangz0 chuangz0 changed the title [TRTLLM-9527][feat] add transferAgent binding (step 1) [TRTLLM-9527][feat] Add transferAgent binding (step 1) Dec 22, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #29353 [ run ] completed with state SUCCESS. Commit: f7c8e1d
/LLM/main/L0_MergeRequest_PR pipeline #22543 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chuangz0 chuangz0 force-pushed the nixl_agent_py_binding branch from f7c8e1d to 0c40c21 Compare December 23, 2025 00:55
@chuangz0
Copy link
Collaborator Author

/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Ray-1,DGX_B200-8_GPUs-PyTorch-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29477 [ run ] triggered by Bot. Commit: 0c40c21

@chuangz0 chuangz0 force-pushed the nixl_agent_py_binding branch from 0c40c21 to 02e1252 Compare December 23, 2025 06:38
@chuangz0
Copy link
Collaborator Author

/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Ray-1,DGX_B200-8_GPUs-PyTorch-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29537 [ run ] triggered by Bot. Commit: 02e1252

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29537 [ run ] completed with state FAILURE. Commit: 02e1252
/LLM/main/L0_MergeRequest_PR pipeline #22712 (Partly Tested) completed with status: 'ABORTED'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chuangz0 chuangz0 force-pushed the nixl_agent_py_binding branch from 02e1252 to c748c45 Compare December 23, 2025 09:12
@chuangz0
Copy link
Collaborator Author

/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Ray-1,DGX_B200-8_GPUs-PyTorch-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29584 [ run ] triggered by Bot. Commit: c748c45

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29584 [ run ] completed with state SUCCESS. Commit: c748c45
/LLM/main/L0_MergeRequest_PR pipeline #22752 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
@chuangz0 chuangz0 force-pushed the nixl_agent_py_binding branch from c748c45 to fc35ee7 Compare December 24, 2025 01:19
@chuangz0
Copy link
Collaborator Author

/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Ray-1,DGX_B200-8_GPUs-PyTorch-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29678 [ run ] triggered by Bot. Commit: fc35ee7

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29678 [ run ] completed with state SUCCESS. Commit: fc35ee7
/LLM/main/L0_MergeRequest_PR pipeline #22796 (Partly Tested) completed with status: 'SUCCESS'

@chuangz0
Copy link
Collaborator Author

/bot skip --comment "all test has passed"

@chuangz0 chuangz0 requested a review from atrifex December 24, 2025 03:31
@tensorrt-cicd
Copy link
Collaborator

PR_Github #29713 [ skip ] triggered by Bot. Commit: fc35ee7

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29713 [ skip ] completed with state SUCCESS. Commit: fc35ee7
Skipping testing for commit fc35ee7

Copy link
Collaborator

@atrifex atrifex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on the agreement that no new dependencies have been added.

@chuangz0 chuangz0 merged commit 536a8f6 into NVIDIA:main Jan 6, 2026
5 checks passed
@Shixiaowei02 Shixiaowei02 added the Disaggregated serving <NV>Deploying with separated, distributed components (params, kv-cache, compute). Arch & perf. label Jan 7, 2026
videodanchik pushed a commit to videodanchik/TensorRT-LLM that referenced this pull request Jan 14, 2026
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>
@coderabbitai coderabbitai bot mentioned this pull request Jan 21, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Disaggregated serving <NV>Deploying with separated, distributed components (params, kv-cache, compute). Arch & perf.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants