[TRTLLM-9527][feat] Add transferAgent binding (step 1)#10113
[TRTLLM-9527][feat] Add transferAgent binding (step 1)#10113chuangz0 merged 6 commits intoNVIDIA:mainfrom
Conversation
|
/bot run |
📝 WalkthroughWalkthroughThis PR introduces Python bindings for the TensorRT-LLM Transfer Agent, enabling Python code to interact with cache transmission and memory management capabilities. It adds new enum types (TransferState), modifies the TransferStatus interface to support timed waits with state returns, extends configuration options (listen threads, worker count), and provides comprehensive pybind11 and nanobind binding implementations alongside supporting CMake, build script, and packaging updates. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (9)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txt (2)
72-74: INSTALL_RPATH is not needed per project conventions.Based on learnings, artifacts in TensorRT-LLM are manually copied rather than installed via
cmake --install, so only BUILD_RPATH affects the final artifacts. The INSTALL_RPATH setting can be removed.🔎 Suggested fix:
# Set RPATH for the module to find libtensorrt_llm_nixl_wrapper.so set_target_properties( - ${AGENT_BINDING_TARGET} PROPERTIES BUILD_RPATH "$ORIGIN;$ORIGIN/libs" - INSTALL_RPATH "$ORIGIN;$ORIGIN/libs") + ${AGENT_BINDING_TARGET} PROPERTIES BUILD_RPATH "$ORIGIN;$ORIGIN/libs")
62-69: Consider consolidating CMake directives for readability.Multiple
target_include_directoriesandtarget_link_librariescalls can be combined into single calls for cleaner code.🔎 Suggested consolidation:
# Add include directories - target_include_directories(${AGENT_BINDING_TARGET} PRIVATE NIXL::nixl) - target_include_directories(${AGENT_BINDING_TARGET} - PRIVATE ${PROJECT_SOURCE_DIR}/include) + target_include_directories(${AGENT_BINDING_TARGET} PRIVATE + NIXL::nixl + ${PROJECT_SOURCE_DIR}/include) # Link against NIXL wrapper and NIXL libraries - target_link_libraries(${AGENT_BINDING_TARGET} PRIVATE ${NIXL_WRAPPER_TARGET}) - target_link_libraries(${AGENT_BINDING_TARGET} PRIVATE NIXL::nixl) - target_link_libraries(${AGENT_BINDING_TARGET} PRIVATE CUDA::cudart) + target_link_libraries(${AGENT_BINDING_TARGET} PRIVATE + ${NIXL_WRAPPER_TARGET} + NIXL::nixl + CUDA::cudart)cpp/include/tensorrt_llm/executor/transferAgent.h (1)
293-300: Consider adding 'm' prefix to member variables per coding guidelines.The coding guidelines specify that class member variables should use camelCase prefixed with 'm' (e.g.,
mNumWorkers). However, the existing fields in this struct (mName,useProgThread,multiThread) are inconsistent -mNamehas the prefix while others don't. The new fields follow the same inconsistent pattern.For consistency within this struct, the current approach matches the existing code. You may want to address this inconsistency in a follow-up refactor.
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)
250-250: Consider using designated initializers for clarity.The
BaseAgentConfigbrace initialization with 5 positional arguments ({mAgentName, true, false, true, 1}) is correct but less readable. Consider using C++20 designated initializers or adding an inline comment to clarify each field:BaseAgentConfig config{mAgentName, /* useProgThread = */ true, /* multiThread = */ false, /* useListenThread = */ true, /* numWorkers = */ 1};cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (1)
323-357: LGTM with minor suggestion.The
wait()implementation correctly handles the three transfer states and timeout logic. The use ofsteady_clockis appropriate for elapsed time measurement.Consider adding a brief sleep (e.g.,
std::this_thread::sleep_for(std::chrono::microseconds(100))) instead of pureyield()to reduce CPU spinning during long waits, especially whentimeout_msis large or indefinite.scripts/build_wheel.py (1)
808-825: Verify agent binding glob pattern matches only one file.The glob pattern
tensorrt_llm_transfer_agent_binding*.socould match multiple files if there are leftover artifacts from previous builds with different Python versions (e.g.,tensorrt_llm_transfer_agent_binding.cpython-310-x86_64-linux-gnu.soandtensorrt_llm_transfer_agent_binding.cpython-311-x86_64-linux-gnu.so). Consider adding a check or cleanup.🔎 Suggested improvement:
agent_binding_so = list( nixl_utils_dir.glob("tensorrt_llm_transfer_agent_binding*.so")) if agent_binding_so: + if len(agent_binding_so) > 1: + print(f"Warning: Multiple agent binding .so files found: {agent_binding_so}, using first", file=sys.stderr) trtllm_dir = project_dir / "tensorrt_llm" install_file(agent_binding_so[0], trtllm_dir / agent_binding_so[0].name)cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cpp (2)
64-79: AgentDesc bytes handling is correct but uses raw Python C API.The lambda uses
PyBytes_AsStringandPyBytes_Sizedirectly. While functional, this bypasses pybind11's safer abstractions. Consider using pybind11'spy::bytesmethods for consistency.🔎 Alternative using pybind11 accessors:
py::class_<kvc::AgentDesc>(m, "AgentDesc") .def(py::init( [](py::bytes data) { - std::string str(PyBytes_AsString(data.ptr()), PyBytes_Size(data.ptr())); + std::string str = static_cast<std::string>(data); return kvc::AgentDesc{std::move(str)}; }), py::arg("backend_agent_desc"))
145-168: NixlTransferAgent duplicates BaseTransferAgent method bindings.The
NixlTransferAgentclass re-binds all methods already defined inBaseTransferAgent. SinceNixlTransferAgentinherits fromBaseTransferAgentin the binding definition (line 146), these methods should be inherited automatically. The only difference issubmit_transfer_requestswithpy::call_guard<py::gil_scoped_release>().Consider removing the duplicate method bindings and only override
submit_transfer_requests:🔎 Suggested simplification:
// NixlTransferAgent class py::class_<kvc::NixlTransferAgent, kvc::BaseTransferAgent>(m, "NixlTransferAgent") .def(py::init<kvc::BaseAgentConfig const&>(), py::arg("config")) - .def("register_memory", &kvc::NixlTransferAgent::registerMemory, py::arg("descs")) - .def("deregister_memory", &kvc::NixlTransferAgent::deregisterMemory, py::arg("descs")) - .def("load_remote_agent", - py::overload_cast<std::string const&, kvc::AgentDesc const&>(&kvc::NixlTransferAgent::loadRemoteAgent), - py::arg("name"), py::arg("agent_desc")) - .def("load_remote_agent_by_connection", - py::overload_cast<std::string const&, kvc::ConnectionInfoType const&>( - &kvc::NixlTransferAgent::loadRemoteAgent), - py::arg("name"), py::arg("connection_info")) - .def("get_local_agent_desc", &kvc::NixlTransferAgent::getLocalAgentDesc) - .def("get_local_connection_info", &kvc::NixlTransferAgent::getLocalConnectionInfo) - .def("invalidate_remote_agent", &kvc::NixlTransferAgent::invalidateRemoteAgent, py::arg("name")) .def( "submit_transfer_requests", [](kvc::NixlTransferAgent& self, kvc::TransferRequest const& request) { return self.submitTransferRequests(request).release(); }, - py::arg("request"), py::return_value_policy::take_ownership, py::call_guard<py::gil_scoped_release>()) - .def( - "notify_sync_message", &kvc::NixlTransferAgent::notifySyncMessage, py::arg("name"), py::arg("sync_message")) - .def("get_notified_sync_messages", &kvc::NixlTransferAgent::getNotifiedSyncMessages) - .def("check_remote_descs", &kvc::NixlTransferAgent::checkRemoteDescs, py::arg("name"), py::arg("memory_descs")); + py::arg("request"), py::return_value_policy::take_ownership, py::call_guard<py::gil_scoped_release>());However, if this explicit re-binding is intentional to ensure the GIL is released for all
NixlTransferAgentcalls (not justsubmit_transfer_requests), then consider addingpy::call_guard<py::gil_scoped_release>()to the other methods as well for consistency.cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp (1)
150-173: Same duplication concern as pybind11 version.The
NixlTransferAgentre-binds methods already inherited fromBaseTransferAgent. The same refactoring suggestion from the pybind11 review applies here.If the duplication is intentional to add
nb::call_guard<nb::gil_scoped_release>()to blocking operations, consider being explicit about which methods need GIL release. Currently onlysubmit_transfer_requestshas the guard, while other potentially blocking methods likeregister_memory,load_remote_agent, etc. do not.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
.gitignore(1 hunks)cpp/include/tensorrt_llm/executor/transferAgent.h(1 hunks)cpp/tensorrt_llm/CMakeLists.txt(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp(2 hunks)cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txt(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cpp(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp(6 hunks)cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h(1 hunks)cpp/tensorrt_llm/nanobind/CMakeLists.txt(1 hunks)cpp/tensorrt_llm/pybind/CMakeLists.txt(1 hunks)scripts/build_wheel.py(4 hunks)setup.py(1 hunks)tests/unittest/bindings/test_transfer_agent_bindings.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cpp,h,cu,cuh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.{cpp,h,cu,cuh}: Closing braces of namespaces should have a comment saying the namespace it closes:} // namespace foo
Preferconstorconstexprvariables over#definewhenever possible, as the latter are not visible to the compiler
A variable that is not modified after its initialization should be declared asconst
For naming of constants in C++, follow the naming section conventions
Except0(only used in comparison for checking signness/existence/emptiness) andnullptr,true,false, all other literals should only be used for variable initialization in C++
Use the Allman indentation style in C++
Put the semicolon for an emptyfororwhileloop in a new line in C++
The statement forming the body of aswitch,while,do .. whileorforstatement shall be a compound statement (use brace-delimited statements) in C++
If and else should always be followed by brace-delimited statements, even if empty or a single statement in C++
C++ filenames should use camel case with first letter lowercase:thisIsASubDirandthisIsAFilename.cpp
All files involved in the compilation of a compilation target (.exe/.so) must have filenames that are case-insensitive unique in C++
All types (including class names) in C++ should use camel case with uppercase first letter:FooBarClass
Local variables, methods and namespaces in C++ should use camel case with first letter lowercase:localFooBar
Non-magic-number global variables that are non-static and not defined in anonymous namespace in C++ should use camel case prefixed by a lower case 'g':gDontUseGlobalFoos
Non-magic-number global variables that are static or defined in an anonymous namespace in C++ should use camel case prefixed by a lower case 's':sMutableStaticGlobal
Locally visible static variables in C++ should use camel case with lowercase prefix 's' as the first letter:static std::once_flag sFlag;
Public, private and protected class member variables in C++ should use camel case prefi...
Files:
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.hcpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cppcpp/include/tensorrt_llm/executor/transferAgent.hcpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cppcpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cppcpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp
**/*.h
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.h: Use a preprocessor guard in C++ header files with the formatTRTLLM_<FILENAME>_Hderived from the filename in all caps
The preprocessor guard name in C++ must have prefixTRTLLM_followed by the filename, all in caps. Only use the file name, not directory names
Do not use prefix with underscore in C++ preprocessor guard symbols as such symbols are reserved in C++ standard for compilers or implementation
Do not use trailing underscore in C++ preprocessor guard symbols (unlike Google C++ guideline)
Files:
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.hcpp/include/tensorrt_llm/executor/transferAgent.h
**/*.{cpp,h,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the year of its latest meaningful modification
Files:
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.htests/unittest/bindings/test_transfer_agent_bindings.pycpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cppcpp/include/tensorrt_llm/executor/transferAgent.hcpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cppcpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cppsetup.pycpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cppscripts/build_wheel.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces. Do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used
Python files should use snake_case naming:some_file.py
Python classes should use PascalCase naming:class SomeClass
Python functions and methods should use snake_case naming:def my_awesome_function():
Python local variables should use snake_case naming:my_variable = ...
Python variable names that start with a number should be prefixed with 'k':k_99th_percentile = ...
Python global variables should use upper snake_case with prefix 'G':G_MY_GLOBAL = ...
Python constants should use upper snake_case naming:MY_CONSTANT = ...
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings in Python for classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except to the smallest set of errors possible
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible, using the else block for logic
Files:
tests/unittest/bindings/test_transfer_agent_bindings.pysetup.pyscripts/build_wheel.py
🧠 Learnings (20)
📓 Common learnings
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 7294
File: tensorrt_llm/_torch/modules/rms_norm.py:17-17
Timestamp: 2025-08-27T14:23:55.566Z
Learning: The TensorRT-LLM project requires Python 3.10+ as evidenced by the use of TypeAlias from typing module, match/case statements, and union type | syntax throughout the codebase, despite some documentation still mentioning Python 3.8+.
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 8398
File: tensorrt_llm/_torch/pyexecutor/sampling_utils.py:237-272
Timestamp: 2025-10-17T13:21:31.724Z
Learning: The setup.py file in TensorRT-LLM explicitly requires Python 3.10+ via `python_requires=">=3.10, <4"`, making match/case statements and other Python 3.10+ features appropriate throughout the codebase.
📚 Learning: 2025-08-25T08:47:24.758Z
Learnt from: tshmilnvidia
Repo: NVIDIA/TensorRT-LLM PR: 5488
File: cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp:531-547
Timestamp: 2025-08-25T08:47:24.758Z
Learning: For NIXL API postXferReq function in NixlLoopbackAgent::submitLoopbackRequests, the expected return status is exactly NIXL_IN_PROG, not NIXL_SUCCESS. The strict check `TLLM_CHECK(status == NIXL_IN_PROG)` is correct and intentional for this specific API call.
Applied to files:
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.hcpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp
📚 Learning: 2025-09-09T18:31:44.336Z
Learnt from: venkywonka
Repo: NVIDIA/TensorRT-LLM PR: 7658
File: .github/CODEOWNERS:160-164
Timestamp: 2025-09-09T18:31:44.336Z
Learning: The ruleset for `release/**` branch patterns in the NVIDIA/TensorRT-LLM repository covers NIM-specific release branches like `release/1.0.1-NIM`, ensuring proper code ownership enforcement.
Applied to files:
.gitignore
📚 Learning: 2025-08-18T09:08:07.687Z
Learnt from: tongyuantongyu
Repo: NVIDIA/TensorRT-LLM PR: 6984
File: cpp/tensorrt_llm/CMakeLists.txt:297-299
Timestamp: 2025-08-18T09:08:07.687Z
Learning: In the TensorRT-LLM project, artifacts are manually copied rather than installed via `cmake --install`, so INSTALL_RPATH properties are not needed - only BUILD_RPATH affects the final artifacts.
Applied to files:
.gitignorecpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txtcpp/tensorrt_llm/nanobind/CMakeLists.txtcpp/tensorrt_llm/CMakeLists.txtsetup.pycpp/tensorrt_llm/pybind/CMakeLists.txtscripts/build_wheel.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
Repo: NVIDIA/TensorRT-LLM PR: 6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
.gitignoretests/unittest/bindings/test_transfer_agent_bindings.pysetup.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
Repo: NVIDIA/TensorRT-LLM PR: 6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
.gitignoresetup.py
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.
Applied to files:
.gitignoretests/unittest/bindings/test_transfer_agent_bindings.pysetup.py
📚 Learning: 2025-09-16T09:30:09.716Z
Learnt from: tongyuantongyu
Repo: NVIDIA/TensorRT-LLM PR: 7763
File: cpp/tensorrt_llm/CMakeLists.txt:297-301
Timestamp: 2025-09-16T09:30:09.716Z
Learning: In the TensorRT-LLM project, NCCL libraries are loaded earlier by PyTorch libraries or the bindings library, so the main shared library doesn't need NCCL paths in its RPATH - the libraries will already be available in the process address space when needed.
Applied to files:
.gitignorecpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txtcpp/tensorrt_llm/nanobind/CMakeLists.txtcpp/tensorrt_llm/CMakeLists.txtsetup.pyscripts/build_wheel.py
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
.gitignoretests/unittest/bindings/test_transfer_agent_bindings.py
📚 Learning: 2025-08-11T20:09:24.389Z
Learnt from: achartier
Repo: NVIDIA/TensorRT-LLM PR: 6763
File: tests/integration/defs/triton_server/conftest.py:16-22
Timestamp: 2025-08-11T20:09:24.389Z
Learning: In the TensorRT-LLM test infrastructure, the team prefers simple, direct solutions (like hard-coding directory traversal counts) over more complex but robust approaches when dealing with stable directory structures. They accept the maintenance cost of updating tests if the layout changes.
Applied to files:
.gitignore
📚 Learning: 2025-08-21T21:48:35.135Z
Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 7104
File: cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_scatter.hpp:399-417
Timestamp: 2025-08-21T21:48:35.135Z
Learning: CUTLASS extensions in TensorRT-LLM (located under cpp/tensorrt_llm/cutlass_extensions/) are designed to integrate with and extend functionality in the external CUTLASS repository. When analyzing these extensions, their consumers and functionality wiring may exist in the CUTLASS codebase rather than within TensorRT-LLM itself.
Applied to files:
.gitignorecpp/tensorrt_llm/CMakeLists.txt
📚 Learning: 2025-08-27T14:23:55.566Z
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 7294
File: tensorrt_llm/_torch/modules/rms_norm.py:17-17
Timestamp: 2025-08-27T14:23:55.566Z
Learning: The TensorRT-LLM project requires Python 3.10+ as evidenced by the use of TypeAlias from typing module, match/case statements, and union type | syntax throughout the codebase, despite some documentation still mentioning Python 3.8+.
Applied to files:
.gitignore
📚 Learning: 2025-08-14T15:38:01.771Z
Learnt from: MatthiasKohl
Repo: NVIDIA/TensorRT-LLM PR: 6904
File: cpp/tensorrt_llm/pybind/thop/bindings.cpp:55-57
Timestamp: 2025-08-14T15:38:01.771Z
Learning: In TensorRT-LLM Python bindings, tensor parameter collections like mla_tensor_params and spec_decoding_tensor_params are kept as required parameters without defaults to maintain API consistency, even when it might affect backward compatibility.
Applied to files:
.gitignore
📚 Learning: 2025-08-29T14:07:45.863Z
Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7370
File: tests/unittest/trt/model_api/test_model_quantization.py:24-27
Timestamp: 2025-08-29T14:07:45.863Z
Learning: In TensorRT-LLM's CI infrastructure, pytest skip markers (pytest.mark.skip) are properly honored even when test files have __main__ blocks that call test functions directly. The testing system correctly skips tests without requiring modifications to the __main__ block execution pattern.
Applied to files:
tests/unittest/bindings/test_transfer_agent_bindings.py
📚 Learning: 2025-09-23T15:01:00.070Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:15-17
Timestamp: 2025-09-23T15:01:00.070Z
Learning: In TensorRT-LLM NCCL device kernels, the <sstream> header is not needed as an explicit include in config.cu because it's provided transitively through other headers. Local compilation testing confirms this works without the explicit include.
Applied to files:
cpp/tensorrt_llm/nanobind/CMakeLists.txtcpp/tensorrt_llm/CMakeLists.txt
📚 Learning: 2025-09-23T15:12:38.312Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/thop/allreduceOp.cpp:352-446
Timestamp: 2025-09-23T15:12:38.312Z
Learning: In TensorRT-LLM NCCL device implementation, NCCL version 2.28+ requirements are handled at runtime in the nccl_device/config layer rather than with compile-time guards. This allows the allreduceOp to remain version-agnostic and delegates version compatibility validation to the appropriate lower-level components that can gracefully handle unsupported configurations.
Applied to files:
cpp/tensorrt_llm/nanobind/CMakeLists.txt
📚 Learning: 2025-09-02T13:42:44.885Z
Learnt from: pcastonguay
Repo: NVIDIA/TensorRT-LLM PR: 7455
File: tensorrt_llm/_torch/pyexecutor/py_executor.py:1852-1860
Timestamp: 2025-09-02T13:42:44.885Z
Learning: In MPI communication within TensorRT-LLM pipeline parallelism, different communication types (tokens, logits, termination sync) must use disjoint tag namespaces to avoid message routing collisions when using the same source/destination patterns.
Applied to files:
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp
📚 Learning: 2025-09-23T15:01:00.070Z
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/config.cu:15-17
Timestamp: 2025-09-23T15:01:00.070Z
Learning: In TensorRT-LLM NCCL device kernels (cpp/tensorrt_llm/kernels/nccl_device/config.cu), std::ostringstream is used but <sstream> doesn't need to be explicitly included because it's provided transitively through other headers like tensorrt_llm/common/cudaUtils.h or config.h. Local compilation testing confirms this works without the explicit include.
Applied to files:
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp
📚 Learning: 2025-10-17T13:21:31.724Z
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 8398
File: tensorrt_llm/_torch/pyexecutor/sampling_utils.py:237-272
Timestamp: 2025-10-17T13:21:31.724Z
Learning: The setup.py file in TensorRT-LLM explicitly requires Python 3.10+ via `python_requires=">=3.10, <4"`, making match/case statements and other Python 3.10+ features appropriate throughout the codebase.
Applied to files:
setup.py
📚 Learning: 2025-08-25T08:48:39.694Z
Learnt from: tshmilnvidia
Repo: NVIDIA/TensorRT-LLM PR: 5488
File: cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp:507-517
Timestamp: 2025-08-25T08:48:39.694Z
Learning: When reviewing potential compilation errors related to function parameter types, always check for type aliases using `using` declarations that might make seemingly incompatible types equivalent. RegisterDescs = MemoryDescs in TensorRT-LLM's transfer agent code.
Applied to files:
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp
🧬 Code graph analysis (7)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (2)
cpp/include/tensorrt_llm/executor/transferAgent.h (7)
nodiscard(74-82)nodiscard(103-111)nodiscard(157-165)nodiscard(244-252)nodiscard(254-262)nodiscard(289-354)nodiscard(371-377)cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (11)
nodiscard(201-212)nodiscard(214-217)nodiscard(219-227)nodiscard(229-237)nodiscard(239-247)nodiscard(249-257)nodiscard(259-267)nodiscard(359-362)nodiscard(461-494)wait(323-357)wait(323-323)
tests/unittest/bindings/test_transfer_agent_bindings.py (2)
cpp/include/tensorrt_llm/executor/types.h (1)
MemoryType(165-646)cpp/include/tensorrt_llm/executor/transferAgent.h (5)
TransferOp(218-275)TransferState(277-418)MemoryDescs(94-99)AgentDesc(199-420)TransferRequest(225-240)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)
cpp/tests/unit_tests/executor/transferAgentTest.cpp (2)
config(64-64)config(378-378)
cpp/include/tensorrt_llm/executor/transferAgent.h (2)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (13)
nodiscard(201-212)nodiscard(214-217)nodiscard(219-227)nodiscard(229-237)nodiscard(239-247)nodiscard(249-257)nodiscard(259-267)nodiscard(359-362)nodiscard(461-494)nodiscard(504-513)isCompleted(359-359)wait(323-357)wait(323-323)cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (1)
nodiscard(46-51)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (2)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (1)
mHandle(52-52)cpp/tests/unit_tests/executor/transferAgentTest.cpp (2)
config(64-64)config(378-378)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp (2)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (25)
wait(323-357)wait(323-323)registerMemory(420-429)registerMemory(420-420)registerMemory(596-603)registerMemory(596-596)deregisterMemory(431-436)deregisterMemory(431-431)deregisterMemory(605-612)deregisterMemory(605-605)loadRemoteAgent(438-446)loadRemoteAgent(438-438)loadRemoteAgent(520-554)loadRemoteAgent(520-520)getLocalAgentDesc(448-454)getLocalAgentDesc(448-448)invalidateRemoteAgent(456-459)invalidateRemoteAgent(456-456)notifySyncMessage(496-502)notifySyncMessage(496-496)getNotifiedSyncMessages(504-504)getLocalConnectionInfo(515-518)getLocalConnectionInfo(515-515)checkRemoteDescs(556-562)checkRemoteDescs(556-556)cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (2)
getLocalAgentDesc(67-81)getLocalConnectionInfo(87-95)
scripts/build_wheel.py (1)
tests/integration/defs/cpp/conftest.py (1)
build_dir(28-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (29)
cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/CMakeLists.txt (2)
42-57: LGTM!The conditional binding module creation based on
BINDING_TYPEis well-structured, and the status messages provide good build-time visibility.
78-80: LGTM!Exporting
AGENT_BINDING_TARGETto parent scope correctly enables the main library to establish build dependencies on the transfer agent binding module.cpp/tensorrt_llm/nanobind/CMakeLists.txt (1)
69-73: LGTM!The conditional dependency correctly ensures the transfer agent binding module is built before the main nanobind module when NIXL is enabled. The
if(TARGET ...)guard appropriately handles cases where NIXL is not configured.cpp/tensorrt_llm/CMakeLists.txt (1)
158-161: LGTM!The
TRANSFER_AGENT_BINDING_TARGETvariable is correctly defined within theNIXL_ROOTblock, ensuring it's only set when NIXL support is being built. The naming is consistent with the target defined in the nixl_utils subdirectory.setup.py (1)
116-118: LGTM!The new transfer agent binding artifacts are correctly added to the package data, ensuring they will be included in the wheel distribution. The glob patterns appropriately capture both the shared library (with platform-specific suffix) and the type stub file.
cpp/include/tensorrt_llm/executor/transferAgent.h (2)
277-282: LGTM!The
TransferStateenum is well-designed with clear states for transfer operation outcomes. The naming follows the project's convention for constants (kUPPER_CASE prefix) as per coding guidelines.
289-291: LGTM!The updated
wait()signature with timeout support andTransferStatereturn is a good improvement. The-1default for indefinite wait follows common conventions. Existing derived classes will need to implement this new signature.cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.h (1)
46-48: LGTM!The updated
waitsignature with[[nodiscard]] TransferStatereturn type and optionaltimeout_msparameter correctly overrides the base interface defined incpp/include/tensorrt_llm/executor/transferAgent.h. This enables callers to handle completion states explicitly and support timed waits.cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)
144-145: LGTM!Correctly captures the
TransferStatereturn value fromwait()and asserts success before proceeding with remote agent notification. This aligns with the updated API that now returns explicit completion states.cpp/tensorrt_llm/pybind/CMakeLists.txt (1)
72-76: LGTM!The conditional dependency correctly ensures the Python bindings module builds after
TRANSFER_AGENT_BINDING_TARGETwhen NIXL is enabled. Theif(TARGET ...)guard properly handles cases where the transfer agent binding target isn't defined.tests/unittest/bindings/test_transfer_agent_bindings.py (4)
1-14: LGTM!The conditional import pattern with
try/exceptandpytest.mark.skipifis a clean approach for handling optional binding availability. This ensures the test suite runs gracefully when NIXL is not enabled.
17-47: LGTM!The enum tests correctly verify the NIXL-specific
MemoryTypevalues (DRAM, VRAM, BLK, OBJ, FILE) which correspond to thenixl_mem_ttypes used by the transfer agent, distinct fromexecutor::MemoryType. The distinctness assertions ensure enum values are properly defined.
49-95: LGTM!Good coverage of
MemoryDescandMemoryDescsincluding edge cases (different values, empty list). The tests validate both construction and field accessors.
97-190: LGTM!Comprehensive tests for
AgentDesc,BaseAgentConfig, andTransferRequestcovering construction patterns, field access, and read/write properties. The tests exercise both string and bytes inputs forAgentDescwhich validates the binding's flexibility.cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cpp (2)
368-388: LGTM!The branching logic correctly distinguishes between listen-thread and non-listen-thread configurations. The
elsebranch appropriately clearsmAddressand uses port 0, while both branches consistently useNIXL_THREAD_SYNC_DEFAULTand the newnumWorkersparameter fromBaseAgentConfig.
689-690: LGTM!Correctly captures and asserts the
TransferStatereturn value fromwait(). This aligns with the updated API semantics and provides explicit error handling for loopback transfer completion.scripts/build_wheel.py (4)
371-373: LGTM: Parameter addition follows existing pattern.The new
nixl: boolparameter is properly positioned afterflash_mlaand beforebinding_lib_name, maintaining consistency with the function's parameter ordering convention (binding_type, venv_python, feature flags, then binding lib name).
414-423: LGTM: Stub generation logic is consistent with existing patterns.The nixl stub generation correctly mirrors the existing deep_ep/flash_mla patterns:
- Uses nanobind stubgen for nanobind binding type
- Falls back to pybind11_stubgen for pybind binding type
- Module name
tensorrt_llm_transfer_agent_bindingmatches the CMake targetOne observation: unlike the
bindingsmodule which uses-rflag for recursive stub generation with nanobind, this omits it—this is appropriate since the transfer agent binding is a flat module without submodules.
826-838: The nvda_nixl installation block depends on agent_binding_so presence.The
/opt/nvidia/nvda_nixlinstallation logic at lines 826-838 is nested inside theif agent_binding_so:block (line 819). This means the nixl libraries will only be installed if the agent binding.soexists. Verify this is intentional—it seems correct since both the nixl wrapper and agent binding depend on these libraries.
948-952: LGTM: Call site properly propagates nixl flag.The updated call correctly passes
nixl_root is not Noneas thenixlboolean argument, aligning with the function signature change.cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsPybind.cpp (4)
1-26: LGTM: File header and includes are correct.The copyright header correctly shows 2025, and the include ordering follows conventions (project headers first, then third-party). The namespace aliases are clear and consistent.
32-49: LGTM: Enum bindings are clean and complete.All three enums (MemoryType, TransferOp, TransferState) are properly bound with consistent naming conventions mapping
kXXXC++ enum values to Python-friendly uppercase names.
93-102: Verify TransferStatus::wait return type matches Python expectations.The binding exposes
waitbut doesn't specify the return type. Based on the C++ implementation intransferAgent.cpp:322-356,waitreturnsTransferState. Ensure Python callers understand this returns an enum value, not a boolean.
134-138: Ownership transfer via.release()is correctly handled.Using
.release()on theunique_ptrcombined withpy::return_value_policy::take_ownershipcorrectly transfers ownership to Python. This prevents double-free issues.cpp/tensorrt_llm/executor/cache_transmission/nixl_utils/agentBindingsNanobind.cpp (5)
1-29: LGTM: File header and nanobind setup are correct.Proper 2025 copyright, appropriate nanobind includes for STL type support, and consistent namespace aliases matching the pybind11 version.
67-83: AgentDesc bytes constructor uses placement new correctly.The nanobind custom
__init__pattern with placement new (new (self) kvc::AgentDesc{...}) is the correct idiom for nanobind. Thedata.c_str()anddata.size()calls are appropriate fornb::bytes.
102-106: GIL release on NixlTransferStatus methods is appropriate.The
is_completedandwaitmethods correctly release the GIL since they may involve blocking I/O operations on the underlying NIXL agent. This matches the pybind11 implementation.
108-124: BaseAgentConfig binding looks complete.Both the default constructor and parameterized constructor (via custom
__init__) are properly bound. The read-write properties expose all config fields correctly.
126-148: Verify API consistency between pybind11 and nanobind implementations.Both binding files expose the same API surface—all methods are synchronized across implementations. Verified: identical method definitions (check_remote_descs, deregister_memory, get_local_agent_desc, get_local_connection_info, get_notified_sync_messages, invalidate_remote_agent, is_completed, load_remote_agent, load_remote_agent_by_connection, register_memory, wait) with only expected library-specific differences (py:: vs nb:: prefixes). Ensure they remain synchronized when making future changes.
|
PR_Github #28959 [ run ] triggered by Bot. Commit: |
|
PR_Github #28959 [ run ] completed with state
|
2fcf3f9 to
92b8846
Compare
|
/bot run --disable-fail-fast |
92b8846 to
893e410
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #29061 [ run ] triggered by Bot. Commit: |
b22bb74 to
2b31e6f
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #29138 [ run ] triggered by Bot. Commit: |
|
PR_Github #29138 [ run ] completed with state
|
2b31e6f to
9c96531
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #29201 [ run ] triggered by Bot. Commit: |
|
PR_Github #29201 [ run ] completed with state
|
1a91d0a to
f7c8e1d
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #29353 [ run ] triggered by Bot. Commit: |
|
PR_Github #29353 [ run ] completed with state
|
f7c8e1d to
0c40c21
Compare
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Ray-1,DGX_B200-8_GPUs-PyTorch-1" |
|
PR_Github #29477 [ run ] triggered by Bot. Commit: |
0c40c21 to
02e1252
Compare
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Ray-1,DGX_B200-8_GPUs-PyTorch-1" |
|
PR_Github #29537 [ run ] triggered by Bot. Commit: |
|
PR_Github #29537 [ run ] completed with state
|
02e1252 to
c748c45
Compare
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Ray-1,DGX_B200-8_GPUs-PyTorch-1" |
|
PR_Github #29584 [ run ] triggered by Bot. Commit: |
|
PR_Github #29584 [ run ] completed with state
|
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
c748c45 to
fc35ee7
Compare
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Ray-1,DGX_B200-8_GPUs-PyTorch-1" |
|
PR_Github #29678 [ run ] triggered by Bot. Commit: |
|
PR_Github #29678 [ run ] completed with state |
|
/bot skip --comment "all test has passed" |
|
PR_Github #29713 [ skip ] triggered by Bot. Commit: |
|
PR_Github #29713 [ skip ] completed with state |
atrifex
left a comment
There was a problem hiding this comment.
Approving on the agreement that no new dependencies have been added.
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>
@coderabbitai summary
Description
For python cache transceiver, we need nixl binding . This PR add transferAgent binding , which is separated from other binding.
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.