[TRTLLM-10827][feat] Add KV Cache metrics to MetricsCollector for more Prometheus metrics#11243

yijingl-nvidia · 2026-02-03T22:34:06Z

Description

Current TRTLLM exposes too few Prometheus metrics. This PR adds KV cache utilization and hit rate metrics to Prometheus metrics. It will benefit Dynamo in monitoring TRTLLM.

Minor clean-up:

Rename a var in kvCacheManager to confirm to naming convention. Add more comments.
Add more comments in MetricsCollector in tensorrt_llm/metrics/collector.py.
Refactor some functions in MetricsCollector since the added functionality requires some existing functions to be renamed or refactored.

Test Coverage

Modified tests/unittest/llmapi/apps/_test_openai_prometheus.py to cover the change

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

New Features
- Added KV cache performance metrics: hit rate and utilization tracking via Prometheus gauges.
- Enabled per-iteration performance statistics collection for more granular monitoring and diagnostics.
Improvements
- Enhanced metrics collection with better support for request-level and iteration-level performance data.

coderabbitai · 2026-02-03T22:45:12Z

📝 Walkthrough

Walkthrough

This PR adds Prometheus Gauge metrics for KV cache statistics and refactors Python metrics logging with new public methods. It also renames a C++ member variable for naming consistency and updates related integration points.

Changes

Cohort / File(s)	Summary
C++ KVCacheManager Member Rename `cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`, `cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`	Renamed private member variable `reusedBlockIds` to `mReusedBlockIds` for naming convention consistency. Updated all references in the implementation file.
Python Metrics Collector Enhancement `tensorrt_llm/metrics/collector.py`	Added two new Prometheus Gauge metrics (`kv_cache_hit_rate` and `kv_cache_utilization`). Introduced public methods `log_request_metrics_dict()` and `log_iteration_stats()` to replace legacy logging paths. Added internal helper `_log_gauge()` for gauge metric updates.
OpenAI Server Metrics Integration `tensorrt_llm/serve/openai_server.py`	Updated `_extract_metrics()` to use new `log_request_metrics_dict()` method. Added async iteration of `get_stats_async()` with conditional logging via `log_iteration_stats()` when `enable_iter_perf_stats` is enabled.
Prometheus Metrics Test `tests/unittest/llmapi/apps/_test_openai_prometheus.py`	Added `enable_iter_perf_stats` to test configuration. Updated test to trigger multiple completions and extended assertions to validate new `kv_cache_hit_rate` and `kv_cache_utilization` metrics.

Sequence Diagram

sequenceDiagram
    actor Client
    participant OpenAIServer
    participant MetricsCollector
    participant Prometheus

    Client->>OpenAIServer: request completion
    OpenAIServer->>OpenAIServer: process request
    OpenAIServer->>MetricsCollector: log_request_metrics_dict(metrics_dict)
    MetricsCollector->>Prometheus: emit histograms & counter
    
    alt enable_iter_perf_stats enabled
        OpenAIServer->>OpenAIServer: get_stats_async(timeout=2.0)
        loop for each iteration stat
            OpenAIServer->>MetricsCollector: log_iteration_stats(llm_stat)
            MetricsCollector->>Prometheus: update kv_cache_hit_rate gauge
            MetricsCollector->>Prometheus: update kv_cache_utilization gauge
        end
    end
    
    Client->>Client: receive response with metrics

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding KV cache metrics to MetricsCollector for Prometheus. It matches the changeset which adds kv_cache_hit_rate and kv_cache_utilization gauge metrics.
Description check	✅ Passed	The PR description clearly explains the objectives (adding KV cache metrics to Prometheus), provides test coverage details, and includes the completed checklist.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

✨ 136.113.208.247/32 (new)
34.170.211.100/32
35.222.179.152/32

Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h (1)
1-2: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to reflect the 2026 modification.
🔧 Suggested header update
- * Copyright (c) 2022-2024, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2022-2026, NVIDIA CORPORATION.  All rights reserved.
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp (1)
1-3: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to include 2026 changes.
🔧 Suggested header update
- * SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
tensorrt_llm/serve/openai_server.py (1)
1-2: ⚠️ Potential issue | 🟡 Minor

Add NVIDIA copyright header for 2026 modifications.
🔧 Suggested header addition
 #!/usr/bin/env python
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
tests/unittest/llmapi/apps/_test_openai_prometheus.py (2)
1-1: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA copyright header.

This file is missing the required NVIDIA copyright header with the year of latest meaningful modification (2026). Please add the standard header used in this repo at the top of the file.
As per coding guidelines, "All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification."

53-68: ⚠️ Potential issue | 🟡 Minor

Use equality check for HTTP status.

Line 68 uses is for integer comparison, which is identity-based. Use == for value comparison instead.
Proposed fix
-    assert response.status is 200
+    assert response.status == 200

🤖 Fix all issues with AI agents

In `@tensorrt_llm/metrics/collector.py`:
- Line 1: Add the required NVIDIA copyright header (with the latest modification
year) at the very top of this file, above the existing module docstring (the
string """Utilities for Prometheus Metrics Collection."""), so the header
precedes the docstring and contains the standard NVIDIA copyright notice and
year of last meaningful change.
- Around line 116-139: Update the docstring of log_request_metrics_dict to
Google-style: add an Args section describing metrics_dict and list expected keys
(MetricsCollector.labelname_finish_reason as finish reason string,
MetricNames.E2E, MetricNames.TTFT, MetricNames.TPOT,
MetricNames.REQUEST_QUEUE_TIME as floats in seconds), add a Returns section
stating None (metrics are logged to Prometheus), and include a short Note that
metrics are only recorded when MetricsCollector.labelname_finish_reason is
present; keep the existing high-level description and mention that
histograms/counters (histogram_e2e_time_request, histogram_time_to_first_token,
histogram_time_per_output_token, histogram_queue_time_request,
counter_request_success) are updated.
- Around line 89-96: Rename the misspelled metric attribute kv_cahce_hit_rate to
kv_cache_hit_rate wherever defined/used (e.g., in the Gauge creation and any
references), and change truthy checks on cacheHitRate to explicit existence
checks (e.g., cacheHitRate is not None) so zero values are allowed;
additionally, before computing utilization using maxNumBlocks, guard the
division by ensuring maxNumBlocks is truthy/non-zero (or skip/update metric when
maxNumBlocks == 0) when setting kv_cache_utilization. Ensure you update all
references to these symbols (kv_cahce_hit_rate, kv_cache_hit_rate, cacheHitRate,
maxNumBlocks, kv_cache_utilization) consistently.

🧹 Nitpick comments (2)

tensorrt_llm/metrics/collector.py (1)

4-4: Keep typing imports namespaced.

♻️ Suggested refactor

-import time
-from typing import Dict, Union
+import time
+import typing
@@
-    def __init__(self, labels: Dict[str, str]) -> None:
+    def __init__(self, labels: typing.Dict[str, str]) -> None:
@@
-    def _label_merge(self, labels: Dict[str, str]) -> Dict[str, str]:
+    def _label_merge(self, labels: typing.Dict[str, str]) -> typing.Dict[str, str]:
@@
-    def _log_counter(self, counter, labels: Dict[str, str],
-                     data: Union[int, float]) -> None:
+    def _log_counter(self, counter, labels: typing.Dict[str, str],
+                     data: typing.Union[int, float]) -> None:
@@
-    def _log_histogram(self, histogram, data: Union[int, float]) -> None:
+    def _log_histogram(self, histogram, data: typing.Union[int, float]) -> None:
@@
-    def _log_gauge(self, gauge, data: Union[int, float]) -> None:
+    def _log_gauge(self, gauge, data: typing.Union[int, float]) -> None:

As per coding guidelines: Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.

tensorrt_llm/serve/openai_server.py (1)

468-475: Avoid blocking request finalization while draining iteration stats.

get_stats_async(timeout=2.0) can add up to the timeout in latency per request. Consider draining in a background task so response completion isn’t delayed.

♻️ Suggested refactor

         if self.metrics_collector:
             self.metrics_collector.log_request_metrics_dict(res.metrics_dict)
             if self.llm.args.enable_iter_perf_stats:
-                async for llm_stat in self.llm.get_stats_async(timeout=2.0):
-                    self.metrics_collector.log_iteration_stats(llm_stat)
+                async def _drain_iter_stats():
+                    async for llm_stat in self.llm.get_stats_async(timeout=2.0):
+                        self.metrics_collector.log_iteration_stats(llm_stat)
+                asyncio.create_task(_drain_iter_stats())

tensorrt_llm/metrics/collector.py

yijingl-nvidia · 2026-02-03T23:45:27Z

Addressing comments from coderabbit

tensorrt_llm/serve/openai_server.py

eopXD

KV cache manager-related changes look good to me.
Please wait for @kaiyux and other experts for response regarding GIL/performance.

yijingl-nvidia · 2026-02-12T04:32:30Z

/bot run

tensorrt-cicd · 2026-02-12T04:39:17Z

PR_Github #35717 [ run ] triggered by Bot. Commit: 337d84f

tensorrt-cicd · 2026-02-12T07:27:44Z

PR_Github #35717 [ run ] completed with state SUCCESS. Commit: 337d84f
/LLM/main/L0_MergeRequest_PR pipeline #27585 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yijingl-nvidia · 2026-02-12T22:29:27Z

/bot run

tensorrt-cicd · 2026-02-12T22:36:58Z

PR_Github #35832 [ run ] triggered by Bot. Commit: 337d84f

tensorrt-cicd · 2026-02-13T00:27:02Z

PR_Github #35832 [ run ] completed with state FAILURE. Commit: 337d84f
/LLM/main/L0_MergeRequest_PR pipeline #27675 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yijingl-nvidia · 2026-02-13T00:58:41Z

/bot run

tensorrt-cicd · 2026-02-13T01:04:34Z

PR_Github #35846 [ run ] triggered by Bot. Commit: 0410f59

tensorrt-cicd · 2026-02-13T05:52:37Z

PR_Github #35846 [ run ] completed with state SUCCESS. Commit: 0410f59
/LLM/main/L0_MergeRequest_PR pipeline #27686 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yijingl-nvidia · 2026-02-13T16:42:03Z

/bot run

tensorrt-cicd · 2026-02-13T16:48:36Z

PR_Github #35923 [ run ] triggered by Bot. Commit: 0410f59

tensorrt-cicd · 2026-02-13T21:27:40Z

PR_Github #35923 [ run ] completed with state SUCCESS. Commit: 0410f59
/LLM/main/L0_MergeRequest_PR pipeline #27742 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yijingl-nvidia · 2026-02-14T00:04:31Z

/bot run

tensorrt-cicd · 2026-02-14T00:10:14Z

PR_Github #35952 [ run ] triggered by Bot. Commit: 5f3b274

tensorrt-cicd · 2026-02-14T04:03:17Z

PR_Github #35952 [ run ] completed with state SUCCESS. Commit: 5f3b274
/LLM/main/L0_MergeRequest_PR pipeline #27767 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yijingl-nvidia · 2026-02-16T16:43:27Z

/bot run

tensorrt-cicd · 2026-02-16T16:49:47Z

PR_Github #36056 [ run ] triggered by Bot. Commit: d369966

tensorrt-cicd · 2026-02-16T20:29:11Z

PR_Github #36056 [ run ] completed with state SUCCESS. Commit: d369966
/LLM/main/L0_MergeRequest_PR pipeline #27862 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

brb-nv · 2026-02-17T17:29:32Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-17T17:35:22Z

PR_Github #36095 [ run ] triggered by Bot. Commit: d369966

tensorrt-cicd · 2026-02-17T20:15:51Z

PR_Github #36095 [ run ] completed with state SUCCESS. Commit: d369966
/LLM/main/L0_MergeRequest_PR pipeline #27891 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Yijing Li <257409031+yijingl-nvidia@users.noreply.github.com>

Add kv cache stats in MetricsCollector. Add feature in OpenAIServer to collect iteration stats and upload them to Prometheus using MetricsCollector. Signed-off-by: Yijing Li <257409031+yijingl-nvidia@users.noreply.github.com>

Signed-off-by: Yijing Li <257409031+yijingl-nvidia@users.noreply.github.com>

yijingl-nvidia · 2026-02-18T01:08:02Z

/bot run

tensorrt-cicd · 2026-02-18T01:13:56Z

PR_Github #36122 [ run ] triggered by Bot. Commit: c08a603

tensorrt-cicd · 2026-02-18T08:43:21Z

PR_Github #36122 [ run ] completed with state SUCCESS. Commit: c08a603
/LLM/main/L0_MergeRequest_PR pipeline #27912 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

pcastonguay · 2026-02-18T14:59:07Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-18T15:05:53Z

PR_Github #36154 [ run ] triggered by Bot. Commit: c08a603

tensorrt-cicd · 2026-02-18T18:31:11Z

PR_Github #36154 [ run ] completed with state SUCCESS. Commit: c08a603
/LLM/main/L0_MergeRequest_PR pipeline #27942 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

pcastonguay · 2026-02-18T19:24:13Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-18T19:30:46Z

PR_Github #36175 [ run ] triggered by Bot. Commit: c08a603 Link to invocation

tensorrt-cicd · 2026-02-18T22:15:51Z

PR_Github #36175 [ run ] completed with state SUCCESS. Commit: c08a603
/LLM/main/L0_MergeRequest_PR pipeline #27958 completed with status: 'SUCCESS'
Pipeline has performance regression cases. Check the performance regression report for details.

Link to invocation

…e Prometheus metrics (NVIDIA#11243) Signed-off-by: Yijing Li <257409031+yijingl-nvidia@users.noreply.github.com> Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

rmccorm4 · 2026-02-19T20:17:27Z

tensorrt_llm/metrics/collector.py

+                                    request_queue_time)
+            self.last_log_time = time.time()
+
+    def log_iteration_stats(self, iteration_stats: dict) -> None:


CC @indrajit96

yijingl-nvidia requested a review from a team as a code owner February 3, 2026 22:34

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

tensorrt_llm/metrics/collector.py Show resolved Hide resolved

tensorrt_llm/metrics/collector.py Outdated Show resolved Hide resolved

tensorrt_llm/metrics/collector.py Show resolved Hide resolved

yijingl-nvidia force-pushed the more_metrics branch from 9850e86 to 29f07d5 Compare February 3, 2026 23:26

kaiyux reviewed Feb 6, 2026

View reviewed changes

tensorrt_llm/serve/openai_server.py Show resolved Hide resolved

eopXD approved these changes Feb 6, 2026

View reviewed changes

yijingl-nvidia force-pushed the more_metrics branch from 3f541e0 to 337d84f Compare February 11, 2026 23:29

yijingl-nvidia requested review from a team as code owners February 11, 2026 23:29

yijingl-nvidia requested review from QiJune, Shixiaowei02 and Superjomn February 11, 2026 23:29

yijingl-nvidia force-pushed the more_metrics branch from 337d84f to 0410f59 Compare February 13, 2026 00:57

nvpohanh approved these changes Feb 13, 2026

View reviewed changes

JunyiXu-nv approved these changes Feb 13, 2026

View reviewed changes

yijingl-nvidia force-pushed the more_metrics branch from 0410f59 to 5f3b274 Compare February 14, 2026 00:02

yijingl-nvidia force-pushed the more_metrics branch from 5f3b274 to d369966 Compare February 16, 2026 16:42

yijingl-nvidia added 3 commits February 17, 2026 16:40

Minor refactor for KVCacheManager

d70d6c6

Signed-off-by: Yijing Li <257409031+yijingl-nvidia@users.noreply.github.com>

Add example script to test Prometheus metrics

c08a603

Signed-off-by: Yijing Li <257409031+yijingl-nvidia@users.noreply.github.com>

yijingl-nvidia force-pushed the more_metrics branch from d369966 to c08a603 Compare February 18, 2026 01:04

pcastonguay approved these changes Feb 18, 2026

View reviewed changes

laikhtewari approved these changes Feb 18, 2026

View reviewed changes

taylor-yb-lee merged commit c87c800 into NVIDIA:main Feb 18, 2026
5 checks passed

rmccorm4 reviewed Feb 19, 2026

View reviewed changes

indrajit96 mentioned this pull request Feb 21, 2026

chore: Expose new kv_cache metrics from trtllm backend ai-dynamo/dynamo#6469

Merged

Comments

Conversation

yijingl-nvidia commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yijingl-nvidia commented Feb 3, 2026

Uh oh!

Uh oh!

eopXD left a comment

Choose a reason for hiding this comment

Uh oh!

yijingl-nvidia commented Feb 12, 2026

Uh oh!

tensorrt-cicd commented Feb 12, 2026

Uh oh!

tensorrt-cicd commented Feb 12, 2026

Uh oh!

yijingl-nvidia commented Feb 12, 2026

Uh oh!

tensorrt-cicd commented Feb 12, 2026

Uh oh!

tensorrt-cicd commented Feb 13, 2026

Uh oh!

yijingl-nvidia commented Feb 13, 2026

Uh oh!

tensorrt-cicd commented Feb 13, 2026

Uh oh!

tensorrt-cicd commented Feb 13, 2026

Uh oh!

yijingl-nvidia commented Feb 13, 2026

Uh oh!

tensorrt-cicd commented Feb 13, 2026

Uh oh!

tensorrt-cicd commented Feb 13, 2026

Uh oh!

yijingl-nvidia commented Feb 14, 2026

Uh oh!

tensorrt-cicd commented Feb 14, 2026

Uh oh!

tensorrt-cicd commented Feb 14, 2026

Uh oh!

yijingl-nvidia commented Feb 16, 2026

Uh oh!

tensorrt-cicd commented Feb 16, 2026

Uh oh!

tensorrt-cicd commented Feb 16, 2026

Uh oh!

brb-nv commented Feb 17, 2026

Uh oh!

tensorrt-cicd commented Feb 17, 2026

Uh oh!

tensorrt-cicd commented Feb 17, 2026

Uh oh!

yijingl-nvidia commented Feb 18, 2026

Uh oh!

tensorrt-cicd commented Feb 18, 2026

Uh oh!

tensorrt-cicd commented Feb 18, 2026

yijingl-nvidia commented Feb 3, 2026 •

edited

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading