[Bug]: AutoDeploy Fix the NemotronMOE MMLU OOM issue

### System Info

H100 CW

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

pytest tests/integration/defs/accuracy/test_llm_api_autodeploy.py::TestNemotronMOE -s -vv

### Expected behavior

Test pass

### actual behavior

`FAILED tests/integration/defs/accuracy/test_llm_api_autodeploy.py::TestNemotronMOE::test_bf16 - tensorrt_llm.executor.utils.RequestError: CUDA out of memory. Tried to allocate 3.99 GiB. GPU 0 has a total capacity of 79.11 GiB of which 3.16 GiB is free. Process 328448 has 592.00 MiB memory in use. Including non-PyTorch memory, this process has 75.36 GiB memory in use. Of the allocated memory 67.16 GiB is allocated by PyTorch, with 121.83 MiB allocated in private pools (e.g., CUDA Graphs), and 6.90 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)`

### additional notes

N/A

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: AutoDeploy Fix the NemotronMOE MMLU OOM issue #10580

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: AutoDeploy Fix the NemotronMOE MMLU OOM issue #10580

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions