-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
System Info
H100 CW
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
pytest tests/integration/defs/accuracy/test_llm_api_autodeploy.py::TestNemotronMOE -s -vv
Expected behavior
Test pass
actual behavior
FAILED tests/integration/defs/accuracy/test_llm_api_autodeploy.py::TestNemotronMOE::test_bf16 - tensorrt_llm.executor.utils.RequestError: CUDA out of memory. Tried to allocate 3.99 GiB. GPU 0 has a total capacity of 79.11 GiB of which 3.16 GiB is free. Process 328448 has 592.00 MiB memory in use. Including non-PyTorch memory, this process has 75.36 GiB memory in use. Of the allocated memory 67.16 GiB is allocated by PyTorch, with 121.83 MiB allocated in private pools (e.g., CUDA Graphs), and 6.90 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
additional notes
N/A
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status