Skip to content

Commit 5d54876

Browse files
authored
[training_utils] feat: Add project and experiment name to tensorboard log path (#2080)
By adding project name and experiment name to the log path, avoid all tensorboard logs being mixed in the same folder, improving log management clarity. ### Checklist Before Starting - [x] Searched for similar PR(s). - [x] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. Reference related github issues and PRs if that help review. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
1 parent e484211 commit 5d54876

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

‎verl/utils/tracking.py‎

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ def __init__(self, project_name, experiment_name, default_backend: Union[str, Li
117117
self.logger["vemlp_wandb"] = vemlp_wandb
118118

119119
if "tensorboard" in default_backend:
120-
self.logger["tensorboard"] = _TensorboardAdapter()
120+
self.logger["tensorboard"] = _TensorboardAdapter(project_name, experiment_name)
121121

122122
if "console" in default_backend:
123123
from verl.utils.logger import LocalLogger
@@ -197,12 +197,12 @@ def finish(self):
197197

198198

199199
class _TensorboardAdapter:
200-
def __init__(self):
200+
def __init__(self, project_name, experiment_name):
201201
import os
202202

203203
from torch.utils.tensorboard import SummaryWriter
204204

205-
tensorboard_dir = os.environ.get("TENSORBOARD_DIR", "tensorboard_log")
205+
tensorboard_dir = os.environ.get("TENSORBOARD_DIR", f"tensorboard_log/{project_name}/{experiment_name}")
206206
os.makedirs(tensorboard_dir, exist_ok=True)
207207
print(f"Saving tensorboard log to {tensorboard_dir}.")
208208
self.writer = SummaryWriter(tensorboard_dir)

0 commit comments

Comments
 (0)