Skip to content

[Feature Request] support CPU parallel training with PT #4132

@njzjz

Description

@njzjz

Summary

Support CPU parallel training in the PyTorch backend.

Detailed Description

PyTorch does support gloo for distributed training, but the following lines seem to limit the backend to be nccl.

assert dist.is_nccl_available()
dist.init_process_group(backend="nccl")

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions