Skip to content

Loss Turns to NaN After Several Hundred Ticks When Using Mixed-Precision Training #12

Description

@aiihn

Hello,
I've encountered an issue where the loss turns to NaN after several hundred ticks when I enable mixed-precision training using the --fp16=True flag. And I noticed this line in the code:

loss_scaling = 1, # Loss scaling factor for reducing FP16 under/overflows.

I'm wondering if I should also adjust the --ls setting for loss scaling in conjunction with the --fp16=True flag. Could you advise what value the loss scaling should be set to under these conditions?

Additionally, are there other specific settings that should be configured to optimize mixed-precision training? For example, should the "learning rate" be modified together with "loss scaling"?

If possible, could you share the commands or configuration that you typically use for mixed-precision training?

Thank you very much in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions