Skip to content

Conversation

@RissyRan
Copy link
Collaborator

@RissyRan RissyRan commented May 1, 2025

Description

Update FFN flops calculation for Llama4 (Scout + Maverick)

Tests

Using test like below

  @pytest.mark.tpu_only
  def test_flops(self):

    cfg = pyconfig.initialize(
        [None, os.path.join(PKG_DIR, "configs", "base.yml")],
        run_name="tflops_cal",
        enable_checkpointing=False,
        model_name="llama4-17b-16e",
        dtype="bfloat16",
        per_device_batch_size=4,
        max_target_length=256,
    )

    from MaxText import maxtext_utils

    total_tflops, learnable_weight_tflops, attention_tflops = maxtext_utils.calculate_tflops_training_per_device(cfg)
    print("after change....")
    print(f"total_tflops: {total_tflops}")
    print(f"learnable_weight_tflops: {learnable_weight_tflops}")
    print(f"attention_tflops: {attention_tflops}")

per_device_batch_size=4 max_target_length=256

Model Before After
llama2-7b total_tflops: 41.00620025856, learnable_weight_tflops: 40.593883398144, attention_tflops: 0.412316860416 total_tflops2: 41.00620025856, learnable_weight_tflops2: 40.593883398144, attention_tflops2: 0.412316860416
deepseek2-16b total_tflops: 15.278272413696, learnable_weight_tflops: 15.060839694336, attention_tflops: 0.21743271936 total_tflops: 15.278272413696, learnable_weight_tflops: 15.060839694336, attention_tflops: 0.21743271936
llama4-17b-128e NA total_tflops: 99.99690498048, learnable_weight_tflops: 99.2238108672, attention_tflops: 0.77309411328
llama4-17b-16e NA total_tflops: 99.92442740736, learnable_weight_tflops: 99.15133329408, attention_tflops: 0.77309411328

Comparing to 6BP rule:

1) llama2-7b, reported total 41 tflops, 6BP = 6 * 4 * 256 * 7 /10^3 = 43
2) DeepSeek v2-16b: reported total 15 tflops, 6BP = 6 * 4 * 256 * 2.4/10^3 = 14.7
3) llama4-17b-128e: reported total 100 tflops, 6BP = 6 * 4 * 256 * 17/10^3 = 104
4) llama4-17b-16e: reported total 100 tflops, 6BP = 6 * 4* 256 * 17/10^3 = 104

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Copy link
Collaborator

@jrplatin jrplatin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@gagika gagika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@copybara-service copybara-service bot merged commit 36dd7ee into main May 1, 2025
16 of 19 checks passed
@copybara-service copybara-service bot deleted the llama4_tflops branch May 1, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants