Skip to content

Comments

arm64: Remove widening casts before truncating#123546

Open
jonathandavies-arm wants to merge 6 commits intodotnet:mainfrom
jonathandavies-arm:upstream/ce/fold-casts
Open

arm64: Remove widening casts before truncating#123546
jonathandavies-arm wants to merge 6 commits intodotnet:mainfrom
jonathandavies-arm:upstream/ce/fold-casts

Conversation

@jonathandavies-arm
Copy link
Contributor

This PR remove unnecessary casts instructions. I've added some examples below taken from the Casts.cs test file but I've seen the same optimisations when I run the asmdiffs.

Example 1

[MethodImpl(MethodImplOptions.NoInlining)]
static sbyte CastIntLongSbyte(int x)
{
    //ARM64-FULL-LINE: sxtb {{w[0-9]+}}, {{w[0-9]+}}
    return (sbyte)(long)x;
}

Arm64 Instruction Diff

-sxtw    x0, w0
 sxtb    w0, w0

x64 Instruction Diff

-movsxd  rax, edi
-movsx   rax, al
 movsx   rax, dil

Example 2

[MethodImpl(MethodImplOptions.NoInlining)]
static ushort CastULongUShortULongUShort(ulong x)
{
    //ARM64-FULL-LINE: uxth {{w[0-9]+}}, {{w[0-9]+}}
    return (ushort)(ulong)(ushort)x;
}

Arm64 Instruction Diff

 uxth    w0, w0
-mov     w0, w0
-uxth    w0, w0

x64 Instruction Diff

-movzx    rdi, di
-mov      eax, edi
-movzx    rax, ax
 movzx    rax, di

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 23, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 23, 2026
@jonathandavies-arm
Copy link
Contributor Author

@dotnet/arm64-contrib

@jkotas jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 23, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @dotnet/jit-contrib
See info in area-owners.md if you want to be subscribed.

@adamperlin
Copy link
Contributor

@jonathandavies-arm it looks like there are some superpmi failures, but I believe these are related to an issue that was resolved. Can you update your branch with the latest changes from main?

@EgorBo
Copy link
Member

EgorBo commented Feb 10, 2026

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephentoub
Copy link
Member

🤖 Copilot Code Review — PR #123546

Holistic Assessment

Motivation: This PR eliminates unnecessary intermediate widening casts when a small-int cast (byte/short) is fed by an int→long widening cast. For example, (sbyte)(long)x where x is int becomes (sbyte)x. This is a valid optimization because the lower bits relevant to the final truncation are identical regardless of the intermediate 32→64 bit extension.

Approach: The implementation identifies the pattern Small ← Long ← Int and short-circuits it by bypassing the widening cast. The placement is correct — after the existing early-return for checked casts (if (cast->gtOverflow()) return cast;), ensuring the outer cast is guaranteed unchecked at this point.

Summary: ✅ LGTM. The logic is sound and handles all signed/unsigned combinations correctly. The optimization is safe because truncation to small int types only preserves the lower 8 or 16 bits, which are invariant under 32-to-64 bit extension (whether sign-extended or zero-extended). The PR has already been approved by @EgorBo and passed Fuzzlyn testing.


Detailed Findings

✅ Correctness & Signedness

The optimization is mathematically correct for all combinations:

  • intlong (sign-extend) vs uintulong (zero-extend): The lower 32 bits are preserved in all widening cases. Since the final cast truncates to 8 or 16 bits, the upper bits (including the extended sign/zero bits) are discarded and do not affect the result.
  • genActualType(...) == TYP_INT: This correctly captures both TYP_INT and TYP_UINT (and smaller types like byte/short that normalize to TYP_INT), ensuring the optimization applies to both signed and unsigned source types.
  • varTypeIsSmall: Correctly identifies the target types (TYP_BYTE, TYP_UBYTE, TYP_SHORT, TYP_USHORT).

✅ Overflow Check Handling

  • !src->gtOverflow(): The check on the inner widening cast is correct. While int→long widening never overflows mathematically, this check ensures we don't remove a node that the JIT may have tagged with side effects.
  • Outer cast overflow: The placement of this optimization after line 8606-8609 (if (cast->gtOverflow()) return cast;) guarantees the outer cast has no overflow checking, so no additional check is needed.

✅ Implementation Details

  • DEBUG_DESTROY_NODE(widening): Correctly cleans up the removed intermediate cast node in debug builds.
  • src = cast->CastOp(): Correctly updates the local src variable to point to the new operand, ensuring subsequent optimizations in fgOptimizeCast see the updated tree.

✅ Test Coverage

The extensive test file Casts.cs covers a comprehensive set of cast combinations including:

  • All source types (int, long, uint, ulong)
  • All target small types (sbyte, byte, short, ushort)
  • Multiple chained casts (e.g., long → int → short → sbyte)
  • ARM64-specific assembly checks using the HasDisasmCheck mechanism

The tests validate both correctness (return values) and codegen (expected ARM64 instructions via FileCheck comments).


Review performed with assistance from multiple AI models (Gemini 3 Pro). GPT-5.1-Codex timed out after 7+ minutes.

@EgorBo
Copy link
Member

EgorBo commented Feb 11, 2026

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants