Skip to content

Comments

[Wasm RyuJit] localloc#124250

Open
AndyAyersMS wants to merge 13 commits intodotnet:mainfrom
AndyAyersMS:WasmFixedLocalloc
Open

[Wasm RyuJit] localloc#124250
AndyAyersMS wants to merge 13 commits intodotnet:mainfrom
AndyAyersMS:WasmFixedLocalloc

Conversation

@AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Feb 11, 2026

Implement localloc.

  • Add the ability to have a simple block sig (an if/else that pushes a value).
  • Ensure there is a frame pointer reg
  • Save the frame pointer reg at the bottom of the stack after moving the SP (but not yet doing the analogous thing in the prolog to store the "method info")
  • Align both the localloc value and the SP to STACK_ALIGN (16 bytes).
  • Ignore possible stack overflow

Implement localloc when the size is known at jit time. The runtime
case requires internal registers so can't be implemented yet.

Remove the NYT around GS checks, we will rely on the TODO and
issue tracking to decide if we want to do something here later on.
Copilot AI review requested due to automatic review settings February 11, 2026 00:12
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements fixed-sized localloc support for WASM RyuJIT by adding the memory.fill instruction and the genLclHeap function to generate code for GT_LCLHEAP nodes when the size is known at JIT time.

Changes:

  • Added memory.fill WASM instruction definition for zeroing allocated memory
  • Implemented genLclHeap() to handle compile-time constant localloc operations
  • Added GT_LCLHEAP case to the main codegen switch statement

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
src/coreclr/jit/instrswasm.h Adds memory.fill instruction definition for zeroing memory blocks
src/coreclr/jit/codegenwasm.cpp Implements GT_LCLHEAP codegen with stack adjustment, optional zeroing, and frame pointer saving; variable-sized case marked NYI

@AndyAyersMS
Copy link
Member Author

This PR could (should?) also set up the sp[0] invariant in the prolog for non-leaf methods. Offline we discussed that this could be either the fp or else the memory address of the "method info", For methods with localloc it seems simpler to always have it be fp -- otherwise the localloc expansion doesn't know if it can overwrite that slot or not.

For frames where sp doesn't move sp[0] can be the method info pointer. Seems like this could almost just be the gc info but we probably also need to be able to use it to get to the method desc. We could also store the PEP here since we have it as an arg.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 11, 2026 00:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 12, 2026

Per ecma,

System.StackOverflowException is thrown if there is insufficient memory to service the
request.

So we'll need to do an explicit overflow check too. On native platforms we tend to get this "for free" via stack probing.

Seems like we thus need a runtime helper to access to the stack limit? Or else maybe emscripten_stack_get_free (may not be accurate since we don't resync the global SP).

@SingleAccretion
Copy link
Contributor

So we'll need to do an explicit overflow check too. On native platforms we tend to get this "for free" via stack probing.

Do you mean that we should also insert such checks in the prolog? Otherwise it's not a complete solution.

@AndyAyersMS
Copy link
Member Author

So we'll need to do an explicit overflow check too. On native platforms we tend to get this "for free" via stack probing.

Do you mean that we should also insert such checks in the prolog? Otherwise it's not a complete solution.

I am not sure... how do other languages handle stack overflow on Wasm?

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Feb 12, 2026

I am not sure... how do other languages handle stack overflow on Wasm?

The C/C++ approach is to "let things be".

The default memory layout for debug code (you are presumed to test things with debug code, like we do with checked Jits) is to place the linear memory stack just after __global_base, so that in the event of overflow, your pointers become close to uint.MaxValue and any subsequent memory access traps.

In optimized code, the stack is placed after static data (so that relocations to that static data can be made smaller), so an overflow would still eventually reach the bottom of the memory range, but only after potentially overwriting that static data. It is a somewhat frequent point of contention.

Finally, there is language in the spec itself that limits the number of WASM frames (not linear memory, proper function call frames) that can exist, effectively meaning that if you can choose a value for the linear memory stack that would be large enough for this engine-determined limit to run out (this is what we've observed to be empirically the case in NAOT-LLVM with 2 stacks totaling ~2 MB between them).

Copilot AI review requested due to automatic review settings February 20, 2026 22:00
@AndyAyersMS AndyAyersMS changed the title [Wasm RyuJit] fixed-sized locallocs [Wasm RyuJit] localloc Feb 20, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

@AndyAyersMS
Copy link
Member Author

AfterLowerLclHeap may be a bit too hacky. I wanted to simplify life by having the size always be TYP_I_IMPL... suggestions welcome.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 20, 2026 22:12
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

@AndyAyersMS AndyAyersMS marked this pull request as ready for review February 20, 2026 23:13
Copilot AI review requested due to automatic review settings February 20, 2026 23:13
@AndyAyersMS
Copy link
Member Author

PTAL @dotnet/jit-contrib

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 20, 2026 23:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

// * whether the size operand is a constant or not
// * whether the allocated memory needs to be zeroed or not (info.compInitMem)
//
// If the sp is changed, the value at sp[0] must be the frame pointer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the exact unwinding scheme for variable-sized frames we settled on here into the ABI document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do.

bool const needsZeroing = m_compiler->info.compInitMem;
GenTree* const size = tree->AsOp()->gtOp1;

assert(genActualType(size->gtType) == TYP_I_IMPL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(genActualType(size->gtType) == TYP_I_IMPL);
assert(genActualType(size) == TYP_I_IMPL);


case GT_LCLHEAP:
LowerLclHeap(node);
AfterLowerLclHeap(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this in LowerLclHeap instead?


public:
void emitIns(instruction ins);
void emitIns_B(instruction ins, WasmValueType valType = WasmValueType::Invalid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: emitIns_BlockTy would be more immediately obvious (we already use Ty to mean 'value type').


if (!size->isContainedIntOrIImmed())
{
size = m_compiler->gtNewCastNode(TYP_I_IMPL, size, false, TYP_I_IMPL);
Copy link
Contributor

@SingleAccretion SingleAccretion Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 32 bit WASM TYP_I_IMPL == TYP_INT. So a no-op.

Also, should this be an unsigned cast?

Also, this would deserve a comment as to why we're deciding to add a cast here (| why is it better than the alternatives).

Comment on lines +40 to +47
instrDesc* id = emitNewInstrSC(EA_4BYTE, (cnsval_ssize_t)valType);
insFormat fmt = emitInsFormat(ins);

id->idIns(ins);
id->idInsFmt(fmt);

dispIns(id);
appendToCurIG(id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just:

Suggested change
instrDesc* id = emitNewInstrSC(EA_4BYTE, (cnsval_ssize_t)valType);
insFormat fmt = emitInsFormat(ins);
id->idIns(ins);
id->idInsFmt(fmt);
dispIns(id);
appendToCurIG(id);
emitIns_I(ins, EA_4BYTE, (cnsval_ssize_t)valType);

?

}

if (anyFrameLocals)
if (anyFrameLocals || m_codeGen->isFramePointerRequired())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use the narrower compLocallocUsed here instead of isFramePointerRequired, with a comment to the effect of "LCLHEAP codegen uses FP for unwind info establishment..." (we wouldn't need the FP with a different scheme).

isFramePointerRequired is set with lots of conditions that are deeply irrelevant for WASM and probably should be ifdef-ed out. There are no FP chains on WASM.

//
void WasmRegAlloc::CollectReferencesForLclHeap(GenTreeOp* lclHeapNode)
{
ConsumeTemporaryRegForOperand(lclHeapNode->gtGetOp1() DEBUGARG("lcl heap"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ConsumeTemporaryRegForOperand(lclHeapNode->gtGetOp1() DEBUGARG("lcl heap"));
ConsumeTemporaryRegForOperand(lclHeapNode->gtGetOp1() DEBUGARG("lcl heap size"));


// TODO-WASM-CQ: possibly do small fills directly
//
GetEmitter()->emitIns_I(INS_memory_fill, EA_4BYTE, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This should rendezvous with the block store PR on using a named constant for the memory index)

// Save the aligned size for the zeroing call below if needed.
if (needsZeroing)
{
GetEmitter()->emitIns_I(INS_local_tee, EA_PTRSIZE, WasmRegToIndex(sizeReg));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetMultiUseOperandReg regs cannot be modified like this in the general case (they may still be live if they're candidates).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point.

I will revise this to use an internal register. Doing that I can also avoid the need for the extra casting logic in lower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-wasm WebAssembly architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants