-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Hello, I could use some help with a new model for our return/call SIMD* typing that I am implementing, but first a few examples of what is happening now.
1 example:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector4 ReturnVector4()
{
return new Vector4(1);
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector4 ReturnVector4UsingCall()
{
return ReturnVector4();
}
IL for ReturnVector4UsingCall is very simple: call ReturnVector4; ret,
IR would be ASG(LCL_VAR, call); return LCL_VAR;
The complexity is that Arm64 supports both vector and HFA calling conventions, in this case
Vector4 is an HFA value, so we have to return it as v0.s[0], v1.s[0], v2.s[0], v3.s[0].
Now let's see how we import this call and with which type:
- create it as
TYP_STRUCT, usingcallRetTyp = JITtype2varType(calliSig.retType)inimpImportCall; - change it to
TYP_SIMD16inimpImportCall:callRetTyp = impNormStructType(actualMethodRetTypeSigClass); call->gtType = callRetTyp;; - change it back to
TYP_STRUCTinimpAssignStructPtr:src->gtType = genActualType(returnType);and that is the final value of the type.
a fun side-effect: even if call result is not used we are still creatingASG(LCL_VAR, call), change call type to struct and only later delete theASGleavingcallwith the correct struct type.
Note for !compDoOldStructRetyping(): I don't do 2. and 3., so create as TYP_STRUCT and keep it.
and the return in this case is STRUCT, so we end up with nice IR:
***** BB01
STMT00000 (IL 0x000...0x005)
N005 ( 15, 4) [000003] -ACXG---R--- * ASG simd16 (copy)
N004 ( 1, 1) [000001] D------N---- +--* LCL_VAR simd16<System.Numerics.Vector4> V01 tmp1 d:1
N003 ( 15, 4) [000000] --CXG------- \--* CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 ( 1, 1) [000006] ------------ arg0 in x11 \--* CNS_INT(h) long 0x29e89a04b90 ftn REG x11
***** BB01
STMT00001 (IL ???... ???)
N002 ( 2, 2) [000005] ------------ * RETURN struct
N001 ( 1, 1) [000004] -------N---- \--* LCL_VAR simd16<System.Numerics.Vector4> V01 tmp1 u:1 (last use)
Note/todo/fun fact: if we did not set LCL_VAR type as SIMD16 and keep it as a struct, then copy prop would optimize it as:
N002 ( 2, 2) [000005] ------------ * RETURN struct
N003 ( 15, 4) [000000] --CXG------- \--* CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 ( 1, 1) [000006] ------------ arg0 in x11 \--* CNS_INT(h) long 0x29e89a04b90 ftn REG x11
Summary 1: in HFA case we type call and return as TYP_STRUCT with some confusing transformations in the middle.
2 example:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<int> ReturnVectorInt()
{
return new Vector<int>();
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<int> ReturnVectorIntUsingCall()
{
return ReturnVectorInt();
}
- create it as
TYP_STRUCT, usingcallRetTyp = JITtype2varType(calliSig.retType)inimpImportCall; - change it to
TYP_SIMD16inimpImportCall:callRetTyp = impNormStructType(actualMethodRetTypeSigClass); call->gtType = callRetTyp;; - keep it as
TYP_SIMD16inimpAssignStructPtr:src->gtType = genActualType(returnType);.
and IR looks good:
***** BB01
STMT00000 (IL 0x000...0x005)
N005 ( 15, 4) [000003] -ACXG---R--- * ASG simd16 (copy)
N004 ( 1, 1) [000001] D------N---- +--* LCL_VAR simd16<System.Numerics.Vector4> V01 tmp1 d:1
N003 ( 15, 4) [000000] --CXG------- \--* CALL r2r_ind struct TestHFAandHVA.ReturnVector4,NA,NA,NA
N002 ( 1, 1) [000006] ------------ arg0 in x11 \--* CNS_INT(h) long 0x29e89a04b90 ftn REG x11
***** BB01
STMT00001 (IL ???... ???)
N002 ( 2, 2) [000005] ------------ * RETURN struct
N001 ( 1, 1) [000004] -------N---- \--* LCL_VAR simd16<System.Numerics.Vector4> V01 tmp1 u:1 (last use)
Summary 1,2: based on these 2 examples we could think that TYP_SIMD16 on a call or a return means passed in a single vector register and it will have TYP_STRUCT when it is an HFA,
and TYP_STRUCT can be assigned to TYP_SIMD16, but...
3 example:
struct A
{
bool a;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<A> ReturnVectorNotKnown()
{
return new Vector<A>();
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<A> ReturnVectorNotKnownUsingCall()
{
return ReturnVectorNotKnown();
}
guess which type Jit will use for it before you read the answer :-)
The IR after importation will be:
[000001] --C-G------- * RETURN simd16
[000000] --C-G------- \--* CALL r2r_ind struct TestHFAandHVA.ReturnVectorNotKnown
because for the return TYPE we ask VM and for call type we use getBaseTypeAndSizeOfSIMDType that can only parse known primitive types, so we get a nice mistyping out of nowhere,
does not look like a problem so far, JIT can handle it using morph::fgFixupStructReturn that sets call type to simd16.
3.1. example:
add a temp local var to the last example:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<A> ReturnVectorNotKnownUsingCallAndTemp()
{
var a = ReturnVectorNotKnown();
return a;
}
and we have IR that we want right after importation, thanks to impAssignStructPtr from the first example:
***** BB01
STMT00000 (IL 0x000...0x005)
[000003] -AC-G------- * ASG simd16 (copy)
[000001] D------N---- +--* LCL_FLD simd16 V00 loc0 [+0]
[000000] --C-G------- \--* CALL r2r_ind simd16 TestHFAandHVA.ReturnVectorNotKnown
***** BB01
STMT00001 (IL 0x006...0x007)
[000005] ------------ * RETURN simd16
[000004] ------------ \--* LCL_FLD simd16 V00 loc0 [+0]
but V00 is created as STRUCT, so can't put it in a register, sad:
Generating: N009 ( 15, 4) [000000] --CXG------- t0 = * CALL r2r_ind simd16 TestHFAandHVA.ReturnVectorNotKnown REG d0 $140
IN0003: ldr x0, [x11]
Call: GCvars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}
IN0004: blr x0
/--* t0 simd16
Generating: N011 ( 19, 9) [000003] DA-XG------- * STORE_LCL_FLD simd16 V00 loc0 d:2[+0] NA REG NA
IN0005: str q0, [fp,#16]
Live vars: {} => {V00}
Added IP mapping: 0x0006 STACK_EMPTY (G_M38418_IG02,ins#5,ofs#20)
Generating: N013 (???,???) [000010] ------------ IL_OFFSET void IL offset: 0x6 REG NA
Generating: N015 ( 3, 4) [000004] ------------ t4 = LCL_FLD simd16 V00 loc0 u:2[+0] d16 (last use) REG d16 $141
IN0006: ldr q16, [fp,#16]
Live vars: {V00} => {}
/--* t4 simd16
Generating: N017 ( 4, 5) [000005] ------------ * RETURN simd16 REG NA $142
IN0007: mov v0.16b, v16.16b
Note for !compDoOldStructRetyping(): we do not want all these retyping to happens in random places, so we want types not to change after we create them during importation until they reach lowering.
Question: but what type should we use in the last example? TYP_STRUCT works much better, because then we don't need access LCL_VAR as LCL_FLD, they have exactly the same types and Jit knows that!
For now, I am stick with TYP_STRUCT in all cases for all call types, keep RETURN as VM sees them, but it cases asserts that I can't avoid without implementing #11413, because we start getting IND SIMD16(ADDR byref(call STRUCT) for such calls and ADDR(call) is not a valid IR (we sometimes create them, but we are lucky in those examples and I am not lucky in mine).
Summary 1, 2, 3: do not try to guess Jit TYP looking at C# code.
4 example:
[MethodImpl(MethodImplOptions.NoInlining)]
static Vector<T> ReturnVectorTWithMerge<T>(int v, T init1, T init2, T init3, T init4) where T : struct
{
if (v == 0)
{
return new Vector<T>();
}
else if (v == 1)
{
return new Vector<T>(init1);
}
else if (v == 2)
{
return new Vector<T>(init2);
}
else if (v == 3)
{
return new Vector<T>(init3);
}
else
{
return new Vector<T>(init4);
}
}
struct A
{
bool a;
}
ReturnVectorTWithMerge<A>(int v, a, b, c, d);
so we know that return->gtType == TYP_SIMD and call types would be TYP_STRUCT, and we know that it is working fine somehow and after morph, we change call types to TYP_SIMD and it is great.
but here comes my favorite thing: return merging, we create a 1 local var where we put all return results and it is happening before global morph, during PHASE Morph - Add internal blocks,
can you guess the type of this LCL_VAR?
lvaGrabTemp returning 12 (V12 tmp5) called for Single return block return value.
SIMD Candidate Type System.Numerics.Vector`1[System.__Canon]
Unknown SIMD Vector<T>
mergeReturns statement tree [000071] added to genReturnBB BB10 [0009]
[000071] ------------ * RETURN struct
[000070] -------N---- \--* LCL_VAR struct<System.Numerics.Vector`1[__Canon], 16> V12 tmp5
and return knows it is a struct somehow... But maybe morph will fix it like it fixes calls? Nop... it will bail out with an assert that you can easily repro in the current master, see #37247:
Assert failure(PID 198612 [0x000307d4], Thread: 221228 [0x3602c]): Assertion failed '!"Incompatible types for gtNewTempAssign"' in 'TestHFAandHVA:ReturnVectorTWithMerge(int,System.__Canon,System.__Canon,System.__Canon,System.__Canon):System.Numerics.Vector`1[__Canon]' during 'Morph - Global' (IL size 54)
File: F:\git\runtime\src\coreclr\src\jit\gentree.cpp Line: 15159
Image: F:\git\runtime\artifacts\bin\coreclr\Windows_NT.arm64.Checked\x64\crossgen.exe
when we try to do ASG(LCL_VAR struct our merge lclVar, LCL_FLD SIMD16 from our calls
It doesn't nowadays lead to a bad codegen in release, because lower has a handling for it under compDoOldStructRetyping() == false and we ignore asserts
that is actually compDoOldStructRetyping() == false and do the right thing of setting RETURN TYP back to SIMD16, I do not have an older version or runtime to check what was happening before compDoOldStructRetyping.
Summary 1, 2, 3, 4: with compDoOldStructRetyping == true the old system is very unpredictable and fragile, with failures in simple cases.
compDoOldStructRetyping == false that I am trying to support on arm64 has the same difficulties and I would like to hear @CarolEidt , @tannergooding , @dotnet/jit-contrib opinions about types that we should choose in each case. I have tried many options and none of them was good enough.
I have started working on #11413, so I could keep calls as TYP_STRUCT always, ignoring SIMD and avoiding creating IND(ADDR(CALL) when we assign their results to LCL_VAR/FLD SIMD*, what do you think?