ARM64 - Optimizing a % b operations part 2#66407
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsAddressing part of this issue: #34937 Description There are various ways to optimize
negs x1, x0
and x0, x0, #(n - 1)
and x1, x1, #(n - 1)
csneg x0, x0, x1, miAcceptance Criteria
|
tannergooding
left a comment
There was a problem hiding this comment.
Changes look generally good. I left a couple comments around the possibility of using existing helpers rather than directly querying gtFlags
|
@kunalspathak @EgorBo this is ready. CI is failing due to unrelated reasons it looks like. I think @AndyAyersMS and I reviewed this together a few weeks back as well. |
kunalspathak
left a comment
There was a problem hiding this comment.
Changes looks good, just a comment.
Also, can you check why there are regressions in crossgen2 and other collections?
Top method regressions (bytes):
24 (1.69 % of base) : 85784.dasm - System.Collections.Generic.HashSet`1:SymmetricExceptWithEnumerable(System.Collections.Generic.IEnumerable`1[System.__Canon]):this
16 (0.60 % of base) : 107306.dasm - System.Text.GB18030Encoding:GetChars(long,int,long,int,System.Text.DecoderNLS):int:this
12 (0.94 % of base) : 85783.dasm - System.Collections.Generic.HashSet`1:CheckUniqueAndUnfoundElements(System.Collections.Generic.IEnumerable`1[System.__Canon],bool):System.ValueTuple`2[System.Int32, System.Int32]:this
12 (1.94 % of base) : 146511.dasm - System.Formats.Asn1.AsnWriter:WriteNamedBitList(System.Nullable`1[System.Formats.Asn1.Asn1Tag],long):this
8 (0.98 % of base) : 5636.dasm - Microsoft.CodeAnalysis.PEModule:IsNoPiaLocalType(System.Reflection.Metadata.TypeDefinitionHandle,byref):bool:this
8 (7.69 % of base) : 4080.dasm - Microsoft.CodeAnalysis.XmlCharType:SplitSurrogateChar(int,byref,byref)
8 (0.75 % of base) : 85786.dasm - System.Collections.Generic.HashSet`1:IntersectWithEnumerable(System.Collections.Generic.IEnumerable`1[System.__Canon]):this
8 (2.47 % of base) : 59774.dasm - System.Data.RBTree`1:FreeNode(int):this
8 (2.53 % of base) : 62804.dasm - System.Data.RBTree`1:FreeNode(int):this
8 (0.99 % of base) : 97269.dasm - System.Globalization.HebrewNumber:Append(System.Text.StringBuilder,int)
8 (0.73 % of base) : 138410.dasm - System.Net.NetworkInformation.PhysicalAddress:TryParse(System.ReadOnlySpan`1[System.Char],byref):bool
8 (3.92 % of base) : 98078.dasm - System.Net.WebUtility:ConvertSmpToUtf16(int,byref,byref)
8 (7.69 % of base) : 120074.dasm - System.Xml.XmlCharType:SplitSurrogateChar(int,byref,byref)
4 (1.59 % of base) : 5634.dasm - Microsoft.CodeAnalysis.PEModule:RecordNoPiaLocalTypeCheck(System.Reflection.Metadata.TypeDefinitionHandle):this
4 (3.12 % of base) : 85613.dasm - System.Collections.Generic.BitHelper:IsMarked(int):bool:this
4 (3.23 % of base) : 85614.dasm - System.Collections.Generic.BitHelper:MarkBit(int):this
4 (1.43 % of base) : 96963.dasm - System.Globalization.OrdinalCasing:InitCasingTable():System.UInt16[][]
4 (0.83 % of base) : 132782.dasm - System.Net.FtpControlStream:FormatAddress(System.Net.IPAddress,int):System.String
4 (0.83 % of base) : 138407.dasm - System.Net.NetworkInformation.UnicastIPAddressInformation:PrefixLengthToSubnetMask(ubyte,int):System.Net.IPAddress
| void Lowering::ContainCheckDivOrMod(GenTreeOp* node) | ||
| { | ||
| assert(node->OperIs(GT_DIV, GT_UDIV)); | ||
| assert(node->OperIs(GT_DIV, GT_UDIV, GT_MOD)); |
There was a problem hiding this comment.
why not also GT_UMOD?
There was a problem hiding this comment.
This optimization doesn't impact GT_UMOD, only GT_MOD so that's why the additional GT_MOD check is there.
|
I will look into the regressions that are there to see what was impacted. |
|
@kunalspathak , yes it looks like there is a bit of a regression here: |
Thanks for confirming offline that these are happening for scenarios where there is both |
|
Yea, the regression only occurs in CSE for 'a / b' operations. If we believe the benefits of this optimization outweighs that of CSE for 'a / b', then we should do it. I'm not sure the best way to resolve it either; I thought about matching 'a - (a / cns) * cns' and turn it into this specific optimization, but it gets more complicated as 'multiply' can be optimized to a LSH operation, 'a - ((a / cns) << shift' if the 'cns' is the power of 2. As a test, I tried turning 'a - ((a / cns) << shift' back into a mod op so it would get picked up by the lowering optimization: if (tree->TypeIs(TYP_INT, TYP_LONG) && !tree->IsUnsigned() && op1->OperIs(GT_LCL_VAR))
{
GenTree* mul = op2;
if (mul->OperIs(GT_LSH) && !gtIsActiveCSE_Candidate(mul))
{
GenTree* div = mul->gtGetOp1();
GenTree* cns1 = mul->gtGetOp2();
if (div->OperIs(GT_DIV) && !gtIsActiveCSE_Candidate(div) && cns1->IsIntegralConst())
{
GenTree* a = div->gtGetOp1();
GenTree* cns2 = div->gtGetOp2();
if (a->OperIs(GT_LCL_VAR) && cns2->IsIntegralConstPow2() &&
op1->AsLclVar()->GetLclNum() == a->AsLclVar()->GetLclNum())
{
size_t cnsValue1 = cns1->AsIntConCommon()->IntegralValue();
size_t cnsValue2 = cns2->AsIntConCommon()->IntegralValue();
if ((cnsValue2 >> cnsValue1) == 1)
{
tree->ChangeOper(GT_MOD);
tree->AsOp()->gtOp2 = cns1;
op2 = tree->gtGetOp2();
}
}
}
}
}But, I would like to do this after CSE has occurred; that way if there were CSEs that happened in the expression, it wouldn't match this logic. The next logically thing to look at is lowering if we could do this, but I think we would have to mark multiple nodes as contained - so it still gets more complicated. |
|
I made an issue regarding this minor regression: #67983 |
|
Possible regression: #68624 |

Addressing part of this issue: #34937
Description
There are various ways to optimize
%for integers on ARM64.a % bcan be transformed into a specific sequence of instructions for ARM64 if the operation is signed andbis a constant with the power of 2.Acceptance Criteria
ARM64 diffs based on tests