Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@CarolEidt
Copy link

Instead of selecting a single relatedInterval for preferencing,
follow the chain of relatedIntervals (preferenced intervals).

Change when preferences are propagated to the relatedInterval;
do it in allocateRegisters, so that we can update it when we see a last use.

Also tune when and how intervals are preferenced, including allowing multiple
uses on an RMW node to have the target as their preference.

Fixes #11463
Contributes to #16359

@CarolEidt
Copy link
Author

@dotnet-bot test Windows_NT x64 perf

@CarolEidt
Copy link
Author

@dotnet-bot test Windows_NT x64 throughput
@dotnet-bot test Windows_NT x86 throughput
@dotnet-bot test linux perf flow

@CarolEidt
Copy link
Author

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

@CarolEidt
Copy link
Author

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

@benaadams
Copy link
Member

This would resolve the issue with wrappers in https://github.com/dotnet/coreclr/issues/18542? (The pointer sized ones)

@CarolEidt
Copy link
Author

This would resolve the issue with wrappers in #18542? (The pointer sized ones)

It improves them; it eliminates the ping-pong copies (five copies removed from the ViaStruct2 and subsequent cases), but there's still unnecessary spill. Unfortunately, as with all register allocation changes, there are also regressions. I've analyzed the significant ones and they are basically just the fallout from changed heuristics. However, this "back-burner" work, and I've not been able to get a successful perf run, which I'd like to do before getting this reviewed and merged.

@CarolEidt
Copy link
Author

@dotnet-bot test Windows_NT x64 Checked CoreFX Tests

@CarolEidt
Copy link
Author

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

@CarolEidt
Copy link
Author

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

@mikedn
Copy link

mikedn commented Dec 20, 2018

I've yet to download your changes and see the diffs but FWIW cases like ViaStruct5 can be improved by better handling of struct copies (I guess that qualifies as 1st class structs improvements, it looks like I'm going to do just that in my quest to get rid of GT_ASG). Not to say that LSRA improvements aren't good but it looks like at least in cases ViaStruct5 the problem is actually elsewhere in the JIT.

@CarolEidt
Copy link
Author

test OSX10.12 x64 Checked Innerloop Build and Test
test Ubuntu x64 Checked CoreFX Tests
test Windows_NT x86 Release Innerloop Build and Test

@CarolEidt
Copy link
Author

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test
@dotnet-bot test Ubuntu x64 Checked CoreFX Tests

@CarolEidt
Copy link
Author

test OSX10.12 x64 Checked Innerloop Build and Test
test Ubuntu x64 Checked CoreFX Tests

test Windows_NT x64 perf
test Windows_NT x86 perf

@CarolEidt
Copy link
Author

test Windows_NT x64 perf
test Windows_NT x86 perf

@CarolEidt
Copy link
Author

@dotnet-bot test Windows_NT x64 perf

@benaadams
Copy link
Member

Curiosity got the better of me....

Looking at System.Private.CoreLib.dasm

Total bytes of diff: -14524 (-0.45% of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
      -14524 : System.Private.CoreLib.dasm (-0.45% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method regessions by size (bytes):
         138 : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double
          84 : System.Private.CoreLib.dasm - ValueTuple`8:Equals(ref):bool:this (17 methods)
          59 : System.Private.CoreLib.dasm - PropertyValue:GetFactory(ref):ref
          51 : System.Private.CoreLib.dasm - KeyCollection:CopyTo(ref,int):this (30 methods)
          49 : System.Private.CoreLib.dasm - ArraySortHelper`1:SwapIfGreater(ref,ref,int,int) (23 methods)
          45 : System.Private.CoreLib.dasm - ValueTuple`8:Equals(struct):bool:this (17 methods)
          44 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:PickPivotAndPartition(ref,int,int):int (18 methods)
          42 : System.Private.CoreLib.dasm - ValueCollection:CopyTo(ref,int):this (30 methods)
          41 : System.Private.CoreLib.dasm - DecCalc:DecAddSub(byref,byref,bool)
          36 : System.Private.CoreLib.dasm - Type:FindMembers(int,int,ref,ref):ref:this
          33 : System.Private.CoreLib.dasm - ValueTuple`8:CompareTo(struct):int:this (17 methods)
          32 : System.Private.CoreLib.dasm - Array:Resize(byref,int) (4 methods)
          32 : System.Private.CoreLib.dasm - ReaderWriterLockSlim:TryEnterReadLockCore(struct):bool:this
          26 : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatDateTimeO(struct,struct,struct,byref):bool
          26 : System.Private.CoreLib.dasm - AsyncTaskMethodBuilder`1:GetStateMachineBox(byref):ref:this (26 methods)
          24 : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatDateTimeG(struct,struct,struct,byref):bool
          23 : System.Private.CoreLib.dasm - ArraySortHelper`1:PickPivotAndPartition(ref,int,int,ref):int (23 methods)
          22 : System.Private.CoreLib.dasm - UnicodeEncoding:GetCharCount(long,int,ref):int:this
          21 : System.Private.CoreLib.dasm - DecCalc:Unscale(byref,byref,byref)
          21 : System.Private.CoreLib.dasm - EventProvider:WriteEvent(byref,long,long,long,ref):bool:this
          20 : System.Private.CoreLib.dasm - DateTimeFormat:TryFormatO(struct,struct,struct,byref):bool
          19 : System.Private.CoreLib.dasm - MemoryStream:ReadAsync(ref,int,int,struct):ref:this
          18 : System.Private.CoreLib.dasm - DateTimeParse:ProcessTerminalState(int,byref,byref,byref,byref,ref):bool
          18 : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
          18 : System.Private.CoreLib.dasm - ReadOnlySpan`1:op_Implicit(struct):struct (5 methods)

Top method improvements by size (bytes):
       -1215 : System.Private.CoreLib.dasm - ValueTuple`8:GetHashCodeCore(ref):int:this (17 methods)
       -1097 : System.Private.CoreLib.dasm - ValueTuple`8:GetHashCode():int:this (17 methods)
        -379 : System.Private.CoreLib.dasm - List`1:RemoveAll(ref):int:this (23 methods)
        -253 : System.Private.CoreLib.dasm - List`1:InsertRange(int,ref):this (23 methods)
        -229 : System.Private.CoreLib.dasm - TextInfo:ChangeCaseCommon(ref):ref:this (3 methods)
        -223 : System.Private.CoreLib.dasm - ValueTuple`8:System.IValueTupleInternal.ToStringEnd():ref:this (17 methods)
        -209 : System.Private.CoreLib.dasm - EventPipePayloadDecoder:DecodePayload(byref,struct):ref
        -175 : System.Private.CoreLib.dasm - AsyncMethodBuilderCore:Start(byref) (25 methods)
        -168 : System.Private.CoreLib.dasm - Dictionary`2:GetObjectData(ref,struct):this (28 methods)
        -138 : System.Private.CoreLib.dasm - ArraySortHelper`1:IntroSort(ref,int,int,int,ref) (23 methods)
        -135 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullMatch(struct):bool:this
        -133 : System.Private.CoreLib.dasm - Number:TryParseSingle(struct,int,ref,byref):bool
        -132 : System.Private.CoreLib.dasm - Number:TryParseDouble(struct,int,ref,byref):bool
        -124 : System.Private.CoreLib.dasm - Enum:InternalFlagsFormat(ref,ref,long):ref
        -119 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullAppCompatMatch(struct):bool:this
        -119 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullDHMSMatch(struct):bool:this
        -119 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullHMSFMatch(struct):bool:this
        -108 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:IntroSort(ref,int,int,int) (18 methods)
        -104 : System.Private.CoreLib.dasm - DefaultBinder:BindToMethod(int,ref,byref,ref,ref,ref,byref):ref:this
         -96 : System.Private.CoreLib.dasm - TimeSpanRawInfo:PartialAppCompatMatch(struct):bool:this
         -96 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullDHMMatch(struct):bool:this
         -96 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullHMSMatch(struct):bool:this
         -88 : System.Private.CoreLib.dasm - ConfiguredValueTaskAwaiter:OnCompleted(ref):this (4 methods)
         -87 : System.Private.CoreLib.dasm - Memory`1:CopyTo(struct):this (3 methods)
         -87 : System.Private.CoreLib.dasm - ReadOnlyMemory`1:CopyTo(struct):this (3 methods)

1868 total methods with size differences (1412 improved, 456 regressed), 15336 unchanged.

In the largest non-generic EventPipePayloadDecoder:DecodePayload(byref,struct):ref improvement changes look like

-     lea      ecx, [rdi-8]
-     mov      edi, ecx
-     lea      rcx, bword ptr [rsi+8]
-     mov      rsi, rcx
-     mov      rcx, rsi
-     mov      rsi, rcx
+     add      edi, -8
+     add      rsi, 8

Which is similar to the peephole optimize #21959 but for better reasons? /cc @AndyAyersMS

Top regression CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms

Starts with a reduction, which has a knock on repeated increase from an argument reversal?

-      xorps    xmm1, xmm1
+      xorps    xmm7, xmm7
-      movaps   xmm7, xmm0
-      addsd    xmm7, xmm1
+      addsd    xmm7, xmm0

; repeated change
       mulsd    xmm0, qword ptr [reloc @RWD112]
-      addsd    xmm7, xmm0
+      addsd    xmm0, xmm7
+      movaps   xmm7, xmm0
       movaps   xmm0, xmm6
       mulsd    xmm0, qword ptr [reloc @RWD120]

More interesting to me are async and span, so I had a look at the regressions there

AsyncTaskMethodBuilder`1:GetStateMachineBox(byref):ref:this just seems a change in register for 1 byte (x 26 methods)

-      mov      dword ptr [rax+52], 0xD1FFAB1E
       mov      r14, rax
+      mov      dword ptr [r14+52], 0xD1FFAB1E

Had a look at the regression in ReadOnlySpan`1:op_Implicit(struct):struct, which doesn't seem significant

lea -> add + mov

G_M62951_IG03:
       xor      r9, r9
-      xor      edx, edx
+      xor      r10d, r10d
       jmp      SHORT G_M62951_IG06
G_M62951_IG04:
       mov      r9d, r8d
       mov      r10d, edx
       add      r9, r10
       mov      r10d, dword ptr [rax+8]
       mov      r10d, r10d
       cmp      r9, r10
       ja       SHORT G_M62951_IG09
G_M62951_IG05:
       add      rax, 16
-      movsxd   r8, r8d
-      lea      r9, bword ptr [r8+rax]
+      movsxd   r9, r8d
+      add      r9, rax
+      mov      r10d, edx
G_M62951_IG06:
       mov      bword ptr [rcx], r9
-      mov      dword ptr [rcx+8], edx
+      mov      dword ptr [rcx+8], r10d
       mov      rax, rcx

The ReadOnlySpan`1:get_Item(struct):struct:this Range indexer decides to use r14 instead of rax so gains a register (except at end for return value where it uses rax)

+        push     r14

+        pop      r14
       call     [ReadOnlySpan`1:Slice(int,int):struct:this]
       mov      rax, rdi

- ; Total bytes of code 119, prolog size 13 for method ReadOnlySpan`1:get_Item(struct):struct:this
+ ; Total bytes of code 126, prolog size 15 for method ReadOnlySpan`1:get_Item(struct):struct:this

And improvements

AsyncMethodBuilderCore:Start(byref) loses a register

-; Lcl frame size = 64
+; Lcl frame size = 72
G_M39858_IG01:
        push     rbp
-       push     rdi
        push     rsi
-       sub      rsp, 64
+       sub      rsp, 72

...

G_M39858_IG08:
-      lea      rsp, [rbp-10H]
+      lea      rsp, [rbp-08H]
       pop      rsi
-      pop      rdi
       pop      rbp
       ret      

OTOH MemoryStream:ReadAsync(ref,int,int,struct):ref:this gains a register

       push     rbp
+      push     r14
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 48

ConfiguredValueTaskAwaiter:....AwaitUnsafeOnCompleted(ref):this which was one of the things I was looking at in https://github.com/dotnet/coreclr/issues/18542 improves nicely

-        test     rax, rax
+        mov      rcx, rax
+        test     rcx, rcx

-        mov      rcx, rax

-        mov      rcx, gword ptr [rax+0488H]
+        mov      rdx, gword ptr [rax+0488H]
-        movsx    rdx, word  ptr [rsi+8]
+        movsx    r9, word  ptr [rsi+8]
+        mov      rcx, rbx
+        mov      r8, rdi

-        mov      r8, rdi
-        mov      r9d, edx

-        mov      gword ptr [rsp+28H], rcx

-        mov      gword ptr [rsp+28H], rcx
-        mov      r8, rdi
-        mov      r9d, ed

-        mov      rcx, rbx
-        mov      rdx, gword ptr [rsp+28H]

- ; Total bytes of code 191, prolog size 7 for method ConfiguredValueTaskAwaiter:System.Runtime.CompilerServices.IStateMachineBoxAwareAwaiter.AwaitUnsafeOnCompleted(ref):this
- ; Total bytes of code 167, prolog size 7 for method ConfiguredValueTaskAwaiter:System.Runtime.CompilerServices.IStateMachineBoxAwareAwaiter.AwaitUnsafeOnCompleted(ref):this

Adding "Peephole optimize redundant and shuffle moves" #21959 on top still gives further improvements (only looking at System.Private.CoreLib.dasm)

Summary:
(Note: Lower is better)

Total bytes of diff: -27 (0.00% of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
         -27 : System.Private.CoreLib.dasm (0.00% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements by size (bytes):
          -6 : System.Private.CoreLib.dasm - DateTimeParse:Lex(int,byref,byref,byref,byref,byref,int):bool
          -3 : System.Private.CoreLib.dasm - DefaultBinder:SelectMethod(int,ref,ref,ref):ref:this
          -3 : System.Private.CoreLib.dasm - Enumerator:System.Collections.IEnumerator.get_Current():ref:this (116 methods)
          -3 : System.Private.CoreLib.dasm - Number:NumberToStringFormat(byref,byref,struct,ref)
          -3 : System.Private.CoreLib.dasm - ReaderWriterLockSlim:TryEnterWriteLockCore(struct):bool:this
          -3 : System.Private.CoreLib.dasm - CustomAttributeTypedArgument:ToString(bool):ref:this
          -3 : System.Private.CoreLib.dasm - DateTimeFormatInfo:get_DecimalSeparator():ref:this
          -3 : System.Private.CoreLib.dasm - Enumerator:System.Collections.IDictionaryEnumerator.get_Entry():struct:this (28 methods)

8 total methods with size differences (8 improved, 0 regressed), 17196 unchanged.

@filipnavara
Copy link
Member

Any news about this one? It eliminates lot of the ping-pong copies I happen to see quite often.

@redknightlois
Copy link

@CarolEidt I am working on a few optimizations suffering from this. Is there any ETA for this PR to enter into daily build?

@CarolEidt
Copy link
Author

Sorry for the delays - my priorities are elsewhere at the moment. The previous perf runs showed some unfortunate impacts on Span benchmarks, which I believe I've addressed but I need to make sure I've got those changes pushed, and re-run the perf tests. I'll attempt to get that done ASAP.

Instead of selecting a single relatedInterval for preferencing,
follow the chain of relatedIntervals (preferenced intervals).

Change when preferences are propagated to the relatedInterval;
do it in allocateRegisters, so that we can update it when we see a last use.

Also tune when and how intervals are preferenced, including allowing multiple
uses on an RMW node to have the target as their preference.

Fixes #11463
Contributes to #16359
@CarolEidt
Copy link
Author

test Windows_NT x64 perf
test Windows_NT x86 perf

@CarolEidt CarolEidt changed the title [WIP] Propagate preferences Propagate preferences Jan 31, 2019
@CarolEidt
Copy link
Author

@dotnet/jit-contrib PTAL
I was hoping to get another round of perf tests on this, but I propose that we merge this now.
These kinds of heuristic tunings are never 100% positive, and the impact overall is good. There are still a few extra spills in the Span/Indexer benchmark on x86 that I believe accounted for a slowdown in that one on a previous perf run.
Overall, here are the PMI diffs for frameworks + benchmarks:
X64 Windows:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies, benchstones and benchmarks game in F:\git\coreclr\bin\tests\Windows_NT.x64.Checked for  default jit
Summary:
(Lower is better)
Total bytes of diff: -122460 (-0.31% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          17 : Benchstones\BenchF\InvMt\InvMt\InvMt.dasm (0.84% of base)
           9 : Benchstones\BenchF\Simpsn\Simpsn\Simpsn.dasm (1.10% of base)
           6 : System.Security.Principal.Windows.dasm (0.01% of base)
           5 : BenchmarksGame\fasta\fasta-1\fasta-1.dasm (0.04% of base)
           5 : Benchstones\BenchF\InProd\InProd\InProd.dasm (0.19% of base)
Top file improvements by size (bytes):
      -24895 : System.Private.CoreLib.dasm (-0.55% of base)
      -17045 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.30% of base)
      -11630 : Microsoft.CodeAnalysis.CSharp.dasm (-0.27% of base)
       -6128 : Microsoft.CodeAnalysis.dasm (-0.38% of base)
       -5449 : System.Private.Xml.dasm (-0.15% of base)
172 total files with size differences (158 improved, 14 regressed), 39 unchanged.
Top method regressions by size (bytes):
         217 ( 4.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.XsltOld.HtmlElementProps:CreatePropsTable():ref
         184 ( 6.95% of base) : System.Private.CoreLib.dasm - System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double
         136 ( 1.06% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this
          99 ( 4.94% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.LocalRewriter:MakeConversion(ref,ref,ref,ubyte,ref,bool,bool,bool,bool,ref,ref):ref:this
          91 (11.58% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Syntax.PropertyStatementSyntax:.ctor(ushort,ref,ref,ref,ref,ref,ref,ref,ref,ref,ref):this
Top method improvements by size (bytes):
        -489 (-3.86% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.BoundTreeVisitor`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:VisitInternal(ref,struct):long:this
        -307 (-3.92% of base) : System.Memory.dasm - System.Buffers.BuffersExtensions:CopyToMultiSegment(byref,struct) (5 methods)
        -272 (-3.08% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlUntypedConverter:ChangeType(ref,ref,ref):ref:this (2 methods)
        -258 (-4.35% of base) : NuGet.Frameworks.dasm - NuGet.Frameworks.DefaultPortableFrameworkMappings:get_ProfileFrameworks():ref:this
        -257 (-2.75% of base) : System.Linq.Expressions.dasm - System.Linq.Expressions.Interpreter.CallInstruction:FastCreate(ref,ref):ref (11 methods)
Top method regressions by size (percentage):
          13 (48.15% of base) : System.Private.Xml.dasm - System.Xml.Serialization.ReflectionXmlSerializationReader:ProcessUnknownNode(ref):this
          13 (43.33% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.MethodCompiler:PassesFilter(ref,ref):bool
          13 (43.33% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.MethodCompiler:PassesFilter(ref,ref):bool
           9 (40.91% of base) : System.Private.CoreLib.dasm - System.Threading.SynchronizationContext:Send(ref,ref):this
           9 (40.91% of base) : xunit.execution.dotnet.dasm - Xunit.Sdk.MaxConcurrencySyncContext:Send(ref,ref):this
Top method improvements by size (percentage):
         -16 (-36.36% of base) : Newtonsoft.Json.dasm - Newtonsoft.Json.JsonSerializerSettings:get_ReferenceResolver():ref:this
         -16 (-36.36% of base) : System.ComponentModel.Annotations.dasm - System.ComponentModel.DataAnnotations.ValidationContext:GetService(ref):ref:this
         -16 (-36.36% of base) : System.Threading.Tasks.Dataflow.dasm - System.Threading.Tasks.Dataflow.WriteOnceBlock`1[Int32][System.Int32]:CloneItem(int):int:this
         -16 (-35.56% of base) : Microsoft.CodeAnalysis.dasm - <>c__DisplayClass41_0:<CreateAndAttachToCompilation>b__0(ref,ref,ref):this
         -16 (-35.56% of base) : System.Threading.Tasks.Dataflow.dasm - System.Threading.Tasks.Dataflow.WriteOnceBlock`1[__Canon][System.__Canon]:CloneItem(ref):ref:this
26427 total methods with size differences (20861 improved, 5566 regressed), 201144 unchanged.

X86 windows:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies, benchstones and benchmarks game in F:\git\coreclr\bin\tests\Windows_NT.x86.Checked for  default jit
Summary:
(Lower is better)
Total bytes of diff: -51538 (-0.17% of base)
    diff is an improvement.
Top file regressions by size (bytes):
        1293 : CommandLine.dasm (0.47% of base)
         472 : Microsoft.DotNet.ProjectModel.dasm (0.30% of base)
         401 : NuGet.ProjectModel.dasm (0.61% of base)
         360 : System.CommandLine.dasm (0.69% of base)
         313 : xunit.execution.dotnet.dasm (0.20% of base)
Top file improvements by size (bytes):
      -11011 : System.Private.CoreLib.dasm (-0.29% of base)
       -8082 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.18% of base)
       -6949 : Microsoft.CodeAnalysis.CSharp.dasm (-0.20% of base)
       -2777 : Microsoft.CodeAnalysis.dasm (-0.22% of base)
       -2603 : System.Memory.dasm (-1.23% of base)
181 total files with size differences (142 improved, 39 regressed), 30 unchanged.
Top method regressions by size (bytes):
         334 (13.46% of base) : CommandLine.dasm - <>c:<EnforceRequired>b__2_0(ref):ref:this
         314 ( 4.58% of base) : System.Memory.dasm - System.Buffers.BuffersExtensions:CopyToMultiSegment(byref,struct) (5 methods)
         267 (11.73% of base) : CommandLine.dasm - CommandLine.UnParserExtensions:FormatCommandLine(ref,int,ref):ref
         267 (11.69% of base) : CommandLine.dasm - CommandLine.UnParserExtensions:FormatCommandLine(ref,double,ref):ref
         267 (11.67% of base) : CommandLine.dasm - CommandLine.UnParserExtensions:FormatCommandLine(ref,struct,ref):ref
Top method improvements by size (bytes):
        -252 (-5.37% of base) : System.Private.Uri.dasm - System.Uri:CheckAuthorityHelper(int,ushort,ushort,byref,byref,ref,byref):ushort:this
        -201 (-3.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Binder:ReportOverloadResolutionFailureAndProduceBoundNode(ref,int,ref,struct,ref,struct,struct,ref,ref,ref,ref,bool,ref,ref,ref):ref:this
        -181 (-14.26% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.Ecma335.NamespaceCache:PopulateTableWithExportedTypes(ref):this
        -168 (-15.01% of base) : System.Linq.Expressions.dasm - System.Linq.Expressions.Interpreter.NotEqualInstruction:Create(ref,bool):ref
        -160 (-0.98% of base) : System.Text.RegularExpressions.dasm - System.Text.RegularExpressions.RegexInterpreter:Go():this (2 methods)
Top method regressions by size (percentage):
           5 (38.46% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.CompileTimeCalculations:UncheckedCULng(int):long
           5 (38.46% of base) : System.Private.CoreLib.dasm - System.IntPtr:op_Explicit(int):long
           6 (25.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.MethodCompiler:PassesFilter(ref,ref):bool
           6 (25.00% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.MethodCompiler:PassesFilter(ref,ref):bool
           5 (25.00% of base) : System.Private.CoreLib.dasm - System.Threading.SynchronizationContext:Send(ref,ref):this
Top method improvements by size (percentage):
         -14 (-46.67% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceActivity:GuidToLongId(byref):long
          -9 (-45.00% of base) : System.Data.Common.dasm - RBTreeEnumerator[Int64][System.Int64]:get_Current():long:this
          -9 (-45.00% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.Pair`2[__Canon,Int64][System.__Canon,System.Int64]:get_Second():long:this
          -9 (-45.00% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.Pair`2[Int64,Int64][System.Int64,System.Int64]:get_First():long:this
          -9 (-45.00% of base) : System.Private.CoreLib.dasm - Buf12:get_Low64():long:this
24064 total methods with size differences (17811 improved, 6253 regressed), 203520 unchanged.

@CarolEidt
Copy link
Author

Once I've got a couple of approvals on this, I plan to run a bunch of jitStressRegs legs.

@michellemcdaniel
Copy link

@CarolEidt perf failure is being tracked by #22179

@CarolEidt
Copy link
Author

@adiaaida - thanks; I'm proposing we skip the perf tests and go ahead and merge this once reviewed.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. I want to go over it once more.

src/jit/lsra.h Outdated
{
return this;
}
// if (nextRefPosition->lastUse)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this if should be removed or uncommented... ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the if (and the dead code below it) and added a comment.
Once this is merged, I'll file a new issue to further tune this - specifically in this area and also wrt the spills in Span.IndexerBench.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked things over again... still looks good. Take a look at the one comment from before.

@CarolEidt
Copy link
Author

test Windows_NT x64 Checked Innerloop Build and Test

@CarolEidt CarolEidt merged commit 0f8b7e8 into dotnet:master Feb 2, 2019
@CarolEidt CarolEidt deleted the Issue18542 branch February 11, 2019 23:17
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Propagate preferences

Instead of selecting a single relatedInterval for preferencing,
follow the chain of relatedIntervals (preferenced intervals).

Change when preferences are propagated to the relatedInterval;
do it in allocateRegisters, so that we can update it when we see a last use.

Also tune when and how intervals are preferenced, including allowing multiple
uses on an RMW node to have the target as their preference.

Fixes dotnet/coreclr#11463
Contributes to dotnet/coreclr#16359

Commit migrated from dotnet/coreclr@0f8b7e8
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants