Propagate preferences #19429

CarolEidt · 2018-08-11T00:01:25Z

Instead of selecting a single relatedInterval for preferencing,
follow the chain of relatedIntervals (preferenced intervals).

Change when preferences are propagated to the relatedInterval;
do it in allocateRegisters, so that we can update it when we see a last use.

Also tune when and how intervals are preferenced, including allowing multiple
uses on an RMW node to have the target as their preference.

Fixes #11463
Contributes to #16359

CarolEidt · 2018-08-20T18:11:46Z

@dotnet-bot test Windows_NT x64 perf

CarolEidt · 2018-08-21T02:01:03Z

@dotnet-bot test Windows_NT x64 throughput
@dotnet-bot test Windows_NT x86 throughput
@dotnet-bot test linux perf flow

CarolEidt · 2018-09-10T19:09:56Z

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

CarolEidt · 2018-10-16T20:02:07Z

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

benaadams · 2018-10-31T00:24:21Z

This would resolve the issue with wrappers in https://github.com/dotnet/coreclr/issues/18542? (The pointer sized ones)

CarolEidt · 2018-11-01T05:07:05Z

This would resolve the issue with wrappers in #18542? (The pointer sized ones)

It improves them; it eliminates the ping-pong copies (five copies removed from the ViaStruct2 and subsequent cases), but there's still unnecessary spill. Unfortunately, as with all register allocation changes, there are also regressions. I've analyzed the significant ones and they are basically just the fallout from changed heuristics. However, this "back-burner" work, and I've not been able to get a successful perf run, which I'd like to do before getting this reviewed and merged.

CarolEidt · 2018-11-02T19:13:52Z

@dotnet-bot test Windows_NT x64 Checked CoreFX Tests

CarolEidt · 2018-11-03T00:14:47Z

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

CarolEidt · 2018-12-20T01:34:52Z

@dotnet-bot test Windows_NT x64 perf
@dotnet-bot test Windows_NT x86 perf
@dotnet-bot test linux perf flow

mikedn · 2018-12-20T07:02:15Z

I've yet to download your changes and see the diffs but FWIW cases like ViaStruct5 can be improved by better handling of struct copies (I guess that qualifies as 1st class structs improvements, it looks like I'm going to do just that in my quest to get rid of GT_ASG). Not to say that LSRA improvements aren't good but it looks like at least in cases ViaStruct5 the problem is actually elsewhere in the JIT.

CarolEidt · 2018-12-20T17:49:16Z

test OSX10.12 x64 Checked Innerloop Build and Test
test Ubuntu x64 Checked CoreFX Tests
test Windows_NT x86 Release Innerloop Build and Test

CarolEidt · 2018-12-21T05:46:51Z

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test
@dotnet-bot test Ubuntu x64 Checked CoreFX Tests

CarolEidt · 2018-12-21T15:38:30Z

test OSX10.12 x64 Checked Innerloop Build and Test
test Ubuntu x64 Checked CoreFX Tests

test Windows_NT x64 perf
test Windows_NT x86 perf

CarolEidt · 2019-01-09T23:37:29Z

test Windows_NT x64 perf
test Windows_NT x86 perf

CarolEidt · 2019-01-12T00:48:47Z

@dotnet-bot test Windows_NT x64 perf

benaadams · 2019-01-15T16:46:04Z

Curiosity got the better of me....

Looking at System.Private.CoreLib.dasm

Total bytes of diff: -14524 (-0.45% of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
      -14524 : System.Private.CoreLib.dasm (-0.45% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method regessions by size (bytes):
         138 : System.Private.CoreLib.dasm - CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double
          84 : System.Private.CoreLib.dasm - ValueTuple`8:Equals(ref):bool:this (17 methods)
          59 : System.Private.CoreLib.dasm - PropertyValue:GetFactory(ref):ref
          51 : System.Private.CoreLib.dasm - KeyCollection:CopyTo(ref,int):this (30 methods)
          49 : System.Private.CoreLib.dasm - ArraySortHelper`1:SwapIfGreater(ref,ref,int,int) (23 methods)
          45 : System.Private.CoreLib.dasm - ValueTuple`8:Equals(struct):bool:this (17 methods)
          44 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:PickPivotAndPartition(ref,int,int):int (18 methods)
          42 : System.Private.CoreLib.dasm - ValueCollection:CopyTo(ref,int):this (30 methods)
          41 : System.Private.CoreLib.dasm - DecCalc:DecAddSub(byref,byref,bool)
          36 : System.Private.CoreLib.dasm - Type:FindMembers(int,int,ref,ref):ref:this
          33 : System.Private.CoreLib.dasm - ValueTuple`8:CompareTo(struct):int:this (17 methods)
          32 : System.Private.CoreLib.dasm - Array:Resize(byref,int) (4 methods)
          32 : System.Private.CoreLib.dasm - ReaderWriterLockSlim:TryEnterReadLockCore(struct):bool:this
          26 : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatDateTimeO(struct,struct,struct,byref):bool
          26 : System.Private.CoreLib.dasm - AsyncTaskMethodBuilder`1:GetStateMachineBox(byref):ref:this (26 methods)
          24 : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatDateTimeG(struct,struct,struct,byref):bool
          23 : System.Private.CoreLib.dasm - ArraySortHelper`1:PickPivotAndPartition(ref,int,int,ref):int (23 methods)
          22 : System.Private.CoreLib.dasm - UnicodeEncoding:GetCharCount(long,int,ref):int:this
          21 : System.Private.CoreLib.dasm - DecCalc:Unscale(byref,byref,byref)
          21 : System.Private.CoreLib.dasm - EventProvider:WriteEvent(byref,long,long,long,ref):bool:this
          20 : System.Private.CoreLib.dasm - DateTimeFormat:TryFormatO(struct,struct,struct,byref):bool
          19 : System.Private.CoreLib.dasm - MemoryStream:ReadAsync(ref,int,int,struct):ref:this
          18 : System.Private.CoreLib.dasm - DateTimeParse:ProcessTerminalState(int,byref,byref,byref,byref,ref):bool
          18 : System.Private.CoreLib.dasm - Marvin:ComputeHash32(byref,int,int,int):int
          18 : System.Private.CoreLib.dasm - ReadOnlySpan`1:op_Implicit(struct):struct (5 methods)

Top method improvements by size (bytes):
       -1215 : System.Private.CoreLib.dasm - ValueTuple`8:GetHashCodeCore(ref):int:this (17 methods)
       -1097 : System.Private.CoreLib.dasm - ValueTuple`8:GetHashCode():int:this (17 methods)
        -379 : System.Private.CoreLib.dasm - List`1:RemoveAll(ref):int:this (23 methods)
        -253 : System.Private.CoreLib.dasm - List`1:InsertRange(int,ref):this (23 methods)
        -229 : System.Private.CoreLib.dasm - TextInfo:ChangeCaseCommon(ref):ref:this (3 methods)
        -223 : System.Private.CoreLib.dasm - ValueTuple`8:System.IValueTupleInternal.ToStringEnd():ref:this (17 methods)
        -209 : System.Private.CoreLib.dasm - EventPipePayloadDecoder:DecodePayload(byref,struct):ref
        -175 : System.Private.CoreLib.dasm - AsyncMethodBuilderCore:Start(byref) (25 methods)
        -168 : System.Private.CoreLib.dasm - Dictionary`2:GetObjectData(ref,struct):this (28 methods)
        -138 : System.Private.CoreLib.dasm - ArraySortHelper`1:IntroSort(ref,int,int,int,ref) (23 methods)
        -135 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullMatch(struct):bool:this
        -133 : System.Private.CoreLib.dasm - Number:TryParseSingle(struct,int,ref,byref):bool
        -132 : System.Private.CoreLib.dasm - Number:TryParseDouble(struct,int,ref,byref):bool
        -124 : System.Private.CoreLib.dasm - Enum:InternalFlagsFormat(ref,ref,long):ref
        -119 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullAppCompatMatch(struct):bool:this
        -119 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullDHMSMatch(struct):bool:this
        -119 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullHMSFMatch(struct):bool:this
        -108 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:IntroSort(ref,int,int,int) (18 methods)
        -104 : System.Private.CoreLib.dasm - DefaultBinder:BindToMethod(int,ref,byref,ref,ref,ref,byref):ref:this
         -96 : System.Private.CoreLib.dasm - TimeSpanRawInfo:PartialAppCompatMatch(struct):bool:this
         -96 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullDHMMatch(struct):bool:this
         -96 : System.Private.CoreLib.dasm - TimeSpanRawInfo:FullHMSMatch(struct):bool:this
         -88 : System.Private.CoreLib.dasm - ConfiguredValueTaskAwaiter:OnCompleted(ref):this (4 methods)
         -87 : System.Private.CoreLib.dasm - Memory`1:CopyTo(struct):this (3 methods)
         -87 : System.Private.CoreLib.dasm - ReadOnlyMemory`1:CopyTo(struct):this (3 methods)

1868 total methods with size differences (1412 improved, 456 regressed), 15336 unchanged.

In the largest non-generic EventPipePayloadDecoder:DecodePayload(byref,struct):ref improvement changes look like

-     lea      ecx, [rdi-8]
-     mov      edi, ecx
-     lea      rcx, bword ptr [rsi+8]
-     mov      rsi, rcx
-     mov      rcx, rsi
-     mov      rsi, rcx
+     add      edi, -8
+     add      rsi, 8

Which is similar to the peephole optimize #21959 but for better reasons? /cc @AndyAyersMS

Top regression CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms

Starts with a reduction, which has a knock on repeated increase from an argument reversal?

-      xorps    xmm1, xmm1
+      xorps    xmm7, xmm7
-      movaps   xmm7, xmm0
-      addsd    xmm7, xmm1
+      addsd    xmm7, xmm0

; repeated change
       mulsd    xmm0, qword ptr [reloc @RWD112]
-      addsd    xmm7, xmm0
+      addsd    xmm0, xmm7
+      movaps   xmm7, xmm0
       movaps   xmm0, xmm6
       mulsd    xmm0, qword ptr [reloc @RWD120]

More interesting to me are async and span, so I had a look at the regressions there

AsyncTaskMethodBuilder`1:GetStateMachineBox(byref):ref:this just seems a change in register for 1 byte (x 26 methods)

-      mov      dword ptr [rax+52], 0xD1FFAB1E
       mov      r14, rax
+      mov      dword ptr [r14+52], 0xD1FFAB1E

Had a look at the regression in ReadOnlySpan`1:op_Implicit(struct):struct, which doesn't seem significant

lea -> add + mov

G_M62951_IG03:
       xor      r9, r9
-      xor      edx, edx
+      xor      r10d, r10d
       jmp      SHORT G_M62951_IG06
G_M62951_IG04:
       mov      r9d, r8d
       mov      r10d, edx
       add      r9, r10
       mov      r10d, dword ptr [rax+8]
       mov      r10d, r10d
       cmp      r9, r10
       ja       SHORT G_M62951_IG09
G_M62951_IG05:
       add      rax, 16
-      movsxd   r8, r8d
-      lea      r9, bword ptr [r8+rax]
+      movsxd   r9, r8d
+      add      r9, rax
+      mov      r10d, edx
G_M62951_IG06:
       mov      bword ptr [rcx], r9
-      mov      dword ptr [rcx+8], edx
+      mov      dword ptr [rcx+8], r10d
       mov      rax, rcx

The ReadOnlySpan`1:get_Item(struct):struct:this Range indexer decides to use r14 instead of rax so gains a register (except at end for return value where it uses rax)

+        push     r14

+        pop      r14
       call     [ReadOnlySpan`1:Slice(int,int):struct:this]
       mov      rax, rdi

- ; Total bytes of code 119, prolog size 13 for method ReadOnlySpan`1:get_Item(struct):struct:this
+ ; Total bytes of code 126, prolog size 15 for method ReadOnlySpan`1:get_Item(struct):struct:this

And improvements

AsyncMethodBuilderCore:Start(byref) loses a register

-; Lcl frame size = 64
+; Lcl frame size = 72
G_M39858_IG01:
        push     rbp
-       push     rdi
        push     rsi
-       sub      rsp, 64
+       sub      rsp, 72

...

G_M39858_IG08:
-      lea      rsp, [rbp-10H]
+      lea      rsp, [rbp-08H]
       pop      rsi
-      pop      rdi
       pop      rbp
       ret

OTOH MemoryStream:ReadAsync(ref,int,int,struct):ref:this gains a register

       push     rbp
+      push     r14
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 48

ConfiguredValueTaskAwaiter:....AwaitUnsafeOnCompleted(ref):this which was one of the things I was looking at in https://github.com/dotnet/coreclr/issues/18542 improves nicely

-        test     rax, rax
+        mov      rcx, rax
+        test     rcx, rcx

-        mov      rcx, rax

-        mov      rcx, gword ptr [rax+0488H]
+        mov      rdx, gword ptr [rax+0488H]
-        movsx    rdx, word  ptr [rsi+8]
+        movsx    r9, word  ptr [rsi+8]
+        mov      rcx, rbx
+        mov      r8, rdi

-        mov      r8, rdi
-        mov      r9d, edx

-        mov      gword ptr [rsp+28H], rcx

-        mov      gword ptr [rsp+28H], rcx
-        mov      r8, rdi
-        mov      r9d, ed

-        mov      rcx, rbx
-        mov      rdx, gword ptr [rsp+28H]

- ; Total bytes of code 191, prolog size 7 for method ConfiguredValueTaskAwaiter:System.Runtime.CompilerServices.IStateMachineBoxAwareAwaiter.AwaitUnsafeOnCompleted(ref):this
- ; Total bytes of code 167, prolog size 7 for method ConfiguredValueTaskAwaiter:System.Runtime.CompilerServices.IStateMachineBoxAwareAwaiter.AwaitUnsafeOnCompleted(ref):this

Adding "Peephole optimize redundant and shuffle moves" #21959 on top still gives further improvements (only looking at System.Private.CoreLib.dasm)

Summary:
(Note: Lower is better)

Total bytes of diff: -27 (0.00% of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file improvements by size (bytes):
         -27 : System.Private.CoreLib.dasm (0.00% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements by size (bytes):
          -6 : System.Private.CoreLib.dasm - DateTimeParse:Lex(int,byref,byref,byref,byref,byref,int):bool
          -3 : System.Private.CoreLib.dasm - DefaultBinder:SelectMethod(int,ref,ref,ref):ref:this
          -3 : System.Private.CoreLib.dasm - Enumerator:System.Collections.IEnumerator.get_Current():ref:this (116 methods)
          -3 : System.Private.CoreLib.dasm - Number:NumberToStringFormat(byref,byref,struct,ref)
          -3 : System.Private.CoreLib.dasm - ReaderWriterLockSlim:TryEnterWriteLockCore(struct):bool:this
          -3 : System.Private.CoreLib.dasm - CustomAttributeTypedArgument:ToString(bool):ref:this
          -3 : System.Private.CoreLib.dasm - DateTimeFormatInfo:get_DecimalSeparator():ref:this
          -3 : System.Private.CoreLib.dasm - Enumerator:System.Collections.IDictionaryEnumerator.get_Entry():struct:this (28 methods)

8 total methods with size differences (8 improved, 0 regressed), 17196 unchanged.

filipnavara · 2019-01-25T10:17:41Z

Any news about this one? It eliminates lot of the ping-pong copies I happen to see quite often.

redknightlois · 2019-01-30T14:56:57Z

@CarolEidt I am working on a few optimizations suffering from this. Is there any ETA for this PR to enter into daily build?

CarolEidt · 2019-01-30T18:06:04Z

Sorry for the delays - my priorities are elsewhere at the moment. The previous perf runs showed some unfortunate impacts on Span benchmarks, which I believe I've addressed but I need to make sure I've got those changes pushed, and re-run the perf tests. I'll attempt to get that done ASAP.

Instead of selecting a single relatedInterval for preferencing, follow the chain of relatedIntervals (preferenced intervals). Change when preferences are propagated to the relatedInterval; do it in allocateRegisters, so that we can update it when we see a last use. Also tune when and how intervals are preferenced, including allowing multiple uses on an RMW node to have the target as their preference. Fixes #11463 Contributes to #16359

CarolEidt · 2019-01-30T19:14:57Z

test Windows_NT x64 perf
test Windows_NT x86 perf

CarolEidt · 2019-01-31T16:54:21Z

@dotnet/jit-contrib PTAL
I was hoping to get another round of perf tests on this, but I propose that we merge this now.
These kinds of heuristic tunings are never 100% positive, and the impact overall is good. There are still a few extra spills in the Span/Indexer benchmark on x86 that I believe accounted for a slowdown in that one on a previous perf run.
Overall, here are the PMI diffs for frameworks + benchmarks:
X64 Windows:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies, benchstones and benchmarks game in F:\git\coreclr\bin\tests\Windows_NT.x64.Checked for  default jit
Summary:
(Lower is better)
Total bytes of diff: -122460 (-0.31% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          17 : Benchstones\BenchF\InvMt\InvMt\InvMt.dasm (0.84% of base)
           9 : Benchstones\BenchF\Simpsn\Simpsn\Simpsn.dasm (1.10% of base)
           6 : System.Security.Principal.Windows.dasm (0.01% of base)
           5 : BenchmarksGame\fasta\fasta-1\fasta-1.dasm (0.04% of base)
           5 : Benchstones\BenchF\InProd\InProd\InProd.dasm (0.19% of base)
Top file improvements by size (bytes):
      -24895 : System.Private.CoreLib.dasm (-0.55% of base)
      -17045 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.30% of base)
      -11630 : Microsoft.CodeAnalysis.CSharp.dasm (-0.27% of base)
       -6128 : Microsoft.CodeAnalysis.dasm (-0.38% of base)
       -5449 : System.Private.Xml.dasm (-0.15% of base)
172 total files with size differences (158 improved, 14 regressed), 39 unchanged.
Top method regressions by size (bytes):
         217 ( 4.01% of base) : System.Private.Xml.dasm - System.Xml.Xsl.XsltOld.HtmlElementProps:CreatePropsTable():ref
         184 ( 6.95% of base) : System.Private.CoreLib.dasm - System.Globalization.CalendricalCalculationsHelper:SumLongSequenceOfPeriodicTerms(double):double
         136 ( 1.06% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this
          99 ( 4.94% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.LocalRewriter:MakeConversion(ref,ref,ref,ubyte,ref,bool,bool,bool,bool,ref,ref):ref:this
          91 (11.58% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Syntax.PropertyStatementSyntax:.ctor(ushort,ref,ref,ref,ref,ref,ref,ref,ref,ref,ref):this
Top method improvements by size (bytes):
        -489 (-3.86% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.BoundTreeVisitor`2[Vector`1,Int64][System.Numerics.Vector`1[System.Single],System.Int64]:VisitInternal(ref,struct):long:this
        -307 (-3.92% of base) : System.Memory.dasm - System.Buffers.BuffersExtensions:CopyToMultiSegment(byref,struct) (5 methods)
        -272 (-3.08% of base) : System.Private.Xml.dasm - System.Xml.Schema.XmlUntypedConverter:ChangeType(ref,ref,ref):ref:this (2 methods)
        -258 (-4.35% of base) : NuGet.Frameworks.dasm - NuGet.Frameworks.DefaultPortableFrameworkMappings:get_ProfileFrameworks():ref:this
        -257 (-2.75% of base) : System.Linq.Expressions.dasm - System.Linq.Expressions.Interpreter.CallInstruction:FastCreate(ref,ref):ref (11 methods)
Top method regressions by size (percentage):
          13 (48.15% of base) : System.Private.Xml.dasm - System.Xml.Serialization.ReflectionXmlSerializationReader:ProcessUnknownNode(ref):this
          13 (43.33% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.MethodCompiler:PassesFilter(ref,ref):bool
          13 (43.33% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.MethodCompiler:PassesFilter(ref,ref):bool
           9 (40.91% of base) : System.Private.CoreLib.dasm - System.Threading.SynchronizationContext:Send(ref,ref):this
           9 (40.91% of base) : xunit.execution.dotnet.dasm - Xunit.Sdk.MaxConcurrencySyncContext:Send(ref,ref):this
Top method improvements by size (percentage):
         -16 (-36.36% of base) : Newtonsoft.Json.dasm - Newtonsoft.Json.JsonSerializerSettings:get_ReferenceResolver():ref:this
         -16 (-36.36% of base) : System.ComponentModel.Annotations.dasm - System.ComponentModel.DataAnnotations.ValidationContext:GetService(ref):ref:this
         -16 (-36.36% of base) : System.Threading.Tasks.Dataflow.dasm - System.Threading.Tasks.Dataflow.WriteOnceBlock`1[Int32][System.Int32]:CloneItem(int):int:this
         -16 (-35.56% of base) : Microsoft.CodeAnalysis.dasm - <>c__DisplayClass41_0:<CreateAndAttachToCompilation>b__0(ref,ref,ref):this
         -16 (-35.56% of base) : System.Threading.Tasks.Dataflow.dasm - System.Threading.Tasks.Dataflow.WriteOnceBlock`1[__Canon][System.__Canon]:CloneItem(ref):ref:this
26427 total methods with size differences (20861 improved, 5566 regressed), 201144 unchanged.

X86 windows:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies, benchstones and benchmarks game in F:\git\coreclr\bin\tests\Windows_NT.x86.Checked for  default jit
Summary:
(Lower is better)
Total bytes of diff: -51538 (-0.17% of base)
    diff is an improvement.
Top file regressions by size (bytes):
        1293 : CommandLine.dasm (0.47% of base)
         472 : Microsoft.DotNet.ProjectModel.dasm (0.30% of base)
         401 : NuGet.ProjectModel.dasm (0.61% of base)
         360 : System.CommandLine.dasm (0.69% of base)
         313 : xunit.execution.dotnet.dasm (0.20% of base)
Top file improvements by size (bytes):
      -11011 : System.Private.CoreLib.dasm (-0.29% of base)
       -8082 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.18% of base)
       -6949 : Microsoft.CodeAnalysis.CSharp.dasm (-0.20% of base)
       -2777 : Microsoft.CodeAnalysis.dasm (-0.22% of base)
       -2603 : System.Memory.dasm (-1.23% of base)
181 total files with size differences (142 improved, 39 regressed), 30 unchanged.
Top method regressions by size (bytes):
         334 (13.46% of base) : CommandLine.dasm - <>c:<EnforceRequired>b__2_0(ref):ref:this
         314 ( 4.58% of base) : System.Memory.dasm - System.Buffers.BuffersExtensions:CopyToMultiSegment(byref,struct) (5 methods)
         267 (11.73% of base) : CommandLine.dasm - CommandLine.UnParserExtensions:FormatCommandLine(ref,int,ref):ref
         267 (11.69% of base) : CommandLine.dasm - CommandLine.UnParserExtensions:FormatCommandLine(ref,double,ref):ref
         267 (11.67% of base) : CommandLine.dasm - CommandLine.UnParserExtensions:FormatCommandLine(ref,struct,ref):ref
Top method improvements by size (bytes):
        -252 (-5.37% of base) : System.Private.Uri.dasm - System.Uri:CheckAuthorityHelper(int,ushort,ushort,byref,byref,ref,byref):ushort:this
        -201 (-3.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Binder:ReportOverloadResolutionFailureAndProduceBoundNode(ref,int,ref,struct,ref,struct,struct,ref,ref,ref,ref,bool,ref,ref,ref):ref:this
        -181 (-14.26% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.Ecma335.NamespaceCache:PopulateTableWithExportedTypes(ref):this
        -168 (-15.01% of base) : System.Linq.Expressions.dasm - System.Linq.Expressions.Interpreter.NotEqualInstruction:Create(ref,bool):ref
        -160 (-0.98% of base) : System.Text.RegularExpressions.dasm - System.Text.RegularExpressions.RegexInterpreter:Go():this (2 methods)
Top method regressions by size (percentage):
           5 (38.46% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.CompileTimeCalculations:UncheckedCULng(int):long
           5 (38.46% of base) : System.Private.CoreLib.dasm - System.IntPtr:op_Explicit(int):long
           6 (25.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.MethodCompiler:PassesFilter(ref,ref):bool
           6 (25.00% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.MethodCompiler:PassesFilter(ref,ref):bool
           5 (25.00% of base) : System.Private.CoreLib.dasm - System.Threading.SynchronizationContext:Send(ref,ref):this
Top method improvements by size (percentage):
         -14 (-46.67% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceActivity:GuidToLongId(byref):long
          -9 (-45.00% of base) : System.Data.Common.dasm - RBTreeEnumerator[Int64][System.Int64]:get_Current():long:this
          -9 (-45.00% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.Pair`2[__Canon,Int64][System.__Canon,System.Int64]:get_Second():long:this
          -9 (-45.00% of base) : System.Linq.Parallel.dasm - System.Linq.Parallel.Pair`2[Int64,Int64][System.Int64,System.Int64]:get_First():long:this
          -9 (-45.00% of base) : System.Private.CoreLib.dasm - Buf12:get_Low64():long:this
24064 total methods with size differences (17811 improved, 6253 regressed), 203520 unchanged.

CarolEidt · 2019-01-31T16:55:07Z

Once I've got a couple of approvals on this, I plan to run a bunch of jitStressRegs legs.

michellemcdaniel · 2019-01-31T17:35:34Z

@CarolEidt perf failure is being tracked by #22179

CarolEidt · 2019-01-31T17:39:49Z

@adiaaida - thanks; I'm proposing we skip the perf tests and go ahead and merge this once reviewed.

AndyAyersMS

Looks good so far. I want to go over it once more.

AndyAyersMS · 2019-01-31T18:12:58Z

src/jit/lsra.h

+        {
+            return this;
+        }
+        // if (nextRefPosition->lastUse)


Should this if should be removed or uncommented... ?

I removed the if (and the dead code below it) and added a comment.
Once this is merged, I'll file a new issue to further tune this - specifically in this area and also wrt the spills in Span.IndexerBench.

AndyAyersMS

Looked things over again... still looks good. Take a look at the one comment from before.

CarolEidt · 2019-02-01T23:36:39Z

test Windows_NT x64 Checked Innerloop Build and Test

* Propagate preferences Instead of selecting a single relatedInterval for preferencing, follow the chain of relatedIntervals (preferenced intervals). Change when preferences are propagated to the relatedInterval; do it in allocateRegisters, so that we can update it when we see a last use. Also tune when and how intervals are preferenced, including allowing multiple uses on an RMW node to have the target as their preference. Fixes dotnet/coreclr#11463 Contributes to dotnet/coreclr#16359 Commit migrated from dotnet/coreclr@0f8b7e8

CarolEidt force-pushed the Issue18542 branch from fef89cb to 03773a6 Compare September 10, 2018 19:04

CarolEidt force-pushed the Issue18542 branch from 46ee703 to b979f09 Compare October 16, 2018 19:50

CarolEidt force-pushed the Issue18542 branch from b979f09 to 96193a3 Compare November 1, 2018 15:38

CarolEidt force-pushed the Issue18542 branch from 96193a3 to 58cd07b Compare November 2, 2018 20:24

CarolEidt force-pushed the Issue18542 branch from 58cd07b to 5a42b08 Compare December 20, 2018 01:31

CarolEidt force-pushed the Issue18542 branch from 5a42b08 to 8bc2e70 Compare January 2, 2019 16:47

CarolEidt force-pushed the Issue18542 branch from 8bc2e70 to 31ce62f Compare January 9, 2019 23:05

AndyAyersMS mentioned this pull request Jan 29, 2019

WIP: Peephole optimize redundant and shuffle moves #21959

Closed

CarolEidt force-pushed the Issue18542 branch from 31ce62f to a562f61 Compare January 30, 2019 19:13

CarolEidt changed the title ~~[WIP] Propagate preferences~~ Propagate preferences Jan 31, 2019

AndyAyersMS reviewed Jan 31, 2019

View reviewed changes

AndyAyersMS approved these changes Feb 1, 2019

View reviewed changes

CarolEidt added 2 commits February 1, 2019 11:39

Remove commented-out code & add comment.

d188173

Formatting

d8f450c

CarolEidt merged commit 0f8b7e8 into dotnet:master Feb 2, 2019

CarolEidt deleted the Issue18542 branch February 11, 2019 23:17

benaadams mentioned this pull request Jan 31, 2020

Inlined Struct wrapper .ctors are non-zero cost dotnet/runtime#10540

Closed

CarolEidt mentioned this pull request Jan 31, 2020

[LSRA] Further tuning of preferencing is needed dotnet/runtime#11959

Closed

This was referenced Nov 14, 2020

FMA codegen dotnet/runtime#11260

Closed

[RyuJit] Stackspace reservation is for pre-optimized code dotnet/runtime#7279

Closed

Propagate preferences #19429

Propagate preferences #19429

Uh oh!

Conversation

CarolEidt commented Aug 11, 2018

Uh oh!

CarolEidt commented Aug 20, 2018

Uh oh!

CarolEidt commented Aug 21, 2018

Uh oh!

CarolEidt commented Sep 10, 2018

Uh oh!

CarolEidt commented Oct 16, 2018

Uh oh!

benaadams commented Oct 31, 2018

Uh oh!

CarolEidt commented Nov 1, 2018

Uh oh!

CarolEidt commented Nov 2, 2018

Uh oh!

CarolEidt commented Nov 3, 2018

Uh oh!

CarolEidt commented Dec 20, 2018

Uh oh!

mikedn commented Dec 20, 2018

Uh oh!

CarolEidt commented Dec 20, 2018

Uh oh!

CarolEidt commented Dec 21, 2018

Uh oh!

CarolEidt commented Dec 21, 2018

Uh oh!

CarolEidt commented Jan 9, 2019

Uh oh!

CarolEidt commented Jan 12, 2019

Uh oh!

benaadams commented Jan 15, 2019

Uh oh!

filipnavara commented Jan 25, 2019

Uh oh!

redknightlois commented Jan 30, 2019

Uh oh!

CarolEidt commented Jan 30, 2019

Uh oh!

CarolEidt commented Jan 30, 2019

Uh oh!

CarolEidt commented Jan 31, 2019

Uh oh!

CarolEidt commented Jan 31, 2019

Uh oh!

michellemcdaniel commented Jan 31, 2019

Uh oh!

CarolEidt commented Jan 31, 2019

Uh oh!

AndyAyersMS left a comment

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS Jan 31, 2019

Choose a reason for hiding this comment

Uh oh!

CarolEidt Feb 1, 2019

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS left a comment

Choose a reason for hiding this comment

Uh oh!

CarolEidt commented Feb 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants