Disable nop padding in R2R images and remove jumpstamps from the runtime #26740

davmason · 2019-09-16T22:59:23Z

This is still a work in progress, there are some failing tests and I still need to run perf tests on it, but I want to get eyes on it as early as possible.

This change does three things:

Remove nop padding from R2R images
Remove jumpstamps from the runtime and switch rejit to use exclusively precodes
Merge my changes with Kount's recent changes in code versioning

Remaining work before it's ready to merge:

Get all tests passing
Hopefully refactor ReJitManager::UpdateActiveILVersions so it no longer requires a runtime suspension (moved to a separte issue #26790)
Run perf tests on it to prove there are no startup regressions

src/zap/zapinfo.cpp

jkotas · 2019-09-17T02:14:55Z

This change needs READYTORUN_MAJOR_VERSION bump since it changes the contract of precompiled code. The existing runtimes are not able to run the new code anymore.

…rejit versioning.

…sion

davmason · 2019-09-17T07:19:52Z

@jkotas I don't need to bump MINIMUM_READYTORUN_MAJOR_VERSION though, right? Since the runtime can still use the images with nop padding in them, it's just old runtimes can't use the new ones.

jkotas · 2019-09-17T14:35:12Z

I don't need to bump MINIMUM_READYTORUN_MAJOR_VERSION though, right?

Right.

davmason · 2019-09-17T17:42:54Z

@jkotas @noahfalk I was running in to failures where we would have an R2R image with methods that had precodes baked in but not native code slots. When tests disabled tiered compilation and rejit (not a lot, but there are a few) initial jitting would fail for these methods because we expect if there is a precode we also have a native code slot. My latest commit e20fac2 changes it so if we generate a precode in a saved MethodDesc we also generate a saved native code slot.

I haven't been able to convince myself that this is always the right thing to do, but it does make the tests pass. I'd appreciate any input on this.

kouvel · 2019-09-18T19:45:31Z

failures where we would have an R2R image with methods that had precodes baked in but not native code slots

I wasn't aware that some MethodDescs are baked into R2R images, I thought that was only done for NGen and was avoided in R2R to have better versionability, though it would be reasonable for CoreLib methods. What type of methods are these?

if we generate a precode in a saved MethodDesc we also generate a saved native code slot

I don't see an issue with adding a native code slot other than some size increase. I guess it would depend on what information is available at the time, if it's possible to check the preconditions that would satisfy versionability excluding run-time components then it could be limited to only those cases. My understanding is that there are many types of methods that get precodes and are not versionable (stubs for instance), so not sure about the extent of the issue.

src/vm/codeman.cpp

src/vm/rejit.cpp

src/vm/codeversion.cpp

src/vm/prestub.cpp

src/vm/method.hpp

src/vm/method.cpp

src/vm/methodtablebuilder.cpp

jkotas · 2019-09-18T21:51:00Z

I wasn't aware that some MethodDescs are baked into R2R images, I thought that was only done for NGen and was avoided in R2R to have better versionability,

Correct. MethodDescs are never baked into R2R images, not even for CoreLib.

Co-Authored-By: Koundinya Veluri <[email protected]>

This reverts commit a60bb59.

Co-Authored-By: Koundinya Veluri <[email protected]>

kouvel

LGTM so far

noahfalk

Looks good so far, given the draft status I wasn't sure if any significant change was still to come.

src/vm/codeversion.cpp

src/vm/codeversion.h

src/vm/method.hpp

src/vm/methodtablebuilder.cpp

noahfalk · 2019-09-19T11:11:07Z

src/vm/prestub.cpp

-
 #ifdef FEATURE_CODE_VERSIONING
-    if (IsVersionableWithoutJumpStamp())
+    if (IsVersionable())


IsVersionableWithoutJumpStamp() was previously a single bit check and now IsVersionable() aggregates a variety of different checks. I think @kouvel had the bit cached because the checks were being called frequently enough that the expense of the uncached check became noticeable. When you profile the new code you and @kouvel may want to keep an eye if the expense of IsVersionable() merits any change in the caching strategy.

Also for calls to IsVersionableWith*() functions. Now for methods that are not eligible for tiered compilation it would also evaluate IsEligibleForReJIT(). I think ReJitManager::IsReJITEnabled() could be made faster by caching the configured value for CLRConfig::EXTERNAL_ProfAPI_RejitOnAttach in EEConfig as done for other config vars. That would turn IsEligibleForReJIT() into just a few checks on memory up to the point where IsEligibleForReJIT() returns false, which hopefully won't be noticeable.

May be possible to turn IsReJITEnabled() into a single check. IsEligibleForTieredCompilation() is called directly in many places, so it may still be beneficial to continue having the cached bit represent that instead of IsVersionable().

Actually looks like EXTERNAL_ProfAPI_RejitOnAttach is already cached in a static local variable, so it may be ok already

Both of the checks in ReJitManager::IsReJITEnabled are cached in local statics, so I don't believe there should be a perf concern here. Doing the perf testing will prove that right or wrong.

src/inc/readytorun.h

Co-Authored-By: Noah Falk <[email protected]>

davmason · 2019-09-19T19:31:36Z

I just removed the draft status, I was originally intending to do some more work and hopefully remove the runtime suspension during a ReJIT, but I think this change is big enough and complicated enough on its own. Filed #26790 to track the additional work.

briansull · 2019-09-19T20:37:34Z

src/vm/codeman.cpp

-    // rejitted code, or to reinstate original code on a revert).
-    else if ((g_pConfig->GenOptimizeType() != OPT_SIZE) ||
-        pMD->IsVersionableWithJumpStamp())
+    else if ((g_pConfig->GenOptimizeType() != OPT_SIZE))


This change will save 4 bytes of code per method in crossgen images.
And never emitting the variable sized nops in the method prolog will likely save another one or two bytes per method.
It would be interesting to get the codesize of the larger crossgen images before and after your change,

briansull · 2019-09-19T20:42:30Z

src/vm/methodtablebuilder.cpp

        (pMDMethod->GetMethodType() == METHOD_TYPE_NORMAL || pMDMethod->GetMethodType() == METHOD_TYPE_INSTANTIATED))
+
+#ifdef FEATURE_REJIT
+        ||


I am not a fan of all these compound && and || clauses.

My preference is to write code that checks one or two things then returns true or false.
And repeat.

So:

if (!g_pConfig->TieredCompilation()) { return false; } // Policy - If QuickJit is disabled and the module does not have any pregenerated code, the method would // be ineligible for tiering currently to avoid some unnecessary overhead if (!g_pConfig->TieredCompilation_QuickJit() && !GetModule()->HasNativeOrReadyToRunImage()) { rerturn false; }

davmason · 2019-09-27T07:38:48Z

I ran startup perf tests and see no regressions (details below). At this point I have responded to all feedback and this is in a state where I feel confident checking it in. If there are any outstanding issues, please let me know so I can address them.

cc @brianrob @billwert

Here is the size on disk change (if you want to see other assemblies, let me know)

System.Private.Corelib size
  Before:     9,296 KB on disk
  After :     9,262 KB on disk

Savings: 34 KB

All perf tests were run warm by running for 2 iterations and discarding the results before collecting the data below. The baseline here is master before my changes.

The high level summary is I ran startup tests for WPF, ASP.Net and Powershell and all results were within the noise - no gains, but no losses either.

Here is the timing for the WPF app Bill provided:

    Before:   
              601 msec
              582 msec
              620 msec
              587 msec
              575 msec
              Average: 593 msec
    After :   
              611 msec
              594 msec
              582 msec
              588 msec
              583 msec
              Average: 592 msec

I tested powershell using instructions from @kouvel, grabbing the latest powershell build and replacing the coreclr binaries with those built from master. The script is as follows (it measures the average time to start up powershell and exit):

$iterationCount = 32
$totalDuration = 0.0

1..$iterationCount | % {
  $duration = (Measure-Command { .\pwsh\pwsh.exe -noprofile -c exit }).TotalMilliseconds
  $duration.ToString("0.000")
  $totalDuration += $duration
}

"Average: " + ($totalDuration / $iterationCount).ToString("0.000")

Here is the timings:

Baseline powershell:
  309.215
  293.471
  295.329
  298.485
  290.840
  307.471
  302.760
  309.857
  302.411
  314.054
  302.502
  295.069
  332.554
  305.581
  291.474
  304.776
  299.794
  317.381
  293.785
  295.340
  302.556
  292.813
  305.382
  293.320
  292.424
  301.300
  293.092
  315.784
  292.458
  297.204
  292.774
  293.598
  Average: 301.089

With my changes:
  300.420
  294.193
  297.597
  290.947
  301.054
  313.396
  292.750
  299.602
  292.826
  305.875
  295.943
  293.836
  293.632
  292.098
  301.508
  293.552
  297.416
  296.621
  314.508
  295.243
  297.459
  325.361
  307.380
  303.980
  302.602
  290.661
  325.002
  294.576
  295.850
  292.906
  316.386
  294.502
  Average: 300.303

To test ASP.NET startup I used MusicStore, although none of the automated ways to run it appear to be working so I had to manually build it.


MusicStore baseline:
  D:\jumpstamp_binaries\musicstore_bin>CoreRun.exe MusicStore.dll
  Server started in 704ms

  Starting request to http://localhost:5000
  Response: OK
  Request took 442ms

  Cold start time (server start + first request time): 1146ms


  Batch 0: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 237.00ms
  Steadystate average response time: 3.45ms
  Steadystate median response time: 3.04ms
  Batch 1: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 5.00ms
  Steadystate average response time: 2.89ms
  Steadystate median response time: 2.78ms
  Batch 2: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 4.00ms
  Steadystate average response time: 2.87ms
  Steadystate median response time: 2.78ms
  Batch 3: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 144.00ms
  Steadystate average response time: 3.06ms
  Steadystate median response time: 2.83ms
  Batch 4: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 4.00ms
  Steadystate average response time: 2.86ms
  Steadystate median response time: 2.77ms

MusicStore with Changes:
  D:\jumpstamp_binaries\musicstore_bin>CoreRun.exe MusicStore.dll
  Server started in 707ms

  Starting request to http://localhost:5000
  Response: OK
  Request took 449ms

  Cold start time (server start + first request time): 1156ms


  Batch 0: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 37.00ms
  Steadystate average response time: 3.34ms
  Steadystate median response time: 3.09ms
  Batch 1: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 221.00ms
  Steadystate average response time: 3.11ms
  Steadystate median response time: 2.81ms
  Batch 2: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 4.00ms
  Steadystate average response time: 2.85ms
  Steadystate median response time: 2.74ms
  Batch 3: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 5.00ms
  Steadystate average response time: 2.83ms
  Steadystate median response time: 2.73ms
  Batch 4: running 1001 requests
  Steadystate min response time: 2.00ms
  Steadystate max response time: 163.00ms
  Steadystate average response time: 3.01ms
  Steadystate median response time: 2.77ms

noahfalk

LGTM (always satisfying to see all the red)

briansull

LGTM

davmason added the area-Diagnostics label Sep 16, 2019

davmason added this to the 5.0 milestone Sep 16, 2019

davmason requested review from briansull, kouvel and noahfalk September 16, 2019 22:59

davmason self-assigned this Sep 16, 2019

jkotas reviewed Sep 17, 2019

View reviewed changes

src/zap/zapinfo.cpp Show resolved Hide resolved

davmason added 3 commits September 17, 2019 00:13

Remove jumpstamp support from the runtime and switch to precodes for …

a046592

…rejit versioning.

more jumpstamp changes

9722665

Remove duplicate call to SetProfilerMayHaveActivatedNonDefaultCodeVer…

a60bb59

…sion

Review feedback and cleaning up todos

0f37e82

davmason force-pushed the jumpstamp_merge branch from 496aea2 to 0f37e82 Compare September 17, 2019 08:23

Always add a native code slot for a R2R method if it has a precode

e20fac2

kouvel reviewed Sep 18, 2019

View reviewed changes

davmason and others added 3 commits September 18, 2019 16:48

Apply suggestions from code review

4e4792f

Co-Authored-By: Koundinya Veluri <[email protected]>

Code review feedback

4a16b00

This reverts commit a60bb59.

Update src/vm/codeversion.cpp

f39eabb

Co-Authored-By: Koundinya Veluri <[email protected]>

kouvel reviewed Sep 19, 2019

View reviewed changes

noahfalk reviewed Sep 19, 2019

View reviewed changes

jkotas reviewed Sep 19, 2019

View reviewed changes

src/inc/readytorun.h Show resolved Hide resolved

davmason and others added 3 commits September 19, 2019 12:19

Update R2R version in crossgen2

4aaaf9f

Apply suggestions from code review

31aaa24

Co-Authored-By: Noah Falk <[email protected]>

Update src/vm/codeversion.cpp

cf00950

Co-Authored-By: Noah Falk <[email protected]>

davmason marked this pull request as ready for review September 19, 2019 19:28

code review feedback

1629b4a

briansull reviewed Sep 19, 2019

View reviewed changes

noahfalk approved these changes Sep 30, 2019

View reviewed changes

davmason merged commit 7fd7985 into dotnet:master Sep 30, 2019

davmason deleted the jumpstamp_merge branch September 30, 2019 22:05

briansull reviewed Sep 30, 2019

View reviewed changes

davmason mentioned this pull request Jan 31, 2020

Remove runtime suspension when updating active IL version in ReJIT dotnet/runtime#13442

Closed

This was referenced Feb 1, 2020

ARM JIT support for "jump stamps" for ReJIT dotnet/runtime#9194

Closed

Minor performance regression going to dotnet core 3.1 from .NET 4.8 dotnet/runtime#31613

Open

Disable nop padding in R2R images and remove jumpstamps from the runtime #26740

Disable nop padding in R2R images and remove jumpstamps from the runtime #26740

Uh oh!

Conversation

davmason commented Sep 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jkotas commented Sep 17, 2019

Uh oh!

davmason commented Sep 17, 2019

Uh oh!

jkotas commented Sep 17, 2019

Uh oh!

davmason commented Sep 17, 2019

Uh oh!

kouvel commented Sep 18, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkotas commented Sep 18, 2019

Uh oh!

kouvel left a comment

Choose a reason for hiding this comment

Uh oh!

noahfalk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noahfalk Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

kouvel Sep 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouvel Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

kouvel Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

davmason Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

davmason commented Sep 19, 2019

Uh oh!

briansull Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

briansull Sep 19, 2019

Choose a reason for hiding this comment

Uh oh!

davmason commented Sep 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noahfalk left a comment

Choose a reason for hiding this comment

Uh oh!

briansull left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

davmason commented Sep 16, 2019 •

edited

Loading

kouvel Sep 19, 2019 •

edited

Loading

davmason commented Sep 27, 2019 •

edited

Loading