-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Disable nop padding in R2R images and remove jumpstamps from the runtime #26740
Conversation
|
This change needs |
|
@jkotas I don't need to bump |
496aea2 to
0f37e82
Compare
Right. |
|
@jkotas @noahfalk I was running in to failures where we would have an R2R image with methods that had precodes baked in but not native code slots. When tests disabled tiered compilation and rejit (not a lot, but there are a few) initial jitting would fail for these methods because we expect if there is a precode we also have a native code slot. My latest commit e20fac2 changes it so if we generate a precode in a saved MethodDesc we also generate a saved native code slot. I haven't been able to convince myself that this is always the right thing to do, but it does make the tests pass. I'd appreciate any input on this. |
I wasn't aware that some MethodDescs are baked into R2R images, I thought that was only done for NGen and was avoided in R2R to have better versionability, though it would be reasonable for CoreLib methods. What type of methods are these?
I don't see an issue with adding a native code slot other than some size increase. I guess it would depend on what information is available at the time, if it's possible to check the preconditions that would satisfy versionability excluding run-time components then it could be limited to only those cases. My understanding is that there are many types of methods that get precodes and are not versionable (stubs for instance), so not sure about the extent of the issue. |
Correct. MethodDescs are never baked into R2R images, not even for CoreLib. |
Co-Authored-By: Koundinya Veluri <[email protected]>
This reverts commit a60bb59.
Co-Authored-By: Koundinya Veluri <[email protected]>
kouvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM so far
noahfalk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far, given the draft status I wasn't sure if any significant change was still to come.
|
|
||
| #ifdef FEATURE_CODE_VERSIONING | ||
| if (IsVersionableWithoutJumpStamp()) | ||
| if (IsVersionable()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsVersionableWithoutJumpStamp() was previously a single bit check and now IsVersionable() aggregates a variety of different checks. I think @kouvel had the bit cached because the checks were being called frequently enough that the expense of the uncached check became noticeable. When you profile the new code you and @kouvel may want to keep an eye if the expense of IsVersionable() merits any change in the caching strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for calls to IsVersionableWith*() functions. Now for methods that are not eligible for tiered compilation it would also evaluate IsEligibleForReJIT(). I think ReJitManager::IsReJITEnabled() could be made faster by caching the configured value for CLRConfig::EXTERNAL_ProfAPI_RejitOnAttach in EEConfig as done for other config vars. That would turn IsEligibleForReJIT() into just a few checks on memory up to the point where IsEligibleForReJIT() returns false, which hopefully won't be noticeable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be possible to turn IsReJITEnabled() into a single check. IsEligibleForTieredCompilation() is called directly in many places, so it may still be beneficial to continue having the cached bit represent that instead of IsVersionable().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually looks like EXTERNAL_ProfAPI_RejitOnAttach is already cached in a static local variable, so it may be ok already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of the checks in ReJitManager::IsReJITEnabled are cached in local statics, so I don't believe there should be a perf concern here. Doing the perf testing will prove that right or wrong.
Co-Authored-By: Noah Falk <[email protected]>
Co-Authored-By: Noah Falk <[email protected]>
|
I just removed the draft status, I was originally intending to do some more work and hopefully remove the runtime suspension during a ReJIT, but I think this change is big enough and complicated enough on its own. Filed #26790 to track the additional work. |
| // rejitted code, or to reinstate original code on a revert). | ||
| else if ((g_pConfig->GenOptimizeType() != OPT_SIZE) || | ||
| pMD->IsVersionableWithJumpStamp()) | ||
| else if ((g_pConfig->GenOptimizeType() != OPT_SIZE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will save 4 bytes of code per method in crossgen images.
And never emitting the variable sized nops in the method prolog will likely save another one or two bytes per method.
It would be interesting to get the codesize of the larger crossgen images before and after your change,
| (pMDMethod->GetMethodType() == METHOD_TYPE_NORMAL || pMDMethod->GetMethodType() == METHOD_TYPE_INSTANTIATED)) | ||
|
|
||
| #ifdef FEATURE_REJIT | ||
| || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of all these compound && and || clauses.
My preference is to write code that checks one or two things then returns true or false.
And repeat.
So:
if (!g_pConfig->TieredCompilation())
{
return false;
}
// Policy - If QuickJit is disabled and the module does not have any pregenerated code, the method would
// be ineligible for tiering currently to avoid some unnecessary overhead
if (!g_pConfig->TieredCompilation_QuickJit() && !GetModule()->HasNativeOrReadyToRunImage())
{
rerturn false;
}
|
I ran startup perf tests and see no regressions (details below). At this point I have responded to all feedback and this is in a state where I feel confident checking it in. If there are any outstanding issues, please let me know so I can address them. Here is the size on disk change (if you want to see other assemblies, let me know) All perf tests were run warm by running for 2 iterations and discarding the results before collecting the data below. The baseline here is master before my changes. The high level summary is I ran startup tests for WPF, ASP.Net and Powershell and all results were within the noise - no gains, but no losses either. Here is the timing for the WPF app Bill provided: I tested powershell using instructions from @kouvel, grabbing the latest powershell build and replacing the coreclr binaries with those built from master. The script is as follows (it measures the average time to start up powershell and exit): Here is the timings: To test ASP.NET startup I used MusicStore, although none of the automated ways to run it appear to be working so I had to manually build it. |
noahfalk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (always satisfying to see all the red)
briansull
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is still a work in progress, there are some failing tests and I still need to run perf tests on it, but I want to get eyes on it as early as possible.
This change does three things:
Remaining work before it's ready to merge: