Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@GrabYourPitchforks
Copy link
Member

This takes advantage of Roslyn's new ref reassignment feature to speed up the inner loop in Array.Reverse. Testing shows that for large arrays the reversal time is cut by approx. 20% from the baseline.

Old loop codegen:

00007ff8`8311fc2c 4c63c0          movsxd  r8,eax
00007ff8`8311fc2f 468b0c81        mov     r9d,dword ptr [rcx+r8*4]
00007ff8`8311fc33 4e8d0481        lea     r8,[rcx+r8*4]
00007ff8`8311fc37 4c63d2          movsxd  r10,edx
00007ff8`8311fc3a 468b1c91        mov     r11d,dword ptr [rcx+r10*4]
00007ff8`8311fc3e 458918          mov     dword ptr [r8],r11d
00007ff8`8311fc41 46890c91        mov     dword ptr [rcx+r10*4],r9d
00007ff8`8311fc45 ffc0            inc     eax
00007ff8`8311fc47 ffca            dec     edx
00007ff8`8311fc49 3bc2            cmp     eax,edx
00007ff8`8311fc4b 7cdf            jl      ... (00007ff8`8311fc2c)

New loop codegen:

00007ff8`831238e2 8b11            mov     edx,dword ptr [rcx]
00007ff8`831238e4 448b00          mov     r8d,dword ptr [rax]
00007ff8`831238e7 448901          mov     dword ptr [rcx],r8d
00007ff8`831238ea 8910            mov     dword ptr [rax],edx
00007ff8`831238ec 4883c104        add     rcx,4
00007ff8`831238f0 4883c0fc        add     rax,0FFFFFFFFFFFFFFFCh
00007ff8`831238f4 483bc8          cmp     rcx,rax
00007ff8`831238f7 72e9            jb      ... (00007ff8`831238e2)

The codegen samples above are for reversing an int[]. I saw similar performance gains for other types. Referential types like object[] don't get as large a benefit due to the card table checks, but there is still some speedup.

I also took the opportunity to collapse the precondition checks using the recommended patterns for (index, count) parameters.

@jkotas
Copy link
Member

jkotas commented May 5, 2018

MemoryExtensions.Reverse can use the same treatment.

@GrabYourPitchforks
Copy link
Member Author

@jkotas Do you see any issue with having Array.Reverse<T> call into MemoryExtensions.Reverse<T>, as proposed in the latest commit? You've mentioned in the past that there are some weird codegen considerations, but not sure if those would apply here.

Update Array.Reverse to call it
@GrabYourPitchforks
Copy link
Member Author

Quick question to reviewers who are more familiar with the JIT and GC:

I played around with two different patterns here. The first, Unsafe.Add(ref first, length - 1), generates this bytecode:

dec length
movsxd length64, length
lea last, [first + length * stride]

The second, Unsafe.Add(ref Unsafe.Add(ref first, length), -1), generates this bytecode:

movsxd length64, length
lea last, [first + length64 * stride - stride]

Is it possible in theory for the JIT to generate something like the below in the second case?

movsxd length64, length
lea last, [first + length64 * stride]
add last, -stride

And if it does generate this, is it possible that immediately after the lea instruction but before the add, first and last technically point to different objects, which could allow the GC to relocate first but not last, causing the final add instruction to get last to point to garbage?

FYI, the reason I asked the above question is that this same pattern of "get a ref to just past the end of the array then back up" also appears in Buffer.Memmove(ref byte, ref byte, ...). I don't recall that particular code pattern being mentioned in the review, so not sure if it just slipped under the radar during review or if it's actually safe.

@jkotas
Copy link
Member

jkotas commented May 6, 2018

Do you see any issue with having Array.Reverse call into MemoryExtensions.Reverse

It would make Array.Reverse<T> slower, in particular where T is reference type or shared generic type.

Is it possible in theory for the JIT to generate something like the below in the second case?

Yes, it is possible. It is not just theory. It is pretty likely that the JIT will generate something like this on non-Intel architectures that have less rich addressing modes.

first and last technically point to different objects

byref that points right past the end of the an array is treated by the garbage collector as if it was pointing into the array.

From ECMA-335 spec: Managed pointers can point to ... the address where an element just past the end of an array would be stored (for pointer indexes into managed arrays.

We take advantage of this in number of places.

ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
Reverse(array, 0, array.Length);

// The Span ctor might perform an unnecessary type check if T is a reference type,
Copy link
Member

@jkotas jkotas May 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type check would be a correctness problem for co-variant arrays. Avoiding it is not just a performance optimization; it is required for correctness.

ThrowHelper.ThrowLengthArgumentOutOfRange_ArgumentOutOfRange_NeedNonNegNum();
if (array.Length - index < length)
if ((uint)index > (uint)array.Length)
ThrowHelper.ThrowArgumentOutOfRange_IndexException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change exception messages. The same argument validation is in non-generic Array.Reverse - we should keep the two in sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? MSDN explicitly documents the specific exception types thrown by the non-generic method, and I'd rather not make changes to the parameter validation for these older methods. At least with the generic method the particular exception types aren't explicitly documented, so we have a little more leeway to go between ArgumentException and ArgumentOutOfRangeException in order to simplify the validation logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not realized that this is changing exception types too. Have you gone through the exercise on how breaking is that?

Copy link
Member Author

@GrabYourPitchforks GrabYourPitchforks May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code previously threw ArgumentOutOfRangeException if length was negative and ArgumentException if length was too large. Now it throws ArgumentException in both cases.

This shouldn't break anybody, and in the past when I've participated in full framework shiproom we wouldn't have considered this a breaking change. (The reason it wouldn't be considered a breaking change is that we're not introducing a code path that now throws where previously it didn't, and we're still throwing an exception type that would have been called out in the documentation were docs to exist.)

If you have specific guidelines I should go against send them along and I'll take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The breaking change rules are documented in https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/breaking-change-rules.md#exceptions . Your change is throwing less specific exception in some cases. This is not listed as allowed change.

Also, note you typically pick up the new generic overloads silently with recompilation. It makes the behavior divergence between the non-generic and generic overloads more questionable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the probability of breaking anybody here is low. However, we are not doing paper cut breaks unless the benefit is significant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, reverted.

@jkotas
Copy link
Member

jkotas commented May 6, 2018

@dotnet-bot test Windows_NT x64 Checked corefx_baseline

i++;
j--;
}
MemoryMarshal.CreateSpan(ref Unsafe.Add(ref Unsafe.As<byte, T>(ref array.GetRawSzArrayData()), index), length).Reverse();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to measure how much slower is Array.Reverse going to for small arrays.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented May 6, 2018

@jkotas Thanks for the advice so far! I'll get a new iteration out to address your feedback and collect some firmer numbers.

One more idea - we could go even further and optimize various T widths. For example, if we're running as a 64-bit process but T has width 32 bits, we know that T can't contain references, so we can do something like this. (pseudo-code, not tested against edge cases)

// assume first, last are "ref ulong"
do {
    ulong temp = read_unaligned(ref last);
    write_unaligned(ref last) = ror64(read_unaligned(ref first), 32);
    write_unaligned(ref first) = ror64(temp, 32);
    first = ref Unsafe.Add(ref first, 1);
    last = ref Unsafe.Add(ref last, -1);
} while (Unsafe.IsAddressLessThanOrEqualTo(first, last));

Obviously this would add complexity to the Reverse method, but it would significantly speed up reversal for certain T.

So the question: is this added complexity actually worth it? Do we have evidence that - say - trying to reverse an int[] is a common enough operation that it deserved to be special-cased?

Edit: Reversing a byte[] is a surprisingly common operation, especially in crypto. So maybe some limited optimization does make sense in the end. But opinions from people with experience are always good!

@gfoidl
Copy link
Member

gfoidl commented May 7, 2018

to reverse an int[] is a common enough operation

For my day-to-day work it is quite common. It's about int-counters that are sorted ascending. In some parts of the algorithms the reverse order is useful, so Array.Reverse them.

Although I believe byte[] will be more common, so special casing byte can be a good point. int is nice-to-have for me.

@stephentoub
Copy link
Member

@dotnet-bot test Linux-musl x64 Debug Build please
@dotnet-bot test Windows_NT x64 Checked Build and Test (Jit - CoreFx) please

@jkotas
Copy link
Member

jkotas commented May 7, 2018

Is ArrayReverse_Object the overload that takes just array; or the overload that takes the indices as well?

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented May 7, 2018

@jkotas It's the overload that takes only the T[], but that method internally calls the overload that takes indices. These benchmark numbers are for the first iteration of the PR before any of the MemoryExtensions.Reverse<T> work was performed.

Additionally fix integer underflow in non-generic Array.Reverse
@GrabYourPitchforks
Copy link
Member Author

Unit tests are failing because of the changes to parameter validation. The PR dotnet/corefx#29559 addresses the failing unit tests.

@jkotas
Copy link
Member

jkotas commented May 8, 2018

@dotnet-bot test Windows_NT x64 Checked corefx_baseline

@jkotas
Copy link
Member

jkotas commented May 8, 2018

@dotnet-bot test Windows_NT x64 Checked Build and Test (Jit - CoreFx) please

@GrabYourPitchforks
Copy link
Member Author

I fixed most of the error / stddev issues with my benchmark runner.

Old (System.Private.CoreLib as it is today)

Method Size Mean Error StdDev
ArrayReverse_Byte 0 2.469 ns 0.0243 ns 0.0215 ns
ArrayReverse_Int 0 2.252 ns 0.0871 ns 0.1276 ns
ArrayReverse_Object 0 2.574 ns 0.0238 ns 0.0172 ns
ArrayReverse_Byte 1 2.629 ns 0.0272 ns 0.0241 ns
ArrayReverse_Int 1 2.264 ns 0.0122 ns 0.0095 ns
ArrayReverse_Object 1 3.004 ns 0.0620 ns 0.0550 ns
ArrayReverse_Byte 4 3.189 ns 0.0819 ns 0.0726 ns
ArrayReverse_Int 4 3.119 ns 0.0401 ns 0.0335 ns
ArrayReverse_Object 4 11.957 ns 0.0855 ns 0.0668 ns
ArrayReverse_Byte 16 8.438 ns 0.1104 ns 0.0730 ns
ArrayReverse_Int 16 8.469 ns 0.1493 ns 0.1166 ns
ArrayReverse_Object 16 42.609 ns 0.5579 ns 0.4034 ns
ArrayReverse_Byte 128 50.724 ns 0.9750 ns 0.8643 ns
ArrayReverse_Int 128 50.025 ns 0.2537 ns 0.2119 ns
ArrayReverse_Object 128 331.447 ns 5.9050 ns 5.5235 ns
ArrayReverse_Byte 1024 382.151 ns 7.6200 ns 9.0711 ns
ArrayReverse_Int 1024 394.005 ns 2.6319 ns 2.0548 ns
ArrayReverse_Object 1024 2,576.236 ns 14.3080 ns 11.9478 ns

New (latest iteration of the PR)

Method Size Mean Error StdDev
ArrayReverse_Byte 0 1.933 ns 0.0780 ns 0.2262 ns
ArrayReverse_Int 0 1.994 ns 0.0721 ns 0.0801 ns
ArrayReverse_Object 0 2.864 ns 0.0885 ns 0.0739 ns
ArrayReverse_Byte 1 1.710 ns 0.0594 ns 0.0527 ns
ArrayReverse_Int 1 1.901 ns 0.0698 ns 0.0545 ns
ArrayReverse_Object 1 2.725 ns 0.0854 ns 0.1403 ns
ArrayReverse_Byte 4 2.919 ns 0.0295 ns 0.0230 ns
ArrayReverse_Int 4 3.316 ns 0.0983 ns 0.0872 ns
ArrayReverse_Object 4 12.309 ns 0.0822 ns 0.0641 ns
ArrayReverse_Byte 16 7.793 ns 0.0577 ns 0.0418 ns
ArrayReverse_Int 16 7.392 ns 0.1829 ns 0.2033 ns
ArrayReverse_Object 16 39.336 ns 0.1710 ns 0.1428 ns
ArrayReverse_Byte 128 38.962 ns 0.2893 ns 0.2565 ns
ArrayReverse_Int 128 39.529 ns 0.5673 ns 0.5306 ns
ArrayReverse_Object 128 292.475 ns 5.7809 ns 7.5168 ns
ArrayReverse_Byte 1024 300.793 ns 0.9178 ns 0.6637 ns
ArrayReverse_Int 1024 301.494 ns 1.0527 ns 0.8791 ns
ArrayReverse_Object 1024 2,330.756 ns 45.4055 ns 55.7620 ns

@jkotas
Copy link
Member

jkotas commented May 8, 2018

@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@jkotas jkotas merged commit ba4225e into dotnet:master May 8, 2018
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request May 8, 2018
* Speed up Array.Reverse by using ref reassignment

* Optimize MemoryExtensions.Reverse

Signed-off-by: dotnet-bot <[email protected]>
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request May 8, 2018
* Speed up Array.Reverse by using ref reassignment

* Optimize MemoryExtensions.Reverse

Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
jkotas pushed a commit to dotnet/corert that referenced this pull request May 8, 2018
* Speed up Array.Reverse by using ref reassignment

* Optimize MemoryExtensions.Reverse

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit to dotnet/corefx that referenced this pull request May 8, 2018
* Speed up Array.Reverse by using ref reassignment

* Optimize MemoryExtensions.Reverse

Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
@GrabYourPitchforks GrabYourPitchforks deleted the array_reverse branch March 11, 2019 06:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants