Improve Enumerable.ToArray to avoid excessive copying for lazy enumerables #11208

jamesqo · 2016-08-28T13:50:41Z

The idea that instead of resizing the array (aka throwing it out and allocating a new one) and re-copying the data every time the sequence doesn't fit, we can instead keep the old array around, allocate a new one of size * 2, and just continue reading into the new array at index 0. This will require keeping around a list of arrays we've filled up so far, however the overhead of this list will be substantially less than the size of the data itself.

A visualization of what I'm talking about for a 300-length enumerable:

First array: [items 0-31]

List of arrays:
[0]: [items 32-63]
[1]: [items 64-127]
[2]: [items 128-255]

Current array: [items 256-299]

(I also put this in the comments to help future readers understand.)

Then when we need the final array to return from the method, we just calculate first.Length + list.Sum(a => a.Length) + indexWeFinishedReadingIntoInCurrentArray, pre-allocate an array of that size, and then copy each of the arrays into that array.

I opted to only use this algorithm for enumerables of > length 32, since for smaller sizes it will lead to increased fragmentation, and the overhead of the list allocation will probably be noticeable in comparison to, say, 20 elements. ~32 is also the threshold at around which Array.Copy starts getting faster than a for-loop.

Benchmark results

https://gist.github.com/jamesqo/399b72bf5de8e2cbd83d044836cbefa4 (includes results/source code)

You can see that the new implementation consistently has about 2/3 the gen0 collections as the old one.

The timings are somewhat inconsistent (disclaimer: I only did 100,000 iterations since the tests were taking way too long to run), but you can see that the new implementation is generally faster/the same speed as the old one. (I suspect it may be hard to measure the differences due to all of the interface invocations that are going on, in addition to the covariant array type checks if T is a reference type.)

cc @stephentoub @VSadov @JonHanna

jamesqo · 2016-09-04T22:06:36Z

@stephentoub can you PTAL?

I admit that the added logic is a bit complex, however this is a frequently-used path in cases like customers.Where(c => c.Id == 3).ToArray() and the memory improvements are substantial.

stephentoub · 2016-09-05T17:03:09Z

I want to think about this some more. It's not clear to me yet it's entirely a win, even if a microbenchmark for throughput and Gen0 GCs shows an improvement. It may be; I'm slightly concerned though that this will end up with more objects held onto for the duration of the operation (plus the few additional small objects that get allocated), etc. As you say, it's also a lot more code.

@VSadov, @justinvp, @jkotas, thoughts?

jkotas · 2016-09-05T17:19:57Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

We want the implementation to work well for both JITed and AOT environments. This trick does not work for AOT - you will pay for the extra code there.

This trick does not work for AOT - you will pay for the extra code there.

How much is the cost? Since the method is compiled in advance, there is no first-time overhead invoking the method like when it is jitted.

The cost is in binary size and compile time.

jkotas · 2016-09-05T17:44:27Z

however this is a frequently-used path in cases like customers.Where(c => c.Id == 3).ToArray()

What is the improvement for this case, or other common Linq use cases?

Linq has ToArray optimized in the layer above via IIListProvider interface. I am wondering how frequently would this optimization actually kicks in; and if there are common cases where it does not - whether it would be better to optimize those using IIListProvider instead.

jamesqo · 2016-09-05T19:47:01Z

What is the improvement for this case, or other common Linq use cases?

If the number of customers is greater than 32, for example 50, then we exchange the Customer[64] allocation at the end for a Customer[32] allocation. And we keep the Customer[32] which holds the first 32 customers around, instead of copying all of the data into a new array and then throwing it away.

Linq has ToArray optimized in the layer above via IIListProvider interface. I am wondering how frequently would this optimization actually kicks in

IIListProvider (to my understanding) represents an iterator we know the count of in advance, and we can make exactly 1 allocation for ToArray / ToList. For example, if we call customers.Select(c => Foo(c)) and we know that customers.Count == 38, we know the resulting enumerable will have exactly 38 elements and can allocate an array of exactly that size. I don't think this can be done with Where iterators.

VSadov · 2016-09-15T19:11:03Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

curious - why not new T[first.Length * 2] ?

I didn't want to be overly optimistic with the allocations- having first.Length * 2 (which would be 64 as this PR is currently implemented) would lead us to allocate more than 2N elements for an N-length enumerable in some cases. For example, 100 would lead us to allocate space for 224 elements.

Makes sense. In a typical doubling algorithm, you'd have 32 elements and double the size of the array to 64, making room for another 32. Then double again to 128, making room for another 64. Etc. Since we're allocating an array that won't include the initial set of elements, we don't want to do that initial doubling. So we allocate an additional array of length 32 for 32 elements (64 total), then an additional array of length 64 for 64 elements (128 total), etc.

I have an idea that should sometimes reduce the number of allocations/copying:
Allocate first.Length * 2 up front, but place the elements in the new array starting at first.Length. Once you reach the end of the new array, or the end of the iterator:

If there are more items, put them at the start of the new array (You will have to reorder them when you de-fragment the array list)

If there are no more items, place the elements from first in the new array, from 0 to first.Length

@NSIL That seems like it would increase allocations for e.g. lengths 33-64, since before we would allocate a T[32] and a T[32], and now we would allocate a T[32] and T[64]. What would be an example in which is decreases allocations/copying?

If there are between 32 and 64 elements, you can just return the new array.
Edit: I don't know if resizing down allocates. if it does, then it doesn't help at all

@NSIL Resizing down does allocate, unfortunately.

Never mind then :) I assumed it didn't, will check next time.

VSadov · 2016-09-15T19:26:41Z

I like the change. It seems that for n <= 32 it would have roughly the same behavior as before, while for large enumerables the transient bytes allocated and copying churn would be reduced by ~50%.

VSadov · 2016-09-15T19:26:52Z

LGTM

VSadov · 2016-09-15T19:32:10Z

@jamesqo - could you try benchmarking scenarios with T being a reference type (or a struct containing a reference type).
Copying references is more expensive because of GC fences, so I wonder how this change affects the throughput.

jamesqo · 2016-09-18T05:01:15Z

@VSadov Here are the results for some types which are/contain references. I additionally took a look in PerfView and memmoveGCRefs seems to be taking up less time than it was before.

stephentoub

I don't love the complexity this adds, but I can see it being worth it for larger enumerables. Thanks for working on it.

stephentoub · 2016-09-26T16:00:14Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

We should just use Array.Resize / Array.Copy and let them employ appropriate optimizations.

stephentoub · 2016-09-26T16:03:23Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

IEnumerator.MoveNext is spec'd to return false on subsequent calls after it's been called, but I'm concerned that not every IEnumerator implementation follows that, and this is introducing an additional MoveNext call after MoveNext may have returned false in certain cases. Let's avoid that.

If we reached here we have to had exited from the loop after calling get_Current. MoveNext is only called while index < buffer.Length, if they're == then we exited via break. (Perhaps I should add a comment clarifying this?)

A comment would be helpful. This is part of the complexity I was talking about.

stephentoub · 2016-09-26T16:04:09Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

Same comment as earlier. Please just use Array.Resize / Array.Copy.

stephentoub · 2016-09-26T16:04:35Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

We don't need this assert; this was just checked two lines earlier.

stephentoub · 2016-09-26T16:17:23Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

Makes sense. In a typical doubling algorithm, you'd have 32 elements and double the size of the array to 64, making room for another 32. Then double again to 128, making room for another 64. Etc. Since we're allocating an array that won't include the initial set of elements, we don't want to do that initial doubling. So we allocate an additional array of length 32 for 32 elements (64 total), then an additional array of length 64 for 64 elements (128 total), etc.

stephentoub · 2016-09-26T16:32:22Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

Should this be checked?

stephentoub · 2016-09-26T16:34:16Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

Rather than special-case first, should we just put first into buffers at the very beginning and then treat it like any other array in the list? Then you can delete this copy. The one-extra slot in buffer's underlying array of arrays should not matter.

It will cause a slight overhead for enumerables length 33-64 since adding it to the list will cause the List to do an allocation of one T[4][]. I wanted to defer that allocation as much as possible.

I wanted to defer that allocation as much as possible.

I see. Ok.

stephentoub · 2016-09-26T16:36:05Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

Nit: could just foreach, e.g.

foreach (T[] buffer in buffers) { Array.Copy(buffer, 0, result, copied, buffer.Length); copied += buffer.Length; }

(I also changed previous to buffer here, as I'm not sure what previous is referring to here as a name.)

That will be slower; I checked some time ago and get_Item and set_Item of List are both inlined.

I'm not suggesting this because of performance. Does it make a measurable difference here?

Hmm... no. I'll change it to use foreach, then.

stephentoub · 2016-09-26T16:38:03Z

src/Common/src/System/Collections/Generic/EnumerableHelpers.cs

Do we need to carry over any of the logic from https://github.com/dotnet/corefx/pull/11208/files#diff-be1bf738e84b4f261cc144a01fdb196cL63 ? If not, the local isn't necessary and this could just be:

current = new T[current.Length * 2];

@stephentoub No, since if the enumerable is between 2^30 and 2^31 - 1 elements we will not attempt to allocate an array over MaxArrayLength unlike the old algorithm. Will change.

stephentoub · 2016-09-28T20:47:01Z

Thanks for your efforts on this, @jamesqo.

…ables (dotnet/corefx#11208) * Improve Enumerable.ToArray implementation to avoid excessive copying Commit migrated from dotnet/corefx@c9c676e

jamesqo added 2 commits August 27, 2016 23:02

Improve Enumerable.ToArray implementation to avoid excessive copying

b0f0194

Remove nil entries from the comments

355f5b3

dnfclas added the cla-already-signed label Aug 28, 2016

dotnet-bot added the 3 - Ready For Review label Aug 28, 2016

jkotas reviewed Sep 5, 2016
View reviewed changes

stephentoub assigned VSadov Sep 12, 2016

karelz removed the 3 - Ready For Review label Sep 13, 2016

VSadov reviewed Sep 15, 2016

View reviewed changes

karelz added enhancement Product code improvement that does NOT require public API changes/additions area-System.Collections area-System.Linq and removed area-System.Collections labels Sep 26, 2016

stephentoub approved these changes Sep 26, 2016

View reviewed changes

jamesqo added 2 commits September 26, 2016 17:52

Respond to PR feedback

ae719b6

foreach over the List

cc97ccc

stephentoub merged commit c9c676e into dotnet:master Sep 28, 2016

jamesqo deleted the better-to-array branch September 29, 2016 00:43

jamesqo mentioned this pull request Oct 2, 2016

Add ArrayBuilder<T> struct, delete LowLevelList #12287

Merged

jamesqo mentioned this pull request Oct 27, 2016

Add LargeArrayBuilder type to enscapulate ToArray logic #13076

Merged

karelz modified the milestone: 1.2.0 Dec 3, 2016

jamesqo mentioned this pull request Jan 31, 2020

[Proposal] Adding a version of List<T> with smaller memory footprint and no random access dotnet/runtime#20304

Closed

Improve Enumerable.ToArray to avoid excessive copying for lazy enumerables #11208

Improve Enumerable.ToArray to avoid excessive copying for lazy enumerables #11208

Uh oh!

Conversation

jamesqo commented Aug 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark results

Uh oh!

jamesqo commented Sep 4, 2016

Uh oh!

stephentoub commented Sep 5, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas commented Sep 5, 2016

Uh oh!

jamesqo commented Sep 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nbaraz Oct 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov commented Sep 15, 2016

Uh oh!

VSadov commented Sep 15, 2016

Uh oh!

VSadov commented Sep 15, 2016

Uh oh!

jamesqo commented Sep 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub left a comment

Choose a reason for hiding this comment

Uh oh!

stephentoub Sep 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub Sep 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub Sep 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesqo Sep 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

jamesqo commented Aug 28, 2016 •

edited

Loading

jamesqo commented Sep 5, 2016 •

edited

Loading

nbaraz Oct 2, 2016 •

edited

Loading

jamesqo commented Sep 18, 2016 •

edited

Loading

stephentoub Sep 26, 2016 •

edited

Loading

stephentoub Sep 26, 2016 •

edited

Loading

stephentoub Sep 26, 2016 •

edited

Loading

jamesqo Sep 26, 2016 •

edited

Loading

jamesqo Sep 26, 2016 •

edited

Loading

stephentoub Sep 26, 2016 •

edited

Loading

jamesqo Sep 26, 2016 •

edited

Loading