Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@jamesqo
Copy link
Contributor

@jamesqo jamesqo commented Aug 28, 2016

The idea that instead of resizing the array (aka throwing it out and allocating a new one) and re-copying the data every time the sequence doesn't fit, we can instead keep the old array around, allocate a new one of size * 2, and just continue reading into the new array at index 0. This will require keeping around a list of arrays we've filled up so far, however the overhead of this list will be substantially less than the size of the data itself.

A visualization of what I'm talking about for a 300-length enumerable:

First array: [items 0-31]

List of arrays:
[0]: [items 32-63]
[1]: [items 64-127]
[2]: [items 128-255]

Current array: [items 256-299]

(I also put this in the comments to help future readers understand.)

Then when we need the final array to return from the method, we just calculate first.Length + list.Sum(a => a.Length) + indexWeFinishedReadingIntoInCurrentArray, pre-allocate an array of that size, and then copy each of the arrays into that array.

I opted to only use this algorithm for enumerables of > length 32, since for smaller sizes it will lead to increased fragmentation, and the overhead of the list allocation will probably be noticeable in comparison to, say, 20 elements. ~32 is also the threshold at around which Array.Copy starts getting faster than a for-loop.

Benchmark results

https://gist.github.com/jamesqo/399b72bf5de8e2cbd83d044836cbefa4 (includes results/source code)

You can see that the new implementation consistently has about 2/3 the gen0 collections as the old one.

The timings are somewhat inconsistent (disclaimer: I only did 100,000 iterations since the tests were taking way too long to run), but you can see that the new implementation is generally faster/the same speed as the old one. (I suspect it may be hard to measure the differences due to all of the interface invocations that are going on, in addition to the covariant array type checks if T is a reference type.)

cc @stephentoub @VSadov @JonHanna

@jamesqo
Copy link
Contributor Author

jamesqo commented Sep 4, 2016

@stephentoub can you PTAL?

I admit that the added logic is a bit complex, however this is a frequently-used path in cases like customers.Where(c => c.Id == 3).ToArray() and the memory improvements are substantial.

@stephentoub
Copy link
Member

I want to think about this some more. It's not clear to me yet it's entirely a win, even if a microbenchmark for throughput and Gen0 GCs shows an improvement. It may be; I'm slightly concerned though that this will end up with more objects held onto for the duration of the operation (plus the few additional small objects that get allocated), etc. As you say, it's also a lot more code.

@VSadov, @justinvp, @jkotas, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the implementation to work well for both JITed and AOT environments. This trick does not work for AOT - you will pay for the extra code there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trick does not work for AOT - you will pay for the extra code there.

How much is the cost? Since the method is compiled in advance, there is no first-time overhead invoking the method like when it is jitted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost is in binary size and compile time.

@jkotas
Copy link
Member

jkotas commented Sep 5, 2016

however this is a frequently-used path in cases like customers.Where(c => c.Id == 3).ToArray()

What is the improvement for this case, or other common Linq use cases?

Linq has ToArray optimized in the layer above via IIListProvider interface. I am wondering how frequently would this optimization actually kicks in; and if there are common cases where it does not - whether it would be better to optimize those using IIListProvider instead.

@jamesqo
Copy link
Contributor Author

jamesqo commented Sep 5, 2016

What is the improvement for this case, or other common Linq use cases?

If the number of customers is greater than 32, for example 50, then we exchange the Customer[64] allocation at the end for a Customer[32] allocation. And we keep the Customer[32] which holds the first 32 customers around, instead of copying all of the data into a new array and then throwing it away.

Linq has ToArray optimized in the layer above via IIListProvider interface. I am wondering how frequently would this optimization actually kicks in

IIListProvider (to my understanding) represents an iterator we know the count of in advance, and we can make exactly 1 allocation for ToArray / ToList. For example, if we call customers.Select(c => Foo(c)) and we know that customers.Count == 38, we know the resulting enumerable will have exactly 38 elements and can allocate an array of exactly that size. I don't think this can be done with Where iterators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious - why not new T[first.Length * 2] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to be overly optimistic with the allocations- having first.Length * 2 (which would be 64 as this PR is currently implemented) would lead us to allocate more than 2N elements for an N-length enumerable in some cases. For example, 100 would lead us to allocate space for 224 elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. In a typical doubling algorithm, you'd have 32 elements and double the size of the array to 64, making room for another 32. Then double again to 128, making room for another 64. Etc. Since we're allocating an array that won't include the initial set of elements, we don't want to do that initial doubling. So we allocate an additional array of length 32 for 32 elements (64 total), then an additional array of length 64 for 64 elements (128 total), etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea that should sometimes reduce the number of allocations/copying:
Allocate first.Length * 2 up front, but place the elements in the new array starting at first.Length. Once you reach the end of the new array, or the end of the iterator:

  • If there are more items, put them at the start of the new array (You will have to reorder them when you de-fragment the array list)
  • If there are no more items, place the elements from first in the new array, from 0 to first.Length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NSIL That seems like it would increase allocations for e.g. lengths 33-64, since before we would allocate a T[32] and a T[32], and now we would allocate a T[32] and T[64]. What would be an example in which is decreases allocations/copying?

Copy link

@nbaraz nbaraz Oct 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are between 32 and 64 elements, you can just return the new array.
Edit: I don't know if resizing down allocates. if it does, then it doesn't help at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NSIL Resizing down does allocate, unfortunately.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind then :) I assumed it didn't, will check next time.

@VSadov
Copy link
Member

VSadov commented Sep 15, 2016

I like the change. It seems that for n <= 32 it would have roughly the same behavior as before, while for large enumerables the transient bytes allocated and copying churn would be reduced by ~50%.

@VSadov
Copy link
Member

VSadov commented Sep 15, 2016

LGTM

@VSadov
Copy link
Member

VSadov commented Sep 15, 2016

@jamesqo - could you try benchmarking scenarios with T being a reference type (or a struct containing a reference type).
Copying references is more expensive because of GC fences, so I wonder how this change affects the throughput.

@jamesqo
Copy link
Contributor Author

jamesqo commented Sep 18, 2016

@VSadov Here are the results for some types which are/contain references. I additionally took a look in PerfView and memmoveGCRefs seems to be taking up less time than it was before.

@karelz karelz added enhancement Product code improvement that does NOT require public API changes/additions area-System.Collections area-System.Linq and removed area-System.Collections labels Sep 26, 2016
Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the complexity this adds, but I can see it being worth it for larger enumerables. Thanks for working on it.

Copy link
Member

@stephentoub stephentoub Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just use Array.Resize / Array.Copy and let them employ appropriate optimizations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEnumerator.MoveNext is spec'd to return false on subsequent calls after it's been called, but I'm concerned that not every IEnumerator implementation follows that, and this is introducing an additional MoveNext call after MoveNext may have returned false in certain cases. Let's avoid that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we reached here we have to had exited from the loop after calling get_Current. MoveNext is only called while index < buffer.Length, if they're == then we exited via break. (Perhaps I should add a comment clarifying this?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be helpful. This is part of the complexity I was talking about.

Copy link
Member

@stephentoub stephentoub Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as earlier. Please just use Array.Resize / Array.Copy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this assert; this was just checked two lines earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. In a typical doubling algorithm, you'd have 32 elements and double the size of the array to 64, making room for another 32. Then double again to 128, making room for another 64. Etc. Since we're allocating an array that won't include the initial set of elements, we don't want to do that initial doubling. So we allocate an additional array of length 32 for 32 elements (64 total), then an additional array of length 64 for 64 elements (128 total), etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be checked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Member

@stephentoub stephentoub Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than special-case first, should we just put first into buffers at the very beginning and then treat it like any other array in the list? Then you can delete this copy. The one-extra slot in buffer's underlying array of arrays should not matter.

Copy link
Contributor Author

@jamesqo jamesqo Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will cause a slight overhead for enumerables length 33-64 since adding it to the list will cause the List to do an allocation of one T[4][]. I wanted to defer that allocation as much as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to defer that allocation as much as possible.

I see. Ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could just foreach, e.g.

foreach (T[] buffer in buffers)
{
    Array.Copy(buffer, 0, result, copied, buffer.Length);
    copied += buffer.Length;
}

(I also changed previous to buffer here, as I'm not sure what previous is referring to here as a name.)

Copy link
Contributor Author

@jamesqo jamesqo Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be slower; I checked some time ago and get_Item and set_Item of List are both inlined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting this because of performance. Does it make a measurable difference here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... no. I'll change it to use foreach, then.

Copy link
Member

@stephentoub stephentoub Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to carry over any of the logic from https://github.com/dotnet/corefx/pull/11208/files#diff-be1bf738e84b4f261cc144a01fdb196cL63 ? If not, the local isn't necessary and this could just be:

current = new T[current.Length * 2];

Copy link
Contributor Author

@jamesqo jamesqo Sep 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub No, since if the enumerable is between 2^30 and 2^31 - 1 elements we will not attempt to allocate an array over MaxArrayLength unlike the old algorithm. Will change.

@stephentoub
Copy link
Member

Thanks for your efforts on this, @jamesqo.

@stephentoub stephentoub merged commit c9c676e into dotnet:master Sep 28, 2016
@jamesqo jamesqo deleted the better-to-array branch September 29, 2016 00:43
@karelz karelz modified the milestone: 1.2.0 Dec 3, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…ables (dotnet/corefx#11208)

* Improve Enumerable.ToArray implementation to avoid excessive copying

Commit migrated from dotnet/corefx@c9c676e
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Linq enhancement Product code improvement that does NOT require public API changes/additions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants