Increase ObjectManager.MaxArraySize to improve BinaryFormatter deserialization time #17949

stephentoub · 2017-04-05T14:14:41Z

To support fix-ups during BinaryFormatter deserialization, ObjectManager maintains a table of ObjectHolders. This table is an array, where each element/bucket of the array is a linked list of ObjectHolders, and where the right bucket is found via id % MaxArraySize, where MaxArraySize is a const power of 2. IDs are allocated monotonically and contiguously, such that the max list size is equal to the total number of object holders divided by MaxArraySize. Every time an object is searched for, we need to find the right bucket, and then walk the list, and so for large numbers of objects, searching for every object as is done during deserialization is effectively an O(N^2) operation.

There are a variety of ways to address this. But a very simple stop-gap is to simply increase the MaxArraySize. Doing so doesn't impact small deserializations, as the array wouldn't have grown beyond MaxArraySize anyway, and for large deserializations, we allocate more memory for the array but only proportionally to the number of objects added to the table. The primary downside is such a large array is more likely to end up in the LOH, which will make collecting the array more expensive, but nowhere near as expensive as the O(N^2) algorithm that would result with a smaller array of long lists.

Thus, this mitigation simply increases the MaxArraySize, from 4K to 1M.

For the repro in https://github.com/dotnet/corefx/issues/16991, before this change I get:

Serialization time 2.30s
Deserialization 21.11s

and after I get:

Serialization time 2.49s
Deserialization 2.07s

If I increase the size further, from 500K to 1M Book in the list, before I get:

Serialization time 4.98s
Deserialization 82.30s

and after I get:

Serialization time 4.88s
Deserialization 4.13s

For super huge graphs, we could still end up with long bucket lists and thus still exhibit some polynomic behavior, but the chances/impact of that are significantly decreased.

cc: @morganbr, @jkotas , @Alois-xx
Closes https://github.com/dotnet/corefx/issues/16991

To support fix-ups during BinaryFormatter deserialization, ObjectManager maintains a table of ObjectHolders. This table is an array, where each element/bucket of the array is a linked list of ObjectHolders, and where the right bucket is found via `id % MaxArraySize`. IDs are allocated monotonically and contiguously, such that the max list size is equal to the total number of object holders divided by MaxArraySize. Every time an object is searched for, we need to find the right bucket, and then walk the list, and so for large numbers of objects, searching for every object is effectively an O(N^2) operation. There are a variety of ways to address this. But a very simple stop-gap is to simply increase the MaxArraySize. Doing so doesn't impact small deserializations, as the array wouldn't have grown beyond MaxArraySize anyway, and for large deserializations, we allocate more memory for the array but only proportionally to the number of objects added to the table. The primary downside is such a large array is more likely to end up in the LOH, which will make collecting the array more expensive, but nowhere near as expensive as the O(N^2) algorithm that would result with a smaller array of long lists. Thus, this mitigation simply increases the MaxArraySize from 4K to 1MB.

gnilsson2 · 2017-04-07T17:40:33Z

Just to Point out how bad the performance are:
In #16991 I wrote "Deserialization of an 114Mbyte Dictionary, takes about 10 minutes, and uses 1Gbyte memory!"
Since then I have coded a simple binary save/load of the same data, it loads in 3 seconds!
Ofcource this imporvment is not possible to atain in an general binary formatter, but with clever ordering of data, it should be possible to come close to this.

stephentoub 's O(N^2) I just can't understand. O(N) should be possible I think.

stephentoub · 2017-04-07T17:44:30Z

stephentoub 's O(N^2) I just can't understand

You mean you don't understand my analysis that says the current implementation is O(N^2)? Worst case, you're doing a lookup for N items, and since those N items are stored in lists that are being walked, each lookup can be O(N), hence O(N^2).

gnilsson2 · 2017-04-07T18:06:16Z

Sorry I'm unclear,
Yes you are right if you need the lookup.
My point is that (at least in my case) its possible to get rid of the lookup, with smart ordering of data.
In my case 600secondes -> 3 seconds, with N ~ 1,3million.
Going from O(N^2) to O(N) should give an improvment of N times, so my 3 sec. must be disk and other "overhead"

I agrue that just increasing the buffer size is not a satisfactory solution.
(Today we can't rely on increasing hardware performance solving anymore)
Certanly we expect .NET library to give us near optimal performance.

stephentoub · 2017-04-07T18:08:44Z

just increasing the buffer size is not a satisfactory solution

As I stated in the PR, it's a workaround that helps to mitigate the problem, with few downsides. If you'd like to submit a PR that does better, we'd welcome it. But we're not planning to spend a lot of time working on BinaryFormatter, which was brought to .NET Core primarily for compatibility and is not a technology we're heavily investing in moving forward.

gnilsson2 · 2017-04-07T18:12:29Z

Excuse my illiteracy. what does PR mean,
Ah.. I guess Problem Resolution

stephentoub · 2017-04-07T18:13:57Z

what does PR mean

"Pull Request", meaning you could implement a solution and submit it to this repo on GitHub, which would allow us to review the changes and potentially merge them into .NET Core.

gnilsson2 · 2017-04-07T18:30:21Z

I might consider take a deep dive into BinaryFormater, but having browsed the code I realize it's not any easy task to understand/get a grip on it...

I have 2 reasons for pushing this issue

First
I my hobby coding, I want a simple way to Serialize, BinaryFormater is much easier to apply than xml.

Second
I my work, I have recently recomended a switch from C++ to C#, arguing that the .NET library
is a good fast library, compensating for switching from pre compiled to JIT compiled.

Now my trust in .NET decreeses. (I recently also experienced bad performance in string.IndexOf, I found a workaround)

Increase ObjectManager.MaxArraySize to improve BinaryFormatter deserialization time Commit migrated from dotnet/corefx@afde3c1

dnfclas added the cla-already-signed label Apr 5, 2017

jkotas approved these changes Apr 5, 2017

View reviewed changes

stephentoub merged commit afde3c1 into dotnet:master Apr 5, 2017

stephentoub deleted the deser_perf branch April 5, 2017 15:31

karelz modified the milestone: 2.0.0 Apr 8, 2017

stephentoub added the netfx-port-consider label May 16, 2017

ViktorHofer mentioned this pull request Nov 26, 2017

Reuse HashHelpers for BinaryFormatter objectholder hashes #25509

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase ObjectManager.MaxArraySize to improve BinaryFormatter deserialization time #17949

Increase ObjectManager.MaxArraySize to improve BinaryFormatter deserialization time #17949

Uh oh!

stephentoub commented Apr 5, 2017

Uh oh!

gnilsson2 commented Apr 7, 2017

Uh oh!

stephentoub commented Apr 7, 2017

Uh oh!

gnilsson2 commented Apr 7, 2017

Uh oh!

stephentoub commented Apr 7, 2017

Uh oh!

gnilsson2 commented Apr 7, 2017

Uh oh!

stephentoub commented Apr 7, 2017 •

edited

Loading

Uh oh!

gnilsson2 commented Apr 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Increase ObjectManager.MaxArraySize to improve BinaryFormatter deserialization time #17949

Increase ObjectManager.MaxArraySize to improve BinaryFormatter deserialization time #17949

Uh oh!

Conversation

stephentoub commented Apr 5, 2017

Uh oh!

gnilsson2 commented Apr 7, 2017

Uh oh!

stephentoub commented Apr 7, 2017

Uh oh!

gnilsson2 commented Apr 7, 2017

Uh oh!

stephentoub commented Apr 7, 2017

Uh oh!

gnilsson2 commented Apr 7, 2017

Uh oh!

stephentoub commented Apr 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnilsson2 commented Apr 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stephentoub commented Apr 7, 2017 •

edited

Loading