-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Increase ObjectManager.MaxArraySize to improve BinaryFormatter deserialization time #17949
Conversation
To support fix-ups during BinaryFormatter deserialization, ObjectManager maintains a table of ObjectHolders. This table is an array, where each element/bucket of the array is a linked list of ObjectHolders, and where the right bucket is found via `id % MaxArraySize`. IDs are allocated monotonically and contiguously, such that the max list size is equal to the total number of object holders divided by MaxArraySize. Every time an object is searched for, we need to find the right bucket, and then walk the list, and so for large numbers of objects, searching for every object is effectively an O(N^2) operation. There are a variety of ways to address this. But a very simple stop-gap is to simply increase the MaxArraySize. Doing so doesn't impact small deserializations, as the array wouldn't have grown beyond MaxArraySize anyway, and for large deserializations, we allocate more memory for the array but only proportionally to the number of objects added to the table. The primary downside is such a large array is more likely to end up in the LOH, which will make collecting the array more expensive, but nowhere near as expensive as the O(N^2) algorithm that would result with a smaller array of long lists. Thus, this mitigation simply increases the MaxArraySize from 4K to 1MB.
|
Just to Point out how bad the performance are: stephentoub 's O(N^2) I just can't understand. O(N) should be possible I think. |
You mean you don't understand my analysis that says the current implementation is O(N^2)? Worst case, you're doing a lookup for N items, and since those N items are stored in lists that are being walked, each lookup can be O(N), hence O(N^2). |
|
Sorry I'm unclear, I agrue that just increasing the buffer size is not a satisfactory solution. |
As I stated in the PR, it's a workaround that helps to mitigate the problem, with few downsides. If you'd like to submit a PR that does better, we'd welcome it. But we're not planning to spend a lot of time working on BinaryFormatter, which was brought to .NET Core primarily for compatibility and is not a technology we're heavily investing in moving forward. |
|
Excuse my illiteracy. what does PR mean, |
"Pull Request", meaning you could implement a solution and submit it to this repo on GitHub, which would allow us to review the changes and potentially merge them into .NET Core. |
|
I might consider take a deep dive into BinaryFormater, but having browsed the code I realize it's not any easy task to understand/get a grip on it... I have 2 reasons for pushing this issue First Second Now my trust in .NET decreeses. (I recently also experienced bad performance in string.IndexOf, I found a workaround) |
Increase ObjectManager.MaxArraySize to improve BinaryFormatter deserialization time Commit migrated from dotnet/corefx@afde3c1
To support fix-ups during BinaryFormatter deserialization, ObjectManager maintains a table of ObjectHolders. This table is an array, where each element/bucket of the array is a linked list of ObjectHolders, and where the right bucket is found via
id % MaxArraySize, where MaxArraySize is a const power of 2. IDs are allocated monotonically and contiguously, such that the max list size is equal to the total number of object holders divided by MaxArraySize. Every time an object is searched for, we need to find the right bucket, and then walk the list, and so for large numbers of objects, searching for every object as is done during deserialization is effectively an O(N^2) operation.There are a variety of ways to address this. But a very simple stop-gap is to simply increase the MaxArraySize. Doing so doesn't impact small deserializations, as the array wouldn't have grown beyond MaxArraySize anyway, and for large deserializations, we allocate more memory for the array but only proportionally to the number of objects added to the table. The primary downside is such a large array is more likely to end up in the LOH, which will make collecting the array more expensive, but nowhere near as expensive as the O(N^2) algorithm that would result with a smaller array of long lists.
Thus, this mitigation simply increases the MaxArraySize, from 4K to 1M.
For the repro in https://github.com/dotnet/corefx/issues/16991, before this change I get:
and after I get:
If I increase the size further, from 500K to 1M Book in the list, before I get:
and after I get:
For super huge graphs, we could still end up with long bucket lists and thus still exhibit some polynomic behavior, but the chances/impact of that are significantly decreased.
cc: @morganbr, @jkotas , @Alois-xx
Closes https://github.com/dotnet/corefx/issues/16991