-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Improve performance of CancellationToken.Register / CancellationTokenRegistration.Dispose #12819
Conversation
|
Just as an example of this in a higher-level test, this code: using System;
using System.IO;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
class Program
{
public static async Task Main()
{
var uri = new Uri("http://httpbin.org/get");
using (var hc = new HttpClient() { Timeout = Timeout.InfiniteTimeSpan })
for (int i = 0; i < 100; i++)
using (var r = await hc.GetStreamAsync(uri))
await r.CopyToAsync(Stream.Null);
}
}uses CancellationTokens internally because HttpClient uses them to implement its CancelPendingRequests functionality. On Windows, before this change, a profile includes the following allocations related to CancellationToken: cc: @geoffkizer |
benaadams
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whole lot of C# clean up in same commit; so a little noisy to follow - but LGTM
| [DebuggerDisplay("IsCancellationRequested = {IsCancellationRequested}")] | ||
| public struct CancellationToken | ||
| { | ||
| private readonly static Action<object> s_actionToActionObjShunt = obj => ((Action)obj)(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One day the compiler will do this 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Though even then we might still want to do this to avoid the laziness / null check.
| return m_source != null && m_source.CanBeCanceled; | ||
| } | ||
| } | ||
| public bool CanBeCanceled => _source != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-cancelable source is now only represented by null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Do you see a case where that breaks down? As far as I can tell, the only way you could get a non-cancelable source previously was with default(CancellationToken)/new CancellationToken()/CancellationToken.None, and all of those (and I believe only those) will have a null CTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, was just confirming my understanding of the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok :) Was worried I missed something.
| ); | ||
| } | ||
| public CancellationTokenRegistration Register(Action<object> callback, object state) => | ||
| Register(callback, state, useSyncContext: false, useExecutionContext: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No throw on null callback? (Is with throw expression in Register above) - though above may not be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No throw on null callback?
The public Register methods that take an Action have to do a check because that Action is actually passed as the object state. The public Register methods that take an Action<object> don't have the check and just pass it through to the internal Register, which then does the consolidated check.
| public CancellationTokenRegistration Register(Action callback) => | ||
| Register( | ||
| s_actionToActionObjShunt, | ||
| callback ?? throw new ArgumentNullException(nameof(callback)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is throw expression needed? Is a pass through call and its picked up by final Register method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is throw expression needed?
Yes. It's being passed here as the object state, not as the callback, and object state isn't validated as non-null because it's allowed to be null.
| registeredCallbacksLists = Interlocked.CompareExchange(ref m_registeredCallbacksLists, list, null); | ||
| if (registeredCallbacksLists == null) registeredCallbacksLists = list; | ||
| partitions = new CallbackPartition[s_numPartitions]; | ||
| partitions = Interlocked.CompareExchange(ref _callbackPartitions, partitions, null) ?? partitions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting pattern; might use this elsewhere 😄
|
This is a good change of approach 👍 |
Thanks
Yeah, sorry, I'd initially had it split out into several commits, but I messed up in a rebase and decided not to spend the time revisiting it all to resplit it apart. |
| procs > 2 ? 4 : | ||
| procs > 1 ? 2 : | ||
| 1; | ||
| Debug.Assert(count > 0 && (count & (count - 1)) == 0, "Got {count}, but expected a power of 2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ attribute on string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed.
…okenRegistration.Dispose CancellationToken.Register currently allocates an object per call. This PR makes it allocation-free (amortized). The current implementation was optimized for scalability, using a lock-free algorithm to minimize contention when lots of threads are registering with a token concurrently. Each registration allocates an object which is then stored into a lock-free sparse array data structure. But the level of scalability enabled is unnecessary for the vast majority of CancellationToken usage, adds non-trivial complexity to the implementation, and adds costs to the most common usage scenarios, e.g. a single CancellationToken that registered with and the unregistered with repeatedly. The new implementation strikes a (IMO) better balance between performance and scalability, retaining most of the latter while doing much better for the former in the most interesting use cases. CancellationTokenSource now maintains an array of partition objects, each of which is lazily-initialized, contains a lock that protects that partition, and contains a linked list of registration objects. The striped locking helps maintain most of the scalability and also enables node reuse due to not suffering from ABA problems. (All numbers that follow are 64-bit.) The size of CancellationTokenSource itself doesn't change (64 bytes), so allocating a CancellationTokenSource and then never registering with its token remains the same in both speed and space. The size overhead associated with the first registration on a CancellationTokenSource improves. With the old implementation, the first registration would result in 272 bytes allocated, whereas with the new implementation that shrinks to 218. Each additional registration at the same time in the CancellationToken is now larger than in the old implementation, 80 bytes instead of ~56 bytes (it varies in the old implementation due to the details of the lock-free algorithm and when new segments are created), but these new node allocations are reusable. So if you register 100 registrations with the same CancellationToken and don't unregister any of them, you'll incur ~8000 bytes of allocation instead of ~5600, but if you instead register and unregister them 100 times, you'll only incur ~80 bytes of allocation instead of ~5600, with objects reused for the former case but not the latter case. Speed is also significantly improved. A single thread registering and unregistering repeatedly from a token is now ~2x the old implementation in throughput. Similarly with eight threads on eight logical cores all registering/unregistering concurrently. Finally, the size of CancellationTokenRegistration also shrinks. This struct is often contained in other classes that have registered with the token, and so will end up shrinking the size of those allocations as well.
|
@dotnet-bot test OSX10.12 x64 Checked Build and Test please ("ERROR: Build step failed with exception hudson.remoting.RemotingSystemException: java.util.concurrent.ExecutionException: Invalid object ID 326 iota=327" |
|
@maririos PTAL at the OSX test failure. |
|
Issue https://github.com/dotnet/core-eng/issues/1259 created |


CancellationToken.Register currently allocates an object per call. This PR makes it allocation-free (amortized).
The current implementation was optimized for scalability, using a lock-free algorithm to minimize contention when lots of threads are registering with a token concurrently. Each registration allocates an object which is then stored into a lock-free sparse array data structure. But the level of scalability enabled is unnecessary for the vast majority of CancellationToken usage, adds non-trivial complexity to the implementation, and adds costs to the most common usage scenarios, e.g. a single CancellationToken that registered with and the unregistered with repeatedly.
The new implementation strikes a (IMO) better balance between performance and scalability, retaining most of the latter while doing much better for the former in the most interesting use cases. CancellationTokenSource now maintains an array of partition objects, each of which is lazily-initialized, contains a lock that protects that partition, and contains a linked list of registration objects. The striped locking helps maintain most of the scalability and also enables node reuse due to not suffering from ABA problems.
(All numbers that follow are 64-bit.)
The size of CancellationTokenSource itself doesn't change (64 bytes), so allocating a CancellationTokenSource and then never registering with its token remains the same in both speed and space.
The size overhead associated with the first registration on a CancellationTokenSource improves. With the old implementation, the first registration would result in 272 bytes allocated, whereas with the new implementation that shrinks to 218.
Each additional registration at the same time in the CancellationToken is now larger than in the old implementation, 80 bytes instead of ~56 bytes (it varies in the old implementation due to the details of the lock-free algorithm and when new segments are created), but these new node allocations are reusable. So if you register 100 registrations with the same CancellationToken and don't unregister any of them, you'll incur ~8000 bytes of allocation instead of ~5600, but if you instead register and unregister them 100 times, you'll only incur ~80 bytes of allocation instead of ~5600, with objects reused for the former case but not the latter case.
Speed is also significantly improved. A single thread registering and unregistering repeatedly from a token is now ~2x the old implementation in throughput. Similarly with eight threads on eight logical cores all registering/unregistering concurrently.
Finally, the size of CancellationTokenRegistration also shrinks by a few bytes. This struct is often contained in other classes that have registered with the token, and so will end up shrinking the size of those allocations as well.
cc: @kouvel, @alexperovich, @benaadams