Improve performance of CancellationToken.Register / CancellationTokenRegistration.Dispose #12819

stephentoub · 2017-07-14T21:25:31Z

CancellationToken.Register currently allocates an object per call. This PR makes it allocation-free (amortized).

The current implementation was optimized for scalability, using a lock-free algorithm to minimize contention when lots of threads are registering with a token concurrently. Each registration allocates an object which is then stored into a lock-free sparse array data structure. But the level of scalability enabled is unnecessary for the vast majority of CancellationToken usage, adds non-trivial complexity to the implementation, and adds costs to the most common usage scenarios, e.g. a single CancellationToken that registered with and the unregistered with repeatedly.

The new implementation strikes a (IMO) better balance between performance and scalability, retaining most of the latter while doing much better for the former in the most interesting use cases. CancellationTokenSource now maintains an array of partition objects, each of which is lazily-initialized, contains a lock that protects that partition, and contains a linked list of registration objects. The striped locking helps maintain most of the scalability and also enables node reuse due to not suffering from ABA problems.

(All numbers that follow are 64-bit.)

The size of CancellationTokenSource itself doesn't change (64 bytes), so allocating a CancellationTokenSource and then never registering with its token remains the same in both speed and space.

The size overhead associated with the first registration on a CancellationTokenSource improves. With the old implementation, the first registration would result in 272 bytes allocated, whereas with the new implementation that shrinks to 218.

Each additional registration at the same time in the CancellationToken is now larger than in the old implementation, 80 bytes instead of ~56 bytes (it varies in the old implementation due to the details of the lock-free algorithm and when new segments are created), but these new node allocations are reusable. So if you register 100 registrations with the same CancellationToken and don't unregister any of them, you'll incur ~8000 bytes of allocation instead of ~5600, but if you instead register and unregister them 100 times, you'll only incur ~80 bytes of allocation instead of ~5600, with objects reused for the former case but not the latter case.

Speed is also significantly improved. A single thread registering and unregistering repeatedly from a token is now ~2x the old implementation in throughput. Similarly with eight threads on eight logical cores all registering/unregistering concurrently.

Finally, the size of CancellationTokenRegistration also shrinks by a few bytes. This struct is often contained in other classes that have registered with the token, and so will end up shrinking the size of those allocations as well.

cc: @kouvel, @alexperovich, @benaadams

stephentoub · 2017-07-15T13:36:42Z

Just as an example of this in a higher-level test, this code:

using System;
using System.IO;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    public static async Task Main()
    {
        var uri = new Uri("http://httpbin.org/get");
        using (var hc = new HttpClient() { Timeout = Timeout.InfiniteTimeSpan })
            for (int i = 0; i < 100; i++)
                using (var r = await hc.GetStreamAsync(uri))
                    await r.CopyToAsync(Stream.Null);
    }
}

uses CancellationTokens internally because HttpClient uses them to implement its CancelPendingRequests functionality. On Windows, before this change, a profile includes the following allocations related to CancellationToken:

After the change, a profile includes these:

cc: @geoffkizer

benaadams

Whole lot of C# clean up in same commit; so a little noisy to follow - but LGTM

benaadams · 2017-07-18T20:32:37Z

src/mscorlib/src/System/Threading/CancellationToken.cs

    [DebuggerDisplay("IsCancellationRequested = {IsCancellationRequested}")]
    public struct CancellationToken
    {
+        private readonly static Action<object> s_actionToActionObjShunt = obj => ((Action)obj)();


One day the compiler will do this 😢

Yes. Though even then we might still want to do this to avoid the laziness / null check.

benaadams · 2017-07-18T20:33:32Z

src/mscorlib/src/System/Threading/CancellationToken.cs

-                return m_source != null && m_source.CanBeCanceled;
-            }
-        }
+        public bool CanBeCanceled => _source != null;


Non-cancelable source is now only represented by null?

Yes. Do you see a case where that breaks down? As far as I can tell, the only way you could get a non-cancelable source previously was with default(CancellationToken)/new CancellationToken()/CancellationToken.None, and all of those (and I believe only those) will have a null CTS.

No, was just confirming my understanding of the change.

Ok :) Was worried I missed something.

benaadams · 2017-07-18T20:37:49Z

src/mscorlib/src/System/Threading/CancellationToken.cs

-             );
-        }
+        public CancellationTokenRegistration Register(Action<object> callback, object state) =>
+            Register(callback, state, useSyncContext: false, useExecutionContext: true);


No throw on null callback? (Is with throw expression in Register above) - though above may not be needed.

No throw on null callback?

The public Register methods that take an Action have to do a check because that Action is actually passed as the object state. The public Register methods that take an Action<object> don't have the check and just pass it through to the internal Register, which then does the consolidated check.

benaadams · 2017-07-18T20:40:35Z

src/mscorlib/src/System/Threading/CancellationToken.cs

+        public CancellationTokenRegistration Register(Action callback) =>
+            Register(
+                s_actionToActionObjShunt,
+                callback ?? throw new ArgumentNullException(nameof(callback)),


Is throw expression needed? Is a pass through call and its picked up by final Register method?

Is throw expression needed?

Yes. It's being passed here as the object state, not as the callback, and object state isn't validated as non-null because it's allowed to be null.

benaadams · 2017-07-18T20:52:33Z

src/mscorlib/src/System/Threading/CancellationTokenSource.cs

-                    registeredCallbacksLists = Interlocked.CompareExchange(ref m_registeredCallbacksLists, list, null);
-                    if (registeredCallbacksLists == null) registeredCallbacksLists = list;
+                    partitions = new CallbackPartition[s_numPartitions];
+                    partitions = Interlocked.CompareExchange(ref _callbackPartitions, partitions, null) ?? partitions;


Interesting pattern; might use this elsewhere 😄

benaadams · 2017-07-18T20:59:49Z

This is a good change of approach 👍

stephentoub · 2017-07-18T21:12:07Z

This is a good change of approach

Thanks

Whole lot of C# clean up in same commit; so a little noisy to follow

Yeah, sorry, I'd initially had it split out into several commits, but I messed up in a rebase and decided not to spend the time revisiting it all to resplit it apart.

kouvel · 2017-07-19T10:15:18Z

src/mscorlib/src/System/Threading/CancellationTokenSource.cs

+                procs > 2 ? 4 :
+                procs > 1 ? 2 :
+                1;
+            Debug.Assert(count > 0 && (count & (count - 1)) == 0, "Got {count}, but expected a power of 2");


$ attribute on string

Good catch. Fixed.

…okenRegistration.Dispose CancellationToken.Register currently allocates an object per call. This PR makes it allocation-free (amortized). The current implementation was optimized for scalability, using a lock-free algorithm to minimize contention when lots of threads are registering with a token concurrently. Each registration allocates an object which is then stored into a lock-free sparse array data structure. But the level of scalability enabled is unnecessary for the vast majority of CancellationToken usage, adds non-trivial complexity to the implementation, and adds costs to the most common usage scenarios, e.g. a single CancellationToken that registered with and the unregistered with repeatedly. The new implementation strikes a (IMO) better balance between performance and scalability, retaining most of the latter while doing much better for the former in the most interesting use cases. CancellationTokenSource now maintains an array of partition objects, each of which is lazily-initialized, contains a lock that protects that partition, and contains a linked list of registration objects. The striped locking helps maintain most of the scalability and also enables node reuse due to not suffering from ABA problems. (All numbers that follow are 64-bit.) The size of CancellationTokenSource itself doesn't change (64 bytes), so allocating a CancellationTokenSource and then never registering with its token remains the same in both speed and space. The size overhead associated with the first registration on a CancellationTokenSource improves. With the old implementation, the first registration would result in 272 bytes allocated, whereas with the new implementation that shrinks to 218. Each additional registration at the same time in the CancellationToken is now larger than in the old implementation, 80 bytes instead of ~56 bytes (it varies in the old implementation due to the details of the lock-free algorithm and when new segments are created), but these new node allocations are reusable. So if you register 100 registrations with the same CancellationToken and don't unregister any of them, you'll incur ~8000 bytes of allocation instead of ~5600, but if you instead register and unregister them 100 times, you'll only incur ~80 bytes of allocation instead of ~5600, with objects reused for the former case but not the latter case. Speed is also significantly improved. A single thread registering and unregistering repeatedly from a token is now ~2x the old implementation in throughput. Similarly with eight threads on eight logical cores all registering/unregistering concurrently. Finally, the size of CancellationTokenRegistration also shrinks. This struct is often contained in other classes that have registered with the token, and so will end up shrinking the size of those allocations as well.

stephentoub · 2017-07-19T15:34:55Z

@dotnet-bot test OSX10.12 x64 Checked Build and Test please ("ERROR: Build step failed with exception hudson.remoting.RemotingSystemException: java.util.concurrent.ExecutionException: Invalid object ID 326 iota=327"
@dotnet/dnceng, FYI

dleeapho · 2017-07-19T18:19:56Z

@maririos PTAL at the OSX test failure.

maririos · 2017-07-19T20:54:10Z

Issue https://github.com/dotnet/core-eng/issues/1259 created

dnfclas added the cla-already-signed label Jul 14, 2017

stephentoub changed the title ~~Improve the performance of CancellationToken.Register / CancellationTokenRegistration.Dispose~~ Improve performance of CancellationToken.Register / CancellationTokenRegistration.Dispose Jul 17, 2017

benaadams approved these changes Jul 18, 2017

View reviewed changes

kouvel approved these changes Jul 19, 2017

View reviewed changes

stephentoub force-pushed the cts_perf branch from c9b03dd to a4094d2 Compare July 19, 2017 13:54

stephentoub merged commit 8a163ca into dotnet:master Jul 19, 2017

stephentoub deleted the cts_perf branch July 19, 2017 17:24

joperezr mentioned this pull request Jul 20, 2017

Update CoreClr, CoreFx, CoreSetup, ProjectNTfs, ProjectNTfsTestILC, Standard to preview2-25520-01, preview2-25520-92, preview2-25519-02, beta-25520-01, beta-25520-01, preview1-25520-01, respectively (master) dotnet/corefx#22459

Merged

karelz modified the milestone: 2.1.0 Aug 28, 2017

stephentoub added the netfx-port-consider label Oct 17, 2017

stephentoub mentioned this pull request Feb 11, 2018

Enable SocketsHttpHandler cancellation support dotnet/corefx#27029

Merged

stephentoub mentioned this pull request Nov 26, 2019

Fix CancellationTokenRegistration.Unregister race condition #27949

Merged

fraser-lowndes mentioned this pull request Aug 30, 2022

Fix S2930 FN: Track System.Threading.CancellationTokenSource SonarSource/sonar-dotnet#6050

Closed

Improve performance of CancellationToken.Register / CancellationTokenRegistration.Dispose #12819

Improve performance of CancellationToken.Register / CancellationTokenRegistration.Dispose #12819

Uh oh!

Conversation

stephentoub commented Jul 14, 2017

Uh oh!

stephentoub commented Jul 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benaadams left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benaadams commented Jul 18, 2017

Uh oh!

stephentoub commented Jul 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub commented Jul 19, 2017

Uh oh!

dleeapho commented Jul 19, 2017

Uh oh!

maririos commented Jul 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

stephentoub commented Jul 15, 2017 •

edited

Loading