Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@stephentoub
Copy link
Member

CancellationToken.Register currently allocates an object per call. This PR makes it allocation-free (amortized).

The current implementation was optimized for scalability, using a lock-free algorithm to minimize contention when lots of threads are registering with a token concurrently. Each registration allocates an object which is then stored into a lock-free sparse array data structure. But the level of scalability enabled is unnecessary for the vast majority of CancellationToken usage, adds non-trivial complexity to the implementation, and adds costs to the most common usage scenarios, e.g. a single CancellationToken that registered with and the unregistered with repeatedly.

The new implementation strikes a (IMO) better balance between performance and scalability, retaining most of the latter while doing much better for the former in the most interesting use cases. CancellationTokenSource now maintains an array of partition objects, each of which is lazily-initialized, contains a lock that protects that partition, and contains a linked list of registration objects. The striped locking helps maintain most of the scalability and also enables node reuse due to not suffering from ABA problems.

(All numbers that follow are 64-bit.)

The size of CancellationTokenSource itself doesn't change (64 bytes), so allocating a CancellationTokenSource and then never registering with its token remains the same in both speed and space.

The size overhead associated with the first registration on a CancellationTokenSource improves. With the old implementation, the first registration would result in 272 bytes allocated, whereas with the new implementation that shrinks to 218.

Each additional registration at the same time in the CancellationToken is now larger than in the old implementation, 80 bytes instead of ~56 bytes (it varies in the old implementation due to the details of the lock-free algorithm and when new segments are created), but these new node allocations are reusable. So if you register 100 registrations with the same CancellationToken and don't unregister any of them, you'll incur ~8000 bytes of allocation instead of ~5600, but if you instead register and unregister them 100 times, you'll only incur ~80 bytes of allocation instead of ~5600, with objects reused for the former case but not the latter case.

Speed is also significantly improved. A single thread registering and unregistering repeatedly from a token is now ~2x the old implementation in throughput. Similarly with eight threads on eight logical cores all registering/unregistering concurrently.

Finally, the size of CancellationTokenRegistration also shrinks by a few bytes. This struct is often contained in other classes that have registered with the token, and so will end up shrinking the size of those allocations as well.

cc: @kouvel, @alexperovich, @benaadams

@stephentoub
Copy link
Member Author

stephentoub commented Jul 15, 2017

Just as an example of this in a higher-level test, this code:

using System;
using System.IO;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    public static async Task Main()
    {
        var uri = new Uri("http://httpbin.org/get");
        using (var hc = new HttpClient() { Timeout = Timeout.InfiniteTimeSpan })
            for (int i = 0; i < 100; i++)
                using (var r = await hc.GetStreamAsync(uri))
                    await r.CopyToAsync(Stream.Null);
    }
}

uses CancellationTokens internally because HttpClient uses them to implement its CancelPendingRequests functionality. On Windows, before this change, a profile includes the following allocations related to CancellationToken:
image
After the change, a profile includes these:
image

cc: @geoffkizer

@stephentoub stephentoub changed the title Improve the performance of CancellationToken.Register / CancellationTokenRegistration.Dispose Improve performance of CancellationToken.Register / CancellationTokenRegistration.Dispose Jul 17, 2017
Copy link
Member

@benaadams benaadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole lot of C# clean up in same commit; so a little noisy to follow - but LGTM

[DebuggerDisplay("IsCancellationRequested = {IsCancellationRequested}")]
public struct CancellationToken
{
private readonly static Action<object> s_actionToActionObjShunt = obj => ((Action)obj)();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One day the compiler will do this 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Though even then we might still want to do this to avoid the laziness / null check.

return m_source != null && m_source.CanBeCanceled;
}
}
public bool CanBeCanceled => _source != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-cancelable source is now only represented by null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Do you see a case where that breaks down? As far as I can tell, the only way you could get a non-cancelable source previously was with default(CancellationToken)/new CancellationToken()/CancellationToken.None, and all of those (and I believe only those) will have a null CTS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, was just confirming my understanding of the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok :) Was worried I missed something.

);
}
public CancellationTokenRegistration Register(Action<object> callback, object state) =>
Register(callback, state, useSyncContext: false, useExecutionContext: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No throw on null callback? (Is with throw expression in Register above) - though above may not be needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No throw on null callback?

The public Register methods that take an Action have to do a check because that Action is actually passed as the object state. The public Register methods that take an Action<object> don't have the check and just pass it through to the internal Register, which then does the consolidated check.

public CancellationTokenRegistration Register(Action callback) =>
Register(
s_actionToActionObjShunt,
callback ?? throw new ArgumentNullException(nameof(callback)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is throw expression needed? Is a pass through call and its picked up by final Register method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is throw expression needed?

Yes. It's being passed here as the object state, not as the callback, and object state isn't validated as non-null because it's allowed to be null.

registeredCallbacksLists = Interlocked.CompareExchange(ref m_registeredCallbacksLists, list, null);
if (registeredCallbacksLists == null) registeredCallbacksLists = list;
partitions = new CallbackPartition[s_numPartitions];
partitions = Interlocked.CompareExchange(ref _callbackPartitions, partitions, null) ?? partitions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting pattern; might use this elsewhere 😄

@benaadams
Copy link
Member

This is a good change of approach 👍

@stephentoub
Copy link
Member Author

This is a good change of approach

Thanks

Whole lot of C# clean up in same commit; so a little noisy to follow

Yeah, sorry, I'd initially had it split out into several commits, but I messed up in a rebase and decided not to spend the time revisiting it all to resplit it apart.

procs > 2 ? 4 :
procs > 1 ? 2 :
1;
Debug.Assert(count > 0 && (count & (count - 1)) == 0, "Got {count}, but expected a power of 2");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ attribute on string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.

…okenRegistration.Dispose

CancellationToken.Register currently allocates an object per call.  This PR makes it allocation-free (amortized).

The current implementation was optimized for scalability, using a lock-free algorithm to minimize contention when lots of threads are registering with a token concurrently.  Each registration allocates an object which is then stored into a lock-free sparse array data structure.  But the level of scalability enabled is unnecessary for the vast majority of CancellationToken usage, adds non-trivial complexity to the implementation, and adds costs to the most common usage scenarios, e.g. a single CancellationToken that registered with and the unregistered with repeatedly.

The new implementation strikes a (IMO) better balance between performance and scalability, retaining most of the latter while doing much better for the former in the most interesting use cases.  CancellationTokenSource now maintains an array of partition objects, each of which is lazily-initialized, contains a lock that protects that partition, and contains a linked list of registration objects.  The striped locking helps maintain most of the scalability and also enables node reuse due to not suffering from ABA problems.

(All numbers that follow are 64-bit.)

The size of CancellationTokenSource itself doesn't change (64 bytes), so allocating a CancellationTokenSource and then never registering with its token remains the same in both speed and space.

The size overhead associated with the first registration on a CancellationTokenSource improves.  With the old implementation, the first registration would result in 272 bytes allocated, whereas with the new implementation that shrinks to 218.

Each additional registration at the same time in the CancellationToken is now larger than in the old implementation, 80 bytes instead of ~56 bytes (it varies in the old implementation due to the details of the lock-free algorithm and when new segments are created), but these new node allocations are reusable.  So if you register 100 registrations with the same CancellationToken and don't unregister any of them, you'll incur ~8000 bytes of allocation instead of ~5600, but if you instead register and unregister them 100 times, you'll only incur ~80 bytes of allocation instead of ~5600, with objects reused for the former case but not the latter case.

Speed is also significantly improved.  A single thread registering and unregistering repeatedly from a token is now ~2x the old implementation in throughput.  Similarly with eight threads on eight logical cores all registering/unregistering concurrently.

Finally, the size of CancellationTokenRegistration also shrinks.  This struct is often contained in other classes that have registered with the token, and so will end up shrinking the size of those allocations as well.
@stephentoub
Copy link
Member Author

@dotnet-bot test OSX10.12 x64 Checked Build and Test please ("ERROR: Build step failed with exception hudson.remoting.RemotingSystemException: java.util.concurrent.ExecutionException: Invalid object ID 326 iota=327"
@dotnet/dnceng, FYI

@stephentoub stephentoub merged commit 8a163ca into dotnet:master Jul 19, 2017
@stephentoub stephentoub deleted the cts_perf branch July 19, 2017 17:24
@dleeapho
Copy link

@maririos PTAL at the OSX test failure.

@maririos
Copy link
Member

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants