Faster reference-count overflow protection [LWN.net]

Benefits for LWN subscribers
The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jonathan Corbet
July 24, 2017

Improving the security of a system often involves tradeoffs, with the costs measured in terms of convenience and performance, among others. To their frustration, security-oriented developers often discover that the tolerance for these costs is quite low. Defenses against reference-count overflows have run into that sort of barrier, slowing their adoption considerably. Now, though, it would appear that a solution has been found to the performance cost imposed by reference-count hardening, clearing the way toward its adoption throughout the kernel.

Reference-count overflows typically come about as the result of a programming error. Code that increments the reference count on an object may neglect to decrement it in certain error paths, for example. Such errors can allow an attacker to repeatedly increment a counter until it overflows, at which point the object in question can be made to appear to be unused and freed while it is, in fact, still in use. The resulting use-after-free vulnerability is often exploitable to fully compromise the system.

The path toward protection against reference-count overflows in the kernel has been a long one. It started with code from the PaX/grsecurity patch set, but the initial approach of adding protection to the core atomic_t type ran into opposition and had to be changed. The next step was to introduce a new refcount_t type specifically for reference counts and to add the protections there. This type was merged for the 4.11 development cycle and various kernel subsystems were changed to use it, but refcount_t upset the networking developers, who were unwilling to pay the performance cost associated with it.

The networking layer is often where such patches run into trouble, but it was not the only place this time around. Andrew Morton recently complained about a refcount_t conversion in the interprocess communication (IPC) subsystem, for example, saying that there was no point in slowing down "simple, safe, old, well-tested code". It began to appear that, even if reference-count protection were added throughout the kernel, it would be disabled by distributors who feared the performance hit.

One of the core truths of secure-systems development is that disabled (or never implemented) protective measures are remarkably ineffective at stopping attackers. Another one is that "safe, old, well-tested" code may be merely old, as Ingo Molnar pointed out:

It's old, well-tested code _for existing, sane parameters_, until someone finds a decade old bug in one of these with an insane parameters no-one stumbled upon so far, and builds an exploit on top of it.

Truly protecting the kernel against reference-count overflows requires making the checks as universal as possible. That, in turn, requires either convincing developers to accept the performance cost of those checks or finding a way to reduce that cost to acceptable levels. The latter course is almost certainly the path of least resistance — if a solution to the performance cost can be found.

Update: the single instruction mentioned to the left has been claimed by Pax Team as his work. The patch set remains Kees's.

With his fast refcount overflow protection patch set, Kees Cook would indeed appear to have found that solution. It works by adding a single instruction to the existing (highly optimized) atomic_t implementation that catches the case where the reference count goes negative (as happens when the counter overflows). The instruction is especially easy for the processor's branch-prediction logic to guess correctly, so it performs well, as demonstrated by microbenchmark results posted with the patch set. The standard atomic_t implementation ran the benchmark in 82.249 billion cycles; the new refcount_t code, instead, took 82.211 billion cycles — exactly the same within the margin of error, in other words. The older refcount_t implementation requires 144.8 billion cycles to run the test, for comparison.

The current patch set is for the x86 architecture only. Since assembly work is required, each of the other architectures will need to be added individually when somebody gets around to doing it. There do not appear to be significant obstacles to making this technique work on the other major architectures.

There is a cost to this change, relative to the full refcount_t implementation: it no longer detects the "increment from zero" case. If an object's reference count drops to zero, that object will normally be freed; a subsequent increment operation suggests that a reference still existed and the freed object may still be in use. This, obviously, would be a good situation to catch, but nobody has found a way to do so without adding to the expense of increment operations. Cook claimed in the patch set that the overflow case that the new refcount_t does catch is the most common, though, and cited two exploits published in 2016 (CVE-2014-2851 and CVE-2016-0728) that would have been blocked had that checking been in place.

There are still some developers who remain unenthusiastic about the refcount_t type; see this complaint from Eric Biederman (and Cook's response) for example. The remaining disagreements seemed to be based on a couple of arguments: (1) refcount_t doesn't fix all reference-count-related problems, and (2) using it implies a presumption of bugginess that some developers find hurtful to their pride. But, with the performance issue seemingly solved, those other complaints seem unlikely to block the implementation of reference-count hardening in most of the kernel. That can only be good news for those who are concerned about security.

Index entries for this article
Kernel	Reference counting
Kernel	Security/Kernel hardening
Security	Linux kernel

to post comments

Faster reference-count overflow protection

Posted Jul 24, 2017 23:36 UTC (Mon) by ianmcc (guest, #88379) [Link] (7 responses)

using it implies a presumption of bugginess that some developers find hurtful to their pride

Any developer who thinks that shouldn't be allowed anywhere near any security related code.

Faster reference-count overflow protection

Posted Jul 25, 2017 7:06 UTC (Tue) by billev2k (subscriber, #32054) [Link]

"any security related code", i.e. *any* code.

Faster reference-count overflow protection

Posted Jul 25, 2017 13:58 UTC (Tue) by flussence (guest, #85566) [Link] (4 responses)

There are only really two types of software: slow programs that prefer correctness over speed, and “fast” programs where you learn to react to each disaster they cause by adding more sandboxes, firewalls, service monitors, cleanup tools, etc.

I don't know why any kernel developer would argue for the latter. Do they not run *anything* on top of their kernel besides toy microbenchmarks? Userspace is running the aground and there's a million tons of electron and containers spilling everywhere while they're complaining about the new lighthouse bulb needing 5 watts more than before.

Faster reference-count overflow protection

Posted Jul 25, 2017 14:24 UTC (Tue) by tao (subscriber, #17563) [Link] (2 responses)

The kernel developers themselves aren't the issue when it comes to micro-optimisations. Personally I run my kernel with tons of extra debugging enabled that slows the system down. But for the heavy-hitting users of the kernel (think 498 out of the 500 fastest super computers in the world, or Facebook, or Google) every cycle matters. Sad but true.

For such users ensuring that security fixes, no matter how important they may be, doesn't introduce performance regressions is paramount.

Faster reference-count overflow protection

Posted Jul 25, 2017 16:56 UTC (Tue) by tlamp (subscriber, #108540) [Link]

Make the security enhancement opt out through kconfig then and we're done.

Those who really need every cycle can disable it then and sane distros can ship their kernel with them, maybe ship even two versions.

Faster reference-count overflow protection

Posted Aug 3, 2017 9:41 UTC (Thu) by Wol (subscriber, #4433) [Link]

Yep. If you introduce a security fix on a path that nearly every single kernel call goes through, it's going to hurt no matter how much you try to sugar the pill. And these things have a habit of hurting in ways you didn't expect - like say the tiny increase in size bumping your code out of L1 cache and triggering cache misses all over the place ...

Tight, correct, code is great. It's also generally very expensive.

Cheers,
Wol

Faster reference-count overflow protection

Posted Aug 3, 2017 18:36 UTC (Thu) by tuna (guest, #44480) [Link]

Maybe you have pretty hard timing and latency targets you want to reach, like 8.3 ms frame time with 3 frames total latency. Then you have to write code that does not do unnecessary operations.

Faster reference-count overflow protection

Posted Jul 25, 2017 16:31 UTC (Tue) by quotemstr (subscriber, #45331) [Link]

Yet here we are. We have to deal with human nature as it is, not as we'd like it to be.

Faster reference-count overflow protection

Posted Jul 25, 2017 1:55 UTC (Tue) by OrbatuThyanD (guest, #114326) [Link] (18 responses)

kees continues to amaze.

Faster reference-count overflow protection

Posted Jul 25, 2017 2:02 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

To be fair, the first time I've seen this approach was in a message from spender.

Faster reference-count overflow protection

Posted Jul 25, 2017 2:23 UTC (Tue) by OrbatuThyanD (guest, #114326) [Link]

i'm impressed by spender too (despite him currently being on the outs with the linux community .. ?), but spender's never reviewed any of my code, so I feel more of a kinship with kees.

Faster reference-count overflow protection

Posted Jul 25, 2017 8:56 UTC (Tue) by PaXTeam (guest, #24616) [Link] (15 responses)

hey, thanks for the compliments, the fast refcount code is also mine after all, not his (despite what this another not-paid-for LWN piece would imply ;).

Faster reference-count overflow protection

Posted Jul 25, 2017 14:18 UTC (Tue) by nix (subscriber, #2304) [Link] (4 responses)

Oh, come on. The article said "It started with code from the PaX/grsecurity patch set", Kees specifically thanked you... what more do you want? Active abasement from everyone and admission that they are not fit to touch your code at all? Because that's not how development works *anywhere*.

Faster reference-count overflow protection

Posted Jul 25, 2017 17:09 UTC (Tue) by PaXTeam (guest, #24616) [Link] (2 responses)

can't speak for you but i'm fine with the truth ;). now you tell me who you think authored this fast 'single instruction overhead' refcount mechanism when you read "With his fast refcount overflow protection patch set, Kees Cook would indeed appear to have found that solution."

Faster reference-count overflow protection

Posted Jul 25, 2017 21:47 UTC (Tue) by renox (guest, #23785) [Link]

Well you can speak for me: given the lack of precision of the source of the improvement in the article, I thought that kees was the creator of the improvement.

And this is not the first time that I'm disturbed by this kind of imprecision in lwn..

Faster reference-count overflow protection

Posted Jul 26, 2017 13:37 UTC (Wed) by mjthayer (guest, #39183) [Link]

I was not wanting to get drawn into this, but did you try contacting Jon privately before posting, asking him to change the article to better reflect your part in this? It might be worth trying, even if you are not very hopeful. Perhaps it can even still be done for this article at this point in time.

Faster reference-count overflow protection

Posted Jul 26, 2017 12:21 UTC (Wed) by flussence (guest, #85566) [Link]

That's, somewhat ironically, the ultimate outcome of KSPP: pretty soon nobody will have a reason to touch his code at all. We'll all be getting reverse-engineered, properly reviewed, protection-racket-free versions and his name can live on forever, at the bottom of a git commit message.

Credits and payments

Posted Jul 25, 2017 22:40 UTC (Tue) by corbet (editor, #1) [Link] (9 responses)

So, PaX...is $OTHER_PUBLICATION paying you nicely for repeatedly trashing our writing? :)

The patch set is described as coming from Kees Cook, based on work in PaX/grsecurity. That is objectively true. More to the point, that patch set contains a great deal of work to separate out the changes, make them acceptable to mainline, document them, run benchmarks, etc. All part of the normal work required to get a change upstream. You have, as is your right, declined to do that work. But when somebody else does it for you, the result is their work as much as yours.

I write pretty routinely about patch sets here. They often contain work from multiple people, often in ways that is hard to tell. If I were to credit everybody who somehow influenced a patch set, the articles would be long indeed. The current patch set originated in your work — which was noted in the article — but also contains contributions from Kees, Josh Poimboeuf, Ingo Molnar, Li Kun, and probably others, most of whom were not credited in the article. But only you, who were actually credited, chose to publicly question my integrity (again).

The point is coming where I'm just not going to write about this work anymore, it's simply not worth the shitstorm that results every time. There is a lot of other kernel work going on — work that your patch sets depend on — that I can write about without wishing I'd chosen a different line of work.

Credits and payments

Posted Jul 25, 2017 22:55 UTC (Tue) by andresfreund (subscriber, #69562) [Link]

> The point is coming where I'm just not going to write about this work anymore, it's simply not worth the shitstorm that results every time. There is a lot of other kernel work going on — work that your patch sets depend on — that I can write about without wishing I'd chosen a different line of work.

That'd be sad - at least I appreciate them. And from my POV it seems that minimizing mainline security work is pretty much PaXTeam's goal. Reducing positive-ish coverage of such efforts seems like part of that.

If necessary I'd rather see you disable comments on these articles, or just you putting PaXTeam on ignore.

the higher road

Posted Jul 25, 2017 23:16 UTC (Tue) by sfeam (subscriber, #2841) [Link]

Don't let the bastards wear you down. The LWN kernel coverage benefits your subscribers and the larger community of readers. Publication of incessant sniping by PaXTeam does not. The better choice is to continue the former and block the latter. There is no shame in choosing to killfile sources of predictable aggrevation. I know there is always a temptation to peek at messages in the spam bin, but resist it.

Credits and payments

Posted Jul 26, 2017 0:25 UTC (Wed) by PaXTeam (guest, #24616) [Link] (2 responses)

> "With his fast refcount overflow protection patch set, Kees Cook would indeed appear to have found that solution."

who do you think that attributes my very own ideas and work to if not Kees Cook? apparently i'm not the only person who noticed your deliberate and intentional attempt at misattribution and diminishing my work i invested 16+ years into. the same thing happened recently with the 'write rarely' article, remember? and when i point out all this, you play the drama queen and even defend your actions instead of admitting, never mind correcting, your error?

> You have, as is your right, declined to do that work.

this lie just can't die, can it? i didn't "decline to do that work" as there was no offer to decline (and it can't be done in my free time, noone in the KSPP does it that way either). why do you keep spreading these lies coming from companies that chose competition over cooperation with us?

> But when somebody else does it for you, the result is their work as much as yours.

first, copy-pasting code doesn't establish copyright (Intel tried with spender's code and didn't get away with it), second, i wish you attributed the work proportional to the effort invested into it (if you think it's 'work' to upstream our code, imagine how much more work it was to create and maintain it).

> The point is coming where I'm just not going to write about this work anymore, it's simply not worth the shitstorm that results every time.

sadly, this would be good riddance at this point. your agenda to diminish the amount of effort and effect of our work we invested in over the past 16 years shows in every single KSPP/grsec/PaX related article and if you're unable to write objective (and let's not get started about being technically correct) articles about these topics, it's probably best to leave it for those who can.

Credits and payments

Posted Jul 26, 2017 12:35 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

> why do you keep spreading these lies coming from companies that chose competition over cooperation with us?

Why do you choose to compete with mainline instead of cooperation with them?

> second, i wish you attributed the work proportional to the effort invested into it

If maintenance is such a burden and upstreaming so easy, why is upstreaming not a net time savings from your side?

> if you're unable to write objective (and let's not get started about being technically correct) articles about these topics, it's probably best to leave it for those who can.

So do you want to write articles covering KSPP and how everything they do is wrong and how PaX does it better? There's a link on the left about how to write for LWN yourself.

Personally, I find it useful to know how things are changing in the kernel I actually use rather than some perfect, unattainable, pristine kernel you have now locked behind a paywall.

Credits and payments

Posted Jul 26, 2017 13:56 UTC (Wed) by PaXTeam (guest, #24616) [Link]

> Why do you choose to compete with mainline instead of cooperation with them?

you got that backwards, i didn't make that choice, the companies behind the KSPP did. let's also not forget upstream's lack of interest and often outright hostility towards anything security related that existed for a long time and it's debatable if that attitute has changed all that much. even with the KSPP around it feels like several kernel developers are dragging their feet and gnashing their teeth whenever something security related gets submitted.

> If maintenance is such a burden and upstreaming so easy, why is upstreaming not a net time savings from your side?

because it's a strawman. first, (for me) maintenance is much easier compared to R&D, second, upstreaming would be a waste of my free time that i'd rather spend on new development instead (never mind the family and other things people have in life). case in point, i extended my REFCOUNT feature to the refcount_t API in something like a day maybe whereas upstream's been working on it for what, over half a year now (not to mention the year beforehand on the original attempt that went to waste completely)? why do you think i'd have enough free time to do this when everyone else has to get paid in order to be able to do it?

> So do you want to write articles covering KSPP and how everything they do is wrong and how PaX does it better?

not just 'want' but 'did': https://grsecurity.net/grsecurity_4_11_updates.php or https://grsecurity.net/an_ancient_kernel_hole_is_not_clos... and there'll be more in the future. if you have suggestions for topics you'd like us to discuss in the future, you know where to contact us.

Credits and payments

Posted Jul 26, 2017 7:50 UTC (Wed) by patrick_g (subscriber, #44470) [Link]

> The point is coming where I'm just not going to write about this work anymore

Please don't.
I'm sure I'm not alone to think that your kernel articles are precious and much needed to try to understand what's going on in the kernel world.

Credits and payments

Posted Jul 26, 2017 8:19 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

I also understood the article as saying that the original idea came from PaX/grsecurity, but that "The next step was to introduce a new refcount_t type" was by other kernel developers.

Credits and payments

Posted Jul 26, 2017 9:12 UTC (Wed) by PaXTeam (guest, #24616) [Link]

note that i did not create the refcount_t abstraction myself (for various reasons, can elaborate if anyone cares, but i guess few do at this point ;) and it was the right move (at least conceptually speaking). you can check the kernel-hardening list about my contributions to that discussion where i explained how the actual refcount_t proposal was wrong on both the design (it doesn't expose/deal with references but low level plumbing instead) and implementation level (it's just a wrapper around atomic_t). given my experience with the latter (the PaX REFCOUNT feature works on atomic_t and variants) i also explained that the proposed implementation was a code size and performance disaster. did anyone heed my words? of course not as NIH raised its ugly head. predictably, my words proved to be true: first the code size problem forced the implementation to move out-of-line even before Linus would accept the code. this of course only exacerbated the already bad performance problem (the function call overhead in addition to the completely unnecessary open-coded cmpxchg loops) so the developers were forced to go back to my code once again. this is where the PaX REFCOUNT implementation comes into play which has always been optimized for both code size and performance (beyond its security goal of course, see the original version's description at https://forums.grsecurity.net/viewtopic.php?f=7&t=417...). it is this latter attribution (never mind technically correct explanation of what's done for what reason) that is lacking in these REFCOUNT articles you find on LWN.

Credits and payments

Posted Jul 27, 2017 17:52 UTC (Thu) by diederich (subscriber, #26007) [Link]

This has been a difficult situation for some time, and I appreciate you weathering it so far.

I think what makes it so hard is a combination of PaX being generally, needlessly abrasive and that he is usually correct in the most technical, specific sense.

> The point is coming where I'm just not going to write about this work anymore, it's simply not worth the shitstorm that results every time.

This kind of material draws me to LWN more than any other, so I hope you do continue.

Here is a suggestion: write up a short, straightforward poster code of conduct (if one doesn't already exist), and announce it one week. That code would specifically ban users who repeatedly post in an overtly negative, disruptive fashion.

Then, after a few warnings, ban anybody who continues. You will likely be accused of applying a double standard. In the end, though, this will become a more positive, useful place.

I have found myself visiting some parts of LWN less frequently because of the post behaviour of some. I'm no delicate flower; far from it. The older I get, the more I realize that dealing with such distractions is just not worth it.

And for the record: I think that the work done by the PaX/grsecurity is and has been smart and innovative. Further, I'd love to hear from those folks, here on LWN and elsewhere. Just with a less negative tone.

Faster reference-count overflow protection

Posted Jul 25, 2017 15:12 UTC (Tue) by jreiser (subscriber, #11027) [Link] (2 responses)

Related: 1) If the empty set of references is represented by a count of -2, then the "sign trick" also detects "increment from the empty set". Of course there is ample opportunity to confuse "empty set of references" with "reference counter is 0".

2) Runaway counts (bugs) can be detected sooner by scaling the counter. Count 127, 126, or 64 at a time. Those scale factors fit into a one-byte literal operand on x86*. A scale factor divisible by a larger power of 2 eases detection of stale code that counts by 1.

3) On x86*, the INC and DEC instructions do not write the Carry bit of the processor status word; the previous Carry is preserved. This creates a dataflow dependency from the most-recent instruction which does write the Carry bit, and in theory resolving the dependency can require multiple machine cycles. Perhaps in this case the latency is likely to be small (no recent long dependency chains that set Carry), or masked by the multiple-tens-of-cycles that are required by a LOCKed memory reference. An explicit operand value such as "lock add $1,counter" avoids the dataflow dependency because ADD writes all the status bits. It takes one more byte of instruction space, and instruction decode might take one more cycle if the chip implementation allows only one literal operand and/or address displacement to be decoded at a time.

[I comment at LWN because: 1) I'm not subscribed to LKML, and I don't know how to Reply with a reference to the correct thread. 2) It's probably not that important.]

Faster reference-count overflow protection

Posted Jul 26, 2017 1:08 UTC (Wed) by PaXTeam (guest, #24616) [Link]

1. unfortunately the detection of use-after-free based on the 0 refcount is a pointless exercise as the first thing an exploit writer will make sure is that the underlying memory of the just freed object gets reused and thus the 0 refcount value disappears (remember, this is supposed to be a security feature meant to prevent the exploitation of a certain bug class).

2. the problem with scaling is that it'd effectively put a lower overflow detection limit on the refcount and would thus further restrict the usable number of references (going from 4G to 2G made already some people worry as there're real life use cases which can get close to those limits and would force a move to atomic64_t or a not-yet-existing refcount64_t type).

3. note that inc/dec are simply 'inherited' from the atomic_t type and its accessors used by the refcount_t API, which despite its name, isn't about references but merely a restricted wrapper over atomic_t.

Faster reference-count overflow protection

Posted Aug 11, 2017 9:32 UTC (Fri) by jzbiciak (guest, #5246) [Link]

I have to say, "wow": I was unaware of the partial flag update weirdness on x86. I had presumed Intel's processors were sophisticated enough to track the flag bits independently; however, Agner Fog's enormous doc backs you up on this point.

Basically, on some processors, a partial flag update incurs an additional μop for a read-modify-write operation, and more generally the partial update still has ordering constraints, likely in order to easily provide a consistent view of the FLAGS in case an interrupt or exception occurs.

I noticed while fiddling around at godbolt.org that GCC avoids [code]ADD[/code] instructions if it can get the same effect with [code]LEA[/code], assuming it doesn't need the flags. This makes sense, as [code]LEA[/code] sets no flags.

Faster reference-count overflow protection

Posted Jul 26, 2017 9:26 UTC (Wed) by mm7323 (subscriber, #87386) [Link] (2 responses)

I see the patch is using inline asm. Wouldn't it be easier and cleaner to use the GCC builtins for this purpose? The compiler will try to use hardware where supported:

"The compiler will attempt to use hardware instructions to implement these built-in functions where possible, like conditional jump on overflow after addition, conditional jump on carry etc. "

Or does the kernel not trust the compiler?

Faster reference-count overflow protection

Posted Jul 26, 2017 13:10 UTC (Wed) by thestinger (guest, #91827) [Link] (1 responses)

GCC doesn't have those builtins for atomics.

Faster reference-count overflow protection

Posted Jul 26, 2017 20:57 UTC (Wed) by mm7323 (subscriber, #87386) [Link]

Ah, yes. Thanks!

Faster reference-count overflow protection

Posted Jul 29, 2017 12:20 UTC (Sat) by eru (subscriber, #2753) [Link]

Wouldn't it be simpler to just use a 64-bit refcount, at least on 64-bit machines? A back-of-the-envelope calculation suggests that with realistic worst-case incrementing rates, it would require centuries to overflow, an eternity for all practical purposes.