Faster reference-count overflow protection
Benefits for LWN subscribersImproving the security of a system often involves tradeoffs, with the costs measured in terms of convenience and performance, among others. To their frustration, security-oriented developers often discover that the tolerance for these costs is quite low. Defenses against reference-count overflows have run into that sort of barrier, slowing their adoption considerably. Now, though, it would appear that a solution has been found to the performance cost imposed by reference-count hardening, clearing the way toward its adoption throughout the kernel.The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!
Reference-count overflows typically come about as the result of a programming error. Code that increments the reference count on an object may neglect to decrement it in certain error paths, for example. Such errors can allow an attacker to repeatedly increment a counter until it overflows, at which point the object in question can be made to appear to be unused and freed while it is, in fact, still in use. The resulting use-after-free vulnerability is often exploitable to fully compromise the system.
The path toward protection against reference-count overflows in the kernel has been a long one. It started with code from the PaX/grsecurity patch set, but the initial approach of adding protection to the core atomic_t type ran into opposition and had to be changed. The next step was to introduce a new refcount_t type specifically for reference counts and to add the protections there. This type was merged for the 4.11 development cycle and various kernel subsystems were changed to use it, but refcount_t upset the networking developers, who were unwilling to pay the performance cost associated with it.
The networking layer is often where such patches run into trouble, but it
was not the only place this time around. Andrew Morton recently complained about a refcount_t
conversion in the interprocess communication (IPC) subsystem, for example,
saying that
there was no point in slowing down "simple, safe, old, well-tested
code
". It began to appear that, even if reference-count
protection were added throughout the kernel, it would be disabled by
distributors who feared the performance hit.
One of the core truths of secure-systems development is that disabled (or never implemented) protective measures are remarkably ineffective at stopping attackers. Another one is that "safe, old, well-tested" code may be merely old, as Ingo Molnar pointed out:
Truly protecting the kernel against reference-count overflows requires making the checks as universal as possible. That, in turn, requires either convincing developers to accept the performance cost of those checks or finding a way to reduce that cost to acceptable levels. The latter course is almost certainly the path of least resistance — if a solution to the performance cost can be found.
The current patch set is for the x86 architecture only. Since assembly work is required, each of the other architectures will need to be added individually when somebody gets around to doing it. There do not appear to be significant obstacles to making this technique work on the other major architectures.
There is a cost to this change, relative to the full refcount_t implementation: it no longer detects the "increment from zero" case. If an object's reference count drops to zero, that object will normally be freed; a subsequent increment operation suggests that a reference still existed and the freed object may still be in use. This, obviously, would be a good situation to catch, but nobody has found a way to do so without adding to the expense of increment operations. Cook claimed in the patch set that the overflow case that the new refcount_t does catch is the most common, though, and cited two exploits published in 2016 (CVE-2014-2851 and CVE-2016-0728) that would have been blocked had that checking been in place.
There are still some developers who remain unenthusiastic about the
refcount_t type; see this
complaint from Eric Biederman (and Cook's
response) for example. The remaining disagreements seemed to be based
on a couple of arguments: (1) refcount_t doesn't fix all
reference-count-related problems, and (2) using it implies a
presumption of bugginess that some developers find hurtful to their pride.
But, with the performance issue seemingly solved, those other complaints
seem unlikely to block the implementation of reference-count hardening in
most of the kernel. That can only be good news for those who are concerned
about security.
| Index entries for this article | |
|---|---|
| Kernel | Reference counting |
| Kernel | Security/Kernel hardening |
| Security | Linux kernel |
