bpo-31773: _PyTime_GetPerfCounter() uses _PyTime_t #3983

vstinner · 2017-10-13T09:30:01Z

Rewrite win_perf_counter() to only use integers internally.
Add _PyTime_MulDiv() which compute "ticks * SEC_TO_NS / perf_freq"
in two parts (seconds and nanoseconds) to prevent integer overflow.
Clock frequency is checked at initialization for integer overflow.
Enhance also pymonotonic() to reduce the precision loss on macOS
(mach_absolute_time() clock).

https://bugs.python.org/issue31773

* Rewrite win_perf_counter() to only use integers internally. * Add _PyTime_MulDiv() which compute "ticks * SEC_TO_NS / perf_freq" in two parts (seconds and nanoseconds) to prevent integer overflow. * Clock frequency is checked at initialization for integer overflow. * Enhance also pymonotonic() to reduce the precision loss on macOS (mach_absolute_time() clock).

vstinner · 2017-10-13T09:37:14Z

I found a solution to prevent integer overflow with a new _PyTime_MulDiv() by computing the result in two parts.

This PR should now avoids any precision loss in _PyTime_GetPerfCounter() on Windows (QueryPerformanceCounter) and Linux (clock_gettime).

Only the final conversion from _PyTime_t (int) to double in time.perf_counter() can loose precision, but only after 104 days:

# nanoseconds (_PyTime_t) => seconds (double)
# _PyTime_GetPerfCounter => time.perf_counter()
>>> x=2**52+1;int(float(x*1e-9)*1e9)-x   # no precision loss
0
>>> x=2**53+1;int(float(x*1e-9)*1e9)-x   # precision loss! (1 nanosecond)
-1

>>> import datetime; datetime.timedelta(seconds=2**53 / 1e9)
datetime.timedelta(104, 21599, 254741)
>>> print(datetime.timedelta(seconds=2**53 / 1e9))
104 days, 5:59:59.254741

Add another sanity check: timebase.denom < _PyTime_t

methane

I don't know much about mach_absolute_time().
But code looks good and API looks far better than current.

serhiy-storchaka · 2017-10-13T11:47:05Z

Python/pytime.c

+
+#if defined(MS_WINDOWS) || defined(__APPLE__)
+Py_LOCAL_INLINE(_PyTime_t)
+_PyTime_MulDiv(_PyTime_t ticks, _PyTime_t mul, _PyTime_t div)


_PyTime_t is a time duration in nanoseconds. Neither of arguments is a _PyTime_t. I would use long long.

I prefer to use _PyTime_t type for the 3 parameters to not have to worry about casting or handling integer overflow.

Using the _PyTime_t type doesn't mean that the value must be a duration. _PyTime_t is just a signed integer ;-)

Okay, I see that _PyTime_t already is used not only for time in nanoseconds.

serhiy-storchaka · 2017-10-13T12:00:05Z

Python/pytime.c

+           Check also that timebase.numer and timebase.denom can be casted to
+           _PyTime_t. */
+        if (timebase.numer > _PyTime_MAX / timebase.denom
+            || timebase.denom > _PyTime_MAX) {


timebase.denom > _PyTime_MAX is redundant. If timebase.denom > _PyTime_MAX then _PyTime_MAX / timebase.denom == 0.

While it's correct, I prefer to keep the test to be very explicit. The test is only run once, it doesn't matter.

Maybe than add the explicit test timebase.numer > _PyTime_MAX?

Additional timebase.denom > _PyTime_MAX doesn't add clarity to me. It rather make me puzzling and spending a minute for figuring out why this test is added explicitly.

Maybe than add the explicit test timebase.numer > _PyTime_MAX?

Done

serhiy-storchaka · 2017-10-13T12:05:05Z

Python/pytime.c

+       compute the number of seconds ("integer part"), and then the remaining
+       number of nanoseconds ("floating part").
+
+       The caller has to check that ticks * mul, with ticks < div, cannot


We can support even larger mul if represent the fraction mul / div as int + mul2 / div, where int = round(mul/div). But it seems there is no need to do this now.

It may be useful tomorrow if we need to support other weird clocks, but I confirm that I don't see the need for that right now.

CPU frequencies seems to be stuck at 4 GHz for physical reasons, and the maximum mul accepted by win_perf_counter() is (263-1)//(109) which is larger than 9 GHz. Moreover, recent Windows versions now use a fixed frequency of 10^7 (10 MHz): a resolution of 100 ns. QueryPerformanceFrequency() doesn't seem to be related to the CPU frequency anymore.

ncoghlan

I like this version. Some readability suggestions inline for the revised integer arithmetic, but I don't consider them blockers.

ncoghlan · 2017-10-13T11:56:36Z

Python/pytime.c

+       compute the number of seconds ("integer part"), and then the remaining
+       number of nanoseconds ("floating part").
+
+       The caller has to check that ticks * mul, with ticks < div, cannot


This constraint would be more clearly expressed as:

The caller must ensure that "div * mul" cannot overflow, as this ensures that neither 'ticks / div * mul' (the seconds calculation) nor '(ticks % div) * mul / div' (the nanosecond calculation) will overflow.

I rephrased (and simpliifed) this comment.

ncoghlan · 2017-10-13T12:02:54Z

Python/pytime.c

+    ticks %= div;
+    nsec = ticks * mul;
+    nsec /= div;
+    return sec * mul + nsec;


I'm wondering if it might be clearer to spell out the full calculations, and then rely on the compiler being smart enough to figure out whether or not there are any subcalculations it can re-use:

sec = ticks / div * mul; /* Avoid overflow in the integer part */ nsec = (ticks % div) * mul / div; /* Avoid loss of precision in the fractional part */ return sec + nsec

If you write it out that way, then the suggested comment above could be simplified to:

The caller must ensure that "div * mul" cannot overflow, as this ensures that neither the seconds calculation nor the nanoseconds calculation will overflow.

I prefer to leave the code as it is. I prefer to write explicit code and don't rely on the computer to be smart.

I just hope that some smart compilers are able to compile "ticks / div" and "ticks % div" as a single CPU instruction ;-) (At least, Intel AMD64 is able to do that.)

Nick's shorter variant looks clearer to me too.

But the name sec looks misleading. This isn't a time in seconds, and if it would be, adding seconds and nanoseconds would be wrong.

That's a good point about the names: sec_part and nsec_part would be more accurate (since they're both in units of nanoseconds, it's just that the first one has a zero nanosecond component, while the latter has a zero seconds component).

Regarding "what are compilers and CPUs good at optimising?", the thing you really, really, really want to avoid is pipeline stalls as they wait to touch the same piece of memory multiple times without enough other work in between. Setting out the calculations as distinct operations makes it clear to the compiler that the two calculations are entirely independent of each other, and the only time they need to be synchronised is in the final addition.

So that's why I favour the approach of making the algorithmic intent as clear as possible, for the benefit of both human readers and compilers, and in this case, that means making the two expressions independent of each other rather than requiring the reader to track the local variable mutation and reverse engineer the original expressions from that.

Hand-optimisation might make sense if a profiler indicated this was a hotspot, but even then, I'd be surprised if switching from "two independent subexpressions" to "touch the same local variable multiples times" actually sped things up.

I renamed variables to intpart and remaining since I'm trying to not hardcode the _PyTime_t unit in pytime.c. The API is "unit agnostic".

Exchange also overflow check on mach_timebase_info() to keep the explicit "timebase.denom > _PyTime_MAX" check, even if it's technically redundant.

serhiy-storchaka

Besides few style nitpicks in general the PR LGTM.

serhiy-storchaka · 2017-10-14T06:37:19Z

Python/pytime.c

+
+#if defined(MS_WINDOWS) || defined(__APPLE__)
+Py_LOCAL_INLINE(_PyTime_t)
+_PyTime_MulDiv(_PyTime_t ticks, _PyTime_t mul, _PyTime_t div)


Okay, I see that _PyTime_t already is used not only for time in nanoseconds.

serhiy-storchaka · 2017-10-14T06:41:25Z

Python/pytime.c

+    ticks %= div;
+    nsec = ticks * mul;
+    nsec /= div;
+    return sec * mul + nsec;


Nick's shorter variant looks clearer to me too.

But the name sec looks misleading. This isn't a time in seconds, and if it would be, adding seconds and nanoseconds would be wrong.

serhiy-storchaka · 2017-10-14T06:45:33Z

Python/pytime.c

+           Check also that timebase.numer and timebase.denom can be casted to
+           _PyTime_t. */
+        if (timebase.numer > _PyTime_MAX / timebase.denom
+            || timebase.denom > _PyTime_MAX) {


Maybe than add the explicit test timebase.numer > _PyTime_MAX?

Additional timebase.denom > _PyTime_MAX doesn't add clarity to me. It rather make me puzzling and spending a minute for figuring out why this test is added explicitly.

sec/nsec = intpart/remaining

vstinner · 2017-10-16T15:11:34Z

Hum, it's not convenient to work on this PR since the modified code is only compiled on macOS and Windows. I have to use 3 computers to work on it.

I fixed the last warnings on macOS and Windows. I tested manually time.time() and ran test_time on macOS and Windows. Everything seems fine (to me). I'm now waiting for Travis CI + AppVeyor to merge this PR.

vstinner requested a review from a team October 13, 2017 09:30

the-knights-who-say-ni added the CLA signed label Oct 13, 2017

bedevere-bot added the awaiting merge label Oct 13, 2017

vstinner requested review from methane and serhiy-storchaka October 13, 2017 09:30

vstinner added the skip news label Oct 13, 2017

serhiy-storchaka removed the request for review from a team October 13, 2017 09:36

Fix division by zero on macOS

af3cc83

Add another sanity check: timebase.denom < _PyTime_t

methane approved these changes Oct 13, 2017

View reviewed changes

serhiy-storchaka reviewed Oct 13, 2017

View reviewed changes

ncoghlan approved these changes Oct 13, 2017

View reviewed changes

Rephrase a comment

47f6df9

Exchange also overflow check on mach_timebase_info() to keep the explicit "timebase.denom > _PyTime_MAX" check, even if it's technically redundant.

serhiy-storchaka approved these changes Oct 14, 2017

View reviewed changes

vstinner added 7 commits October 16, 2017 10:21

_PyTime_MulDiv(): avoid "second" term

06e88a5

sec/nsec = intpart/remaining

Check timebase.numer > _PyTime_MAX

091d72c

Merge branch 'master' into perf_counter_int

cce1fbc

win_perf_counter(): add overflow checks

92bbde8

Fix comment

c02f47f

Fix overflow checks on macOS

d8979e0

Remove unused variables

02ada8a

vstinner merged commit bdaeb7d into python:master Oct 16, 2017

bedevere-bot removed the awaiting merge label Oct 16, 2017

vstinner deleted the perf_counter_int branch October 16, 2017 15:44

Uh oh!

bpo-31773: _PyTime_GetPerfCounter() uses _PyTime_t #3983

bpo-31773: _PyTime_GetPerfCounter() uses _PyTime_t #3983

Uh oh!

Conversation

vstinner commented Oct 13, 2017 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Oct 13, 2017

Uh oh!

methane left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ncoghlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner commented Oct 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vstinner commented Oct 13, 2017 •

edited by bedevere-bot

Loading