Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Oct 13, 2017

  • Rewrite win_perf_counter() to only use integers internally.
  • Add _PyTime_MulDiv() which compute "ticks * SEC_TO_NS / perf_freq"
    in two parts (seconds and nanoseconds) to prevent integer overflow.
  • Clock frequency is checked at initialization for integer overflow.
  • Enhance also pymonotonic() to reduce the precision loss on macOS
    (mach_absolute_time() clock).

https://bugs.python.org/issue31773

* Rewrite win_perf_counter() to only use integers internally.
* Add _PyTime_MulDiv() which compute "ticks * SEC_TO_NS / perf_freq"
  in two parts (seconds and nanoseconds) to prevent integer overflow.
* Clock frequency is checked at initialization for integer overflow.
* Enhance also pymonotonic() to reduce the precision loss on macOS
  (mach_absolute_time() clock).
@vstinner vstinner requested a review from a team October 13, 2017 09:30
@serhiy-storchaka serhiy-storchaka removed the request for review from a team October 13, 2017 09:36
@vstinner
Copy link
Member Author

I found a solution to prevent integer overflow with a new _PyTime_MulDiv() by computing the result in two parts.

This PR should now avoids any precision loss in _PyTime_GetPerfCounter() on Windows (QueryPerformanceCounter) and Linux (clock_gettime).

Only the final conversion from _PyTime_t (int) to double in time.perf_counter() can loose precision, but only after 104 days:

# nanoseconds (_PyTime_t) => seconds (double)
# _PyTime_GetPerfCounter => time.perf_counter()
>>> x=2**52+1;int(float(x*1e-9)*1e9)-x   # no precision loss
0
>>> x=2**53+1;int(float(x*1e-9)*1e9)-x   # precision loss! (1 nanosecond)
-1

>>> import datetime; datetime.timedelta(seconds=2**53 / 1e9)
datetime.timedelta(104, 21599, 254741)
>>> print(datetime.timedelta(seconds=2**53 / 1e9))
104 days, 5:59:59.254741

Add another sanity check: timebase.denom < _PyTime_t
Copy link
Member

@methane methane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about mach_absolute_time().
But code looks good and API looks far better than current.


#if defined(MS_WINDOWS) || defined(__APPLE__)
Py_LOCAL_INLINE(_PyTime_t)
_PyTime_MulDiv(_PyTime_t ticks, _PyTime_t mul, _PyTime_t div)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyTime_t is a time duration in nanoseconds. Neither of arguments is a _PyTime_t. I would use long long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use _PyTime_t type for the 3 parameters to not have to worry about casting or handling integer overflow.

Using the _PyTime_t type doesn't mean that the value must be a duration. _PyTime_t is just a signed integer ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that _PyTime_t already is used not only for time in nanoseconds.

Python/pytime.c Outdated
Check also that timebase.numer and timebase.denom can be casted to
_PyTime_t. */
if (timebase.numer > _PyTime_MAX / timebase.denom
|| timebase.denom > _PyTime_MAX) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timebase.denom > _PyTime_MAX is redundant. If timebase.denom > _PyTime_MAX then _PyTime_MAX / timebase.denom == 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's correct, I prefer to keep the test to be very explicit. The test is only run once, it doesn't matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe than add the explicit test timebase.numer > _PyTime_MAX?

Additional timebase.denom > _PyTime_MAX doesn't add clarity to me. It rather make me puzzling and spending a minute for figuring out why this test is added explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe than add the explicit test timebase.numer > _PyTime_MAX?

Done

Python/pytime.c Outdated
compute the number of seconds ("integer part"), and then the remaining
number of nanoseconds ("floating part").
The caller has to check that ticks * mul, with ticks < div, cannot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can support even larger mul if represent the fraction mul / div as int + mul2 / div, where int = round(mul/div). But it seems there is no need to do this now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful tomorrow if we need to support other weird clocks, but I confirm that I don't see the need for that right now.

CPU frequencies seems to be stuck at 4 GHz for physical reasons, and the maximum mul accepted by win_perf_counter() is (263-1)//(109) which is larger than 9 GHz. Moreover, recent Windows versions now use a fixed frequency of 10^7 (10 MHz): a resolution of 100 ns. QueryPerformanceFrequency() doesn't seem to be related to the CPU frequency anymore.

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this version. Some readability suggestions inline for the revised integer arithmetic, but I don't consider them blockers.

Python/pytime.c Outdated
compute the number of seconds ("integer part"), and then the remaining
number of nanoseconds ("floating part").
The caller has to check that ticks * mul, with ticks < div, cannot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constraint would be more clearly expressed as:

The caller must ensure that "div * mul" cannot overflow, as this ensures that neither 'ticks / div * mul' (the seconds calculation) nor '(ticks % div) * mul / div' (the nanosecond calculation) will overflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased (and simpliifed) this comment.

Python/pytime.c Outdated
ticks %= div;
nsec = ticks * mul;
nsec /= div;
return sec * mul + nsec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it might be clearer to spell out the full calculations, and then rely on the compiler being smart enough to figure out whether or not there are any subcalculations it can re-use:

sec = ticks / div * mul; /* Avoid overflow in the integer part */
nsec = (ticks % div) * mul / div; /* Avoid loss of precision in the fractional part */
return sec + nsec

If you write it out that way, then the suggested comment above could be simplified to:

The caller must ensure that "div * mul" cannot overflow, as this ensures that neither the seconds calculation nor the nanoseconds calculation will overflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to leave the code as it is. I prefer to write explicit code and don't rely on the computer to be smart.

I just hope that some smart compilers are able to compile "ticks / div" and "ticks % div" as a single CPU instruction ;-) (At least, Intel AMD64 is able to do that.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nick's shorter variant looks clearer to me too.

But the name sec looks misleading. This isn't a time in seconds, and if it would be, adding seconds and nanoseconds would be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point about the names: sec_part and nsec_part would be more accurate (since they're both in units of nanoseconds, it's just that the first one has a zero nanosecond component, while the latter has a zero seconds component).

Regarding "what are compilers and CPUs good at optimising?", the thing you really, really, really want to avoid is pipeline stalls as they wait to touch the same piece of memory multiple times without enough other work in between. Setting out the calculations as distinct operations makes it clear to the compiler that the two calculations are entirely independent of each other, and the only time they need to be synchronised is in the final addition.

So that's why I favour the approach of making the algorithmic intent as clear as possible, for the benefit of both human readers and compilers, and in this case, that means making the two expressions independent of each other rather than requiring the reader to track the local variable mutation and reverse engineer the original expressions from that.

Hand-optimisation might make sense if a profiler indicated this was a hotspot, but even then, I'd be surprised if switching from "two independent subexpressions" to "touch the same local variable multiples times" actually sped things up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed variables to intpart and remaining since I'm trying to not hardcode the _PyTime_t unit in pytime.c. The API is "unit agnostic".

Exchange also overflow check on mach_timebase_info() to keep the
explicit "timebase.denom > _PyTime_MAX" check, even if it's
technically redundant.
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides few style nitpicks in general the PR LGTM.


#if defined(MS_WINDOWS) || defined(__APPLE__)
Py_LOCAL_INLINE(_PyTime_t)
_PyTime_MulDiv(_PyTime_t ticks, _PyTime_t mul, _PyTime_t div)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that _PyTime_t already is used not only for time in nanoseconds.

Python/pytime.c Outdated
ticks %= div;
nsec = ticks * mul;
nsec /= div;
return sec * mul + nsec;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nick's shorter variant looks clearer to me too.

But the name sec looks misleading. This isn't a time in seconds, and if it would be, adding seconds and nanoseconds would be wrong.

Python/pytime.c Outdated
Check also that timebase.numer and timebase.denom can be casted to
_PyTime_t. */
if (timebase.numer > _PyTime_MAX / timebase.denom
|| timebase.denom > _PyTime_MAX) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe than add the explicit test timebase.numer > _PyTime_MAX?

Additional timebase.denom > _PyTime_MAX doesn't add clarity to me. It rather make me puzzling and spending a minute for figuring out why this test is added explicitly.

@vstinner
Copy link
Member Author

Hum, it's not convenient to work on this PR since the modified code is only compiled on macOS and Windows. I have to use 3 computers to work on it.

I fixed the last warnings on macOS and Windows. I tested manually time.time() and ran test_time on macOS and Windows. Everything seems fine (to me). I'm now waiting for Travis CI + AppVeyor to merge this PR.

@vstinner vstinner merged commit bdaeb7d into python:master Oct 16, 2017
@vstinner vstinner deleted the perf_counter_int branch October 16, 2017 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants