bpo-31415: Rewrite _PyTime_GetWinPerfCounter() for _PyTime_t #3942

vstinner · 2017-10-10T12:35:14Z

Rewrite _PyTime_GetWinPerfCounter() for _PyTime_t
Cleanup pytime.c

https://bugs.python.org/issue31415

vstinner · 2017-10-10T12:43:48Z

The first commit of this PR ("Rewrite _PyTime_GetWinPerfCounter() for _PyTime_t") modifiy _PyTime_GetWinPerfCounter() to get the clock value as a _PyTime_t directly. The complexity is to avoid integer overflow, since the function computes value * SEC_TO_NS / perf_frequency, where SEC_TO_NS = 1_000_000_000.

In my Windows 8.1 VM, perf_frequency = 10_000_000, and so the ratio becomes 100/1. At the end, _PyTime_GetWinPerfCounter() just has to multiply the clock by 100. There is no risk of precision loss by an irrational denominator.

While modifying Windows time.perf_counter(), I also modified macOS time.monotonic() which also uses a numerator/denominator ratio. It's just to make the code more consistent, since on macOS Sierra, the ratio is 1/1 and so computing GCD is uselss. But the GCD is only computed once, so it doesn't matter.

I finished with a code cleanup to normalize the code for the PEP 7: add { ... } since I like "my" tiny pytime.c file :-)

ncoghlan

This doesn't look wrong to me, but I'd never really followed how the old version worked, so I'm not sure that actually counts for much :)

vstinner · 2017-10-10T12:59:39Z

@ncoghlan: This doesn't look wrong to me, but I'd never really followed how the old version worked, so I'm not sure that actually counts for much :)

Using pseudo-code, this big PR replaces:

double diff = QueryPerformanceCounter();  // floatting point number
return diff / (double)QueryPerformanceFrequency();   // seconds

with

_PyTime_t numer, denom;
numer, denom = magic_fraction(SEC_TO_NS, QueryPerformanceFrequency());
...
_PyTime_t diff = QueryPerformanceCounter();  // integer numbers
return (diff * numer) / denom;   // nanoseconds

ncoghlan

Thanks for the explanation - I agree this patch does indeed achieve that goal.

One comment inline regarding a function name that you may want to change to improve readability, but I don't consider that a blocker for merging.

ncoghlan · 2017-10-10T13:39:06Z

Python/pytime.c

It occurs to me that if the low order bits in mul are zero, and the high order bits in div are also zero, then shifting mul right and div left could allow even more computations to avoid overflow.

I'd be surprised if that was worth the extra complexity given the GCD calculations further down, though.

Using GCD, there is no need to count the number of zero low order bits.

I'd be surprised if that was worth the extra complexity given the GCD calculations further down, though.

On Windows, the ratio is SEC_TO_NS / perf_frequency where SEC_TO_NS=1_000_000_000. On my Windows 8.1 VM, perf_frequency=10_000_000. At the end, you get numer=100 and denom=1.

If you only use binary shifts, you loose the common divisor 5.

ncoghlan · 2017-10-10T13:43:12Z

Python/pytime.c

I almost missed that this initialised t0 in addition to perf_frequency. Since it initialises all the statics for this function rather than just perf_frequency, perhaps _init_win_perf_counter would be a better name?

I renamed the method to init_win_perf_counter(). No need for a "_" prefix, since the function is marked as private using "static" keyword.

vstinner · 2017-10-11T14:15:17Z

I wasn't confortable with checking for integer overflow and raising an exception on clock read. So I replaced the check at each clock read with a check only done once when a clock is initialized. The maximum multiplier should now be lower than 291.

On macOS, the multipler is always 1, so it's ok.

On my Windows 8.1 VM, the multiplier is 100 (so it's ok on my VM). I don't know for older Windows versions, we will see during the beta stage :-)

If the check fails on some machines, we will have to find a different strategy, like using larger integers (128 bits) to prevent any risk of integer overflow.

serhiy-storchaka

What is the purpose of this change? Why using integer arithmetic is better than using floats?

A large number of brace additions distracts attention and makes reviewing harder. I'm not sure that I haven't skipped some meaningful changes.

serhiy-storchaka · 2017-10-11T10:22:06Z

Python/pytime.c

It seems to me that _PyTime_GetWinPerfCounterWithInfo() now can return -1 without setting an error.

Yeah, it's on purpose.

_PyTime_Init() makes sure the time.time(), time.monotonic() and time.perf_counter() clocks work. If a function returns -1 without raising an exception, it's ok. _Py_InitializeMainInterpreter() calls Py_FatalError() with a nice error message if _PyTime_Init() fails. If an exception is raised, it is displayed. If no exception is raised, we only have the Py_FatalError() message which should be enough.

I only uesd the -1 without any exception set for cases which "should never occur in practice" :-)

Why not use asserts instead?

It would be better to document this behavior explicitly, because it is different from conventional behavior of C API.

Why not use asserts instead?

I prefer to not make too many assumptions on clock. Some platforms are really weird.

When I implemented time.monotonic(), I added a check at runtime, but only in debug mode, to really make sure that the clock is monotonic. The test failed on a strange Debian VM. We decided to just drop the assertion... Honestly, I only added the assertion to really check if some weird clock exist in the world. So yes, they do exist :-)

It would be better to document this behavior explicitly, because it is different from conventional behavior of C API.

In fact, the same C function is used in 3 different ways:

_PyTime_Init(): read each clock once to check if the clock works. The check is required since clock_gettime(CLOCK_MONOTONIC) can return an error on Hurd for example.

time.get_clock_time()

time.time(), time.perf_counter(), time.monotonic()

serhiy-storchaka · 2017-10-11T14:06:12Z

Python/pytime.c

Why not reuse a and b as in common implementation of Euclid algorithm? This would slightly simplify the code. In any case you need a one temporary variable.

Ok, will do once I will rebase this PR.

serhiy-storchaka · 2017-10-11T14:12:17Z

Python/pytime.c

When this is possible?

It's sanity checks "just in case". In practice, mach_timebase_info() always return (1, 1).

These checks avoid a possible division by zero.

serhiy-storchaka · 2017-10-11T14:15:36Z

Python/pytime.c

When this is possible?

freq.QuadPart is supposed to be >= 1... but I prefer to be super safe, and check anything, "just in case".

The check is cheap and only done once, whereas a crash in bug can be very annoying and complex to debug.

serhiy-storchaka · 2017-10-11T14:18:55Z

Python/pytime.c

Can this overflow?

(now.QuadPart - t0) shouldn't overflow. Both values are supposed to be positive and the clock is supposed to be monotonic. In practice, there is no warranty, the clock might go backward, so it isn't used for time.monotonic() but only time.perf_counter().

Note: my PR doesn't introduced this code, it exists since Python 2.7 at least:
https://github.com/python/cpython/blob/2.7/Modules/timemodule.c#L184

I don't mean that the subtraction can overflow. I mean that casting the result from LARGE_INTEGER to _PyTime_t can overflow. In the current code the result is casted to double. It never overflows.

(Stupid GitHub, my rebase dropped the commits, I don't have the context anymore.)

No, in practice, _PyTime_t == LONGLONG, it's the same type. I only added an assertion to be "future proof".

vstinner · 2017-10-11T14:42:57Z

Hum, I'm making more changes than I expected. I created the PR #3955. Once the cleanup PR will be merged, I will rebase this one on top of it to keep this PR reviewable.

vstinner · 2017-10-11T14:44:25Z

What is the purpose of this change? Why using integer arithmetic is better than using floats?

This PR is a follow-up of my first PR which was already merged. See my comment:
#3936 (comment)
"The drawback of this change is that time.perf_counter() now converts QueryPerformanceCounter() / QueryPerformanceFrequency() double into a _PyTime_t (integer) and then back to double. Two useless conversions required by the standard _PyTime_t format."

This PR is supposed to reduce the risk of precision loss even further.

vstinner · 2017-10-11T15:20:27Z

I checked test.pythoninfo on Windows 3.x buildbots and AppVeyor. I looked at the resolution of time.perf_counter:

AppVeyor: resolution=1e-07
x86 Windows7 3.x: resolution=2.793651148400146e-07
AMD64 Windows8 3.x: resolution=1e-07
AMD64 Windows8.1 Non-Debug 3.x: resolution=1e-07

Full example:

time.perf_counter: namespace(adjustable=False, implementation='QueryPerformanceCounter()', monotonic=True, resolution=1e-07)

resolution=2.793651148400146e-07 can be an issue. It gives QueryPerformanceFrequency()==3579545, but the fraction cannot be simplified much:

>>> import fractions; fractions.Fraction(10**9, 3579545)
Fraction(200000000, 715909)

vstinner · 2017-10-11T15:23:49Z

Note: 3579545 Hz comes from the Programmable Interval Timer (PIT) chip (also called an 8253/8254 chip).

https://stackoverflow.com/questions/2345599/why-is-my-stopwatch-frequency-so-low

3,579,545 is the magic number. That's the frequency in Hertz before dividing it by 3 and feeding it into the 8053 timer chip in the original IBM PC. The odd looking number wasn't chosen by accident, it is the frequency of the color burst signal in the NTSC TV system used in the US and Japan. The IBM engineers were looking for a cheap crystal to implement the oscillator, nothing was cheaper than the one used in every TV set.

The _PyTime_GetWinPerfCounterWithInfo() function now only uses _PyTime_t (integer) internally, rather than using C double to divide the clock by the performance frequency. Add _PyTime_GCD() to compute the smallest numerator and denominator to apply the performance frequency in _PyTime_GetWinPerfCounterWithInfo(). Modify also pymonotonic() macOS implementation which uses mach_absolute_time() and mach_timebase_info() to use _PyTime_GCD(). Add also _PyTime_MulDiv() to apply a numerator and denominotor on a timestamp but also check for integer overflow.

vstinner · 2017-10-12T14:51:01Z

I abandon this PR, my approach doesn't work. This PR is superceded by the PR #3964 which uses C double for time.clock() and time.perf_counter().

While my PR was funny to write and works nicely with QueryPerformanceFrequency() == 10**7 (10_000_000, 10 MHz, 100 ns resolution), it leads quickly to integer overflow in _PyTime_MulDiv() with QueryPerformanceFrequency() == 3,579,545.

I don't know how to implement 128-bit int mul and 128-bit int div in a portable way, especially on 32-bit architecture. I tried __int128 in Visual Studio but I got an error in x64 (64-bit) mode (so I don't expect it to work in 32-bit mode). I don't want to create temporary "heavy" PyLongObject... just to read a clock.

Note: 3,579,545 is decomposed as prime numbers as 5 * 715,909. The 715,909 number is not really convenient for integer arithmetic with 64-bit integer (since I would like to store nanoseconds).

vstinner requested review from methane and ncoghlan October 10, 2017 12:35

the-knights-who-say-ni added the CLA signed label Oct 10, 2017

bedevere-bot added the awaiting merge label Oct 10, 2017

vstinner added the skip news label Oct 10, 2017

vstinner mentioned this pull request Oct 10, 2017

bpo-31415: Add _PyTime_GetPerfCounter() and use it for -X importtime #3936

Merged

ncoghlan reviewed Oct 10, 2017

View reviewed changes

ncoghlan approved these changes Oct 10, 2017

View reviewed changes

serhiy-storchaka self-requested a review October 11, 2017 10:22

serhiy-storchaka reviewed Oct 11, 2017

View reviewed changes

vstinner added 2 commits October 11, 2017 17:26

Fix _PyTime_MulDiv()

3c10dae

vstinner closed this Oct 12, 2017

vstinner deleted the pytime_gcd branch October 12, 2017 14:51

Uh oh!

bpo-31415: Rewrite _PyTime_GetWinPerfCounter() for _PyTime_t #3942

bpo-31415: Rewrite _PyTime_GetWinPerfCounter() for _PyTime_t #3942

Uh oh!

Conversation

vstinner commented Oct 10, 2017 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Oct 10, 2017

Uh oh!

ncoghlan left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner commented Oct 10, 2017

Uh oh!

ncoghlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner commented Oct 11, 2017

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner commented Oct 11, 2017

Uh oh!

vstinner commented Oct 11, 2017

Uh oh!

vstinner commented Oct 11, 2017

Uh oh!

vstinner commented Oct 11, 2017

Uh oh!

vstinner commented Oct 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vstinner commented Oct 10, 2017 •

edited by bedevere-bot

Loading