Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Oct 10, 2017

  • Rewrite _PyTime_GetWinPerfCounter() for _PyTime_t
  • Cleanup pytime.c

https://bugs.python.org/issue31415

@vstinner
Copy link
Member Author

The first commit of this PR ("Rewrite _PyTime_GetWinPerfCounter() for _PyTime_t") modifiy _PyTime_GetWinPerfCounter() to get the clock value as a _PyTime_t directly. The complexity is to avoid integer overflow, since the function computes value * SEC_TO_NS / perf_frequency, where SEC_TO_NS = 1_000_000_000.

In my Windows 8.1 VM, perf_frequency = 10_000_000, and so the ratio becomes 100/1. At the end, _PyTime_GetWinPerfCounter() just has to multiply the clock by 100. There is no risk of precision loss by an irrational denominator.

While modifying Windows time.perf_counter(), I also modified macOS time.monotonic() which also uses a numerator/denominator ratio. It's just to make the code more consistent, since on macOS Sierra, the ratio is 1/1 and so computing GCD is uselss. But the GCD is only computed once, so it doesn't matter.

I finished with a code cleanup to normalize the code for the PEP 7: add { ... } since I like "my" tiny pytime.c file :-)

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look wrong to me, but I'd never really followed how the old version worked, so I'm not sure that actually counts for much :)

@vstinner
Copy link
Member Author

@ncoghlan: This doesn't look wrong to me, but I'd never really followed how the old version worked, so I'm not sure that actually counts for much :)

Using pseudo-code, this big PR replaces:

double diff = QueryPerformanceCounter();  // floatting point number
return diff / (double)QueryPerformanceFrequency();   // seconds

with

_PyTime_t numer, denom;
numer, denom = magic_fraction(SEC_TO_NS, QueryPerformanceFrequency());
...
_PyTime_t diff = QueryPerformanceCounter();  // integer numbers
return (diff * numer) / denom;   // nanoseconds

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation - I agree this patch does indeed achieve that goal.

One comment inline regarding a function name that you may want to change to improve readability, but I don't consider that a blocker for merging.

Python/pytime.c Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that if the low order bits in mul are zero, and the high order bits in div are also zero, then shifting mul right and div left could allow even more computations to avoid overflow.

I'd be surprised if that was worth the extra complexity given the GCD calculations further down, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using GCD, there is no need to count the number of zero low order bits.

I'd be surprised if that was worth the extra complexity given the GCD calculations further down, though.

On Windows, the ratio is SEC_TO_NS / perf_frequency where SEC_TO_NS=1_000_000_000. On my Windows 8.1 VM, perf_frequency=10_000_000. At the end, you get numer=100 and denom=1.

If you only use binary shifts, you loose the common divisor 5.

Python/pytime.c Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost missed that this initialised t0 in addition to perf_frequency. Since it initialises all the statics for this function rather than just perf_frequency, perhaps _init_win_perf_counter would be a better name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the method to init_win_perf_counter(). No need for a "_" prefix, since the function is marked as private using "static" keyword.

@serhiy-storchaka serhiy-storchaka self-requested a review October 11, 2017 10:22
@vstinner
Copy link
Member Author

I wasn't confortable with checking for integer overflow and raising an exception on clock read. So I replaced the check at each clock read with a check only done once when a clock is initialized. The maximum multiplier should now be lower than 291.

On macOS, the multipler is always 1, so it's ok.

On my Windows 8.1 VM, the multiplier is 100 (so it's ok on my VM). I don't know for older Windows versions, we will see during the beta stage :-)

If the check fails on some machines, we will have to find a different strategy, like using larger integers (128 bits) to prevent any risk of integer overflow.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this change? Why using integer arithmetic is better than using floats?

A large number of brace additions distracts attention and makes reviewing harder. I'm not sure that I haven't skipped some meaningful changes.

Python/pytime.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that _PyTime_GetWinPerfCounterWithInfo() now can return -1 without setting an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's on purpose.

_PyTime_Init() makes sure the time.time(), time.monotonic() and time.perf_counter() clocks work. If a function returns -1 without raising an exception, it's ok. _Py_InitializeMainInterpreter() calls Py_FatalError() with a nice error message if _PyTime_Init() fails. If an exception is raised, it is displayed. If no exception is raised, we only have the Py_FatalError() message which should be enough.

I only uesd the -1 without any exception set for cases which "should never occur in practice" :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use asserts instead?

It would be better to document this behavior explicitly, because it is different from conventional behavior of C API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use asserts instead?

I prefer to not make too many assumptions on clock. Some platforms are really weird.

When I implemented time.monotonic(), I added a check at runtime, but only in debug mode, to really make sure that the clock is monotonic. The test failed on a strange Debian VM. We decided to just drop the assertion... Honestly, I only added the assertion to really check if some weird clock exist in the world. So yes, they do exist :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to document this behavior explicitly, because it is different from conventional behavior of C API.

In fact, the same C function is used in 3 different ways:

  • _PyTime_Init(): read each clock once to check if the clock works. The check is required since clock_gettime(CLOCK_MONOTONIC) can return an error on Hurd for example.
  • time.get_clock_time()
  • time.time(), time.perf_counter(), time.monotonic()

Python/pytime.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reuse a and b as in common implementation of Euclid algorithm? This would slightly simplify the code. In any case you need a one temporary variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do once I will rebase this PR.

Python/pytime.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sanity checks "just in case". In practice, mach_timebase_info() always return (1, 1).

These checks avoid a possible division by zero.

Python/pytime.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freq.QuadPart is supposed to be >= 1... but I prefer to be super safe, and check anything, "just in case".

The check is cheap and only done once, whereas a crash in bug can be very annoying and complex to debug.

Python/pytime.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this overflow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(now.QuadPart - t0) shouldn't overflow. Both values are supposed to be positive and the clock is supposed to be monotonic. In practice, there is no warranty, the clock might go backward, so it isn't used for time.monotonic() but only time.perf_counter().

Note: my PR doesn't introduced this code, it exists since Python 2.7 at least:
https://github.com/python/cpython/blob/2.7/Modules/timemodule.c#L184

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean that the subtraction can overflow. I mean that casting the result from LARGE_INTEGER to _PyTime_t can overflow. In the current code the result is casted to double. It never overflows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Stupid GitHub, my rebase dropped the commits, I don't have the context anymore.)

No, in practice, _PyTime_t == LONGLONG, it's the same type. I only added an assertion to be "future proof".

@vstinner
Copy link
Member Author

Hum, I'm making more changes than I expected. I created the PR #3955. Once the cleanup PR will be merged, I will rebase this one on top of it to keep this PR reviewable.

@vstinner
Copy link
Member Author

What is the purpose of this change? Why using integer arithmetic is better than using floats?

This PR is a follow-up of my first PR which was already merged. See my comment:
#3936 (comment)
"The drawback of this change is that time.perf_counter() now converts QueryPerformanceCounter() / QueryPerformanceFrequency() double into a _PyTime_t (integer) and then back to double. Two useless conversions required by the standard _PyTime_t format."

This PR is supposed to reduce the risk of precision loss even further.

@vstinner
Copy link
Member Author

I checked test.pythoninfo on Windows 3.x buildbots and AppVeyor. I looked at the resolution of time.perf_counter:

  • AppVeyor: resolution=1e-07
  • x86 Windows7 3.x: resolution=2.793651148400146e-07
  • AMD64 Windows8 3.x: resolution=1e-07
  • AMD64 Windows8.1 Non-Debug 3.x: resolution=1e-07

Full example:

time.perf_counter: namespace(adjustable=False, implementation='QueryPerformanceCounter()', monotonic=True, resolution=1e-07)

resolution=2.793651148400146e-07 can be an issue. It gives QueryPerformanceFrequency()==3579545, but the fraction cannot be simplified much:

>>> import fractions; fractions.Fraction(10**9, 3579545)
Fraction(200000000, 715909)

@vstinner
Copy link
Member Author

Note: 3579545 Hz comes from the Programmable Interval Timer (PIT) chip (also called an 8253/8254 chip).

https://stackoverflow.com/questions/2345599/why-is-my-stopwatch-frequency-so-low

3,579,545 is the magic number. That's the frequency in Hertz before dividing it by 3 and feeding it into the 8053 timer chip in the original IBM PC. The odd looking number wasn't chosen by accident, it is the frequency of the color burst signal in the NTSC TV system used in the US and Japan. The IBM engineers were looking for a cheap crystal to implement the oscillator, nothing was cheaper than the one used in every TV set.

The _PyTime_GetWinPerfCounterWithInfo() function now only uses
_PyTime_t (integer) internally, rather than using C double to divide
the clock by the performance frequency.

Add _PyTime_GCD() to compute the smallest numerator and denominator
to apply the performance frequency in
_PyTime_GetWinPerfCounterWithInfo().

Modify also pymonotonic() macOS implementation which uses
mach_absolute_time() and mach_timebase_info() to use _PyTime_GCD().

Add also _PyTime_MulDiv() to apply a numerator and denominotor on a
timestamp but also check for integer overflow.
@vstinner
Copy link
Member Author

I abandon this PR, my approach doesn't work. This PR is superceded by the PR #3964 which uses C double for time.clock() and time.perf_counter().

While my PR was funny to write and works nicely with QueryPerformanceFrequency() == 10**7 (10_000_000, 10 MHz, 100 ns resolution), it leads quickly to integer overflow in _PyTime_MulDiv() with QueryPerformanceFrequency() == 3,579,545.

I don't know how to implement 128-bit int mul and 128-bit int div in a portable way, especially on 32-bit architecture. I tried __int128 in Visual Studio but I got an error in x64 (64-bit) mode (so I don't expect it to work in 32-bit mode). I don't want to create temporary "heavy" PyLongObject... just to read a clock.

Note: 3,579,545 is decomposed as prime numbers as 5 * 715,909. The 715,909 number is not really convenient for integer arithmetic with 64-bit integer (since I would like to store nanoseconds).

@vstinner vstinner closed this Oct 12, 2017
@vstinner vstinner deleted the pytime_gcd branch October 12, 2017 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants