Skip to content

Conversation

@rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Aug 10, 2020

@rhettinger rhettinger added the performance Performance or resource usage label Aug 10, 2020
@rhettinger rhettinger changed the title bpo-41613: Improve speed and accuracy of math.hypot() bpo-41513: Improve speed and accuracy of math.hypot() Aug 10, 2020
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it requires to make tests less strict, it is potentially breaking change, and therefore it should be mentioned in What's New.

for (i=0 ; i < n ; i++) {
x = vec[i];
assert(Py_IS_FINITE(x) && fabs(x) <= max);
x *= scale;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is faster, x *= scale or x = ldexp(x, -max_e)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The test as written was over-specified. It should have been written this way from the start.

  • The x *= scale is faster than x = ldexp(x, -max_e). The former is a single, fast in-line instruction and the latter is an external library call.

Here's the generated code for the loop:

L284:
    movsd   (%r12,%rax,8), %xmm0
    addq    $1, %rax
    cmpq    %rax, %rbp
    mulsd   %xmm4, %xmm0           <-- x *= scale
    movapd  %xmm0, %xmm1
    mulsd   %xmm0, %xmm1           <-- x *= x
    movapd  %xmm2, %xmm0
    addsd   %xmm1, %xmm2            <-- csum += x
    subsd   %xmm2, %xmm0
    addsd   %xmm1, %xmm0
    addsd   %xmm0, %xmm3
    jg  L284
    subsd   %xmm

@rhettinger rhettinger merged commit fff3c28 into python:master Aug 16, 2020
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Aug 20, 2020
xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants