Skip to content

Conversation

@vstinner
Copy link
Member

When comparing suites with more than one benchmark, compute the
geometric mean of benchmark speeds, to compare a whole suite to the
reference suite with a single number.

When comparing suites with more than one benchmark, compute the
geometric mean of benchmark speeds, to compare a whole suite to the
reference suite with a single number.
@vstinner
Copy link
Member Author

@methane @pablogsal @serhiy-storchaka: Would you mind to have a look at this new feature? I'm not sure that I compute the geometric mean of the correct thing.

IMO it's a big feature and it should help a lot to compare two benchmark suites using a single number rather than 30 numbers.

A geometric mean has no unit. It's a unusual value. In short, geo mean > 1.0 means "faster", geo mean < 1.0 means "slower".


PyPy uses the geometric mean to compare PyPy to CPython as a single number. speed.pypy.org announces:

"The geometric average of all benchmarks is 0.24 or 4.2 times faster than cpython"

I'm not sure that speed.pypy.org and my PR compute the geometric mean of the same thing, since it announces that 0.24 means "faster".

My PR computes the geometric mean of all "speeds". The speed of a benchmark is the ratio: (benchmark mean) / (reference benchmark mean).

@vstinner
Copy link
Member Author

I'm not sure that speed.pypy.org and my PR compute the geometric mean of the same thing, since it announces that 0.24 means "faster".

Oops, I computed the geometric mean backwards. I normalized reference / benchmark, but the correct fomula is benchmark / reference.

I completed my PR to update the documentation which explains how the geometric mean is computed and what it means.

Use (benchmark mean) / (reference mean), rather than
(reference mean / benchmark mean), to use the same ratio than the
geometric mean: normalize the mean to the reference.
@vstinner
Copy link
Member Author

I wrote this PR when I saw bench_results.txt of https://bugs.python.org/issue41972 : it's really hard to read all these numbers and understand if it's faster or slower.

@vstinner vstinner mentioned this pull request Oct 13, 2020
@vstinner
Copy link
Member Author

I rewrote this PR as a serie of commits. It's now implemented in the master branch, I close this PR.

@vstinner vstinner closed this Oct 26, 2020
@vstinner vstinner deleted the geo_mean branch October 26, 2020 12:53
@vstinner
Copy link
Member Author

The last commit adding the feature to the "compare" command is the commit 0518a22.

Example:

$ python3 -m pyperf compare_to ./pyperf/tests/mult_list_py36.json ./pyperf/tests/mult_list_py37.json
[1]*1000: Mean +- std dev: [mult_list_py36] 2.13 us +- 0.06 us -> [mult_list_py37] 2.09 us +- 0.04 us: 1.02x faster (-2%)
[1,2]*1000: Mean +- std dev: [mult_list_py36] 3.70 us +- 0.05 us -> [mult_list_py37] 5.28 us +- 0.09 us: 1.42x slower (+42%)
[1,2,3]*1000: Mean +- std dev: [mult_list_py36] 4.61 us +- 0.13 us -> [mult_list_py37] 6.05 us +- 0.11 us: 1.31x slower (+31%)

Geometric mean: 1.22 (slower)


$ python3 -m pyperf compare_to ./pyperf/tests/mult_list_py36.json ./pyperf/tests/mult_list_py37.json -G
Slower (2):
- [1,2]*1000: 3.70 us +- 0.05 us -> 5.28 us +- 0.09 us: 1.42x slower (+42%)
- [1,2,3]*1000: 4.61 us +- 0.13 us -> 6.05 us +- 0.11 us: 1.31x slower (+31%)

Faster (1):
- [1]*1000: 2.13 us +- 0.06 us -> 2.09 us +- 0.04 us: 1.02x faster (-2%)

Geometric mean: 1.22 (slower)


$ python3 -m pyperf compare_to ./pyperf/tests/mult_list_py36.json ./pyperf/tests/mult_list_py37.json --table
+----------------+----------------+------------------------------+
| Benchmark      | mult_list_py36 | mult_list_py37               |
+================+================+==============================+
| [1]*1000       | 2.13 us        | 2.09 us: 1.02x faster (-2%)  |
+----------------+----------------+------------------------------+
| [1,2]*1000     | 3.70 us        | 5.28 us: 1.42x slower (+42%) |
+----------------+----------------+------------------------------+
| [1,2,3]*1000   | 4.61 us        | 6.05 us: 1.31x slower (+31%) |
+----------------+----------------+------------------------------+
| Geometric mean | (ref)          | 1.22 (slower)                |
+----------------+----------------+------------------------------+


$ python3 -m pyperf compare_to ./pyperf/tests/mult_list_py36.json ./pyperf/tests/mult_list_py37.json --table -G
+----------------+----------------+------------------------------+
| Benchmark      | mult_list_py36 | mult_list_py37               |
+================+================+==============================+
| [1]*1000       | 2.13 us        | 2.09 us: 1.02x faster (-2%)  |
+----------------+----------------+------------------------------+
| [1,2,3]*1000   | 4.61 us        | 6.05 us: 1.31x slower (+31%) |
+----------------+----------------+------------------------------+
| [1,2]*1000     | 3.70 us        | 5.28 us: 1.42x slower (+42%) |
+----------------+----------------+------------------------------+
| Geometric mean | (ref)          | 1.22 (slower)                |
+----------------+----------------+------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant