bpo-47009: Streamline list.append for the common case#31864

sweeneyde · 2022-03-14T04:58:43Z

https://bugs.python.org/issue47009

sweeneyde · 2022-03-14T04:59:56Z

Benchmarks are good:

from pyperf import Runner, perf_counter

def bench_listcomp(loops, length):
    src = list(map(float, range(length)))
    t0 = perf_counter()
    for i in range(loops):
        [x for x in src]
    return perf_counter() - t0

def bench_append(loops, length):
    src = list(map(float, range(length)))
    t0 = perf_counter()
    for i in range(loops):
        arr = []
        for x in src:
            arr.append(x)
    return perf_counter() - t0

runner = Runner()
for n in [100, 1_000, 10_000, 100_000]:
    runner.bench_time_func(f"listcomp {n}", bench_listcomp, n)
    runner.bench_time_func(f"append {n}", bench_append, n)

Results from GCC on WSL with --enable-optimizations --with-lto

Faster (8):
- listcomp 10000: 118 us +- 2 us -> 92.6 us +- 1.5 us: 1.28x faster
- listcomp 100000: 1.16 ms +- 0.02 ms -> 916 us +- 26 us: 1.27x faster
- listcomp 1000: 12.3 us +- 0.2 us -> 9.89 us +- 0.41 us: 1.25x faster
- listcomp 100: 1.59 us +- 0.03 us -> 1.32 us +- 0.05 us: 1.21x faster
- append 100000: 1.69 ms +- 0.05 ms -> 1.45 ms +- 0.04 ms: 1.17x faster
- append 10000: 168 us +- 4 us -> 145 us +- 3 us: 1.16x faster
- append 1000: 17.4 us +- 0.3 us -> 15.2 us +- 0.6 us: 1.14x faster
- append 100: 2.03 us +- 0.06 us -> 1.81 us +- 0.08 us: 1.12x faster

Geometric mean: 1.20x faster

markshannon

I think this would be simpler, and just as fast, by renaming app1 to _PyList_AppendTakeRef (with appropriate refcount adjustments).
No need for the inline functions.

markshannon · 2022-03-18T11:40:25Z

Include/internal/pycore_list.h

+PyAPI_FUNC(int)
+_PyList_AppendTakeRefListResize(PyListObject *self, PyObject *newitem);
+
+static inline int


No need for an inline function.
Let the LTO pass decide if it wants to inline it.

markshannon · 2022-03-18T11:45:03Z

Objects/listobject.c

-app1(PyListObject *self, PyObject *v)
+/* internal, used by _PyList_AppendTakeRef */
+int
+_PyList_AppendTakeRefListResize(PyListObject *self, PyObject *newitem)


Do we need the extra function?
Converting app1 to _PyList_AppendTakeRef would seem to be enough.

I tried something like that and the results are on the bpo issue, but it seems there is a significant benefit to having the separate as-small-as-possible function that always gets inlined as opposed to a slightly longer version that we let the compiler figure out.

Presumably the compiler isn't inlining the function because it isn't hot enough.
PRECALL_NO_KW_LIST_APPEND and LIST_APPEND are only 0.3% of all instructions executed, so maybe it is best not to inline.

Do you have benchmark numbers for the standard suite?

https://gist.github.com/sweeneyde/6cbbe1c9d216d117370a809c704b6cfc

Geometric mean: 1.00x faster

Also, IMO there is a relative lack of list comprehensions in the pyperformance suite.

Include/internal/pycore_list.h

sweeneyde · 2022-03-31T21:17:31Z

@markshannon Since pyperformance results were negligible and this has a significant speedup on operations that live in hot loops (albeit not in pyperformance), is it okay if I merge this?

markshannon · 2022-04-01T10:17:18Z

Sorry, this dropped off my radar.

sweeneyde requested a review from markshannon as a code owner March 14, 2022 04:58

bedevere-bot added the awaiting core review label Mar 14, 2022

the-knights-who-say-ni added the CLA signed label Mar 14, 2022

sweeneyde requested a review from methane March 14, 2022 05:16

sweeneyde added the performance Performance or resource usage label Mar 14, 2022

Streamline list.append for the common case

a240df1

sweeneyde force-pushed the listappend branch from f1951b6 to a240df1 Compare March 14, 2022 09:43

📜🤖 Added by blurb_it.

e038176

markshannon reviewed Mar 18, 2022

View reviewed changes

kumaraditya303 reviewed Mar 21, 2022

View reviewed changes

Include/internal/pycore_list.h Outdated Show resolved Hide resolved

sweeneyde and others added 2 commits March 21, 2022 03:21

PyAPI_FUNC --> extern

8ad202f

Merge branch 'main' into listappend

ea3f634

markshannon merged commit a0ea7a1 into python:main Apr 1, 2022

bedevere-bot removed the awaiting core review label Apr 1, 2022

sweeneyde deleted the listappend branch April 1, 2022 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bpo-47009: Streamline list.append for the common case#31864