bpo-47009: Streamline list.append for the common case#31864
bpo-47009: Streamline list.append for the common case#31864markshannon merged 4 commits intopython:mainfrom
Conversation
|
Benchmarks are good: from pyperf import Runner, perf_counter
def bench_listcomp(loops, length):
src = list(map(float, range(length)))
t0 = perf_counter()
for i in range(loops):
[x for x in src]
return perf_counter() - t0
def bench_append(loops, length):
src = list(map(float, range(length)))
t0 = perf_counter()
for i in range(loops):
arr = []
for x in src:
arr.append(x)
return perf_counter() - t0
runner = Runner()
for n in [100, 1_000, 10_000, 100_000]:
runner.bench_time_func(f"listcomp {n}", bench_listcomp, n)
runner.bench_time_func(f"append {n}", bench_append, n)Results from GCC on WSL with |
markshannon
left a comment
There was a problem hiding this comment.
I think this would be simpler, and just as fast, by renaming app1 to _PyList_AppendTakeRef (with appropriate refcount adjustments).
No need for the inline functions.
| PyAPI_FUNC(int) | ||
| _PyList_AppendTakeRefListResize(PyListObject *self, PyObject *newitem); | ||
|
|
||
| static inline int |
There was a problem hiding this comment.
No need for an inline function.
Let the LTO pass decide if it wants to inline it.
| app1(PyListObject *self, PyObject *v) | ||
| /* internal, used by _PyList_AppendTakeRef */ | ||
| int | ||
| _PyList_AppendTakeRefListResize(PyListObject *self, PyObject *newitem) |
There was a problem hiding this comment.
Do we need the extra function?
Converting app1 to _PyList_AppendTakeRef would seem to be enough.
There was a problem hiding this comment.
I tried something like that and the results are on the bpo issue, but it seems there is a significant benefit to having the separate as-small-as-possible function that always gets inlined as opposed to a slightly longer version that we let the compiler figure out.
There was a problem hiding this comment.
Presumably the compiler isn't inlining the function because it isn't hot enough.
PRECALL_NO_KW_LIST_APPEND and LIST_APPEND are only 0.3% of all instructions executed, so maybe it is best not to inline.
Do you have benchmark numbers for the standard suite?
There was a problem hiding this comment.
https://gist.github.com/sweeneyde/6cbbe1c9d216d117370a809c704b6cfc
Geometric mean: 1.00x faster
There was a problem hiding this comment.
Also, IMO there is a relative lack of list comprehensions in the pyperformance suite.
|
@markshannon Since pyperformance results were negligible and this has a significant speedup on operations that live in hot loops (albeit not in pyperformance), is it okay if I merge this? |
|
Sorry, this dropped off my radar. |
https://bugs.python.org/issue47009