Why is a=a100 almost two times faster than a=100? [duplicate]

Question

Following the question about Chaining *= += operators and the good comment of Tom Wojcik ("Why would you assume aaa *= 200 is faster than aaa = aaa * 200 ?"), I tested it in Jupyter notebook:

%%timeit aaa = np.arange(1,101,1)
    aaa*=100

%%timeit aaa = np.arange(1,101,1)
    aaa=aaa*100

And I was surprised because the first test is longer than the second one: 1530ns and 952ns, respectively. Why these values are so different?

This is related to numpy. Doesn't happen with regular ints or floats — pho
– pho, Commented Apr 20, 2021 at 14:16
Changing the range to be np.arange(1,10001,1) actually reverses the results: aaa*=100 is faster! So the in-place is still faster as the input grows in size. For small arrays, for some reason, creating a new array is more efficient... — Tomerikoo
– Tomerikoo, Commented Apr 20, 2021 at 14:22
The difference is that one modifies the data-structure itself (in-place operation) aaa*= 1 while the other just reassigns the variable a = a * 100, which I guess is the source of slower behavior. — MaPy
– MaPy, Commented Apr 20, 2021 at 14:22
@MaPy I think you missed the point. The one assigning a new array is faster... — Tomerikoo
– Tomerikoo, Commented Apr 20, 2021 at 14:23

Kir Chou · Accepted Answer · 2021-04-20 15:06:32Z

13

TL;DR: this question is equivalent to the performance difference between inplace_binop (INPLACE_*) (aaa*=100) vs binop (BINARY_*) (aaa=aaa*100). The difference can be found by using dis module:

import numpy as np
import dis

aaa = np.arange(1,101,1)

dis.dis('''
for i in range(1000000):
  aaa*=100
''')

  3          14 LOAD_NAME                2 (aaa)
             16 LOAD_CONST               1 (100)
             18 INPLACE_MULTIPLY
             20 STORE_NAME               2 (aaa)
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               2 (None)
             28 RETURN_VALUE

dis.dis('''
for i in range(1000000):
  aaa=aaa*100
''')

  3          14 LOAD_NAME                2 (aaa)
             16 LOAD_CONST               1 (100)
             18 BINARY_MULTIPLY
             20 STORE_NAME               2 (aaa)
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               2 (None)
             28 RETURN_VALUE

Then back to your question, which is absolutely faster?

Unluckily, it's hard to say which function is faster, here's why:

You can check compile.c of CPython code directly. If you trace a bit into CPython code, here's the function call difference:

inplace_binop -> compiler_augassign -> compiler_visit_stmt
binop -> compiler_visit_expr1 -> compiler_visit_expr -> compiler_visit_kwonlydefaults

Since the function call and logic are different, that means there are tons of factors (including your input size(*), CPU...etc) could matter to the performance as well, you'll need to work on profiling to optimize your code based on your use case.

*: from others comment, you can check this post to know the performance of different input size.

edited Apr 20, 2021 at 15:06

answered Apr 20, 2021 at 14:57

Kir Chou

3,1001 gold badge43 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Stef1611 Over a year ago

Very thanks to all !!! I was not thinking that my question would take me so far. I am going to try to understand this very interesting answer.

Stef1611 Over a year ago

Thanks for your answer. I am beginning python (I come from C world) and it is very interesting to see the python bytecode. You make me discover the dis module. I wonder if it is possible to go deeper. For example, until the asm code ?

Kir Chou Over a year ago

@Stef1611 The missing puzzle includes bytecode and VM, you can initially refer The AST and Me by Emily Morehouse-Valcarcel @ PyCon 2018 and this SO answer for details. To know further, please consider the CPython internals class by Philip Guo.

Stef1611 Over a year ago

Thanks a lot. I think I will be occupied for some days or months ... But very very interesting.

Petronella · Accepted Answer · 2021-04-20 15:35:02Z

The += symbol appeared in the C language in the 1970s, and - with the C idea of "smart assembler" correspond to a clearly different machine instruction and addressing mode

"a=a * 100" "a *= 100" produce the same effect but correspond at low level to a different way the processor is working.

a *= 100 means

find the place identified by a
multiply with 100

a = a * 100 means:

evaluate a*100
Find the place identified by a
Copy a into an accumulator
multiply with 100 the accumulator
Store the result in a
Find the place identified by a
Copy the accumulator to it

Python is coded in C, it inherited the syntax from C, but since there is no translation / optimization before the execution in interpreted languages, things are not necessarily so intimately related (since there is one less parsing step). However, an interpreter can refer to different execution routines for the three types of expression, taking advantage of different machine code depending on how the expression is formed and on the evaluation context.

Collectives™ on Stack Overflow

Why is a=a100 almost two times faster than a=100? [duplicate]

2 Answers 2

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Linked

Related