17

Following the question about Chaining *= += operators and the good comment of Tom Wojcik ("Why would you assume aaa *= 200 is faster than aaa = aaa * 200 ?"), I tested it in Jupyter notebook:

%%timeit aaa = np.arange(1,101,1)
    aaa*=100

%%timeit aaa = np.arange(1,101,1)
    aaa=aaa*100

And I was surprised because the first test is longer than the second one: 1530ns and 952ns, respectively. Why these values are so different?

11
  • 6
    If you reverse the order here, what are the results? Commented Apr 20, 2021 at 14:11
  • 3
    This is related to numpy. Doesn't happen with regular ints or floats Commented Apr 20, 2021 at 14:16
  • 4
    Changing the range to be np.arange(1,10001,1) actually reverses the results: aaa*=100 is faster! So the in-place is still faster as the input grows in size. For small arrays, for some reason, creating a new array is more efficient... Commented Apr 20, 2021 at 14:22
  • 1
    The difference is that one modifies the data-structure itself (in-place operation) aaa*= 1 while the other just reassigns the variable a = a * 100, which I guess is the source of slower behavior. Commented Apr 20, 2021 at 14:22
  • 2
    @MaPy I think you missed the point. The one assigning a new array is faster... Commented Apr 20, 2021 at 14:23

2 Answers 2

13

TL;DR: this question is equivalent to the performance difference between inplace_binop (INPLACE_*) (aaa*=100) vs binop (BINARY_*) (aaa=aaa*100). The difference can be found by using dis module:

import numpy as np
import dis

aaa = np.arange(1,101,1)
dis.dis('''
for i in range(1000000):
  aaa*=100
''')
  3          14 LOAD_NAME                2 (aaa)
             16 LOAD_CONST               1 (100)
             18 INPLACE_MULTIPLY
             20 STORE_NAME               2 (aaa)
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               2 (None)
             28 RETURN_VALUE
dis.dis('''
for i in range(1000000):
  aaa=aaa*100
''')
  3          14 LOAD_NAME                2 (aaa)
             16 LOAD_CONST               1 (100)
             18 BINARY_MULTIPLY
             20 STORE_NAME               2 (aaa)
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               2 (None)
             28 RETURN_VALUE

Then back to your question, which is absolutely faster?

Unluckily, it's hard to say which function is faster, here's why:

You can check compile.c of CPython code directly. If you trace a bit into CPython code, here's the function call difference:

  • inplace_binop -> compiler_augassign -> compiler_visit_stmt
  • binop -> compiler_visit_expr1 -> compiler_visit_expr -> compiler_visit_kwonlydefaults

Since the function call and logic are different, that means there are tons of factors (including your input size(*), CPU...etc) could matter to the performance as well, you'll need to work on profiling to optimize your code based on your use case.

*: from others comment, you can check this post to know the performance of different input size.

Sign up to request clarification or add additional context in comments.

4 Comments

Very thanks to all !!! I was not thinking that my question would take me so far. I am going to try to understand this very interesting answer.
Thanks for your answer. I am beginning python (I come from C world) and it is very interesting to see the python bytecode. You make me discover the dis module. I wonder if it is possible to go deeper. For example, until the asm code ?
@Stef1611 The missing puzzle includes bytecode and VM, you can initially refer The AST and Me by Emily Morehouse-Valcarcel @ PyCon 2018 and this SO answer for details. To know further, please consider the CPython internals class by Philip Guo.
Thanks a lot. I think I will be occupied for some days or months ... But very very interesting.
7

The += symbol appeared in the C language in the 1970s, and - with the C idea of "smart assembler" correspond to a clearly different machine instruction and addressing mode

"a=a * 100" "a *= 100" produce the same effect but correspond at low level to a different way the processor is working.

a *= 100 means

  • find the place identified by a
  • multiply with 100

a = a * 100 means:

  • evaluate a*100
  • Find the place identified by a
  • Copy a into an accumulator
  • multiply with 100 the accumulator
  • Store the result in a
  • Find the place identified by a
  • Copy the accumulator to it

Python is coded in C, it inherited the syntax from C, but since there is no translation / optimization before the execution in interpreted languages, things are not necessarily so intimately related (since there is one less parsing step). However, an interpreter can refer to different execution routines for the three types of expression, taking advantage of different machine code depending on how the expression is formed and on the evaluation context.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.