I am trying to speed up some code with multiprocessing in Python, but I cannot understand one point. Assume I have the following dumb function:
import time
from multiprocessing.pool import Pool
def foo(_):
for _ in range(100000000):
a = 3
When I run this code without using multiprocessing (see the code below) on my laptop (Intel - 8 cores cpu) time taken is ~2.31 seconds.
t1 = time.time()
foo(1)
print(f"Without multiprocessing {time.time() - t1}")
Instead, when I run this code by using Python multiprocessing library (see the code below) time taken is ~6.0 seconds.
pool = Pool(8)
t1 = time.time()
pool.map(foo, range(8))
print(f"Sample multiprocessing {time.time() - t1}")
From the best of my knowledge, I understand that when using multiprocessing there is some time overhead mainly caused by the need to spawn the new processes and to copy the memory state. However, this operation should be performed just once when the processed are initially spawned at the very beginning and should not be that huge.
So what I am missing here? Is there something wrong in my reasoning?
Edit: I think it is better to be more explicit on my question. What I expected here was the multiprocessed code to be slightly slower than the sequential one. It is true that I don't split the whole work across the 8 cores, but I am using 8 cores in parallel to do the same job (hence in an ideal world the processing time should more or less stay the same). Considering the overhead of spawning new processes, I expected a total increase in time of some (not too big) percentage, but not of a ~2.60x increase as I got here.
timeto work out how long code it taking, use something liketimeitor another performance testing module.timeitshould do multiple comparisons by executing the same code multiple times, so measures should be more reliable right?