{"id":1315,"date":"2018-10-31T05:39:44","date_gmt":"2018-10-31T05:39:44","guid":{"rendered":"https:\/\/machinelearningplus.com\/?p=1315"},"modified":"2022-04-20T09:34:25","modified_gmt":"2022-04-20T09:34:25","slug":"parallel-processing-python","status":"publish","type":"post","link":"https:\/\/machinelearningplus.com\/python\/parallel-processing-python\/","title":{"rendered":"Parallel Processing in Python &#8211; A Practical Guide with Examples"},"content":{"rendered":"<p><em>Parallel processing is a mode of operation where the task is executed simultaneously in multiple processors in the same computer. It is meant to reduce the overall processing time. In this tutorial, you&#8217;ll understand the procedure to parallelize any typical logic using python&#8217;s multiprocessing module.<\/em><\/p>\r\n<h2>1. Introduction<\/h2>\r\n<p>Parallel processing is a mode of operation where the task is executed simultaneously in multiple processors in the same computer. It is meant to reduce the overall processing time.<\/p>\r\n<p>However, there is usually a bit of overhead when communicating between processes which can actually increase the overall time taken for small tasks instead of decreasing it.<\/p>\r\n<p>In python, the <code>multiprocessing<\/code> module is used to run independent parallel processes by using subprocesses (instead of threads).<\/p>\r\n<p>It allows you to leverage multiple processors on a machine (both Windows and Unix), which means, the processes can be run in completely separate memory locations. By the end of this tutorial you would know:<\/p>\r\n<ol>\r\n<li>How to structure the code and understand the syntax to enable parallel processing using <code>multiprocessing<\/code>?<\/li>\r\n<li>How to implement synchronous and asynchronous parallel processing?<\/li>\r\n<li>How to parallelize a Pandas DataFrame?<\/li>\r\n<li>Solve 3 different usecases with the <code>multiprocessing.Pool()<\/code> interface.<\/li>\r\n<\/ol>\r\n<!-- \/wp:post-content -->\r\n\r\n<!-- wp:paragraph -->\r\n<h2 id=\"2howmanyparallelprocesses\">2. How many maximum parallel processes can you run?<\/h2>\r\n<p>The maximum number of processes you can run at a time is limited by the number of processors in your computer. If you don&#8217;t know how many processors are present in the machine, the <code>cpu_count()<\/code> function in <code>multiprocessing<\/code> will show it.<\/p>\r\n<pre><code class=\"python language-python\">import multiprocessing as mp\r\nprint(\"Number of processors: \", mp.cpu_count())\r\n<\/code><\/pre>\r\n<h2 id=\"3whatissynchronousandasynchronousexecution\">3. What is Synchronous and Asynchronous execution?<\/h2>\r\n<p>In parallel processing, there are two types of execution: Synchronous and Asynchronous.<\/p>\r\n<p>A synchronous execution is one the processes are completed in the same order in which it was started. This is achieved by locking the main program until the respective processes are finished.<\/p>\r\n<p>Asynchronous, on the other hand, doesn&#8217;t involve locking. As a result, the order of results can get mixed up but usually gets done quicker.<\/p>\r\n<p>There are 2 main objects in <code>multiprocessing<\/code> to implement parallel execution of a function: The <code>Pool<\/code> Class and the <code>Process<\/code> Class.<\/p>\r\n<ol>\r\n<li><code>Pool<\/code> Class\r\n<ol>\r\n<li>Synchronous execution\r\n<ul>\r\n<li><code>Pool.map()<\/code> and <code>Pool.starmap()<\/code><\/li>\r\n<li><code>Pool.apply()<\/code><\/li>\r\n<\/ul>\r\n<\/li>\r\n<li>Asynchronous execution\r\n<ul>\r\n<li><code>Pool.map_async()<\/code> and <code>Pool.starmap_async()<\/code><\/li>\r\n<li><code>Pool.apply_async()<\/code>)<\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ol>\r\n<\/li>\r\n<li><code>Process<\/code> Class<\/li>\r\n<\/ol>\r\n<p>Let&#8217;s take up a typical problem and implement parallelization using the above techniques.<\/p>\r\n<p>In this tutorial, we stick to the <code>Pool<\/code> class, because it is most convenient to use and serves most common practical applications.<\/p>\r\n<h2 id=\"4problemstatementcounthowmanynumbersexistbetweenagivenrangeineachrow\">4. Problem Statement: Count how many numbers exist between a given range in each row<\/h2>\r\n<p>The first problem is: Given a 2D matrix (or list of lists), count how many numbers are present between a given range in each row. We will work on the list prepared below.<\/p>\r\n<pre><code class=\"python language-python\">import numpy as np\r\nfrom time import time\r\n\r\n# Prepare data\r\nnp.random.RandomState(100)\r\narr = np.random.randint(0, 10, size=[200000, 5])\r\ndata = arr.tolist()\r\ndata[:5]\r\n<\/code><\/pre>\r\n<h2 id=\"solutionwithoutparallelization\">Solution without parallelization<\/h2>\r\n<p>Let&#8217;s see how long it takes to compute it without parallelization.<\/p>\r\n<p>For this, we iterate the function <code>howmany_within_range()<\/code> (written below) to check how many numbers lie within range and returns the count.<\/p>\r\n<pre><code class=\"python language-python\"># Solution Without Paralleization\r\n\r\ndef howmany_within_range(row, minimum, maximum):\r\n    \"\"\"Returns how many numbers lie within `maximum` and `minimum` in a given `row`\"\"\"\r\n    count = 0\r\n    for n in row:\r\n        if minimum &lt;= n &lt;= maximum:\r\n            count = count + 1\r\n    return count\r\n\r\nresults = []\r\nfor row in data:\r\n    results.append(howmany_within_range(row, minimum=4, maximum=8))\r\n\r\nprint(results[:10])\r\n#&gt; [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]\r\n<\/code><\/pre>\r\n<p>&lt;heborder=&#8221;0&#8243; scrolling=&#8221;auto&#8221; allowfullscreen=&#8221;allowfullscreen&#8221;&gt; &lt;!&#8211; \/wp:parag4&gt; \u00a0 \u00a0<\/p>\r\n<h2 id=\"5howtoparallelizeanyfunction\">5. How to parallelize any function?<\/h2>\r\n<p>The general way to parallelize any operation is to take a particular function that should be run multiple times and make it run parallelly in different processors.<\/p>\r\n<p>To do this, you initialize a <code>Pool<\/code> with n number of processors and pass the function you want to parallelize to one of <code>Pool<\/code>s parallization methods.<\/p>\r\n<p><code>multiprocessing.Pool()<\/code> provides the <code>apply()<\/code>, <code>map()<\/code> and <code>starmap()<\/code> methods to make any function run in parallel.<\/p>\r\n<p>Nice!<\/p>\r\n<p>So what&#8217;s the difference between <code>apply()<\/code> and <code>map()<\/code>?<\/p>\r\n<p>Both <code>apply<\/code> and <code>map<\/code> take the function to be parallelized as the main argument.<\/p>\r\n<p>But the difference is, <code>apply()<\/code> takes an <code>args<\/code> argument that accepts the parameters passed to the &#8216;function-to-be-parallelized&#8217; as an argument, whereas, <code>map<\/code> can take only one iterable as an argument.<\/p>\r\n<p>So, <code>map()<\/code> is really more suitable for simpler iterable operations but does the job faster.<\/p>\r\n<p>We will get to <code>starmap()<\/code> once we see how to parallelize <code>howmany_within_range()<\/code> function with <code>apply()<\/code> and <code>map()<\/code>.<\/p>\r\n<h2 id=\"51parallelizingusingpoolapply\">5.1. Parallelizing using Pool.apply()<\/h2>\r\n<p>Let&#8217;s parallelize the <code>howmany_within_range()<\/code> function using <code>multiprocessing.Pool()<\/code>.<\/p>\r\n<pre><code class=\"python language-python\"># Parallelizing using Pool.apply()\r\n\r\nimport multiprocessing as mp\r\n\r\n# Step 1: Init multiprocessing.Pool()\r\npool = mp.Pool(mp.cpu_count())\r\n\r\n# Step 2: `pool.apply` the `howmany_within_range()`\r\nresults = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]\r\n\r\n# Step 3: Don't forget to close\r\npool.close()    \r\n\r\nprint(results[:10])\r\n#&gt; [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]\r\n<\/code><\/pre>\r\n<h2 id=\"52parallelizingusingpoolmap\">5.2. Parallelizing using Pool.map()<\/h2>\r\n<p><code>Pool.map()<\/code> accepts only one iterable as argument.<\/p>\r\n<p>So as a workaround, I modify the <code>howmany_within_range<\/code> function by setting a default to the <code>minimum<\/code> and <code>maximum<\/code> parameters to create a new <code>howmany_within_range_rowonly()<\/code> function so it accetps only an iterable list of rows as input.<\/p>\r\n<p>I know this is not a nice usecase of <code>map()<\/code>, but it clearly shows how it differs from <code>apply()<\/code>.<\/p>\r\n<pre><code class=\"python language-python\"># Parallelizing using Pool.map()\r\nimport multiprocessing as mp\r\n\r\n# Redefine, with only 1 mandatory argument.\r\ndef howmany_within_range_rowonly(row, minimum=4, maximum=8):\r\n    count = 0\r\n    for n in row:\r\n        if minimum &lt;= n &lt;= maximum:\r\n            count = count + 1\r\n    return count\r\n\r\npool = mp.Pool(mp.cpu_count())\r\n\r\nresults = pool.map(howmany_within_range_rowonly, [row for row in data])\r\n\r\npool.close()\r\n\r\nprint(results[:10])\r\n#&gt; [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]\r\n<\/code><\/pre>\r\n<h2 id=\"53parallelizingusingpoolstarmap\">5.3. Parallelizing using Pool.starmap()<\/h2>\r\n<p>In previous example, we have to redefine <code>howmany_within_range<\/code> function to make couple of parameters to take default values.<\/p>\r\n<p>Using <code>starmap()<\/code>, you can avoid doing this.<\/p>\r\n<p>How you ask?<\/p>\r\n<p>Like <code>Pool.map()<\/code>, <code>Pool.starmap()<\/code> also accepts only one iterable as argument, but in <code>starmap()<\/code>, each element in that iterable is also a iterable.<\/p>\r\n<p>You can to provide the arguments to the &#8216;function-to-be-parallelized&#8217; in the same order in this inner iterable element, will in turn be unpacked during execution.<\/p>\r\n<p>So effectively, <code>Pool.starmap()<\/code> is like a version of <code>Pool.map()<\/code> that accepts arguments.<\/p>\r\n<pre><code class=\"python language-python\"># Parallelizing with Pool.starmap()\r\nimport multiprocessing as mp\r\n\r\npool = mp.Pool(mp.cpu_count())\r\n\r\nresults = pool.starmap(howmany_within_range, [(row, 4, 8) for row in data])\r\n\r\npool.close()\r\n\r\nprint(results[:10])\r\n#&gt; [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]\r\n<\/code><\/pre>\r\n<h2 id=\"6asynchronousparallelprocessing\">6. Asynchronous Parallel Processing<\/h2>\r\n<p>The asynchronous equivalents <code>apply_async()<\/code>, <code>map_async()<\/code> and <code>starmap_async()<\/code> lets you do execute the processes in parallel asynchronously, that is the next process can start as soon as previous one gets over without regard for the starting order.<\/p>\r\n<p>As a result, there is no guarantee that the result will be in the same order as the input.<\/p>\r\n<h2 id=\"61parallelizingwithpoolapply_async\">6.1 Parallelizing with Pool.apply_async()<\/h2>\r\n<p><code>apply_async()<\/code> is very similar to <code>apply()<\/code> except that you need to provide a callback function that tells how the computed results should be stored.<\/p>\r\n<p>However, a caveat with <code>apply_async()<\/code> is, the order of numbers in the result gets jumbled up indicating the processes did not complete in the order it was started.<\/p>\r\n<p>A workaround for this is, we redefine a new <code>howmany_within_range2()<\/code> to accept and return the iteration number (<code>i<\/code>) as well and then sort the final results.<\/p>\r\n<pre><code class=\"python language-python\"># Parallel processing with Pool.apply_async()\r\n\r\nimport multiprocessing as mp\r\npool = mp.Pool(mp.cpu_count())\r\n\r\nresults = []\r\n\r\n# Step 1: Redefine, to accept `i`, the iteration number\r\ndef howmany_within_range2(i, row, minimum, maximum):\r\n    \"\"\"Returns how many numbers lie within `maximum` and `minimum` in a given `row`\"\"\"\r\n    count = 0\r\n    for n in row:\r\n        if minimum &lt;= n &lt;= maximum:\r\n            count = count + 1\r\n    return (i, count)\r\n\r\n\r\n# Step 2: Define callback function to collect the output in `results`\r\ndef collect_result(result):\r\n    global results\r\n    results.append(result)\r\n\r\n\r\n# Step 3: Use loop to parallelize\r\nfor i, row in enumerate(data):\r\n    pool.apply_async(howmany_within_range2, args=(i, row, 4, 8), callback=collect_result)\r\n\r\n# Step 4: Close Pool and let all the processes complete    \r\npool.close()\r\npool.join()  # postpones the execution of next line of code until all processes in the queue are done.\r\n\r\n# Step 5: Sort results [OPTIONAL]\r\nresults.sort(key=lambda x: x[0])\r\nresults_final = [r for i, r in results]\r\n\r\nprint(results_final[:10])\r\n#&gt; [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]\r\n<\/code><\/pre>\r\n<p>It is possible to use <code>apply_async()<\/code> without providing a <code>callback<\/code> function.<\/p>\r\n<p>Only that, if you don&#8217;t provide a callback, then you get a list of <code>pool.ApplyResult<\/code> objects which contains the computed output values from each process.<\/p>\r\n<p>From this, you need to use the <code>pool.ApplyResult.get()<\/code> method to retrieve the desired final result.<\/p>\r\n<pre><code class=\"python language-python\"># Parallel processing with Pool.apply_async() without callback function\r\n\r\nimport multiprocessing as mp\r\npool = mp.Pool(mp.cpu_count())\r\n\r\nresults = []\r\n\r\n# call apply_async() without callback\r\nresult_objects = [pool.apply_async(howmany_within_range2, args=(i, row, 4, 8)) for i, row in enumerate(data)]\r\n\r\n# result_objects is a list of pool.ApplyResult objects\r\nresults = [r.get()[1] for r in result_objects]\r\n\r\npool.close()\r\npool.join()\r\nprint(results[:10])\r\n#&gt; [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]\r\n<\/code><\/pre>\r\n<h2 id=\"62parallelizingwithpoolstarmap_async\">6.2 Parallelizing with Pool.starmap_async()<\/h2>\r\n<p>You saw how <code>apply_async()<\/code> works.<\/p>\r\n<p>Can you imagine and write up an equivalent version for <code>starmap_async<\/code> and <code>map_async<\/code>?<\/p>\r\n<p>The implementation is below anyways.<\/p>\r\n<pre><code class=\"python language-python\"># Parallelizing with Pool.starmap_async()\r\n\r\nimport multiprocessing as mp\r\npool = mp.Pool(mp.cpu_count())\r\n\r\nresults = []\r\n\r\nresults = pool.starmap_async(howmany_within_range2, [(i, row, 4, 8) for i, row in enumerate(data)]).get()\r\n\r\n# With map, use `howmany_within_range_rowonly` instead\r\n# results = pool.map_async(howmany_within_range_rowonly, [row for row in data]).get()\r\n\r\npool.close()\r\nprint(results[:10])\r\n#&gt; [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]\r\n<\/code><\/pre>\r\n<h2 id=\"7howtoparallelizeapandasdataframe\">7. How to Parallelize a Pandas DataFrame?<\/h2>\r\n<p>So far you&#8217;ve seen how to parallelize a function by making it work on lists.<\/p>\r\n<p>But when working in data analysis or machine learning projects, you might want to parallelize Pandas Dataframes, which are the most commonly used objects (besides numpy arrays) to store tabular data.<\/p>\r\n<p>When it comes to parallelizing a <code>DataFrame<\/code>, you can make the function-to-be-parallelized to take as an input parameter:<\/p>\r\n<ul>\r\n<li>one row of the dataframe<\/li>\r\n<li>one column of the dataframe<\/li>\r\n<li>the entire dataframe itself<\/li>\r\n<\/ul>\r\n<p>The first 2 can be done using <code>multiprocessing<\/code> module itself.<\/p>\r\n<p>But for the last one, that is parallelizing on an entire dataframe, we will use the <code>pathos<\/code> package that uses <code>dill<\/code> for serialization internally.<\/p>\r\n<p>First, lets create a sample dataframe and see how to do row-wise and column-wise paralleization.<\/p>\r\n<p>Something like using <code>pd.apply()<\/code> on a user defined function but in parallel.<\/p>\r\n<pre><code class=\"python language-python\">import numpy as np\r\nimport pandas as pd\r\nimport multiprocessing as mp\r\n\r\ndf = pd.DataFrame(np.random.randint(3, 10, size=[5, 2]))\r\nprint(df.head())\r\n#&gt;    0  1\r\n#&gt; 0  8  5\r\n#&gt; 1  5  3\r\n#&gt; 2  3  4\r\n#&gt; 3  4  4\r\n#&gt; 4  7  9\r\n<\/code><\/pre>\r\n<p>We have a dataframe. Let&#8217;s apply the <code>hypotenuse<\/code> function on each row, but running 4 processes at a time.<\/p>\r\n<p>To do this, we exploit the <code>df.itertuples(name=False)<\/code>.<\/p>\r\n<p>By setting <code>name=False<\/code>, you are passing each row of the dataframe as a simple tuple to the <code>hypotenuse<\/code> function.<\/p>\r\n<pre><code class=\"python language-python\"># Row wise Operation\r\ndef hypotenuse(row):\r\n    return round(row[1]**2 + row[2]**2, 2)**0.5\r\n\r\nwith mp.Pool(4) as pool:\r\n    result = pool.imap(hypotenuse, df.itertuples(name=False), chunksize=10)\r\n    output = [round(x, 2) for x in result]\r\n\r\nprint(output)\r\n#&gt; [9.43, 5.83, 5.0, 5.66, 11.4]\r\n<\/code><\/pre>\r\n<p>That was an example of row-wise parallelization.<\/p>\r\n<p>Let&#8217;s also do a column-wise parallelization.<\/p>\r\n<p>For this, I use <code>df.iteritems()<\/code> to pass an entire column as a series to the <code>sum_of_squares<\/code> function.<\/p>\r\n<pre><code class=\"python language-python\"># Column wise Operation\r\ndef sum_of_squares(column):\r\n    return sum([i**2 for i in column[1]])\r\n\r\nwith mp.Pool(2) as pool:\r\n    result = pool.imap(sum_of_squares, df.iteritems(), chunksize=10)\r\n    output = [x for x in result]\r\n\r\nprint(output) \r\n#&gt; [163, 147]\r\n<\/code><\/pre>\r\n<p>Now comes the third part &#8211; Parallelizing a function that accepts a Pandas Dataframe, NumPy Array, etc. Pathos follows the <code>multiprocessing<\/code> style of: Pool &gt; Map &gt; Close &gt; Join &gt; Clear.<\/p>\r\n<p>Check out the <a href=\"https:\/\/github.com\/uqfoundation\/pathos\" target=\"_blank\" rel=\"noopener noreferrer\">pathos docs<\/a> for more info.<\/p>\r\n<pre><code class=\"python language-python\">import numpy as np\r\nimport pandas as pd\r\nimport multiprocessing as mp\r\nfrom pathos.multiprocessing import ProcessingPool as Pool\r\n\r\ndf = pd.DataFrame(np.random.randint(3, 10, size=[500, 2]))\r\n\r\ndef func(df):\r\n    return df.shape\r\n\r\ncores=mp.cpu_count()\r\n\r\ndf_split = np.array_split(df, cores, axis=0)\r\n\r\n# create the multiprocessing pool\r\npool = Pool(cores)\r\n\r\n# process the DataFrame by mapping function to each df across the pool\r\ndf_out = np.vstack(pool.map(func, df_split))\r\n\r\n# close down the pool and join\r\npool.close()\r\npool.join()\r\npool.clear()\r\n<\/code><\/pre>\r\n<p>Thanks to <a href=\"https:\/\/www.reddit.com\/user\/notsoprocoder\">notsoprocoder<\/a> for this contribution based on pathos.<\/p>\r\n<p>If you are familiar with pandas dataframes but want to get hands-on and master it, check out these <a href=\"https:\/\/machinelearningplus.com\/python\/101-pandas-exercises-python\/\">pandas exercises<\/a>.<\/p>\r\n<h2 id=\"8exercises\">8. Exercises<\/h2>\r\n<p><strong>Problem 1:<\/strong> Use <code>Pool.apply()<\/code> to get the row wise common items in <code>list_a<\/code> and <code>list_b<\/code>.<\/p>\r\n<pre><code class=\"python language-python\">list_a = [[1, 2, 3], [5, 6, 7, 8], [10, 11, 12], [20, 21]]\r\nlist_b = [[2, 3, 4, 5], [6, 9, 10], [11, 12, 13, 14], [21, 24, 25]]\r\n<\/code><\/pre>\r\n<details class=\"blogv4-expand\"><summary class=\"blogv4-expand__toggle\">Show Solution<\/summary><div class=\"blogv4-expand__body\">\r\n<pre><code class=\"python language-python\">import multiprocessing as mp\r\n\r\nlist_a = [[1, 2, 3], [5, 6, 7, 8], [10, 11, 12], [20, 21]]\r\nlist_b = [[2, 3, 4, 5], [6, 9, 10], [11, 12, 13, 14], [21, 24, 25]]\r\n\r\ndef get_commons(list_1, list_2):\r\n    return list(set(list_1).intersection(list_2))\r\n\r\npool = mp.Pool(mp.cpu_count())\r\nresults = [pool.apply(get_commons, args=(l1, l2)) for l1, l2 in zip(list_a, list_b)]\r\npool.close()    \r\nprint(results[:10])\r\n<\/code><\/pre>\r\n<p><\/div><\/details> <strong>Problem 2:<\/strong> Use <code>Pool.map()<\/code> to run the following python scripts in parallel. Script names: &#8216;script1.py&#8217;, &#8216;script2.py&#8217;, &#8216;script3.py&#8217; <details class=\"blogv4-expand\"><summary class=\"blogv4-expand__toggle\">Show Solution<\/summary><div class=\"blogv4-expand__body\"><\/p>\r\n<pre><code class=\"python language-python\">import os                                                                       \r\nimport multiprocessing as mp\r\n\r\nprocesses = ('script1.py', 'script2.py', 'script3.py')                      \r\n\r\ndef run_python(process):                                                             \r\n    os.system('python {}'.format(process))                                      \r\n\r\npool = mp.Pool(processes=3)                                                        \r\npool.map(run_python, processes)  \r\n<\/code><\/pre>\r\n<p><\/div><\/details> <strong>Problem 3:<\/strong> Normalize each row of 2d array (list) to vary between 0 and 1.<\/p>\r\n<pre><code class=\"python language-python\">list_a = [[2, 3, 4, 5], [6, 9, 10, 12], [11, 12, 13, 14], [21, 24, 25, 26]]\r\n<\/code><\/pre>\r\n<details class=\"blogv4-expand\"><summary class=\"blogv4-expand__toggle\">Show Solution<\/summary><div class=\"blogv4-expand__body\"><\/p>\r\n<pre><code class=\"python language-python\">import multiprocessing as mp\r\n\r\nlist_a = [[2, 3, 4, 5], [6, 9, 10, 12], [11, 12, 13, 14], [21, 24, 25, 26]]\r\n\r\ndef normalize(mylist):\r\n    mini = min(mylist)\r\n    maxi = max(mylist)\r\n    return [(i - mini)\/(maxi-mini) for i in mylist]\r\n\r\npool = mp.Pool(mp.cpu_count())\r\nresults = [pool.apply(normalize, args=(l1, )) for l1 in list_a]\r\npool.close()    \r\nprint(results[:10])\r\n<\/code><\/pre>\r\n<p><\/div><\/details>\r\n<h2 id=\"9conclusion\">9. Conclusion<\/h2>\r\n<p>Hope you were able to solve the above exercises, congratulations if you did! In this post, we saw the overall procedure and various ways to implement parallel processing using the multiprocessing module. The procedure described above is pretty much the same even if you work on larger machines with many more number of processors, where you may reap the real speed benefits of parallel processing. Happy coding and I&#8217;ll see you in the <a href=\"https:\/\/machinelearningplus.com\/python\/dask-tutorial\" target=\"_blank\" rel=\"noopener noreferrer\">next one<\/a>!<\/p>\r\n<h2>Recommended Posts<\/h2>\r\n<p><a href=\"https:\/\/machinelearningplus.com\/python\/dask-tutorial\/\" target=\"_blank\" rel=\"noopener noreferrer\">Dask Tutorial &#8211; How to handle large data in Python<\/a> <a href=\"https:\/\/machinelearningplus.com\/python-json-guide\/\" target=\"_blank\" rel=\"noopener noreferrer\">Python JSON Guide<\/a> <a href=\"https:\/\/machinelearningplus.com\/python\/python-regex-tutorial-examples\/\" target=\"_blank\" rel=\"noopener noreferrer\">Python RegEx Tutorial<\/a> <a href=\"https:\/\/machinelearningplus.com\/python\/python-logging-guide\/\" target=\"_blank\" rel=\"noopener noreferrer\">Python Logging Guide<\/a> <a href=\"https:\/\/machinelearningplus.com\/python-collections-guide\/\" target=\"_blank\" rel=\"noopener noreferrer\">Python Collections Guide<\/a> <a href=\"https:\/\/machinelearningplus.com\/python\/requests-in-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">Guide to Python Requests Module<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>Parallel processing is a mode of operation where the task is executed simultaneously in multiple processors in the same computer. It is meant to reduce the overall processing time. In this tutorial, you&#8217;ll understand the procedure to parallelize any typical logic using python&#8217;s multiprocessing module. 1. Introduction Parallel processing is a mode of operation where [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1327,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[21],"tags":[58,57,22],"class_list":["post-1315","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","tag-multiprocessing","tag-parallel-processing","tag-python","ads-data-visualization-with-pandas"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Parallel Processing in Python - A Practical Guide with Examples | ML+<\/title>\n<meta name=\"description\" content=\"Parallel processing is when the task is executed simultaneously in multiple processors. In this tutorial, you&#039;ll understand the procedure to parallelize any typical logic using python&#039;s multiprocessing module.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/localhost:8080\/python\/parallel-processing-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Parallel Processing in Python - A Practical Guide with Examples | ML+\" \/>\n<meta property=\"og:description\" content=\"Parallel processing is when the task is executed simultaneously in multiple processors. In this tutorial, you&#039;ll understand the procedure to parallelize any typical logic using python&#039;s multiprocessing module.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/localhost:8080\/python\/parallel-processing-python\/\" \/>\n<meta property=\"og:site_name\" content=\"machinelearningplus\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/rtipaday\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-10-31T05:39:44+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-04-20T09:34:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/localhost:8080\/wp-content\/uploads\/2018\/10\/parallel_processing_feature.png\" \/>\n\t<meta property=\"og:image:width\" content=\"560\" \/>\n\t<meta property=\"og:image:height\" content=\"315\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Selva Prabhakaran\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/R_Programming\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Selva Prabhakaran\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/\"},\"author\":{\"name\":\"Selva Prabhakaran\",\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/#\\\/schema\\\/person\\\/510885c0515804366fa644c38258391e\"},\"headline\":\"Parallel Processing in Python &#8211; A Practical Guide with Examples\",\"datePublished\":\"2018-10-31T05:39:44+00:00\",\"dateModified\":\"2022-04-20T09:34:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/\"},\"wordCount\":1419,\"commentCount\":12,\"publisher\":{\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/machinelearningplus.com\\\/wp-content\\\/uploads\\\/2018\\\/10\\\/parallel_processing_feature.png\",\"keywords\":[\"Multiprocessing\",\"Parallel Processing\",\"Python\"],\"articleSection\":[\"Python\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/\",\"url\":\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/\",\"name\":\"Parallel Processing in Python - A Practical Guide with Examples | ML+\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/machinelearningplus.com\\\/wp-content\\\/uploads\\\/2018\\\/10\\\/parallel_processing_feature.png\",\"datePublished\":\"2018-10-31T05:39:44+00:00\",\"dateModified\":\"2022-04-20T09:34:25+00:00\",\"description\":\"Parallel processing is when the task is executed simultaneously in multiple processors. In this tutorial, you'll understand the procedure to parallelize any typical logic using python's multiprocessing module.\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/localhost:8080\\\/python\\\/parallel-processing-python\\\/#primaryimage\",\"url\":\"https:\\\/\\\/machinelearningplus.com\\\/wp-content\\\/uploads\\\/2018\\\/10\\\/parallel_processing_feature.png\",\"contentUrl\":\"https:\\\/\\\/machinelearningplus.com\\\/wp-content\\\/uploads\\\/2018\\\/10\\\/parallel_processing_feature.png\",\"width\":560,\"height\":315,\"caption\":\"parallel processing python\"},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/#website\",\"url\":\"https:\\\/\\\/machinelearningplus.com\\\/\",\"name\":\"machinelearningplus\",\"description\":\"Learn Data Science (AI \\\/ ML) Online\",\"publisher\":{\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/machinelearningplus.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/#organization\",\"name\":\"machinelearningplus\",\"url\":\"https:\\\/\\\/machinelearningplus.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/machinelearningplus.com\\\/wp-content\\\/uploads\\\/2022\\\/05\\\/MachineLearningplus-logo.svg\",\"contentUrl\":\"https:\\\/\\\/machinelearningplus.com\\\/wp-content\\\/uploads\\\/2022\\\/05\\\/MachineLearningplus-logo.svg\",\"width\":348,\"height\":36,\"caption\":\"machinelearningplus\"},\"image\":{\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/#\\\/schema\\\/person\\\/510885c0515804366fa644c38258391e\",\"name\":\"Selva Prabhakaran\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/machinelearningplus.com\\\/wp-content\\\/litespeed\\\/avatar\\\/a994280177da541405c016f593e86ea7.jpg?ver=1776363207\",\"url\":\"https:\\\/\\\/machinelearningplus.com\\\/wp-content\\\/litespeed\\\/avatar\\\/a994280177da541405c016f593e86ea7.jpg?ver=1776363207\",\"contentUrl\":\"https:\\\/\\\/machinelearningplus.com\\\/wp-content\\\/litespeed\\\/avatar\\\/a994280177da541405c016f593e86ea7.jpg?ver=1776363207\",\"caption\":\"Selva Prabhakaran\"},\"description\":\"Selva is an experienced Data Scientist and leader, specializing in executing AI projects for large companies. Selva started machinelearningplus to make Data Science \\\/ ML \\\/ AI accessible to everyone. The website enjoys 4 Million+ readership. His courses, lessons, and videos are loved by hundreds of thousands of students and practitioners.\",\"sameAs\":[\"https:\\\/\\\/localhost:8080\\\/\",\"https:\\\/\\\/www.facebook.com\\\/rtipaday\\\/\",\"https:\\\/\\\/x.com\\\/https:\\\/\\\/twitter.com\\\/R_Programming\"],\"url\":\"https:\\\/\\\/machinelearningplus.com\\\/author\\\/selva86\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Parallel Processing in Python - A Practical Guide with Examples | ML+","description":"Parallel processing is when the task is executed simultaneously in multiple processors. In this tutorial, you'll understand the procedure to parallelize any typical logic using python's multiprocessing module.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/localhost:8080\/python\/parallel-processing-python\/","og_locale":"en_US","og_type":"article","og_title":"Parallel Processing in Python - A Practical Guide with Examples | ML+","og_description":"Parallel processing is when the task is executed simultaneously in multiple processors. In this tutorial, you'll understand the procedure to parallelize any typical logic using python's multiprocessing module.","og_url":"https:\/\/localhost:8080\/python\/parallel-processing-python\/","og_site_name":"machinelearningplus","article_author":"https:\/\/www.facebook.com\/rtipaday\/","article_published_time":"2018-10-31T05:39:44+00:00","article_modified_time":"2022-04-20T09:34:25+00:00","og_image":[{"width":560,"height":315,"url":"https:\/\/localhost:8080\/wp-content\/uploads\/2018\/10\/parallel_processing_feature.png","type":"image\/png"}],"author":"Selva Prabhakaran","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/R_Programming","twitter_misc":{"Written by":"Selva Prabhakaran","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/localhost:8080\/python\/parallel-processing-python\/#article","isPartOf":{"@id":"https:\/\/localhost:8080\/python\/parallel-processing-python\/"},"author":{"name":"Selva Prabhakaran","@id":"https:\/\/machinelearningplus.com\/#\/schema\/person\/510885c0515804366fa644c38258391e"},"headline":"Parallel Processing in Python &#8211; A Practical Guide with Examples","datePublished":"2018-10-31T05:39:44+00:00","dateModified":"2022-04-20T09:34:25+00:00","mainEntityOfPage":{"@id":"https:\/\/localhost:8080\/python\/parallel-processing-python\/"},"wordCount":1419,"commentCount":12,"publisher":{"@id":"https:\/\/machinelearningplus.com\/#organization"},"image":{"@id":"https:\/\/localhost:8080\/python\/parallel-processing-python\/#primaryimage"},"thumbnailUrl":"https:\/\/machinelearningplus.com\/wp-content\/uploads\/2018\/10\/parallel_processing_feature.png","keywords":["Multiprocessing","Parallel Processing","Python"],"articleSection":["Python"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/localhost:8080\/python\/parallel-processing-python\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/localhost:8080\/python\/parallel-processing-python\/","url":"https:\/\/localhost:8080\/python\/parallel-processing-python\/","name":"Parallel Processing in Python - A Practical Guide with Examples | ML+","isPartOf":{"@id":"https:\/\/machinelearningplus.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/localhost:8080\/python\/parallel-processing-python\/#primaryimage"},"image":{"@id":"https:\/\/localhost:8080\/python\/parallel-processing-python\/#primaryimage"},"thumbnailUrl":"https:\/\/machinelearningplus.com\/wp-content\/uploads\/2018\/10\/parallel_processing_feature.png","datePublished":"2018-10-31T05:39:44+00:00","dateModified":"2022-04-20T09:34:25+00:00","description":"Parallel processing is when the task is executed simultaneously in multiple processors. In this tutorial, you'll understand the procedure to parallelize any typical logic using python's multiprocessing module.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/localhost:8080\/python\/parallel-processing-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/localhost:8080\/python\/parallel-processing-python\/#primaryimage","url":"https:\/\/machinelearningplus.com\/wp-content\/uploads\/2018\/10\/parallel_processing_feature.png","contentUrl":"https:\/\/machinelearningplus.com\/wp-content\/uploads\/2018\/10\/parallel_processing_feature.png","width":560,"height":315,"caption":"parallel processing python"},{"@type":"WebSite","@id":"https:\/\/machinelearningplus.com\/#website","url":"https:\/\/machinelearningplus.com\/","name":"machinelearningplus","description":"Learn Data Science (AI \/ ML) Online","publisher":{"@id":"https:\/\/machinelearningplus.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/machinelearningplus.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/machinelearningplus.com\/#organization","name":"machinelearningplus","url":"https:\/\/machinelearningplus.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/machinelearningplus.com\/#\/schema\/logo\/image\/","url":"https:\/\/machinelearningplus.com\/wp-content\/uploads\/2022\/05\/MachineLearningplus-logo.svg","contentUrl":"https:\/\/machinelearningplus.com\/wp-content\/uploads\/2022\/05\/MachineLearningplus-logo.svg","width":348,"height":36,"caption":"machinelearningplus"},"image":{"@id":"https:\/\/machinelearningplus.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/machinelearningplus.com\/#\/schema\/person\/510885c0515804366fa644c38258391e","name":"Selva Prabhakaran","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/machinelearningplus.com\/wp-content\/litespeed\/avatar\/a994280177da541405c016f593e86ea7.jpg?ver=1776363207","url":"https:\/\/machinelearningplus.com\/wp-content\/litespeed\/avatar\/a994280177da541405c016f593e86ea7.jpg?ver=1776363207","contentUrl":"https:\/\/machinelearningplus.com\/wp-content\/litespeed\/avatar\/a994280177da541405c016f593e86ea7.jpg?ver=1776363207","caption":"Selva Prabhakaran"},"description":"Selva is an experienced Data Scientist and leader, specializing in executing AI projects for large companies. Selva started machinelearningplus to make Data Science \/ ML \/ AI accessible to everyone. The website enjoys 4 Million+ readership. His courses, lessons, and videos are loved by hundreds of thousands of students and practitioners.","sameAs":["https:\/\/localhost:8080\/","https:\/\/www.facebook.com\/rtipaday\/","https:\/\/x.com\/https:\/\/twitter.com\/R_Programming"],"url":"https:\/\/machinelearningplus.com\/author\/selva86\/"}]}},"_links":{"self":[{"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/posts\/1315","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/comments?post=1315"}],"version-history":[{"count":0,"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/posts\/1315\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/media\/1327"}],"wp:attachment":[{"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/media?parent=1315"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/categories?post=1315"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/machinelearningplus.com\/wp-json\/wp\/v2\/tags?post=1315"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}