{"id":10836,"date":"2021-04-28T13:26:22","date_gmt":"2021-04-28T07:56:22","guid":{"rendered":"http:\/\/www.pythonpool.com\/?p=10836"},"modified":"2021-05-01T17:16:28","modified_gmt":"2021-05-01T11:46:28","slug":"python-flatmap","status":"publish","type":"post","link":"https:\/\/www.pythonpool.com\/python-flatmap\/","title":{"rendered":"How to use the Pyspark flatMap() function in Python?"},"content":{"rendered":"\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_74 counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #990303;color:#990303\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #990303;color:#990303\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 eztoc-toggle-hide-by-default' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#Introduction\" >Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#What_is_RDD\" >What is RDD?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#Example_for_RDD\" >Example for RDD<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#What_is_flatMap_function\" >What is flatMap() function?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#Syntax\" >Syntax<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#Example_of_Python_flatMap_function\" >Example of Python flatMap() function<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#Complete_Python_PySpark_flatMap_function_example\" >Complete Python PySpark flatMap() function example<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#Some_more_example_of_flatMap_function\" >Some more example of flatMap() function<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#1_Using_range_in_flatmap_function\" >1. Using range() in flatmap() function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#2_Making_pairs_with_using_lambda_function\" >2. Making pairs with using lambda() function<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#Difference_between_map_and_flatMap_in_python\" >Difference between map() and flatMap() in python<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#map\" >map():<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#flatMap\" >flatMap():<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\" id=\"h-introduction\"><span class=\"ez-toc-section\" id=\"Introduction\"><\/span>Introduction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>In python, we have discussed many concepts and conversions. But sometimes, we come to a situation where we need to flatten the data frames\/RDD. In this tutorial, we will be discussing the concept of the python <strong>flatMap() function in the PySpark module. <\/strong>The flatMap() function is used to flatten the data frames\/RDD.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-is-rdd\"><span class=\"ez-toc-section\" id=\"What_is_RDD\"><\/span>What is RDD?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The RDD stands for Resilient Distributed Data set. It is the basic component of Spark. In this, Each data set is divided into logical parts, and these can be easily computed on different nodes of the cluster. They are operated in parallel. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-example-for-rdd\"><span class=\"ez-toc-section\" id=\"Example_for_RDD\"><\/span>Example for RDD<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In this example, you will get to see the flatMap() function with the use of lambda() function and <a href=\"http:\/\/www.pythonpool.com\/python-list-index-out-of-range\/\" target=\"_blank\" rel=\"noreferrer noopener\">range()<\/a> function in python. Firstly, we will take the input data. Then, the <strong>sparkcontext.parallelize()<\/strong> method is used to create a parallelized collection. Through this, we can distribute the data across multiple nodes instead of depending on a single node to process the data. Then, we will print the data in the parallelized form with the <a href=\"http:\/\/www.pythonpool.com\/python-help-function\/\" target=\"_blank\" rel=\"noreferrer noopener\"><span style=\"text-decoration: underline\">help<\/span><\/a> of for loop. Let us look at the example for understanding the concept in detail.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><strong>NOTE : Firstly, You have to install PySpark from the google to run all these code or programs<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom pyspark.sql import SparkSession\nspark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()\ninput_data = &#x5B;&quot;Python Pool&quot;,\n        &quot;Latracal Solutions&quot;,\n        &quot;Python pool is best&quot;,\n        &quot;Basic command in python&quot;]\nrdd=spark.sparkContext.parallelize(input_data)\nfor ele in rdd.collect():\n    print(ele)\n<\/pre><\/div>\n\n\n<p class=\"has-medium-font-size\"><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"916\" height=\"127\" src=\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/1-8.png\" alt=\"Example for RDD\" class=\"wp-image-10837\" srcset=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/1-8.png 916w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/1-8-300x42.png 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/1-8-768x106.png 768w\" sizes=\"(max-width: 916px) 100vw, 916px\" \/><\/figure>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Explanation:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Firstly, we will take the input in the input_data.<\/li><li>Then, we will apply sparkContext.parallelize() method.<\/li><li>And with the help of for loop, we will print the output by applying the method.<\/li><li>Hence, you can see the output.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-is-flatmap-function\"><span class=\"ez-toc-section\" id=\"What_is_flatMap_function\"><\/span>What is flatMap() function?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p><strong>The flatMap() function PySpark module is the transformation operation used for flattening the Dataframes\/RDD(array\/map DataFrame columns) after applying the function on every element and returns a new PySpark RDD\/DataFrame.<\/strong> <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-syntax\"><span class=\"ez-toc-section\" id=\"Syntax\"><\/span>Syntax<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>RDD.flatMap(f, preservesPartitioning=False)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-example-of-python-flatmap-function\"><span class=\"ez-toc-section\" id=\"Example_of_Python_flatMap_function\"><\/span>Example of Python flatMap() function<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In this example, you will get to see the flatMap() function with the use of lambda() function and range() function in python. Firstly, we will take the input data. Then, the sparkcontext.parallelize() method is used to create a parallelized collection. We can distribute the data across multiple nodes instead of depending on a single node to process the data. Then, we will apply the flatMap() function, inside which we will apply the lambda function. And then, we will print the element of input data with the help of for loop. Let us look at the example for understanding the concept in detail.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom pyspark.sql import SparkSession\nspark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()\ninput_data = &#x5B;&quot;Python Pool&quot;,\n        &quot;Latracal Solutions&quot;,\n        &quot;Python pool is best&quot;,\n        &quot;Basic command in python&quot;]\nrdd=spark.sparkContext.parallelize(input_data)\nrdd2=rdd.flatMap(lambda x: x.split(&quot; &quot;))\nfor ele in rdd2.collect():\n    print(ele)\n<\/pre><\/div>\n\n\n<p class=\"has-medium-font-size\"><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"905\" height=\"347\" src=\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/2-3.png\" alt=\"Example of Python flatMap() function\" class=\"wp-image-10838\" srcset=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/2-3.png 905w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/2-3-300x115.png 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/2-3-768x294.png 768w\" sizes=\"(max-width: 905px) 100vw, 905px\" \/><\/figure>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Explanation:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Firstly, we will take the input in the input_data.<\/li><li>Then, we will apply sparkContext.parallelize() method.<\/li><li>After that, we will apply the flatMap() function with the lambda function inside it.<\/li><li>At last, print the element with the help of for loop.<\/li><li>Hence, you can see the output.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-complete-python-pyspark-flatmap-function-example\"><span class=\"ez-toc-section\" id=\"Complete_Python_PySpark_flatMap_function_example\"><\/span>Complete Python PySpark flatMap() function example<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>In this example, you will get to see the flatMap() function with the use of lambda() function and range() function in python. Firstly, we will take the input data. Then, the sparkcontext.parallelize() method is used to create a parallelized collection. We can distribute the data across multiple nodes instead of depending on a single node to process the data. Then, we will apply the flatMap() function, inside which we will apply the lambda function. And then, we will print the element of input data with the help of for loop. Let us look at the example for understanding the concept in detail.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom pyspark.sql import SparkSession\nspark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()\ninput_data = &#x5B;&quot;Python Pool&quot;,\n        &quot;Latracal Solutions&quot;,\n        &quot;Python pool is best&quot;,\n        &quot;Basic command in python&quot;]\nrdd=spark.sparkContext.parallelize(input_data)\nfor element in rdd.collect():\n    print(element)\nprint(&quot;\\n&quot;)\nrdd2=rdd.flatMap(lambda x: x.split(&quot; &quot;))\nfor ele in rdd2.collect():\n    print(ele)\n<\/pre><\/div>\n\n\n<p class=\"has-medium-font-size\"><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"904\" height=\"491\" src=\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/3-5.png\" alt=\"Complete Python PySpark flatMap() function example\" class=\"wp-image-10839\" srcset=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/3-5.png 904w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/3-5-300x163.png 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/3-5-768x417.png 768w\" sizes=\"(max-width: 904px) 100vw, 904px\" \/><\/figure>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Explanation:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\" id=\"block-5cec7bdb-740a-4c91-9f3f-c41e14f2bd7a\"><li>Firstly, we will take the input in the input_data.<\/li><li>Then, we will apply sparkContext.parallelize() method.<\/li><li>And with the help of for loop, we will print the output by applying the method.<\/li><li>Then, we will print the output after applying the sparkContext.parallelize() method.<\/li><li>After that, we will apply the flatMap() function with the lambda function inside it.<\/li><li>At last, print the element with the help of for loop.<\/li><li>Hence, you can see the output.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-some-more-example-of-flatmap-function\"><span class=\"ez-toc-section\" id=\"Some_more_example_of_flatMap_function\"><\/span>Some more example of flatMap() function<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-1-using-range-in-flatmap-function\"><span class=\"ez-toc-section\" id=\"1_Using_range_in_flatmap_function\"><\/span>1. Using range() in flatmap() function<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In this example, you will get to see the flatMap() function with the use of lambda() function and range() function in python. <strong>sparkcontext.parallelize()<\/strong> method is used to create a parallelized collection. We can distribute the data across multiple nodes instead of depending on a single node to process the data. Then, we will apply the flatMap() function, inside which we will apply lambda function and range function. Let us look at the example for understanding the concept in detail.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nR = sparkContext.parallelize(&#x5B;2, 3, 4])\noutput = R.flatMap(lambda x: range(1, x)).collect()\nprint(output)\n<\/pre><\/div>\n\n\n<p class=\"has-medium-font-size\"><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"896\" height=\"86\" src=\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/4-3.png\" alt=\"Using range() in Python flatmap() function\" class=\"wp-image-10840\" srcset=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/4-3.png 896w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/4-3-300x29.png 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/4-3-768x74.png 768w\" sizes=\"(max-width: 896px) 100vw, 896px\" \/><\/figure>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Explanation:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Firstly, we will apply the sparkcontext.parallelize() method.<\/li><li>Then, we will apply the flatMap() function.<\/li><li>Inside which we have lambda and range function.<\/li><li>Then we will print the output.<\/li><li>The output is printed as the range is from 1 to x, where x is given above.<\/li><li>So first, we take x=2. so 1 gets printed. Then, x=3 so 1 and 2 get printed and then x=4, so 1,2,3 gets printed.<\/li><li>Hence, you can see the output, and if you want them in sorted order, you can apply the sorted function also.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-2-making-pairs-with-using-lambda-function\"><span class=\"ez-toc-section\" id=\"2_Making_pairs_with_using_lambda_function\"><\/span>2. Making pairs with using lambda() function<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In this example, you will get to see the flatMap() function with the use of <a href=\"http:\/\/python.org\/search\/?q=lambda\">lambda() function<\/a> and range() function in python. <strong>sparkcontext.parallelize()<\/strong> method is used to create a parallelized collection. We can distribute the data across multiple nodes instead of depending on a single node to process the data. Then, we will apply the given function, inside which we will apply the lambda function. Let us look at the example for understanding the concept in detail.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nR = sparkContest.parallelize(&#x5B;2, 3, 4])\noutput  = R.flatMap(lambda x: &#x5B;(x, x), (x, x)]).collect()\nprint(output)\n<\/pre><\/div>\n\n\n<p class=\"has-medium-font-size\"><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"909\" height=\"58\" src=\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/6-1.png\" alt=\"Making pairs with using lambda() function\" class=\"wp-image-10841\" srcset=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/6-1.png 909w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/6-1-300x19.png 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/6-1-768x49.png 768w\" sizes=\"(max-width: 909px) 100vw, 909px\" \/><\/figure>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Explanation:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\" id=\"block-775151df-3a5a-490b-804e-8f5252c591a3\"><li>Firstly, we will apply the sparkcontext.parallelize() method.<\/li><li>Then, we will apply the flatMap() function.<\/li><li>Inside which we have the lambda function.<\/li><li>Then we will print the output.<\/li><li>The output is printed as in the lambda part. You can see that there is written (x, x), (x, x). so firstly, (2, 2), (2, 2) gets printed, and so on.<\/li><li>Hence, you can see the output. <\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-difference-between-map-and-flatmap-in-python\"><span class=\"ez-toc-section\" id=\"Difference_between_map_and_flatMap_in_python\"><\/span>Difference between map() and flatMap() in python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-map\"><span class=\"ez-toc-section\" id=\"map\"><\/span>map():<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p><strong>The map() function is used to return a new RDD by applying a function to each element of the RDD. It can return only one item.<\/strong><\/p>\n\n\n\n<p>To know how <a href=\"http:\/\/www.pythonpool.com\/python-map-function\/\" target=\"_blank\" rel=\"noreferrer noopener\"><span style=\"text-decoration: underline\">python map()<\/span><\/a> function works, you can read our in-depth guide <a href=\"http:\/\/www.pythonpool.com\/python-map-function\/\" target=\"_blank\" rel=\"noreferrer noopener\"><span style=\"text-decoration: underline\">from here<\/span><\/a>. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-flatmap\"><span class=\"ez-toc-section\" id=\"flatMap\"><\/span>flatMap():<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p><strong>The flatMap() is just the same as the map(), it is used to return a new RDD by applying a function to each element of the RDD, but the output is flattened. In this function, we can return multiple lists of elements.<\/strong><\/p>\n\n\n\n<p>Let us look at the example for understanding the difference in detail.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nR = sparkContext.parallelize(&#x5B;3, 4, 5])\noutput = R.map(lambda x: range(1, x)).collect()\nprint(&quot;Output : &quot;,output)\n\nS = sparkContext.parallelize(&#x5B;3, 4, 5])\noutput1 = S.flatMap(lambda x: range(1, x)).collect()\nprint(&quot;Output : &quot;,output1)\n<\/pre><\/div>\n\n\n<p class=\"has-medium-font-size\"><strong>Output:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"904\" height=\"75\" src=\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/15.png\" alt=\"difference python map vs flatmap\" class=\"wp-image-10860\" srcset=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/15.png 904w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/15-300x25.png 300w, https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/15-768x64.png 768w\" sizes=\"(max-width: 904px) 100vw, 904px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>In this tutorial, we have learned about the concept of the flatMap() function in python. We have also seen what is RDD and What is flatMap() function? Then, we have discussed some examples of the function. All the ways are explained in detail with the help of examples. You can use any of the functions according to your choice and your requirement in the program. At last, discussed the difference between map() and flatMap() function.<\/p>\n\n\n\n<p>However, if you have any doubts or questions, do let me know in the comment section below. I will try to help you as soon as possible. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction In python, we have discussed many concepts and conversions. But sometimes, we come to a situation where we need to flatten the data frames\/RDD. &#8230; <\/p>\n<p class=\"read-more-container\"><a title=\"How to use the Pyspark flatMap() function in Python?\" class=\"read-more button\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/#more-10836\" aria-label=\"More on How to use the Pyspark flatMap() function in Python?\">Read more<\/a><\/p>\n","protected":false},"author":17,"featured_media":10880,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[3896],"tags":[3897,3898,3900,3901,3899],"class_list":["post-10836","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-pyspark","tag-flatmap-in-python","tag-flatmap-python","tag-flatmap-python-spark","tag-spark-python-flatmap","tag-whats-a-flatmap-in-python","infinite-scroll-item"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v20.1 (Yoast SEO v25.0) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to use the Pyspark flatMap() function in Python? - Python Pool<\/title>\n<meta name=\"description\" content=\"The python flatMap() function in the PySpark module is the transformation operation used for flattening the Dataframes\/RDD(array\/map columns).\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pythonpool.com\/python-flatmap\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to use the Pyspark flatMap() function in Python?\" \/>\n<meta property=\"og:description\" content=\"Introduction In python, we have discussed many concepts and conversions. But sometimes, we come to a situation where we need to flatten the data\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pythonpool.com\/python-flatmap\/\" \/>\n<meta property=\"og:site_name\" content=\"Python Pool\" \/>\n<meta property=\"article:published_time\" content=\"2021-04-28T07:56:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-05-01T11:46:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1350\" \/>\n\t<meta property=\"og:image:height\" content=\"650\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Siddharth Jain\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@pythonpool\" \/>\n<meta name=\"twitter:site\" content=\"@pythonpool\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Siddharth Jain\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/\"},\"author\":{\"name\":\"Siddharth Jain\",\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/person\/75a3240fabe5ea90200777a9d8d3b4fa\"},\"headline\":\"How to use the Pyspark flatMap() function in Python?\",\"datePublished\":\"2021-04-28T07:56:22+00:00\",\"dateModified\":\"2021-05-01T11:46:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/\"},\"wordCount\":1160,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.pythonpool.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg\",\"keywords\":[\"flatmap in python\",\"flatmap python\",\"flatmap python spark\",\"spark python flatmap\",\"whats a flatmap in python?\"],\"articleSection\":[\"PySpark\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.pythonpool.com\/python-flatmap\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/\",\"url\":\"https:\/\/www.pythonpool.com\/python-flatmap\/\",\"name\":\"How to use the Pyspark flatMap() function in Python? - Python Pool\",\"isPartOf\":{\"@id\":\"https:\/\/www.pythonpool.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg\",\"datePublished\":\"2021-04-28T07:56:22+00:00\",\"dateModified\":\"2021-05-01T11:46:28+00:00\",\"description\":\"The python flatMap() function in the PySpark module is the transformation operation used for flattening the Dataframes\/RDD(array\/map columns).\",\"breadcrumb\":{\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.pythonpool.com\/python-flatmap\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/#primaryimage\",\"url\":\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg\",\"contentUrl\":\"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg\",\"width\":1350,\"height\":650,\"caption\":\"python flatmap\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.pythonpool.com\/python-flatmap\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.pythonpool.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to use the Pyspark flatMap() function in Python?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.pythonpool.com\/#website\",\"url\":\"https:\/\/www.pythonpool.com\/\",\"name\":\"Python Pool\",\"description\":\"Your One-Stop Python Learning Destination\",\"publisher\":{\"@id\":\"https:\/\/www.pythonpool.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.pythonpool.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.pythonpool.com\/#organization\",\"name\":\"Python Pool\",\"url\":\"https:\/\/www.pythonpool.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/logo\/image\/\",\"url\":\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2020\/08\/aa.png\",\"contentUrl\":\"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2020\/08\/aa.png\",\"width\":452,\"height\":185,\"caption\":\"Python Pool\"},\"image\":{\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/pythonpool\",\"https:\/\/www.youtube.com\/c\/pythonpool\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/person\/75a3240fabe5ea90200777a9d8d3b4fa\",\"name\":\"Siddharth Jain\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.pythonpool.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/152b0ac2e5fa2e6328f374499fff4a7a6299477b9cf7bbb15ebc01a88f8f673f?s=96&d=wavatar&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/152b0ac2e5fa2e6328f374499fff4a7a6299477b9cf7bbb15ebc01a88f8f673f?s=96&d=wavatar&r=g\",\"caption\":\"Siddharth Jain\"}}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How to use the Pyspark flatMap() function in Python? - Python Pool","description":"The python flatMap() function in the PySpark module is the transformation operation used for flattening the Dataframes\/RDD(array\/map columns).","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pythonpool.com\/python-flatmap\/","og_locale":"en_US","og_type":"article","og_title":"How to use the Pyspark flatMap() function in Python?","og_description":"Introduction In python, we have discussed many concepts and conversions. But sometimes, we come to a situation where we need to flatten the data","og_url":"https:\/\/www.pythonpool.com\/python-flatmap\/","og_site_name":"Python Pool","article_published_time":"2021-04-28T07:56:22+00:00","article_modified_time":"2021-05-01T11:46:28+00:00","og_image":[{"width":1350,"height":650,"url":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg","type":"image\/jpeg"}],"author":"Siddharth Jain","twitter_card":"summary_large_image","twitter_creator":"@pythonpool","twitter_site":"@pythonpool","twitter_misc":{"Written by":"Siddharth Jain","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.pythonpool.com\/python-flatmap\/#article","isPartOf":{"@id":"https:\/\/www.pythonpool.com\/python-flatmap\/"},"author":{"name":"Siddharth Jain","@id":"https:\/\/www.pythonpool.com\/#\/schema\/person\/75a3240fabe5ea90200777a9d8d3b4fa"},"headline":"How to use the Pyspark flatMap() function in Python?","datePublished":"2021-04-28T07:56:22+00:00","dateModified":"2021-05-01T11:46:28+00:00","mainEntityOfPage":{"@id":"https:\/\/www.pythonpool.com\/python-flatmap\/"},"wordCount":1160,"commentCount":0,"publisher":{"@id":"https:\/\/www.pythonpool.com\/#organization"},"image":{"@id":"https:\/\/www.pythonpool.com\/python-flatmap\/#primaryimage"},"thumbnailUrl":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg","keywords":["flatmap in python","flatmap python","flatmap python spark","spark python flatmap","whats a flatmap in python?"],"articleSection":["PySpark"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.pythonpool.com\/python-flatmap\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.pythonpool.com\/python-flatmap\/","url":"https:\/\/www.pythonpool.com\/python-flatmap\/","name":"How to use the Pyspark flatMap() function in Python? - Python Pool","isPartOf":{"@id":"https:\/\/www.pythonpool.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pythonpool.com\/python-flatmap\/#primaryimage"},"image":{"@id":"https:\/\/www.pythonpool.com\/python-flatmap\/#primaryimage"},"thumbnailUrl":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg","datePublished":"2021-04-28T07:56:22+00:00","dateModified":"2021-05-01T11:46:28+00:00","description":"The python flatMap() function in the PySpark module is the transformation operation used for flattening the Dataframes\/RDD(array\/map columns).","breadcrumb":{"@id":"https:\/\/www.pythonpool.com\/python-flatmap\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pythonpool.com\/python-flatmap\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pythonpool.com\/python-flatmap\/#primaryimage","url":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg","contentUrl":"https:\/\/www.pythonpool.com\/wp-content\/uploads\/2021\/04\/Pyspark-flatMap-function-in-Python.jpg","width":1350,"height":650,"caption":"python flatmap"},{"@type":"BreadcrumbList","@id":"https:\/\/www.pythonpool.com\/python-flatmap\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pythonpool.com\/"},{"@type":"ListItem","position":2,"name":"How to use the Pyspark flatMap() function in Python?"}]},{"@type":"WebSite","@id":"https:\/\/www.pythonpool.com\/#website","url":"https:\/\/www.pythonpool.com\/","name":"Python Pool","description":"Your One-Stop Python Learning Destination","publisher":{"@id":"https:\/\/www.pythonpool.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pythonpool.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.pythonpool.com\/#organization","name":"Python Pool","url":"https:\/\/www.pythonpool.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pythonpool.com\/#\/schema\/logo\/image\/","url":"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2020\/08\/aa.png","contentUrl":"http:\/\/www.pythonpool.com\/wp-content\/uploads\/2020\/08\/aa.png","width":452,"height":185,"caption":"Python Pool"},"image":{"@id":"https:\/\/www.pythonpool.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/pythonpool","https:\/\/www.youtube.com\/c\/pythonpool"]},{"@type":"Person","@id":"https:\/\/www.pythonpool.com\/#\/schema\/person\/75a3240fabe5ea90200777a9d8d3b4fa","name":"Siddharth Jain","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pythonpool.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/152b0ac2e5fa2e6328f374499fff4a7a6299477b9cf7bbb15ebc01a88f8f673f?s=96&d=wavatar&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/152b0ac2e5fa2e6328f374499fff4a7a6299477b9cf7bbb15ebc01a88f8f673f?s=96&d=wavatar&r=g","caption":"Siddharth Jain"}}]}},"_links":{"self":[{"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/posts\/10836","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/comments?post=10836"}],"version-history":[{"count":11,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/posts\/10836\/revisions"}],"predecessor-version":[{"id":10957,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/posts\/10836\/revisions\/10957"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/media\/10880"}],"wp:attachment":[{"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/media?parent=10836"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/categories?post=10836"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pythonpool.com\/wp-json\/wp\/v2\/tags?post=10836"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}