How to save a DataFrame to PostgreSQL in pyspark

This recipe helps you save a DataFrame to PostgreSQL in pyspark
Last Updated: 19 Jan 2023

Get access to Big Data projects View all Big Data projects

APACHE HADOOP PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to save a DataFrame to PostgreSQL in pyspark?

In most big data scenarios, data merging and aggregation are an essential part of the day-to-day activities in big data platforms. In this scenario, we will load the dataframe to the Postgres database table or save the dataframe to the table.

System requirements :

Install Ubuntu in the virtual machine click here
Install single-node Hadoop machine click here
Install pyspark or spark in Ubuntu click here
The below codes can be run in Jupyter notebook or any python console.

Learn to Build ETL Data Pipelines on AWS

Recipe Objective: How to save a DataFrame to PostgreSQL in pyspark?
Step 3: To View Data of the Data Frame
- Step 4: To Save Dataframe to Postgres Table
- Conclusion

Step 1: Import the modules

In this scenario, we are going to import the pyspark and pyspark SQL modules and create a spark session as below:

import pyspark from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession.builder.config("spark.jars", "/usr/local/postgresql-42.2.5.jar") \ .master("local").appName("PySpark_Postgres_test").getOrCreate()

The output of the code:

Step 2: Create Dataframe to store in Postgres

Here we will create a dataframe to save in a Postgres table for that the Row class is in the pyspark.sql submodule. As shown above, we import the Row from class.

studentDf = spark.createDataFrame([ Row(id=1,name='vijay',marks=67), Row(id=2,name='Ajay',marks=88), Row(id=3,name='jay',marks=79), Row(id=4,name='vinay',marks=67), ])

Explore SQL Database Projects to Add them to Your Data Engineer Resume.

The output of the code:

Step 3: To View Data of the Data Frame

Here we are going to view the data top 5 rows in the dataframe as shown below.

studentDf.show(5)

The output of the code:

Step 4: To Save Dataframe to Postgres Table

Here we are going to save the dataframe to the Postgres table which we created earlier. To save, we need to use a write and save method as shown in the below code.

studentDf.select("id","name","marks").write.format("jdbc")\ .option("url", "jdbc:postgresql://localhost:5432/dezyre_new") \ .option("driver", "org.postgresql.Driver").option("dbtable", "students") \ .option("user", "hduser").option("password", "bigdata").save()

The output of the code:

To check the output of the saved data frame in the Postgres table, log in Postgres database.

The output of the saved dataframe:

As shown in the above image, we have written the dataframe to create a table in Postgres.

Conclusion

Here we learned to save a DataFrame to PostgreSQL in pyspark.

Download Materials

bigdata_1

bigdata_2

bigdata_3

bigdata_4

bigdata_5

Download_and_install_VM_Ubuntu_ISO

Install_the_Pyspark_or_Spark_on_Ubuntu

Installation_of_single_node_hadoop

What Users are saying..

Jingwei Li

Graduate Research assistance at Stony Brook University

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

SQL Project for Data Analysis using Oracle Database-Part 2

In this SQL Project for Data Analysis, you will learn to efficiently analyse data using JOINS and various other operations accessible through SQL in Oracle Database.

View Project Details

Build Data Pipeline using Azure Medallion Architecture Approach

In this Azure Project, you will build a data pipeline to analyze large sensor data collected from water bodies across different European countries over several years using Azure Services and SQL Server to generate visualizations to gain valuable insights into water quality trends and determinands.

View Project Details

How to save a DataFrame to PostgreSQL in pyspark

Recipe Objective: How to save a DataFrame to PostgreSQL in pyspark?

System requirements :

Table of Contents

Step 1: Import the modules

Step 2: Create Dataframe to store in Postgres

Step 3: To View Data of the Data Frame

Step 4: To Save Dataframe to Postgres Table

Conclusion

What Users are saying..

Jingwei Li

Relevant Projects

You might also like

Relevant Projects