Two simple ways to upload csv files directly in a PostgreSQL database with Python:
In this article, we will discuss two different ways to populate a PostgresSQL database with a csv file. The PostgreSQL is a type of data storage called a relational database.It is one of the biggest open source relational databases. Relational databases are commonly used as data storage for web contents, big businesses and other data platforms.
Now before moving further make sure you have the necessary tools installed on your system, you should have a local version of Postgres server installed on your computer or an access to a remote Postgres server. ( link for Postgres installation guide wiki page )
The resources we are using here are two Python libraries pandas and psycopg2 and a open-source SQL toolkit SQLAlchemy for Python programming language. Now, we will run through an example of creating a database and a table from scratch and then load a data set into a local running Postgres server. First let us start with psycopg2 library installation.
Using psycopg2:
pip install psycopg2
Connecting to Postgres server
import psycopg2 as ps
import pandas as pd
conn = ps.connect(host='name_of_host_server',
user='your_username',
password ='XXXX',
port=5432)
Let us describe these parameters that we have passed into the psycopg2.connect() method. It is necessary to specify a host describing where your Postgres server is running, usually the name 'localhost' is taken as a standard practice. Then, we need to pass in the port number which by default is 5432 for Postgres server and a username . But, can you see that I have not passed a 'database' parameter, why? Because we will create one through this connection! I have just created an instance of the server for now. In the next lines of code we will create a database.
cur = conn.cursor()
cur.execute("CREATE DATABASE postgres")
conn.get_dsn_parameters()
conn.close()
So, with the connect() method we created a connection object here using which we created a cursor object which we have to use to execute our commands. In the end we have closed the connection.
Now, we will create table.
conn = ps.connect(host='name_of_host_server',
database = 'postgres'
user='your_username',
password ='XXXX',
port=5432)
cur = conn.cursor()
cur.execute("""CREATE TABLE class(
roll integer PRIMARY KEY,
first_name text,
surname text,
address text
)
""")
conn.commit()
If you are familiar with the concept of sql-transactions then you would know that commit() method must be run in the end to create our table. (Till this time you probably would have encountered a couple of glitches but don't worry mention them in comments , we will sort it out.) (Hints rollback() and set_isolation_level() )
Loading csv file : Here we use copy_from() method which like the execute() method is also attached to the Cursor object. However, it differs from the execute() method due to its parameters, here we have passed 'f' a file object as an argument along with the tablename in which we want to upload the csv file and "sep" is the specified delimiter of the fields (columns) of the csv file. In the end we commit the whole transaction block and successfully loaded the class.csv into our table.
cur = conn.cursor()
with open('class.csv', 'r') as f:
# Skip the header row.
next(f)
cur.copy_from(f, 'users', sep=',')
conn.commit()
Second method: Using SQLAlchemy
This method of uploading a csv file into a Postgres server is very simple and unique, the sqlalchemy toolkit takes care of many things in the backend. Primarily, it uses Database URLs which follow RFC-1738 protocol to create an engine object. The syntax of the url is something like this:
dialect+driver://username:password@host:port/database
Where dialect names include the SQLAlchemy dialect such as sqlite, mysql, postgresql, oracle etc and drivername is the name of the python DBAPI to be used to connect to the databases such psycopg2 ,pymysql, pg8000, cx_oracle etc.
Now, we will instantiate an engine object by the following code:
from sqlalchemy import create_engine
#dialect+driver://username:password@host:port/database
engine=create_engine("postgresql+psycopg2://postgres:xxxx@localhost:5432/postgres")
Where the correspondence is in this order :
"Dialect+DBAPI_Python://username:your_password@name_of_host_server:port_number/databasename"
Once engine object is created, we just have to use pandas to_sql() method to upload our csv file directly into the table. The arguments we have passed are the tablename, the engine object and if_exists key argument to which we have given "replace" parameter here but you can also pass "append" if you want to add-on to the existing table in the database. Finally the engine is closed by engine.dispose() method. And your class.csv is successfully uploaded.
import pandas as pd
df = pd.read_csv('class.csv')
try:
df.to_sql('class', engine, if_exists= 'replace', index= False)
except:
print("Sorry, some error has occurred!")
finally:
engine.dispose()
So, in these two easy ways you can populate your PostgreSQL server directly from a csv file. I hope you may find this article interesting and useful. Thank you!
Thank you
thanks 😉
I realized I forgot to specify column data type when reading from excel. All good!
Both approaches worked. the 1st approach worked fine but 2nd approach is easier without creating a table before upload. But the 2nd approach gave me unexpected format. I used pd.read_excel to upload an excel file with numbers in text. the table in Postgres converted the numbers into integer. How to keep the original data format in Excel using the 2nd approach?