6

I need to upload a lot of data to a MySQL db. For most models I use django's ORM, but one of my models will have billions (!) of instances and I would like to optimize its insert operation.

I can't seem to find a way to make executemany() work, and after googling it seems there are almost no examples out there.

I'm looking for the correct sql syntax + correct command syntax + correct values data structure to support an executemany command for the following sql statement:

INSERT INTO `some_table` (`int_column1`, `float_column2`, `string_column3`, `datetime_column4`) VALUES (%d, %f, %s, %s)

Yes, I'm explicitly stating the id (int_column1) for efficiency.

A short example code would be great

4 Answers 4

19

Here's a solution that actually uses executemany() !

Basically the idea in the example here will work.

But note that in Django, you need to use the %s placeholder rather than the question mark.

Also, you will want to manage your transactions. I'll not get into that here as there is plenty of documentation available.

    from django.db import connection,transaction
    cursor = connection.cursor()
    
    
    
    query = ''' INSERT INTO table_name 
            (var1,var2,var3) 
            VALUES (%s,%s,%s) '''
    
    
    query_list = build_query_list() 
    
    # here build_query_list() represents some function to populate
    # the list with multiple records
    # in the tuple format (value1, value2, value3).
    
    
    cursor.executemany(query, query_list)
    
    transaction.commit()

Sign up to request clarification or add additional context in comments.

6 Comments

Half a year late, but finally an answer regarding executemany(). I haven't tested your answer, but I'm accepting it and hoping your code works
I can't find documentation for a function buildQueryList() anywhere. Is it part of Django?
if you read the comments...it's up to YOU to build that function based on your application's specific needs / database schema... #here buildQueryList() represents some function to populate #the list with multiple records #in the tuple format (value1,value2,value3).
For future people looking for the same thing: jsh's snippet works. Thanks
Note that if you are using MySQLdb 1.2 you'll run into a very unfortunate performance regression. The workaround is to use lowercase "values". Seriously. See stackoverflow.com/questions/3945642/…
|
1

are you serisouly suggesting loading billions of rows (sorry instances) of data via some ORM data access layer - how long do you have ?

bulk load if possible - http://dev.mysql.com/doc/refman/5.1/en/load-data.html

7 Comments

I think you missed where I wrote I'm using ORM for most models (which have only a few hundred of instances) and using raw SQL for the single model who has billions of instances, but the link is helpful thanks
Also, I can't really use load-data, as I do manipulate the data before I enter it to the db...
Lastly, yes, it's that important, even if it takes a few days to insert to the db. It's a one time operation (until the next time...)
why dont you write a Dump2CSV method which processs your instances and creates a series of .dat files which you can load into your db ?
So what I guess you're saying is that no matter what the modification on the data is, I should separate the loading process from the data manipulation part and load using load-data
|
1

If you need to modify the data, bulk load with load data into a temporary table as is. Then apply modifications with an insert into select command. IME, this is by far the fastest way to get a lot of data into a table.

Comments

0

I'm not sure how to use the executemany() command, but you can use a single SQL INSERT statement to insert multiple records

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.