1
$\begingroup$

I want to merge two CSV files with a common column using python pandas. With a 32 bit processor, after 2 gb memory, it will throw memory error. How can I do the same with multi processing or any other methods?

import gc
import pandas as pd
csv1_chunk = pd.read_csv('/home/subin/Desktop/a.txt',dtype=str, iterator=True, chunksize=1000)
csv1 = pd.concat(csv1_chunk, axis=1, ignore_index=True)
csv2_chunk = pd.read_csv('/home/subin/Desktop/b.txt',dtype=str, iterator=True, chunksize=1000)
csv2 = pd.concat(csv2_chunk, axis=1, ignore_index=True)
new_df = csv1[csv1["PROFILE_MSISDN"].isin(csv2["L_MSISDN"])]
new_df.to_csv("/home/subin/Desktop/apyb.txt", index=False)
gc.collect()
$\endgroup$
2
  • $\begingroup$ memory error as my understanding in 32 bit cpu only handle upto 2 gb for one process after that they will throw memory error $\endgroup$ Commented Jul 15, 2016 at 7:28
  • 1
    $\begingroup$ Try dask first. If that fails use a real database or Spark. $\endgroup$ Commented Jul 15, 2016 at 7:46

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.