This is reproducible in current latest Pandas 1.5.2.
In Python the zipfile.Path class is intendent to act similar (but not absolute equal!) to pathlib.Path. The latter is accepted by pandas but not the first.
Steps to reproduce:
- Create a zip file named
foo.zip with one an csv-file in it named bar.csv.
- Create a path object directly pointing to that csv file in the zip file:
zp = zipfile.Path('foo.zip', 'bar.csv')
- Use that path object (
zp) in pandas.read_csv() as path object.
Because of that part of your code
|
# is_file_like requires (read | write) & __iter__ but __iter__ is only |
|
# needed for read_csv(engine=python) |
|
if not ( |
|
hasattr(filepath_or_buffer, "read") or hasattr(filepath_or_buffer, "write") |
|
): |
|
msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}" |
|
raise ValueError(msg) |
Python raise an " ValueError: Invalid file path or buffer object type: <class 'zipfile.Path'>".
EDIT:
I'm aware that pandas.read_csv() do offer the compressions argument and can read compressed csv files by its own. But this doesn't help in my case. I'm using pandas as a backend for a more higher level API reading data files. Pandas is just one part of it. And one shortcoming of pandas here is that it is not able to deal with ZIP files containing multiple CSV files.
pathlib.Path and zipfile.Path are standard python. And pandas IMHO should be able to deal with it.
This is reproducible in current latest Pandas
1.5.2.In Python the
zipfile.Pathclass is intendent to act similar (but not absolute equal!) topathlib.Path. The latter is accepted bypandasbut not the first.Steps to reproduce:
foo.zipwith one an csv-file in it namedbar.csv.zp = zipfile.Path('foo.zip', 'bar.csv')zp) inpandas.read_csv()as path object.Because of that part of your code
pandas/pandas/io/common.py
Lines 446 to 452 in 3b09765
Python raise an " ValueError: Invalid file path or buffer object type: <class 'zipfile.Path'>".
EDIT:
I'm aware that
pandas.read_csv()do offer thecompressionsargument and can read compressed csv files by its own. But this doesn't help in my case. I'm usingpandasas a backend for a more higher level API reading data files. Pandas is just one part of it. And one shortcoming of pandas here is that it is not able to deal with ZIP files containing multiple CSV files.pathlib.Pathandzipfile.Pathare standard python. And pandas IMHO should be able to deal with it.