-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Problem description
I was looking into how to convert dataframes to numpy arrays so that both column dtypes and names would be retained, preferably in an efficient way so that memory is not duplicated while doing this. In some way, I would like to have a view on internal data already stored by dataframes as a numpy array. I am good with all datatypes already used in dataframe, and names there.
The issue is that both as_matrix and values convert dtypes of all values. And to_records does not create a simple numpy array.
I have found two potential StackOverflow answers:
- https://stackoverflow.com/questions/40554179/how-to-keep-column-names-when-converting-from-pandas-to-numpy
- https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array-preserving-index
But it seems to me that all those solutions copy data around through intermediate data structures, and then just store them into a new numpy array.
So I would ask for a way to get data as it is, without any conversions of dtypes, as a numpy array.
Output of pd.show_versions()
Details
INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.9.27-moby machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: None.Nonepandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None