Skip to content

Efficient conversion from dataframes to numpy arrays retaining column dtypes and names #16561

@mitar

Description

@mitar

Problem description

I was looking into how to convert dataframes to numpy arrays so that both column dtypes and names would be retained, preferably in an efficient way so that memory is not duplicated while doing this. In some way, I would like to have a view on internal data already stored by dataframes as a numpy array. I am good with all datatypes already used in dataframe, and names there.

The issue is that both as_matrix and values convert dtypes of all values. And to_records does not create a simple numpy array.

I have found two potential StackOverflow answers:

But it seems to me that all those solutions copy data around through intermediate data structures, and then just store them into a new numpy array.

So I would ask for a way to get data as it is, without any conversions of dtypes, as a numpy array.

Output of pd.show_versions()

Details INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.9.27-moby machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions