Skip to content

some argument combinations with reindex fails on an empty dataframe #27315

@ajspera

Description

@ajspera
import pandas as pd
from datetime import datetime, timedelta

end = datetime.utcnow()
begin = end - timedelta(minutes=1)
data_interval = 10
date_index = pd.date_range(start=begin,
                           end=end,
                           freq='{} s'.format(data_interval))
df = pd.DataFrame([], columns=['time','a','b'])
df = df.set_index('time', drop=True)
tol = timedelta(seconds=9)
df = df.reindex(date_index, method='pad', tolerance=tol)

# IndexError: index -1 is out of bounds for axis 0 with size 0

df = pd.DataFrame([], columns=['time','a','b'])
df = df.reindex(date_index, method='nearest')

# IndexError: index -1 is out of bounds for axis 0 with size 0

You get an index error when a dataframe is empty using the tolerance= or method='nearest' .

This is not something that happens with other usages of reindex and can come up as a surprise when reindexing an empty window of data. I would expect it to behave the same as it does without tolerance here.

Expected Output

Should be same as reindex with no args in this case which returns...

                              a    b
2019-07-09 22:35:05.165640  NaN  NaN
2019-07-09 22:35:15.165640  NaN  NaN
2019-07-09 22:35:25.165640  NaN  NaN
2019-07-09 22:35:35.165640  NaN  NaN
2019-07-09 22:35:45.165640  NaN  NaN
2019-07-09 22:35:55.165640  NaN  NaN
2019-07-09 22:36:05.165640  NaN  NaN

Temp Solution

Simple user solution is to check length... but this is a problem that might surprise someone at a bad time like it did for us.

if(len(df) is 0):
    df = df.reindex(date_index)
else:
    df = df.reindex(date_index, method='pad', tolerance=tol)

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-54-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 5.0.1
pip: 18.0
setuptools: 40.4.1
Cython: None
numpy: 1.16.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.1.8
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.3.5
pymysql: 0.9.3
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: 0.6.1
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions