Describe the bug
CSVDataset accepts pandas DataFrames as input for src. But it makes assumptions about the index.
This is because convert_tables_to_dicts uses .loc instead of .iloc. It generates ordinal indexes to subset on but treats them as names indices.
|
data_ = df.loc[rows] if col_names is None else df.loc[rows, col_names] |
To Reproduce
import numpy
import pandas
import monai
df = pandas.DataFrame(numpy.random.random((50, 3)))
df_subset = df.iloc[numpy.arange(0, 50, 5)]
print(df_subset.shape) # (10, 3)
ds = monai.data.CSVDataset(df_subset)
print(len(ds)) # 3
Expected behavior
print(len(ds)) should return 10.
It returns 3 because it looks up indices slice(10), which match indices 0, 5 and 10 from the subset.
Environment
Shouldn't be relevant?
Additional context
Simple fix:
|
data_ = df.loc[rows] if col_names is None else df.loc[rows, col_names] |
The first .loc should be .iloc, and the second should be .iloc[rows][col_names]
Describe the bug
CSVDataset accepts pandas DataFrames as input for src. But it makes assumptions about the index.
This is because
convert_tables_to_dictsuses.locinstead of.iloc. It generates ordinal indexes to subset on but treats them as names indices.MONAI/monai/data/utils.py
Line 1494 in 0bb20a8
To Reproduce
Expected behavior
print(len(ds))should return 10.It returns 3 because it looks up indices slice(10), which match indices 0, 5 and 10 from the subset.
Environment
Shouldn't be relevant?
Additional context
Simple fix:
MONAI/monai/data/utils.py
Line 1494 in 0bb20a8
The first .loc should be .iloc, and the second should be .iloc[rows][col_names]