While working on a small ML project, I wanted to make the initial data validation step a bit faster.
Instead of going column by column to check missing values, correlations, distributions, duplicates, etc., I generated an automated profiling report from the dataframe.
It gave a pretty detailed breakdown:
-
Missing value patterns
-
Correlation heatmaps
-
Statistical summaries
-
Potential outliers
-
Duplicate rows
-
Warnings for constant/highly correlated features
I still dig into things manually afterward, but for a first pass it saves some time.
Curious....do you prefer fully manual EDA or using profiling tools for the initial sweep?