-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
-
I have checked that this issue has not already been reported (might be another variant of Correlation inconsistencies between Series and DataFrame #20954).
-
I have confirmed this bug exists on the latest version of pandas (
1.1.3). -
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
for length in [2, 3, 5, 10, 20]:
print(pd.DataFrame(length*[[0.42, 0.1]], columns=["A", "B"]).corr())gives
A B
A NaN NaN
B NaN NaN
A B
A NaN NaN
B NaN 1.0
A B
A 1.0 NaN
B NaN NaN
A B
A 1.0 -1.0
B -1.0 1.0
A B
A 1.0 1.0
B 1.0 1.0
Problem description
Inconsistent output with slightly varying number of rows. Would expect correlation between series where at least one of them is constant, to be NaN.
This makes e.g. code dependent on dropna() usage after calculating corr() difficult/error prone, as behaviour is inconsistent.
Expected Output
Either consistent NaN output when calculating correlation with constant data, or a warning in pandas.DataFrame.corr documentation stating that returned correlation between constant series can be anything from [1.0, -1.0, NaN].