5

I am having a strange problem in Pandas. I have a Dataframe with several NaN values. I thought I could fill those NaN values using column means (that is, fill every NaN value with its column mean) but when I try the following

  col_means = mydf.apply(np.mean, 0)
  mydf = mydf.fillna(value=col_means)

I still see some NaN values. Why?

Is it because I have more NaN values in my original dataframe than entries in col_means? And what exactly is the difference between fill-by-column vs fill-by-row?

1 Answer 1

5

You can just fillna with the df.mean() Series (which is dict-like):

In [11]: df = pd.DataFrame([[1, np.nan], [np.nan, 4], [5, 6]])

In [12]: df
Out[12]:
    0   1
0   1 NaN
1 NaN   4
2   5   6

In [13]: df.fillna(df.mean())
Out[13]:
   0  1
0  1  5
1  3  4
2  5  6

Note: that df.mean() is the row-wise mean, which gives the fill values:

In [14]: df.mean()
Out[14]:
0    3
1    5
dtype: float64

Note: if df.mean() has some NaN values then these will be used in the DataFrame's fillna, perhaps you want to use a fillna on this Series i.e.

df.mean().fillna(0)
df.fillna(df.mean().fillna(0))
Sign up to request clarification or add additional context in comments.

14 Comments

Image
Thanks. I get nan is not defined with your first line.
You need to do from numpy import nan
Ah ok. Carry on then. :)
@PhillipCloud ah good point. I guess the answer is having NaNs in the dict fills them with NaNs :)
That could be wart. Not sure if the axis argument has any effect in fillna.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.