15

Given a series

s = pd.Series([1.1, 1.2, np.nan])
s
0    1.1
1    1.2
2    NaN
dtype: float64

If the need arises to convert the NaNs to None (to, for example, work with parquets), then I would like to have

0     1.1
1     1.2
2    None
dtype: object

I would assume Series.replace would be the obvious way of doing this, but here's what the function returns:

s.replace(np.nan, None)

0    1.1
1    1.2
2    1.2
dtype: float64

The NaN was forward filled, instead of being replaced. Going through the docs, I see that if the second argument is None, then the first argument should be a dictionary. Based on this, I would expect replace to either replace as intended, or throw an exception.

I believe the workaround here is

pd.Series([x if pd.notna(x) else None for x in s], dtype=object) 
0     1.1
1     1.2
2    None
dtype: object

Which is fine. But I would like to understand why this behaviour occurs, whether it is documented, or if it is just a bug and I have to dust off my git profile and log one on the issue tracker... any ideas?

17
  • 2
    s.where(s.notnull(),None) is another cleaner workaround I guess Commented Jan 3, 2019 at 11:38
  • 4
    to me this looks like a bug, I would expect it to throw an exception or do nothing, forward filling is incorrect, I would file this as an issue: github.com/pandas-dev/pandas/issues Commented Jan 3, 2019 at 11:42
  • 2
    @coldspeed Yes now I get it it is different. The worst part now is I am going through some of my own implementations just to check whether a bug has creeped in because of this. Thanks for the question! s.replace(np.nan, None) is in fact counterintuitive when it forward fills Commented Jan 3, 2019 at 11:43
  • 4
    this works s.replace({np.nan:None}) but I'd expect the less verbose method to behave the same Commented Jan 3, 2019 at 11:44
  • 1
    Here's Nicki's workaround. We might close that as a duplicate if you get an authoritative response to this one. Commented Jan 3, 2019 at 11:56

1 Answer 1

7

This behaviour is in the documentation of the method parameter:

method : {‘pad’, ‘ffill’, ‘bfill’, None}

The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.

So in your example to_replace is a scalar, and value is None. The method by default is pad, from the documentation of fillna:

pad / ffill: propagate last valid observation forward to next valid
Sign up to request clarification or add additional context in comments.

11 Comments

This still doesn't explain why the NaNs are forward filled, though?
Well that would suggest s.replace(np.nan, None, method=None) would work but it doesn't and borks
@coldspeed the method by default is pad
To me this is unexpected, it's a special edge case which is not what I would expect given that if there was no match say s.replace('foo',None) then it would return the original series unchanged
@ayhan ah OK, to me this is weird, it will probably not be changed given it's documented but it's unexpected to me, I wouldn't expect this behaviour, normally nothing happens or the exact matched value is replaced, I wouldn't use replace to ffill or bfill as a consequence
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.