pandas replace NaN to None exhibits counterintuitive behaviour

Question

Given a series

s = pd.Series([1.1, 1.2, np.nan])
s
0    1.1
1    1.2
2    NaN
dtype: float64

If the need arises to convert the NaNs to None (to, for example, work with parquets), then I would like to have

0     1.1
1     1.2
2    None
dtype: object

I would assume Series.replace would be the obvious way of doing this, but here's what the function returns:

s.replace(np.nan, None)

0    1.1
1    1.2
2    1.2
dtype: float64

The NaN was forward filled, instead of being replaced. Going through the docs, I see that if the second argument is None, then the first argument should be a dictionary. Based on this, I would expect replace to either replace as intended, or throw an exception.

I believe the workaround here is

pd.Series([x if pd.notna(x) else None for x in s], dtype=object) 
0     1.1
1     1.2
2    None
dtype: object

Which is fine. But I would like to understand why this behaviour occurs, whether it is documented, or if it is just a bug and I have to dust off my git profile and log one on the issue tracker... any ideas?

s.where(s.notnull(),None) is another cleaner workaround I guess — Vivek Kalyanarangan
– Vivek Kalyanarangan, Commented Jan 3, 2019 at 11:38
to me this looks like a bug, I would expect it to throw an exception or do nothing, forward filling is incorrect, I would file this as an issue: github.com/pandas-dev/pandas/issues — EdChum
– EdChum, Commented Jan 3, 2019 at 11:42
@coldspeed Yes now I get it it is different. The worst part now is I am going through some of my own implementations just to check whether a bug has creeped in because of this. Thanks for the question! s.replace(np.nan, None) is in fact counterintuitive when it forward fills — Vivek Kalyanarangan
– Vivek Kalyanarangan, Commented Jan 3, 2019 at 11:43
this works s.replace({np.nan:None}) but I'd expect the less verbose method to behave the same — EdChum
– EdChum, Commented Jan 3, 2019 at 11:44
Here's Nicki's workaround. We might close that as a duplicate if you get an authoritative response to this one. — user2285236
– user2285236, Commented Jan 3, 2019 at 11:56

Dani Mesejo · Accepted Answer · 2022-01-04 23:15:33Z

7

This behaviour is in the documentation of the method parameter:

method : {‘pad’, ‘ffill’, ‘bfill’, None}

The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.

So in your example to_replace is a scalar, and value is None. The method by default is pad, from the documentation of fillna:

pad / ffill: propagate last valid observation forward to next valid

edited Jan 4, 2022 at 23:15

answered Jan 3, 2019 at 11:46

Dani Mesejo

62.3k6 gold badges57 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

coldspeed95 Over a year ago

This still doesn't explain why the NaNs are forward filled, though?

EdChum Over a year ago

Well that would suggest s.replace(np.nan, None, method=None) would work but it doesn't and borks

Dani Mesejo Over a year ago

@coldspeed the method by default is pad

EdChum Over a year ago

To me this is unexpected, it's a special edge case which is not what I would expect given that if there was no match say s.replace('foo',None) then it would return the original series unchanged

EdChum Over a year ago

@ayhan ah OK, to me this is weird, it will probably not be changed given it's documented but it's unexpected to me, I wouldn't expect this behaviour, normally nothing happens or the exact matched value is replaced, I wouldn't use replace to ffill or bfill as a consequence

|

Collectives™ on Stack Overflow

pandas replace NaN to None exhibits counterintuitive behaviour

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related