4

I have to fill a column only if all the values of that column are null. For example c

df = pd.DataFrame(data = {"col1":[3, np.nan, np.nan, 21, np.nan],
                          "col2":[4, np.nan, 12, np.nan, np.nan],
                          "col3":[33, np.nan, 55, np.nan, np.nan],
                          "col4":[np.nan, np.nan, np.nan, np.nan, np.nan]})


>>> df
   col1  col2  col3  col4
0   3.0   4.0  33.0   NaN
1   NaN   NaN   NaN   NaN
2   NaN  12.0  55.0   NaN
3  21.0   NaN   NaN   NaN
4   NaN   NaN   NaN   NaN

In the above example, I have to replace the values of col4 with 100 since all the values are null/NaN.

So for the above example. I have to get the output as below.

   col1  col2  col3  col4
0   3.0   4.0  33.0   100
1   NaN   NaN   NaN   100
2   NaN  12.0  55.0   100
3  21.0   NaN   NaN   100
4   NaN   NaN   NaN   100

Tried using the below command. But its replacing values of a column only if it contains atleast 1 non-nan value

df.where(df.isnull().all(axis=1), df.fillna(100), inplace=True)

Could you please let me know how to do this.

Thanks

0

5 Answers 5

2

Use indexing:

df.loc[:, df.isna().all()] = 100
print(df)

# Output:
   col1  col2  col3   col4
0   3.0   4.0  33.0  100.0
1   NaN   NaN   NaN  100.0
2   NaN  12.0  55.0  100.0
3  21.0   NaN   NaN  100.0
4   NaN   NaN   NaN  100.0
Sign up to request clarification or add additional context in comments.

Comments

2

This command should help:

df.loc[:, df.isnull().all(axis=0)] = 100

Output:

    col1    col2    col3    col4
0   3.0     4.0     33.0    100.0
1   NaN     NaN     NaN     100.0
2   NaN     12.0    55.0    100.0
3   21.0    NaN     NaN     100.0
4   NaN     NaN     NaN     100.0

Comments

2

Warning: after OP modified the question and gave more details on the exact expected output, this answer is no longer valid.

You almost had it ;)

df['col4'] = df['col4'].mask(df.isnull().all(axis=1), 100)

Your errors:

  • you used where, so you should have inverted your condition (I used mask as an alternative)
  • you only need a scalar as replacement
  • you only needed to apply the operation on 'col4'

output:

       col1      col2      col3      col4
0    3.0000    4.0000   33.0000       NaN
1       NaN       NaN       NaN  100.0000
2       NaN   12.0000   55.0000       NaN
3   21.0000       NaN       NaN       NaN
4       NaN       NaN       NaN  100.0000

2 Comments

Thanks for the quick reply. Issue is that I will be receiving an avro file and the number of columns may be more than 50. So cant do for each column. Could you please help me if we can do this without referencing the column name. Thanks again
Can you provide a short example and the expected output?
1

Since you could have multiple columns which are all having NaN, you can do this:

In [717]: cols = df.columns[df.isna().all()]
In [718]: cols
Out[718]: Index(['col4'], dtype='object')

You can replace multiple columns at once:

In [720]: df[cols] = 100

In [721]: df
Out[721]: 
   col1  col2  col3   col4
0   3.0   4.0  33.0  100.0
1   NaN   NaN   NaN  100.0
2   NaN  12.0  55.0  100.0
3  21.0   NaN   NaN  100.0
4   NaN   NaN   NaN  100.0

Comments

1

use the loc accessor to update any row that does not have all values

 df.loc[:,~df.notna().any()]=100

output

col1  col2  col3   col4
0   3.0   4.0  33.0  100.0
1   NaN   NaN   NaN  100.0
2   NaN  12.0  55.0  100.0
3  21.0   NaN   NaN  100.0
4   NaN   NaN   NaN  100.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.