In pandas, how to pivot a dataframe on a categorical series with missing categories?

Question

I have a pandas dataframe with a categorical series that has missing categories.

In the example shown below, group has the categories "a", "b", and "c", but there are no cases of "c" in the dataframe.

import pandas as pd
dfr = pd.DataFrame({
    "id": ["111", "222", "111", "333"], 
    "group": ["a", "a", "b", "b"], 
    "value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
dfr.pivot(index="id", columns="group")

The resulting pivoted dataframe has columns a and b. I expected a c column containing all missing value as well.

      value      
group     a     b
id               
111     1.0   9.0
222     4.0   NaN
333     NaN  16.0

How can I pivot a dataframe on a categorical series to include columns with all categories, regardless of whether they were present in the original dataframe?

Learning is a mess · Accepted Answer · 2021-12-01 15:56:25Z

5

pd.pivot_table has a dropna argument which dictates dropping or not value columns full of NaNs.

Try setting it to False:

import pandas as pd
dfr = pd.DataFrame({
    "id": ["111", "222", "111", "333"], 
    "group": ["a", "a", "b", "b"], 
    "value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
pd.pivot_table(dfr, index="id", columns="group", dropna=False)

edited Dec 1, 2021 at 15:56

answered Dec 1, 2021 at 15:50

Learning is a mess

8,3178 gold badges42 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

not_speshal · Accepted Answer · 2021-12-01 16:02:18Z

1

You can reindex. This will work even if your value column is not numerical (unlike pivot_table):

output = (dfr.pivot(index="id", columns="group")
             .reindex(columns=pd.MultiIndex.from_product([["value"],
                                                          dfr["group"].cat.categories]
                                                         )
                      )
             )

>>> output
    value          
        a     b   c
id                 
111   1.0   9.0 NaN
222   4.0   NaN NaN
333   NaN  16.0 NaN

answered Dec 1, 2021 at 16:02

not_speshal

23.2k2 gold badges18 silver badges33 bronze badges

Collectives™ on Stack Overflow

In pandas, how to pivot a dataframe on a categorical series with missing categories?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related