7

Hi I am having trouble reshaping my df.

I have:

Netflix     TV      DVD 
   0.1      0.2     0.3
   0.12     0.5     0.15
   0.4      0.6     0.8
            0.5     0.41
            0.41
            0.2 

And I want to convert my df to look like:

Netflix  [0.1, 0.12, 0.4]
TV       [0.2, 0.5, 0.6, 0.5, 0.41, 0.2] 
DVD      [0.3, 0.15, 0.8, 0.41]

Not sure how stack() or pivot() would work on a df of this kind. Any help appreciated.

5
  • df.transpose() ? Commented Jul 8, 2019 at 14:19
  • So you are trying to turn each column into lists? Commented Jul 8, 2019 at 14:20
  • How can your pandas df have columns with different size ? Are the missing values in Netflix and DVD NaNs or None ? Commented Jul 8, 2019 at 14:24
  • @OlivierG yes they were NaNs, I wanted to create a list without these NaN values. Commented Jul 9, 2019 at 8:42
  • In this case, no. only have 24 columns with ~100 rows of data. Commented Jul 9, 2019 at 8:45

5 Answers 5

9

stack

Stacking drops null values while reshaping the array

df.stack().groupby(level=1).agg(list)

DVD                 [0.3, 0.15, 0.8, 0.41]
Netflix                   [0.1, 0.12, 0.4]
TV         [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

7

Remove missing values by Series.dropna and convert to Series in dictionary comprehension:

s = pd.Series({x: df[x].dropna().tolist() for x in df.columns})
print (s)
Netflix                   [0.1, 0.12, 0.4]
TV         [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
DVD                 [0.3, 0.15, 0.8, 0.41]
dtype: object

...or in DataFrame.apply:

s = df.apply(lambda x: x.dropna().tolist())
print (s)

Netflix                   [0.1, 0.12, 0.4]
TV         [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
DVD                 [0.3, 0.15, 0.8, 0.41]
dtype: object

Last if need 2 columns DataFrame:

df1 = s.rename_axis('a').reset_index(name='b')
print (df1)
         a                                b
0  Netflix                 [0.1, 0.12, 0.4]
1       TV  [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
2      DVD           [0.3, 0.15, 0.8, 0.41]

1 Comment

Your comprehension answer in golf form pd.Series({k: [*df[k].dropna()] for k in df})
4

I think this is what you are looking for:

> df.T.apply(lambda x: x.dropna().tolist(), axis=1)

Netflix    [0.1, 0.12, 0.4, 0.5, 0.41, 0.2]
TV                    [0.2, 0.5, 0.6, 0.41]
DVD                        [0.3, 0.15, 0.8]
dtype: object

Comments

2

Using groupby with columns

df.groupby(level=0,axis=1).apply(lambda x : x.dropna().iloc[:,0].tolist())
Out[20]: 
DVD                 [0.3, 0.15, 0.8, 0.41]
Netflix                   [0.1, 0.12, 0.4]
TV         [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
dtype: object

Comments

1

If missing values in each columns are NaNs, you can go with this :

df1 = pd.DataFrame({
    "Netflix":  [0.1, 0.12, 0.4, None, None, None],
    "TV":       [0.2, 0.5, 0.6, 0.5, 0.41, 0.2],
    "DVD":      [0.3, 0.15, 0.8, 0.41, None, None]
}
)
print(df1)

df2 = pd.DataFrame(df1.columns, columns=["Type"])
df2["List_for_Type"] = [
    list(df1[f].dropna())
    for f in df1.columns
]
print(df2)

The corresponding output is :

  Netflix    TV   DVD
0     0.10  0.20  0.30
1     0.12  0.50  0.15
2     0.40  0.60  0.80
3      NaN  0.50  0.41
4      NaN  0.41   NaN
5      NaN  0.20   NaN

      Type                    List_for_Type
0  Netflix                 [0.1, 0.12, 0.4]
1       TV  [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
2      DVD           [0.3, 0.15, 0.8, 0.41]

Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.