4

I am working with two dataframes:

  • df contains a column be/me for stocks for a 20-year period (on a monthly basis).
  • df2, a subset of df (with only certain stocks, only for June) contains the column decile, created via the pd.qcut() method for every year in the 20-year period based on an altered version of df's be/me.

Considering the deciles that I created in df2, I wonder if it's possible to rank df's be/me based on df2's decile column. In other words, I wonder if it's possible to assign df's be/me values to the deciles created in df2.

Please see dataframes below for a better understanding of the issue:

             
df

date         stock_id      be/me    
2000-01-31    1004.0        0.3      
2000-02-29    1004.0        0.7 
2000-03-31    1004.0        1.2 
2000-04-30    1004.0        2.3 
2000-05-31    1004.0        0.9 
...            ...          ...
2020-12-31    3900.0        1.7
2020-12-31    3900.0        2.8
2020-12-31    3900.0        3.0
2020-12-31    3900.0        0.2
2020-12-31    3900.0        2.1

1218855 rows × 3 columns


df2['deciles'] = df2.groupby('date')['be/me'].transform(lambda x: pd.qcut(x, 10, labels=False, duplicates = 'drop'))
df2

date        stock_id      be/me  deciles                    
2000-06-30  2061.0      0.653684    5
2000-06-30  4383.0      0.053660    2
2000-06-30  13561.0     0.092509    2
2000-06-30  4065.0      1.342187    6
2000-06-30  2731.0      0.235582    3
  ...         ...          ...     ...
2020-06-30  7022.0      0.072534    2
2020-06-30  30990.0     1.071096    6
2020-06-30  22867.0     1.627155    6
2020-06-30  15247.0     0.051387    2
2020-06-30  61574.0     1.684690    6

24095 rows × 4 columns

Note: date is of type datetime and, for each date, there are multiple stocks (stock_id). Thank you so much for your time.

EDIT

What I want to do is to check in which df2-created decile the original be/me values (from the original dataframe df) fit. The expected output should be a new column in df with df2-created deciles attributed to each and every be/me value in df. Please let me know if there is any additional clarification necessary.

I created a function that loops through the deciles to fetch the maximum decile value for every date in df2. Not sure if I am heading in the right direction since the output is an array with no date... take a look below:

In: def attribution(deciles,dates):
    deciles = df2['deciles'].unique()
    dates = df2.index.unique()
    body_max = []
    body_min = []
    for x in deciles:
        for y in dates:
            body_max.append(df2[df2['deciles'] == x].loc[y]['be/me'].max())
            body_min.append(df2[df2['deciles'] == x].loc[y]['be/me'].min())
            
    return body_max, body_min

In: attribution(deciles, dates)
Out: [0.9343106070197438,
 1.2747264875802489,
 1.9700461181925901,
 0.7888946814157697,
 0.9304702071896337,
 0.9651423313922733,
 0.7238677612487585,
 1.0358317574924074,
 ...]
2
  • Will you please add a sample dataframe containing your expected output? Commented Nov 21, 2021 at 20:45
  • Please see edit above, @user17242583. Thank you. Commented Nov 22, 2021 at 0:19

1 Answer 1

2

To be clear: you want to know for each be/me value in df which decile it would have fallen into if that value had been in df2? I see two cases:

  1. If df2 covers the whole month of June (as you wrote), I am afraid there is no answer to that question: each day in the month will have decile bins with different edges (since you are doing a groupby('date') on df2). The same be/me value in df could belong to different deciles in df2 depending on the day in June you consider.

  2. If df2 actually covers only one day in June (as your example above seems to indicate: 2020-06-30), then you have one well defined set of decile bins.

In case 2), you could do that:

df
        date  stock_od  be/me
0 2000-01-31    1004.0    0.3
1 2000-02-29    1004.0    0.7
2 2000-03-31    1004.0    1.2
3 2000-04-30    1004.0    2.3
4 2000-05-31    1004.0    0.9
5 2020-12-31    3900.0    1.7
6 2020-12-31    3900.0    2.8
7 2020-12-31    3900.0    3.0
8 2020-12-31    3900.0    0.2
9 2020-12-31    3900.0    2.1

df2
        date  stock_id     be/me
0 2000-06-30    2061.0  0.653684
1 2000-06-30    4383.0  0.053660
2 2000-06-30   13561.0  0.092509
3 2000-06-30    4065.0  1.342187
4 2000-06-30    2731.0  0.235582
5 2000-06-30    7022.0  0.072534
6 2000-06-30   30990.0  1.071096
7 2000-06-30   22867.0  1.627155
8 2000-06-30   15247.0  0.051387
9 2000-06-30   61574.0  1.684690

deciles = pd.qcut(df2['be/me'], 10, labels=False, duplicates = 'drop', retbins=True)

deciles
(0    5
 1    1
 2    3
 3    7
 4    4
 5    2
 6    6
 7    8
 8    0
 9    9
 Name: be/me, dtype: int64,
 array([0.051387 , 0.0534327, 0.0687592, 0.0865165, 0.1783528, 0.444633 ,
        0.8206488, 1.1524233, 1.3991806, 1.6329085, 1.68469  ]))

df.loc[:,'deciles'] = np.digitize(df['be/me'],deciles[1])-1

df
        date  stock_od  be/me  deciles
0 2000-01-31    1004.0    0.3        4
1 2000-02-29    1004.0    0.7        5
2 2000-03-31    1004.0    1.2        7
3 2000-04-30    1004.0    2.3       10
4 2000-05-31    1004.0    0.9        6
5 2020-12-31    3900.0    1.7       10
6 2020-12-31    3900.0    2.8       10
7 2020-12-31    3900.0    3.0       10
8 2020-12-31    3900.0    0.2        4
9 2020-12-31    3900.0    2.1       10

With the argument retbins=True to pd.qcut() you get a tuple where the second item is an array containing the bin (here decile) edges. You then apply the very handy numpy function np.digitize() (https://numpy.org/doc/stable/reference/generated/numpy.digitize.html) to the df column be/me, which gives you for each value which bin (decile) it belongs to.

Note: I added a -1 because the numpy function np.digitize() returns the next decile compared to what pd.qcut() delivered. Probably because np.digitize() reserves deciles 0 and 10 for values that fall outside the lower and higher bin edges, respectively.

Sign up to request clarification or add additional context in comments.

1 Comment

Image
Thank you so much for your (very detailed) answer! Case 2 was indeed the one I was referring to. Cheers!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.