I am working with two dataframes:
dfcontains a columnbe/mefor stocks for a 20-year period (on a monthly basis).df2, a subset ofdf(with only certain stocks, only for June) contains the columndecile, created via thepd.qcut()method for every year in the 20-year period based on an altered version ofdf'sbe/me.
Considering the deciles that I created in df2, I wonder if it's possible to rank df's be/me based on df2's decile column. In other words, I wonder if it's possible to assign df's be/me values to the deciles created in df2.
Please see dataframes below for a better understanding of the issue:
df
date stock_id be/me
2000-01-31 1004.0 0.3
2000-02-29 1004.0 0.7
2000-03-31 1004.0 1.2
2000-04-30 1004.0 2.3
2000-05-31 1004.0 0.9
... ... ...
2020-12-31 3900.0 1.7
2020-12-31 3900.0 2.8
2020-12-31 3900.0 3.0
2020-12-31 3900.0 0.2
2020-12-31 3900.0 2.1
1218855 rows × 3 columns
df2['deciles'] = df2.groupby('date')['be/me'].transform(lambda x: pd.qcut(x, 10, labels=False, duplicates = 'drop'))
df2
date stock_id be/me deciles
2000-06-30 2061.0 0.653684 5
2000-06-30 4383.0 0.053660 2
2000-06-30 13561.0 0.092509 2
2000-06-30 4065.0 1.342187 6
2000-06-30 2731.0 0.235582 3
... ... ... ...
2020-06-30 7022.0 0.072534 2
2020-06-30 30990.0 1.071096 6
2020-06-30 22867.0 1.627155 6
2020-06-30 15247.0 0.051387 2
2020-06-30 61574.0 1.684690 6
24095 rows × 4 columns
Note: date is of type datetime and, for each date, there are multiple stocks (stock_id).
Thank you so much for your time.
EDIT
What I want to do is to check in which df2-created decile the original be/me values (from the original dataframe df) fit. The expected output should be a new column in df with df2-created deciles attributed to each and every be/me value in df.
Please let me know if there is any additional clarification necessary.
I created a function that loops through the deciles to fetch the maximum decile value for every date in df2. Not sure if I am heading in the right direction since the output is an array with no date... take a look below:
In: def attribution(deciles,dates):
deciles = df2['deciles'].unique()
dates = df2.index.unique()
body_max = []
body_min = []
for x in deciles:
for y in dates:
body_max.append(df2[df2['deciles'] == x].loc[y]['be/me'].max())
body_min.append(df2[df2['deciles'] == x].loc[y]['be/me'].min())
return body_max, body_min
In: attribution(deciles, dates)
Out: [0.9343106070197438,
1.2747264875802489,
1.9700461181925901,
0.7888946814157697,
0.9304702071896337,
0.9651423313922733,
0.7238677612487585,
1.0358317574924074,
...]