My knowledge of packages such as pandas is fairly shallow, and I've been looking for a solution to flatten data into rows. With a dict like this, with a surrogate key called entry_id:
data = [
{
"id": 1,
"entry_id": 123,
"type": "ticker",
"value": "IBM"
},
{
"id": 2,
"entry_id": 123,
"type": "company_name",
"value": "International Business Machines"
},
{
"id": 3,
"entry_id": 123,
"type": "cusip",
"value": "01234567"
},
{
"id": 4,
"entry_id": 321,
"type": "ticker",
"value": "AAPL"
},
{
"id": 5,
"entry_id": 321,
"type": "permno",
"value": "123456"
},
{
"id": 6,
"entry_id": 321,
"type": "company_name",
"value": "Apple, Inc."
},
{
"id": 7,
"entry_id": 321,
"type": "formation_date",
"value": "1976-04-01"
}
]
I would like to flatten the data into rows grouped by the surrogate key entry_id to look like this (empty strings or None values, doesn't matter):
[
{"entry_id": 123, "ticker": "IBM", "permno": "", "company_name": "International Business Machines", "cusip": "01234567", "formation_date": ""},
{"entry_id": 321, "ticker": "AAPL", "permno": "123456", "company_name": "Apple, Inc", "cusip": "", "formation_date": "1976-04-01"}
]
I've tried using DataFrame's groupby and json_normalize, but haven't been able to get the right level of sorcery for the desired result. I could walk the data in pure Python, but I'm certain that would not be a fast solution. I'm not sure how to specify that type is the column, value is the value, and entry_id is the aggregation key. I'm open to packages other than pandas as well.