I have the sample dataframe below:
d = {'key': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'crow', 'crow', 'crow', 'crow'],
'count': [12, 3, 5, 5, 3, 1, 4, 1, 7, 3, 8, 2],
'text': ["hello", "i", "am", "a", "piece", "of", "text", "have", "a", "nice", "day", "friends"],
}
}
df = pd.DataFrame(data=d)
df
output:
key count text
0 foo 12 hello
1 foo 3 i
2 foo 5 am
3 foo 5 a
4 bar 3 piece
5 bar 1 of
6 bar 4 text
7 bar 1 have
8 crow 7 a
9 crow 3 nice
10 crow 8 day
11 crow 2 friends
I stacked the dataframe with:
df.set_index("key").stack()
To get:
key
foo count 12
text hello
count 3
text i
count 5
text am
count 5
text a
bar count 3
text piece
count 1
text of
count 4
text text
count 1
text have
crow count 7
text a
count 3
text nice
count 8
text day
count 2
text friends
dtype: object
I am now trying to output the stacked df as a JSON file, but when I use to_json(), I get the error:
ValueError: Series index must be unique for orient='index'
The expect output would text and count grouped by the key:
[
{
"key": "19",
"values": [
{
text: 'hello',
count: 12
},
{
content: 'i',
count: 3
},
{
content: 'am',
count: 5
},
...
]
]