5

I have the sample dataframe below:

d = {'key': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'crow', 'crow', 'crow', 'crow'], 
     'count': [12, 3, 5, 5, 3, 1, 4, 1, 7, 3, 8, 2],
     'text': ["hello", "i", "am", "a", "piece", "of", "text", "have", "a", "nice", "day", "friends"],
}

}
df = pd.DataFrame(data=d)
df   

output:

    key count   text
0   foo    12   hello
1   foo     3   i
2   foo     5   am
3   foo     5   a
4   bar     3   piece
5   bar     1   of
6   bar     4   text
7   bar     1   have
8   crow    7   a
9   crow    3   nice
10  crow    8   day
11  crow    2   friends

I stacked the dataframe with: df.set_index("key").stack()

To get:

key        
foo   count         12
      text       hello
      count          3
      text           i
      count          5
      text          am
      count          5
      text           a
bar   count          3
      text       piece
      count          1
      text          of
      count          4
      text        text
      count          1
      text        have
crow  count          7
      text           a
      count          3
      text        nice
      count          8
      text         day
      count          2
      text     friends
dtype: object

I am now trying to output the stacked df as a JSON file, but when I use to_json(), I get the error:

ValueError: Series index must be unique for orient='index'

The expect output would text and count grouped by the key:

[
  {
    "key": "19",
    "values": [
        {
            text: 'hello',
            count: 12
        },
        {
            content: 'i',
            count: 3
        },
        {
            content: 'am',
            count: 5
        },
        ...
    ]
]
2
  • Your expected output is not a valid JSON object. Commented Jan 4, 2021 at 17:55
  • Apologies, I modified my post to include "some_key" as "values" Commented Jan 4, 2021 at 18:01

1 Answer 1

3

As mentioned in the comment, your expected output is not a valid JSON string. You need "some_key":[...] at the same level with "key":"bar".

For example groupby:

json_str = json.dumps([ {'key':k, 'values':d.to_dict('records')}
                       for k,d in df.drop('key',axis=1).groupby(df['key'])
                      ], indent=2)

Output:

[
  {
    "key": "bar",
    "values": [
      {
        "count": 3,
        "text": "piece"
      },
      {
        "count": 1,
        "text": "of"
      },
      {
        "count": 4,
        "text": "text"
      },
      {
        "count": 1,
        "text": "have"
      }
    ]
  },
  {
    "key": "crow",
    "values": [
      {
        "count": 7,
        "text": "a"
      },
      {
        "count": 3,
        "text": "nice"
      },
      {
        "count": 8,
        "text": "day"
      },
      {
        "count": 2,
        "text": "friends"
      }
    ]
  },
  {
    "key": "foo",
    "values": [
      {
        "count": 12,
        "text": "hello"
      },
      {
        "count": 3,
        "text": "i"
      },
      {
        "count": 5,
        "text": "am"
      },
      {
        "count": 5,
        "text": "a"
      }
    ]
  }
]
Sign up to request clarification or add additional context in comments.

2 Comments

This worked perfectly. Question: why did you drop the key column in the for loop?
@mehsheenman otherwise the key will still present in each record inside values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.