2

I have JSON of the form:

{
    "abc":
      {
        "123":[45600,null,3567],
        "378":[78689,2345,5678],
        "343":[23456,null,null]
      }
    }

I have got json data from a url using below way:

json_data = json.loads(url.read().decode())

I need to convert it into Python Pandas Dataframe as below:

ds    y_ds1  y_ds2  y_ds2
123   45600  null   3567
378   78689  2345   5678
343   23456  null   null

I'm trying to do this way :

df = pd.read_json(url,orient='columns')

It gives result in following form:

          abc
123      [45600,null,3567]
378      [78689,2345,5678]
343      [23456,null,null]

Any way by which I can simply split my second column of array into no. of columns present in the array and rename the headings as mentioned above?

Edits: According to the given json, there are 3 elements in all the arrays; what if the array size is 2 or 4 or 5 for all the items. Note: JSON will have all keys with similar sized arrays. But size of array can be anything.

0

1 Answer 1

2

Does this work for you?

import pandas as pd
import numpy as np

null = np.nan

my_json = {
    "abc":
      {
        "123":[45600,null,3567],
        "378":[78689,2345,5678],
        "343":[23456,null,null]
      }
    }

pd.DataFrame(my_json.get('abc')).T.reset_index().rename(columns={'index':'ds',0:'y_ds1',1:'y_ds2',2:'y_ds3'})

    ds    y_ds1   y_ds2   y_ds2
0  123  45600.0     NaN  3567.0
1  343  23456.0     NaN     NaN
2  378  78689.0  2345.0  5678.0

If the index column can remain the ds column than you can do this:

pd.DataFrame(my_json.get('abc')).T.rename(columns=(lambda x: 'y_ds' + str(x)))

       y_ds0   y_ds1   y_ds2
123  45600.0     NaN  3567.0
343  23456.0     NaN     NaN
378  78689.0  2345.0  5678.0

Edit: Given the DF you presented in your edit you can convert it as so:

temp = df['abc'].apply(lambda x: pd.Series(x)).rename(columns=(lambda x: :'y_ds'+str(x)))

temp

       y_ds0   y_ds1   y_ds2
123  45600.0     NaN  3567.0
378  78689.0  2345.0  5678.0
343  23456.0     NaN     NaN
Sign up to request clarification or add additional context in comments.

7 Comments

please check the edit, if you can suggest something.
I think your last one will work, but index column with values 123,378 etc should be included with heading ds
just do temp.index.name = 'ds'
What I meant is that index column should be considered as part of dataframe with heading ds. Even if I do "temp.index.name = 'ds'"; no. of columns in dataframe is 3 not 4. It should be 4 columns.
ok just do temp.reset_index() after that and you're all set.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.