Is there a python function to get `value_counts()` for pandas dataframe column with list?

Question

I have a dataframe with column list

import pandas as pd

data_dict = {"Trace" : [["A-M", "B&M", "B&Q", "BLOG", "BYPAS", "CIM"],
                        ["B&M", "B&Q", "BLOG", "BYPAS"], 
                        ["BLOG", "BYPAS", "CIM"], 
                        ["A-M", "B&M", "B&Q", "BLOG"],
                        ["A-M", "B&M", "B&Q", "BLOG", "BYPAS", "CIM"],
                        ["A-M", "B&M", "B&Q", "BLOG", "BYPAS", "CIM"],
                        ["BLOG", "BYPAS", "CIM"],
                        ["BLOG", "BYPAS", "CIM"],
                        ["BLOG", "BYPAS", "CIM"]]}

data = pd.DataFrame(data_dict)

    Trace
0   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]
1   [B&M, B&Q, BLOG, BYPAS]
2   [BLOG, BYPAS, CIM]
3   [A-M, B&M, B&Q, BLOG]
4   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]
5   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]
6   [BLOG, BYPAS, CIM]
7   [BLOG, BYPAS, CIM]
8   [BLOG, BYPAS, CIM]

Is there a way to get the unique count of lists in the column, like value_counts(normalize=True) for hashable values in pandas?


                            Trace         Count    Percentage  
0   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]   
1   [B&M, B&Q, BLOG, BYPAS] 
2   [BLOG, BYPAS, CIM]  
3   [A-M, B&M, B&Q, BLOG]

df['Trace'].apply(tuple).value_counts() should do it. You have to make your list into tuple which is immutable and hashable. — Ch3steR
– Ch3steR, Commented Aug 1, 2021 at 11:42

Anurag Dabas · Accepted Answer · 2021-08-01 11:56:11Z

1

As mention in comments by @Ch3ster you can use:

out=data['Trace'].map(tuple).value_counts().rename_axis(index='Trace').reset_index(name='Count')
out=out.assign(Trace=out['Trace'].map(list),Percentage=out['Count']/out['Count'].sum())

output of out:

    Trace                               Count   Percentage
0   [BLOG, BYPAS, CIM]                  4       0.444444
1   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]   3       0.333333
2   [B&M, B&Q, BLOG, BYPAS]             1       0.111111
3   [A-M, B&M, B&Q, BLOG]               1       0.111111

answered Aug 1, 2021 at 11:56

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ailurophile · Accepted Answer · 2021-08-02 04:18:37Z

0

As mention in comments by @Ch3ster

Using apply, value_counts() and pd.concat

data['Trace'].apply(tuple)


out = pd.concat([data['Trace'].value_counts(), 
                data['Trace'].value_counts(normalize=True).mul(100)],axis=1, keys=('Counts','Percentage'))

edited Aug 2, 2021 at 4:18

answered Aug 1, 2021 at 12:01

Ailurophile

3,02510 gold badges33 silver badges65 bronze badges

Comments

Kum_R · Accepted Answer · 2021-08-02 04:23:37Z

0

Try this

df['Trace'].apply(tuple).value_counts()
or
df['Trace'].apply(tuple).value_counts(normalize=True)

answered Aug 2, 2021 at 4:23

Kum_R

3884 silver badges20 bronze badges

Collectives™ on Stack Overflow

Is there a python function to get `value_counts()` for pandas dataframe column with list?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related