0

I have a dataframe with column list

import pandas as pd

data_dict = {"Trace" : [["A-M", "B&M", "B&Q", "BLOG", "BYPAS", "CIM"],
                        ["B&M", "B&Q", "BLOG", "BYPAS"], 
                        ["BLOG", "BYPAS", "CIM"], 
                        ["A-M", "B&M", "B&Q", "BLOG"],
                        ["A-M", "B&M", "B&Q", "BLOG", "BYPAS", "CIM"],
                        ["A-M", "B&M", "B&Q", "BLOG", "BYPAS", "CIM"],
                        ["BLOG", "BYPAS", "CIM"],
                        ["BLOG", "BYPAS", "CIM"],
                        ["BLOG", "BYPAS", "CIM"]]}

data = pd.DataFrame(data_dict)
    Trace
0   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]
1   [B&M, B&Q, BLOG, BYPAS]
2   [BLOG, BYPAS, CIM]
3   [A-M, B&M, B&Q, BLOG]
4   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]
5   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]
6   [BLOG, BYPAS, CIM]
7   [BLOG, BYPAS, CIM]
8   [BLOG, BYPAS, CIM]

Is there a way to get the unique count of lists in the column, like value_counts(normalize=True) for hashable values in pandas?


                            Trace         Count    Percentage  
0   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]   
1   [B&M, B&Q, BLOG, BYPAS] 
2   [BLOG, BYPAS, CIM]  
3   [A-M, B&M, B&Q, BLOG]   
1
  • 2
    df['Trace'].apply(tuple).value_counts() should do it. You have to make your list into tuple which is immutable and hashable. Commented Aug 1, 2021 at 11:42

3 Answers 3

1

As mention in comments by @Ch3ster you can use:

out=data['Trace'].map(tuple).value_counts().rename_axis(index='Trace').reset_index(name='Count')
out=out.assign(Trace=out['Trace'].map(list),Percentage=out['Count']/out['Count'].sum())

output of out:

    Trace                               Count   Percentage
0   [BLOG, BYPAS, CIM]                  4       0.444444
1   [A-M, B&M, B&Q, BLOG, BYPAS, CIM]   3       0.333333
2   [B&M, B&Q, BLOG, BYPAS]             1       0.111111
3   [A-M, B&M, B&Q, BLOG]               1       0.111111
Sign up to request clarification or add additional context in comments.

Comments

0

As mention in comments by @Ch3ster

Using apply, value_counts() and pd.concat

data['Trace'].apply(tuple)


out = pd.concat([data['Trace'].value_counts(), 
                data['Trace'].value_counts(normalize=True).mul(100)],axis=1, keys=('Counts','Percentage'))

Comments

0

Try this

df['Trace'].apply(tuple).value_counts()
or
df['Trace'].apply(tuple).value_counts(normalize=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.