8

I am trying to assign the output from a value_count to a new df. My code follows.

import pandas as pd
import glob


df = pd.concat((pd.read_csv(f, names=['date','bill_id','sponsor_id']) for f in glob.glob('/home/jayaramdas/anaconda3/df/s11?_s_b')))


column_list = ['date', 'bill_id']

df = df.set_index(column_list, drop = True)
df = df['sponsor_id'].value_counts()

df.columns=['sponsor', 'num_bills']
print (df)

The value count is not being assigned the column headers specified 'sponsor', 'num_bills'. I'm getting the following output from print.head

1036    426
791     408
1332    401
1828    388
136     335
Name: sponsor_id, dtype: int64
4
  • What is your output for print (df) ? Commented Mar 9, 2016 at 13:39
  • df = df['sponsor_id'].value_counts() didn't you drop sponsor_id? Commented Mar 9, 2016 at 13:40
  • @ Anton: I just edited to show my output. Commented Mar 9, 2016 at 13:42
  • 1
    value_counts produces a Series so there is only a single column, you need to reset_index and then overwrite the columns, see my answer Commented Mar 9, 2016 at 13:43

2 Answers 2

12

your column length doesn't match, you read 3 columns from the csv and then set the index to 2 of them, you calculated value_counts which produces a Series with the column values as the index and the value_counts as the values, you need to reset_index and then overwrite the column names:

df = df.reset_index()
df.columns=['sponsor', 'num_bills']

Example:

In [276]:
df = pd.DataFrame({'col_name':['a','a','a','b','b']})
df

Out[276]:
  col_name
0        a
1        a
2        a
3        b
4        b

In [277]:
df['col_name'].value_counts()

Out[277]:
a    3
b    2
Name: col_name, dtype: int64

In [278]:    
type(df['col_name'].value_counts())

Out[278]:
pandas.core.series.Series

In [279]:
df = df['col_name'].value_counts().reset_index()
df.columns = ['col_name', 'count']
df

Out[279]:
  col_name  count
0        a      3
1        b      2
Sign up to request clarification or add additional context in comments.

Comments

0

Appending value_counts() to multi-column dataframe:

df = pd.DataFrame({'C1':['A','B','A'],'C2':['A','B','A']})
vc_df = df.value_counts().to_frame('Count').reset_index()
display(df, vc_df)

    C1  C2
0   A   A
1   B   B
2   A   A

    C1   C2 Count
0   A   A   2
1   B   B   1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.