166

Hi I want to get the counts of unique values of the dataframe. count_values implements this however I want to use its output somewhere else. How can I convert .count_values output to a pandas dataframe. here is an example code:

import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)
print(value_counts)
print(type(value_counts))

output is:

2    3
1    2
Name: a, dtype: int64
<class 'pandas.core.series.Series'>

What I need is a dataframe like this:

unique_values  counts
2              3
1              2

Thank you.

5 Answers 5

243

Use rename_axis for name of column from index and reset_index:

df = df.value_counts().rename_axis('unique_values').reset_index(name='counts')
print (df)
   unique_values  counts
0              2       3
1              1       2

Or if need one column DataFrame use Series.to_frame:

df = df.value_counts().rename_axis('unique_values').to_frame('counts')
print (df)
               counts
unique_values        
2                   3
1                   2
Sign up to request clarification or add additional context in comments.

3 Comments

For anyone who wants to have unique_values as it's own column (and not the index), simply append .reset_index(level=0, inplace=True) to the first df = ... expression above.
Does not work when value_counts is passed a list of columns
df.value_counts().reset_index() implicitly converts the value counts to a dataframe.
42

I just run into the same problem, so I provide my thoughts here.

Warning

When you deal with the data structure of Pandas, you have to aware of the return type.

Another solution here

Like @jezrael mentioned before, Pandas do provide API pd.Series.to_frame.

Step 1

You can also wrap the pd.Series to pd.DataFrame by just doing

df_val_counts = pd.DataFrame(value_counts) # wrap pd.Series to pd.DataFrame

Then, you have a pd.DataFrame with column name 'a', and your first column become the index

Input:  print(df_value_counts.index.values)
Output: [2 1]

Input:  print(df_value_counts.columns)
Output: Index(['a'], dtype='object')

Step 2

What now?

If you want to add new column names here, as a pd.DataFrame, you can simply reset the index by the API of reset_index().

And then, change the column name by a list by API df.coloumns

df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts']

Then, you got what you need

Output:

       unique_values    counts
    0              2         3
    1              1         2

Full Answer here

import pandas as pd

df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)

# solution here
df_val_counts = pd.DataFrame(value_counts)
df_value_counts_reset = df_val_counts.reset_index()
df_value_counts_reset.columns = ['unique_values', 'counts'] # change column names

1 Comment

side note: dropna parameter was introduced July, 2021
9

I'll throw in my hat as well, essentially the same as @wy-hsu solution, but in function format:

def value_counts_df(df, col):
    """
    Returns pd.value_counts() as a DataFrame

    Parameters
    ----------
    df : Pandas Dataframe
        Dataframe on which to run value_counts(), must have column `col`.
    col : str
        Name of column in `df` for which to generate counts

    Returns
    -------
    Pandas Dataframe
        Returned dataframe will have a single column named "count" which contains the count_values()
        for each unique value of df[col]. The index name of this dataframe is `col`.

    Example
    -------
    >>> value_counts_df(pd.DataFrame({'a':[1, 1, 2, 2, 2]}), 'a')
       count
    a
    2      3
    1      2
    """
    df = pd.DataFrame(df[col].value_counts())
    df.index.name = col
    df.columns = ['count']
    return df

Comments

1
pd.DataFrame(
    df.groupby(['groupby_col'])['column_to_perform_value_count'].value_counts()
).rename(
    columns={'old_column_name': 'new_column_name'}
).reset_index()

1 Comment

Consider adding more context/information to your code. It also look similar to other solution, though more condensed.
0

Example of selecting a subset of columns from a dataframe, grouping, applying value_count per group, name value_count column as Count, and displaying first n groups.

# Select 5 columns (A..E) from a dataframe (data_df).
# Sort on A,B. groupby B. Display first 3 groups.
df = data_df[['A','B','C','D','E']].sort_values(['A','B'])
g = df.groupby(['B'])
for n,(k,gg) in enumerate(list(g)[:3]): # display first 3 groups
    display(k,gg.value_counts().to_frame('Count').reset_index())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.