How to add columns to pandas df where grouped values are counted for ordered plotting of categories

Question

I build a plotly dashboard in python that displays multiple variables over time. One of the variables is here called "color" and I would like to sort the resulting plot by it.

import pandas as pd
import plotly.express as px
import string
import random
import numpy as np

# for the color mapping
color_dict = {"Colors": {
        "green": "rgb(0, 255, 0)",
        "black": "rgb(0, 0, 0)",  
        "red": "rgb(255, 0, 0)"
    }}


# creating the df
random.seed(30)
letters = list(string.ascii_lowercase)[0:20]

data = {"letters":letters}       
df = pd.DataFrame(data)
df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns) # repeat each row 2 times
df['attempt'] = np.where(df.index%2==0, 1, 2)  # adds either 1 or 2 in column "attempts"
lst = ['2022_10_10', '2022_10_11', '2022_10_12']
N = len(df)
df["date"] = pd.Series(np.tile(lst, N//len(lst))).iloc[:N] # add date column with 3 repeating dates
df["colors"] = random.choices(["green", "black", "red"], k=len(df)) # add randomly the colors
df.head()


#letters attempt date   colors
#0  a   1   2022_10_10  black
#1  a   2   2022_10_11  green
#2  a   1   2022_10_12  green
#3  b   2   2022_10_10  black
#4  b   1   2022_10_11  green


# the plot
fig = px.scatter(
        df,
        x="date",
        y="letters",
        symbol="attempt",
        opacity=0.8,
        color="colors",
        color_discrete_map=color_dict["Colors"],
        width=1000,
        height=800,
    )
fig.update_layout(
    yaxis={
            "type": "category",
            "showgrid": False,
        },
    xaxis={
            "type": "category",
            "showgrid": False,
        },
)
fig

However, as the original df (I assume?) undergoes some groupby etc for the plotting, my pre-plot sorting (I tried sort_values, custom sorting functions etc) seems to have no affect. Therefore, I would like to create additional columns "black", "green", "red", that hold the count of how often black / green / red appeared e.g. on row "a".

df["black"] = ""
df["red"] = ""
df["green"] = ""


#letters attempt date   colors  black   red green
#0  a   1   2022_10_10  black       1   0   2
#1  a   2   2022_10_11  green       1   0   2
#2  a   1   2022_10_12  green       1   0   2
#3  b   2   2022_10_10  black           
#4  b   1   2022_10_11  green

So my questions are: a) how to get color count values into there columns? b) how to use the values of the column "green", then "red", then "black" for sorting the order of the y axis of the plot?

thanks!

EDIT: SORRY, I know this is a complicated task. But I'm looking solely for a way to sort/order the full rows on the y axis. The order WITHIN a row (e.g. row "a") has to be maintained.

EDIT2: I attached a (terribly manufactured) image of the result I'm looking for:

Expected result:

EDIT post answer: in case someone wants to perform this type of row sorting, but only take the latest date (last column on x axis) into account, you can adjust the count status function like this:

color_counts = (
    df[df["date"] == df["date"].max()]
    .groupby('letters')['colors']
    .value_counts()
    .unstack(fill_value=0)
)

What does this visualize? I am not sure, even after carefully reading the description. Is there an expected output? Also, y-axis sorting is not possible since there are 3 categorical variables on the x-axis. It is possible to make a subplot. — r-beginners
– r-beginners, Commented Aug 9, 2022 at 10:05
I would like to have the y axis sorted by the amount of black (or red or green) dots per row. That is why I came up with first calculating this amount and then using this column to sort. — crazysantaclaus
– crazysantaclaus, Commented Aug 9, 2022 at 11:29
"to sort/order the full rows on the y axis" What do you mean? Could you explain, for example, what erroneous assumptions were made in the answers provided? — Vitalizzare
– Vitalizzare, Commented Aug 10, 2022 at 9:38
@Vitalizzare: I was on the way to try your answer, as it looks quite good. But before that I wanted to add some more information, so people don't waste their time — crazysantaclaus
– crazysantaclaus, Commented Aug 10, 2022 at 10:00

Vitalizzare · Accepted Answer · 2022-08-10 10:58:16Z

How to get color count values into their columns

Let's add the number of colors per letter:

color_counts = (
    df.groupby('letters')['colors']
    .value_counts()
    .unstack(fill_value=0)
)
df = df.merge(color_counts, on='letters')

The first 5 records of the modified data frame:

  letters  attempt        date colors  black  green  red
0       a        1  2022_10_10  black      1      2    0
1       a        2  2022_10_11  green      1      2    0
2       a        1  2022_10_12  green      1      2    0
3       b        2  2022_10_10  black      1      2    0
4       b        1  2022_10_11  green      1      2    0

In order to add the black, green, red counts to the plot, we have to include hover_data to px.scatter like this:

fig = px.scatter(
    ...
    hover_data=['black','red','green'],
    ...
)

How to use color per letter counts for sorting letters on the plot

To sort letters along the y-axis, we can use the category_orders parameter of px.scatter this way:

color_counts = (
    df.groupby('letters')['colors']
    .value_counts()
    .unstack(fill_value=0)
)

def sort_letters(color):
    return color_counts[color].sort_values().index

desired_color = 'black'
fig = px.scatter(
    ...
    category_orders={'letters': sort_letters(desired_color)},
    ...
)

Here we pass the exact order of the letters, how they should be positioned along the axis.

Code and Plot

import pandas as pd
import numpy as np
import plotly.express as px
from string import ascii_lowercase
import random

# Color mapping
color_map = {
    "green": "YellowGreen",
    "black": "Black",  
    "red": "Crimson"
}

# Mocking data
random.seed(30)
num_letters = 20
letters = [*ascii_lowercase][:num_letters]
dates = ['2022_10_10', '2022_10_11', '2022_10_12']
df = pd.DataFrame(np.repeat(letters, len(dates)), columns=['letters']) 
df['attempt'] = df.index % 2 + 1
df['date'] = np.tile(dates, num_letters)
df['colors'] = random.choices([*color_map], k=len(df)) 

color_counts = (
    df.groupby('letters')['colors']
    .value_counts()
    .unstack(fill_value=0)
)
df = df.merge(color_counts, on='letters')
#display(df.sample(5))

def sort_letters(color):
    return color_counts[color].sort_values().index

# The plot
desired_color = 'black'

fig = px.scatter(
    df,
    x="date",
    y="letters",
    category_orders={'letters':sort_letters(desired_color)},
    color="colors",    
    color_discrete_map=color_map,
    symbol='attempt',
    symbol_map={1:'circle-open',2:'circle'},
    hover_data=[*color_map],
    opacity=0.8,
    width=800,
    height=800,
    title=f"Sorted by {desired_color} counts"
)

fig.update_layout(
    yaxis={
        "type": "category",
        "showgrid": False,
    },
    xaxis={
        "type": "category",
        "showgrid": False,
    },
)

fig

Update: sort letters by color counts in lexicographic order

To sort by color counts, as in lexicographic order, we can use the code from the original post with this addition at the end:

color_counts = (
    df.groupby('letters')['colors']
    .value_counts()
    .unstack(fill_value=0)
)

fig.update_yaxes(
    categoryorder="array", 
    categoryarray=color_counts.sort_values(by=['green','red','black']).index,
)

Here's the output:

Collectives™ on Stack Overflow

How to add columns to pandas df where grouped values are counted for ordered plotting of categories

1 Answer 1

How to get color count values into their columns

How to use color per letter counts for sorting letters on the plot

Code and Plot

Update: sort letters by color counts in lexicographic order

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

How to get color count values into their columns

How to use color per letter counts for sorting letters on the plot

Code and Plot

Update: sort letters by color counts in lexicographic order

Comments

Your Answer

Sign up or log in

Post as a guest

Related