Using pandas and pyplot to group on multiple columns, get the value counts, and plot this information

Question

I am analyzing some data runs from an Agent Based Model that (TL;DR) simulates the life cycle of a species to predict survival rates given certain input parameters. I am struggling with how to use pandas and pyplot to accomplish this, and would love some suggestions. I have a csv that looks like this;

"run","day","Lifestate","Lat","Long","habitat_sample"
1, 1.0,"adult",0.0,0.0,0
1, 1.0,"adult",0.0,0.0,0
1, 1.0,"larva",0.0,0.0,0
1, 2.0,"adult",0.0,0.0,0
1, 2.0,"nymph",0.0,0.0,0
1, 2.0,"nymph",0.0,0.0,0
1, 2.0,"nymph",0.0,0.0,0
1, 3.0,"nymph",0.0,0.0,0
1, 3.0,"nymph",0.0,0.0,0
1, 3.0,"nymph",0.0,0.0,0
1, 4.0,"nymph",0.0,0.0,0
1, 4.0,"nymph",0.0,0.0,0
1, 4.0,"nymph",0.0,0.0,0
1, 4.0,"nymph",0.0,0.0,0
2, 1.0,"adult",0.0,0.0,0
2, 1.0,"adult",0.0,0.0,0
2, 1.0,"adult",0.0,0.0,0
3, 1.0,"nymph",0.0,0.0,0
3, 1.0,"nymph",0.0,0.0,0
3, 2.0,"larva",0.0,0.0,0
3, 2.0,"larva",0.0,0.0,0

What I need to do is plot the survival rate of the different lifestages over time for each run. In other words, for each run, I need to plot the number of adults, larva, and nymphs present on each day. So on day 1, there was 3 adults, 1 nymph, 2 larva. Day 2 there was 2 adults, 2 nymph, 6 larva, etc. I'd like to wind up with something like this (apologies for the crap sketch):

I am very new to pandas and struggling to wrap my head around all of the different techniques available to me. I can't figure out how to break down and plot the 'Lifestate' column by the number of adults/nymphs/larva per day. I've tried Grouping by run/tick and getting value_counts() for the lifestate column, tried grouping by just run and extracting the number of individuals per lifestage, etc. I can get the numbers I want, but I can't get them in a way that I can plot them. It doesn't make sense to plot days vs. value_counts, since these wind up being different dimensions, right? I feel like my iterative approaches are inefficient and my instinct is telling me this is not the right approach. An example of one of the many things I have tried;

grouped = data.groupby(['run','tick'])

for name, group in grouped_data:
    valcounts = group['Lifestate'].value_counts()

This does get me the numbers I need, but then I am unsure how to plot them. Another concern, is looping like this going to be slow once I start using my actual (large) data sets?

My current idea is to try and extract the data I want and create a new data frame for each run. I'm thinking I want something like this for each run;

"day","num_adults","num_nymphs", "num_larva"
1, 2, 4, 6
2, 1, 3, 5
3, 1, 3, 5
4, 1, 2, 4

and so on. Does this sound like the right way to approach this problem? What am I missing/not thinking of? And advice on logic or design would be greatly appreciated. Thanks.

Diziet Asahi · Accepted Answer · 2020-02-13 21:49:51Z

I wasn't sure what you wanted to do with the "runs" in your example. If you need to consider each run separately, here is my take on it:

mix = pd.MultiIndex.from_product([df['run'].unique(), df['day'].unique(), df['Lifestate'].unique()], names=['run','day','Lifestate'])
new = df.groupby(['run','day','Lifestate']).size().reindex(mix, fill_value=0).unstack().reset_index()

the new dataframe new looks like this:

Lifestate  run  day  adult  larva  nymph
0            1  1.0      2      1      0
1            1  2.0      1      0      3
2            1  3.0      0      0      3
3            1  4.0      0      0      4
4            2  1.0      3      0      0

Then it's pretty trivial to plot each run individually:

# create one subplot by "run"
runs = new.groupby('run')
fig, axs = plt.subplots(len(runs), 1, sharex=True, sharey=True, constrained_layout=True)
for ax,(g,temp) in zip(axs,runs):
    temp.plot(x='day', y=['nymph','larva','adult'], ax=ax, legend=ax.is_first_row())
    ax.set_title("run #{:d}".format(g))

Collectives™ on Stack Overflow

Using pandas and pyplot to group on multiple columns, get the value counts, and plot this information

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related