0

I have a large dataframe. i am trying to plot sales for 2 different years in the same plots as line graph to show the variation across 2 years each month. There is a long series of grouping and filtering i have done before getting the below dataframe. Dataframe has 3 columns (month, sales and the year)

When I am trying to plot the sales across the different years as :

ggplot(df,aes(x=month.sales,y=sales/100000,color=year)) + 
  geom_line()

I am getting a blank graph with x and y labels , while if I plot a column graph, it works. Please help. thank you

3
  • 1
    It probably has to do with the class of your columns (my guess is that month.sales is a factor in your dataframe). Commented Sep 1, 2020 at 8:16
  • Why is this question tagged python? Can you post sample data? Please edit the question with the output of dput(df). Or, if it is too big with the output of dput(head(df, 20)). Commented Sep 1, 2020 at 8:18
  • @maarvd... Thank you, I checked , yes the month.sales is a factored column.How do I go further to get the desired plot ?. I am a complete beginner ....please guide Commented Sep 1, 2020 at 8:32

1 Answer 1

1

I'm guessing your data looks something like this:

set.seed(69)

df <- data.frame(month.sales = factor(rep(month.abb, 2), month.abb),
                 year = rep(2018:2019, each = 12),
                 sales = runif(24, 1, 2) * 100000)

df
#>    month.sales year    sales
#> 1          Jan 2018 114570.1
#> 2          Feb 2018 123197.1
#> 3          Mar 2018 166092.7
#> 4          Apr 2018 163214.1
#> 5          May 2018 109486.6
#> 6          Jun 2018 131429.8
#> 7          Jul 2018 167363.6
#> 8          Aug 2018 191097.6
#> 9          Sep 2018 127427.4
#> 10         Oct 2018 145360.1
#> 11         Nov 2018 134577.1
#> 12         Dec 2018 169486.6
#> 13         Jan 2019 168493.2
#> 14         Feb 2019 147552.5
#> 15         Mar 2019 139811.3
#> 16         Apr 2019 156351.2
#> 17         May 2019 199368.3
#> 18         Jun 2019 130953.6
#> 19         Jul 2019 148150.5
#> 20         Aug 2019 166307.3
#> 21         Sep 2019 121830.8
#> 22         Oct 2019 101838.1
#> 23         Nov 2019 109716.9
#> 24         Dec 2019 125407.9

In which case you can draw a line plot like this:

library(ggplot2)

ggplot(df, aes(x = month.sales, y = sales / 100000, 
               color = factor(year), group = factor(year))) + 
  geom_line()

enter image description here

Note that you need to add the group aesthetic so that ggplot doesn't automatically group your data points according to the factor levels on the x axis.

Sign up to request clarification or add additional context in comments.

2 Comments

Yeah, the data looks like this.I tried this code and it works in a way. However, for 2018 the sales record starts from around September. The plot I am getting has months labeled randomly on the x-axis and also a label as NA. What could be the reason for this ?.I checked the month.due column and it has a lot of NA entries after this ggplot operation...
@SurbhiMishra I have no idea what the month.due column is because you didn't include any data in your question. The ggplot operation does not affect your data frame at all, so it will not introduce any NA values into your data frame that are not already there. I do not know what format your months column is in (is it numbers or words?) but the order is likely to be alphabetical, so you need to ensure the levels are ordered correctly to get them in the correct order. I can't tell you how to do that if you don't edit your question to include your data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.