I'm having a look at altair docs, and I don't understand why, in the last two groups (85 and 90), that have a very low IQR, they have similar whiskers than in the first group, where the IQR of the first group is huge. What am I missing? Why do the two last groups have such long lines?
-
$\begingroup$ It is not clear what this is trying to show - the US population pyramid has not been that uncertain over the last 20 years $\endgroup$Henry– Henry2022-06-28 10:42:28 +00:00Commented Jun 28, 2022 at 10:42
-
$\begingroup$ @Henry The plot evidently shows variation across some kind of Census unit, not uncertainty. David: one part of the answer must be that these are not standard boxplots. In a standard boxplot, no whisker has a length longer than 1.5 times the IQR (as claimed in the text below the title), but obviously that is not the case for the 85 and 90 year olds here. $\endgroup$whuber– whuber ♦2022-06-28 12:59:04 +00:00Commented Jun 28, 2022 at 12:59
-
$\begingroup$ @whuber - but big census units: the medians seem to add up to roughly 60 million (or 300 million if multiplied by the number of years in each age group - which makes me think it might be the whole USA) $\endgroup$Henry– Henry2022-06-28 13:14:13 +00:00Commented Jun 28, 2022 at 13:14
-
2$\begingroup$ @Henry That's perceptive--and a real puzzle. It turns out this plot is visual nonsense. Each boxplot displays the distribution of population by age for the 1850 through the 2000 decennial census, with sexes counted separately. The data are at cdn.jsdelivr.net/npm/[email protected]/data/population.json. $\endgroup$whuber– whuber ♦2022-06-28 14:11:50 +00:00Commented Jun 28, 2022 at 14:11
1 Answer
This is probably because the distribution is heavily skewed for higher ages. The Y-axis appears to be a count (number of people), which is by definition bounded at zero because you cannot have a negative number of people.
For the highest age categories, the minimum, first quartile and median are very close to each other because there are many entities (states?) with very few old people. But there is at least one (and probably a few) entities that do have a lot of old people, which is why the upper whisker is fairly long.
Plotting these as violin plots or (sideways) histograms would make the pattern clearer by showing the full distributions.
