school_grade | Frequency_of_Students |
---|---|
1 | 23 |
2 | 20 |
3 | 15 |
4 | 12 |
5 | 10 |
6 | 8 |
Plots and Tables
2025-09-29
Number of students per grade
school_grade | Frequency_of_Students |
---|---|
1 | 23 |
2 | 20 |
3 | 15 |
4 | 12 |
5 | 10 |
6 | 8 |
The cumulative frequency of a value is the proportion of individuals equal to or less than that value. In this case, equal or less than a given grade.
Sort data, calculate cumulative frequency…
school_grade Frequency_of_Students Cumul_Freq Rel_CFreq
1 1 23 23 0.2613636
2 2 20 43 0.4886364
3 3 15 58 0.6590909
4 4 12 70 0.7954545
5 5 10 80 0.9090909
6 6 8 88 1.0000000
example 2: Temperatures near LaGuardia (1973)
ECDF: empirical cumulative distribution function
Example 3: Spider running speed
Figure caption: Figure 3.4-1 from the textbook
Cumulative frequency distributions clearly communicate quantiles.
Histograms, density plots, & cumulative frequency plots …
can reveal the shapes of distributions
important for understanding data
and choosing a statistical approach
Histograms
Cumulative Frequency Distributions
These are more often used to look at the association between a numerical and a categorical variable:
Two or more variables …
Contingency Table
From: Whitlock & Schluter, The Analysis of Biological Data
Grouped bar plot
Figure caption: figure 2.3-1 from Whitlock & Schluter, The Analysis of Biological Data
Mosaic plot
Figure caption: figure 2.3-2 from Whitlock & Schluter, The Analysis of Biological Data
2+ variables. No upper limit, but too many variables may be confusing
Width indicates the relative proportion of the corresponding value
Scatter Plot
Scatter Plot
Multiple Histograms
Strip Chart
Boxplot
Violin Plot
Multiple Scatterplots with legend
Multiple scatterplots plotted separately
Line Graphs Show Data Over Time
For temporal data, note all observations with a data point, and connect each point with a line.
A grid of line charts that uses the same scales and axes
Maps
Spatial data does not have to be a geographical map
How to make good plots
Mistakes in displaying data:
This plot hides the variation between positions.
Over-plotting hides data by placing data points on top of each other.
This plot shows all the observations
👎 How to hide data
👍 How to reveal data
Mistakes in displaying data:
Reordering factors makes pattern clear
Cause_of_death | Number |
---|---|
Congen. abnor. | 222 |
Heart disease | 463 |
Accidents | 6688 |
All other cause | 1653 |
Other tumor | 52 |
Suicide | 1615 |
Homicide | 2093 |
Chronic res. disease | 107 |
Cerebrov. disease | 67 |
Flu/pneumonia | 73 |
Malig. tumor | 745 |
👎
Nonsense order hides patterns
Alphabetical order is usually a bad idea.
Cause_of_death | Number |
---|---|
Accidents | 6688 |
Homicide | 2093 |
Suicide | 1615 |
Malig. tumor | 745 |
Heart disease | 463 |
Congen. abnor. | 222 |
Chronic res. disease | 107 |
Flu/pneumonia | 73 |
Cerebrov. disease | 67 |
Other tumor | 52 |
All other cause | 1653 |
👍
Order to reveal patterns
List ordinal factors in a meaningful order
List nominal factors from greatest to least, with “all others” last.
How to hide patterns 👎
How to reveal patterns 👍
In this plot, the large scale hides the pattern (difference between the two groups)
From: makeameme.org
B21: Biostatistics with R