7.1.Goodness-of-fit tests

The binomial test

Bárbara D. Bitarello

2025-11-19

Outline

•Goodness-of-fit tests

•The proportional model

•The \(\chi^2\) distribution

•Degrees of freedom

•The \(\chi^2\) test

•Using \(\chi^2\) to test if data are distributed according to a poisson distribution

Goodness of Fit Tests

Does data come from a given distribution with specified parameters?

NHL player birth months 🏒

Biological hypothesis: Being born earlier in the year makes players stand out when young, helping them achieve later success.

\(H_0\): Elite hockey players are born randomly across the year.

\(H_A\): Elite hockey players are NOT born randomly across the year.

Testing!

\(H_0\): Prop birth months of NHLers is like that of other humans.

\(H_A\): Prop birth months of NHLers is unlike that of other humans.

Human births by month
month births (%)
Jan 7.94
Feb 7.63
Mar 8.72
April 8.63
May 8.95
June 8.57
Jul 8.76
Aug 8.50
Sept 8.54
Oct 8.19
Nov 7.70
Dec 7.87

Introducing \(\chi^2\)

\(\chi^2\) summarizes the fit of categorical data to expectations.

shows chi-squared formula with annotations: O=observed, E=expected, chi-squared (greek symbol)

\(\chi^2 = 0\): Data perfectly match null hypothesis expectations

\(χ^2 > 0\): Data show deviation from null hypothesis expectations

Finding Expectations

Expected number of births in month \(i\) is:

\(E[X] = n_{total} × \text{Null proportion}\)

\(n_{total} = 1245\) (just add up the births column)

\[E[X_i] = n_{total} \times \text{expected propotion}_i\]

Code
nhl |>
    mutate(expected.prop = c(0.0794, 0.0763, 0.0872, 0.0863, 0.0895, 0.0857, 0.0876,
        0.085, 0.0854, 0.0819, 0.077, 0.0787)) |>
    mutate(expected.n = expected.prop * sum(births), `expected %` = 100 * expected.prop)
month births expected % expected.n
Jan 133 7.94 98.8
Feb 125 7.63 95.0
Mar 114 8.72 108.6
Apr 119 8.63 107.4
May 119 8.95 111.4
June 123 8.57 106.7
Jul 96 8.76 109.1
Aug 91 8.50 105.8
Sep 83 8.54 106.3
Oct 84 8.19 102.0
Nov 73 7.70 95.9
Dec 85 7.87 98.0

Finding \(\chi^2\) [1/2]

month births expected % expected.n
Jan 133 7.94 98.8
Feb 125 7.63 95.0
Mar 114 8.72 108.6
Apr 119 8.63 107.4
May 119 8.95 111.4
June 123 8.57 106.7
Jul 96 8.76 109.1
Aug 91 8.50 105.8
Sep 83 8.54 106.3
Oct 84 8.19 102.0
Nov 73 7.70 95.9
Dec 85 7.87 98.0

Example: January

\(E=98.9\)

\(O=133\)

\(\frac{(O_{i}-E_{i})^2}{E_{i}}=?\)

\(\frac{(133-98.9)^2}{98.9}=\frac{1162.81}{98.9}=11.76\)

Finding \(\chi^2\) [2/2]

Code
nhl |>
    mutate(sq_dev_over_expect = (expected.n - births)^2/expected.n)
month month.num births expected.prop expected.n expected % sq_dev_over_expect
Jan 1 133 0.079 98.9 7.94 11.795
Feb 2 125 0.076 95.0 7.63 9.478
Mar 3 114 0.087 108.6 8.72 0.272
Apr 4 119 0.086 107.4 8.63 1.243
May 5 119 0.090 111.4 8.95 0.515
June 6 123 0.086 106.7 8.57 2.491
Jul 7 96 0.088 109.1 8.76 1.564
Aug 8 91 0.085 105.8 8.50 2.077
Sep 9 83 0.085 106.3 8.54 5.116
Oct 10 84 0.082 102.0 8.19 3.165
Nov 11 73 0.077 95.9 7.70 5.454
Dec 12 85 0.079 98.0 7.87 1.720

Now find \(\chi^2\) by summing over all \(\frac{(\text{O}_i - \text{E}_i)^2}{\text{E}_i}\)

Code
obs.chi2 <- nhl |>
    summarise(chi2 = sum(sq_dev_over_expect))
obs.chi2
## # A tibble: 1 × 1
##    chi2
##   <dbl>
## 1  44.9

\(\chi^2 = 44.89\)

From Test Stat to P-value [1/2]

By simulation

• Randomly assign hockey players birth dates. \(\chi^2\)

• Calculate \(\chi^2\) for these random data following the null.

• Repeat this many times.

• Compare observed to generated under the null.

From Test Stat to P-value [2/2]

The \(\chi^2\) distribution describes expected values of \(\chi^2\) under \(H_0\).
The p-value is the area under the upper tail of the distribution.
You can see solid agreement between our simulations and the \(\chi^2\) distribution. We often use the \(\chi^2\) for testing comparing categorical counts to expectations.

WHY DOES THIS WORK?

THE \(\chi^2\) DISTRIBUTION’S

PROPERTIES

Introduction to \(\chi^2\) distribution

• The \(\chi^2\) distribution describes the sampling distribution of \(\chi^2\) under a null hypothesis predicting expected categorical counts.

• There are many \(\chi^2\) distributions, each associated with a different number of degrees of freedom.

• The \(\chi^2\) distributions are continuous probability distributions, so we use the area under the curve (not the height of the curve) to obtain an APPROXIMATE P-value. NOTE: This is a one-tailed test!

Introduction to \(\chi^2\) distribution

What Is A Degree of Freedom?

For \(\chi^2\) tests,
df = # categories - 1 - # params estimated from data

More broadly…
How many data points can “wobble around” following initial estimation of your model.

E.g. \(s^2=\frac{\sum (x-\bar x)^2}{n-1}; n=10\)

• First we calculate the mean and then we need to calculate each \((x−\bar x)^2\) for each data point.

• But note that only \(n−1\) data points are truly free to have any value. I.e., if you know the value \(x\) for 9 of your 10 data points, the last one can be inferred. So this statistic has \(n− 1\) degrees of freedom.

Assumptions of \(\chi^2\) tests

  • No expected values \(< 1\)
  • No more than \(20\%\) of expected values \(< 5.\)

Do we meet these?

month Jan Feb Mar Apr May June Jul Aug Sep Oct Nov Dec
expected.n 99 95 109 107 111 107 109 106 106 102 96 98

\(\chi^2\) P-Value and Stats Tables

Statistical Table A in your book [old fashioned, for exam].

df | a 0.1 0.05 10^-2 10^-3 10^-4 10^-5 10^-6 10^-7
1 2.71 3.84 6.63 10.8 15.1 19.5 23.9 28.4
2 4.61 5.99 9.21 13.8 18.4 23.0 27.6 32.2
3 6.25 7.81 11.34 16.3 21.1 25.9 30.7 35.4
4 7.78 9.49 13.28 18.5 23.5 28.5 33.4 38.2
5 9.24 11.07 15.09 20.5 25.7 30.9 35.9 40.9
6 10.64 12.59 16.81 22.5 27.9 33.1 38.3 43.3
7 12.02 14.07 18.48 24.3 29.9 35.3 40.5 45.7
8 13.36 15.51 20.09 26.1 31.8 37.3 42.7 48.0
9 14.68 16.92 21.67 27.9 33.7 39.3 44.8 50.2
10 15.99 18.31 23.21 29.6 35.6 41.3 46.9 52.3
11 17.28 19.68 24.72 31.3 37.4 43.2 48.9 54.4
12 18.55 21.03 26.22 32.9 39.1 45.1 50.8 56.4

\(\chi^2\) P-Value and Stats Tables

We had \(\chi^2=44.89; df=12-1-0=11\)

df | a 0.1 0.05 10^-2 10^-3 10^-4 10^-5 10^-6 10^-7
11 17.3 19.7 24.7 31.3 37.4 43.2 48.9 54.4

Note: The critical value of a test statistic that marks the boundary of a specified area in the tail (or tails) of the sampling distribution under \(H_{0}\). For \(df=11\) and \(\alpha=0.05\), the critical value here is 19.7.

We can determine that \(10^{-6}<P<10^{-5}\)

Code
# find pval for chisq test without stats table
pchisq(q = 44.89, df = 11, lower.tail = F)
## [1] 5.07e-06

P is very small. Data like these would rarely be generated under the null.

We reject \(H_0\) & conclude that birth months of NHL players do not follow that of the rest of the population.

Note: This is a one tailed test by definition– we only care if data are too far from expectations, not if they’re too close.

To summarize what we just did

• This was an example of using the \(\chi^2\) goodness-of-fit test to test if the data follows a proportional model

• The proportional model is a probability model in which the frequency of occurrence of events is proportional to the number of opportunities.

• Critical values are used in association with statistical tables to determine whether the P-value is below a pre-determined threshold

• You don’t need critical values when you use something like R to calculate your P-value

• Always best to report actual p-value than a range

• P-values are always approximate because is a continuously distributed variable

The \(\chi^2\) Distribution is Versatile: We Can Use It for Any Discrete Distribution

• Provided the aforementioned assumptions are met

• In the next, we test the null hypothesis that meiosis in monkey flower hybrids follows the expectations from a Punnett square.

• We use a \(\chi^2\) goodness of fit test to see if data meet binomial expectations. Note that this is not a binomial test, know the difference.

Is Meiosis in monkey flower Fair?

When making hybrids between monkey flower species, in one cross Lila Fishman found:

• 48 GG homozygotes

• 37 GN heterozygotes

• 4 NN homozygotes

• This surprised her.

What’s the probability that Lila would see this (or something more extreme) under chance alone?

Are these genotype frequencies unusual?

\(H_0\): Genotypes follow Mendelian expectations of 1:2:1.
\(H_A\): Genotypes do not follow Mendelian expectations of 1:2:1.

\(df =\) # categories - 1 - # params estimated \(= 3 - 1 - 0 = 2\)

Code
monkeyflowers <- tibble(geno = c("GG", "GN", "NN"), observed = c(48, 37, 4), expected.prop = c(0.25,
    0.5, 0.25))
monkeyflowers |>
    kable() |>
    kable_styling(full_width = FALSE)
geno observed expected.prop
GG 48 0.25
GN 37 0.50
NN 4 0.25

Are these genotype frequencies unusual?

Code
monkeyflowers <- tibble(geno = c("GG", "GN", "NN"), observed = c(48, 37, 4), expected.prop = c(0.25,
    0.5, 0.25)) |>
    mutate(expected.n = sum(observed) * expected.prop, sq_dev = (expected.n - observed)^2/expected.n) |>
    mutate(sq_dev = round(sq_dev, digits = 3))
monkeyflowers |>
    kable() |>
    kable_styling(full_width = FALSE)
geno observed expected.prop expected.n sq_dev
GG 48 0.25 22.2 29.80
GN 37 0.50 44.5 1.26
NN 4 0.25 22.2 14.97

Meiosis isn’t fair in monkey flowers

Code
monkey_chi <- monkeyflowers |>
    summarise(chi2 = sum(sq_dev)) |>
    pull()
monkey_p <- pchisq(q = monkey_chi, df = 2, lower.tail = FALSE)
monkey_p
## [1] 1.01e-10

We observe a \(\chi_{2}^2\) value of 46, and a p-value of \(1.01 \times 10^{-10}\).

Remember: The low P-value reflects the weight of evidence against the null hypothesis, not how big the difference is between the true proportion and the null expectation.

We reject the null hypothesis. Meiosis isn’t fair.

Centromere-Associated Female Meiotic Drive Entails Male Fitness Costs in Monkeyflowers

Female meiotic drive, in which paired chromosomes compete for access to the egg, is a potentially powerful but rarely documented evolutionary force. In interspecific monkeyflower (Mimulus) hybrids, a driving M. guttatus allele (D) exhibits a 98:2 transmission advantage via female meiosis. We show that extreme interspecific drive is most likely caused by divergence in centromere-associated repeat domains and document cytogenetic and functional polymorphism for drive within a population of M. guttatus. In conspecific crosses, D had a 58:42 transmission advantage over nondriving alternative alleles. However, individuals homozygous for the driving allele suffered reduced pollen viability. These fitness effects and molecular population genetic data suggest that balancing selection prevents the fixation or loss of D and that selfish chromosomal transmission may affect both individual fitness and population genetic load.