| num_parasites | num_fish |
|---|---|
| 0 | 103 |
| 1 | 72 |
| 2 | 44 |
| 3 | 14 |
| 4 | 3 |
| 5 | 1 |
| 6 | 1 |
| 7 | 0 |
| 8 | 0 |
| 9 | 0 |
Poisson distribution
2025-11-25
•Goodness-of-fit tests
•The proportional model
•The \(\chi^2\) distribution
•Degrees of freedom
•The \(\chi^2\) test
•Using \(\chi^2\) to test if data are distributed according to a Poisson distribution
Does data come from a given distribution with specified parameters?
We often want to know if events independent in time/space. E.g.:
Perhaps we can use real data and contrast it to expectations under a null model and test. What might that distribution look like?
Assuming events are independent the Poisson distribution, the Poisson describes the expected probability of \(X\) of events (successes) in a block of time or space.
Random Events in Time or Space
The number of successes in the experiment can be counted (discrete).
The mean number of events (successes) that occurs during a specific interval of time (or space) is known.
Each outcome is independent.
The probability that a success will occur is proportional to the size of the interval.
Random Events in Time or Space
Example: Number of births per hour in a given hospital
The number of successes in the experiment can be counted (discrete). ![]()
The mean number of successes that occurs during a specific interval of time (or space) is known. ![]()
Each outcome is independent. ![]()
The probability that a success will occur is proportional to the size of the interval. ![]()
Formally: a mathematical description of the probability of \(X\) successful outcomes when:
the number of attempts, \(n\), is high
the probability of success for each attempt, \(p\), is low and determined through a random process
the probability of each attempt, \(p\), is independent of prior success or failure
The Poisson distribution describes the expected probability of \(X\) independent events in time/space.
The Poisson distribution describes the expected probability of \(X\) independent events in time/space.
Assuming events are random and independent, the probability of observing \(X\) events in a block of time or space equals:
\[\Huge{Pr[X] =\frac{e^{-\lambda} \lambda^{X}}{X!}}\]
\(X\): Number of successes
\(P[X]\): probability of occurrence of \(X\) successes in one trial
\(\lambda\) (or \(\mu\)): the expected (mean) number of events in a block.
\(e\) is the base of \(ln()\), aka Euler’s number (a constant, exp(1) in R).

Parasites are a major force in human health, as well as evolution, ecology, and agronomy, animal husbandry.
Shaw et al. asked if the distribution of parasites or across individual Shad fish was random, or if some have an exceptional parasite burden? Here are their data:
| num_parasites | num_fish |
|---|---|
| 0 | 103 |
| 1 | 72 |
| 2 | 44 |
| 3 | 14 |
| 4 | 3 |
| 5 | 1 |
| 6 | 1 |
| 7 | 0 |
| 8 | 0 |
| 9 | 0 |
\(H_0\): Parasites are placed randomly on fish in a population i.e. parasite numbers follow a Poisson Distribution.
\(H_A\): Parasites are not placed randomly on fish in a population i.e. parasite numbers do not follow a Poisson Distribution.
State hypotheses
Find expectations (via simulation or via model)
| num_parasites | num_fish |
|---|---|
| 0 | 103 |
| 1 | 72 |
| 2 | 44 |
| 3 | 14 |
| 4 | 3 |
| 5 | 1 |
| 6 | 1 |
| 7 | 0 |
| 8 | 0 |
| 9 | 0 |
| total_fish | total_parasites |
|---|---|
| 238 | 225 |
We can put 225 on 238 fish many times to generate a null.
Take 225 parasites and:
randomly place one parasite in one of 238 fish
repeat for each parasite
the same fish can be the target more than once
Note: the observed data does not include X > 6 but they are possible and are thus shown here.
\(P[X]=\frac{e^{-\lambda}\lambda^X}{X!}\)
\(e=2.718282\)
\(\lambda = \frac{tot.paras}{tot.fish}=225/238\)
Let’s try the first row:
\(X=0\)
\(P[0]=\frac{e^{-0.945}\times 0.945^0}{0!}\approx 0.388\)
\(\text{\# fish expected}=0.388\times 238=92.5\)
Repeat for each row…
| num_parasites | num_fish | expect |
|---|---|---|
| 0 | 103 | 92.471 |
| 1 | 72 | 87.420 |
| 2 | 44 | 41.322 |
| 3 | 14 | 13.022 |
| 4 | 3 | 3.078 |
| 5 | 1 | 0.582 |
| 6 | 1 | 0.092 |
| 7 | 0 | 0.012 |
| 8 | 0 | 0.001 |
| 9 | 0 | 0.000 |
| total_fish | total_parasites |
|---|---|
| 238 | 225 |
Do we meet \(\chi^2\) assumptions?
If yes, go on. If no, brainstorm.
| num_parasites | num_fish | expect |
|---|---|---|
| 0 | 103 | 92.471 |
| 1 | 72 | 87.420 |
| 2 | 44 | 41.322 |
| 3 | 14 | 13.022 |
| 4 | 3 | 3.078 |
| 5 | 1 | 0.582 |
| 6 | 1 | 0.092 |
| 7 | 0 | 0.012 |
| 8 | 0 | 0.001 |
| 9 | 0 | 0.000 |
| total_fish | total_parasites |
|---|---|
| 238 | 225 |
Options:
Go on despite this: if test is robust to violations. (I don’t support this option!)
Combine categories: if biologically sensible.
Find a more appropriate test.
Bypass traditional tests by using simulations (which we’ve done)
| num_parasites | num_fish | tot.fish | tot.parasite | mu | expect | sq_dev |
|---|---|---|---|---|---|---|
| 0 | 103 | 238 | 225 | 0.945 | 92.47 | 1.20 |
| 1 | 72 | 238 | 225 | 0.945 | 87.42 | 2.72 |
| 2 | 44 | 238 | 225 | 0.945 | 41.32 | 0.17 |
| 3 | 14 | 238 | 225 | 0.945 | 13.02 | 0.07 |
| 4+ | 5 | 238 | 225 | 0.945 | 3.77 | 0.40 |
\(\chi^ 2= 1.20 + 2.72 + 0.17 + 0.07 + 0.40 = 4.56\)
df = # categories - 1 - # params estimated = 5 - 1 - 1 = 3
\(df=3; \chi^2 = 4.56\)
| df | a | 0.1 | 0.05 | 10^-2 | 10^-3 | 10^-4 | 10^-5 | 10^-6 | 10^-7 |
|---|---|---|---|---|---|---|---|---|
| 1 | 2.71 | 3.84 | 6.63 | 10.8 | 15.1 | 19.5 | 23.9 | 28.4 |
| 2 | 4.61 | 5.99 | 9.21 | 13.8 | 18.4 | 23.0 | 27.6 | 32.2 |
| 3 | 6.25 | 7.81 | 11.34 | 16.3 | 21.1 | 25.9 | 30.7 | 35.4 |
Assuming \(\alpha=0.05\), we fail to reject the NULL hypothesis (\(P>\alpha\)). We cannot exclude the idea that parasites are distributed at random across fish.
A particular feature of the Poisson distribution is that the variance equals the mean.
Sample counts therefore vary more as the mean increases.
If the variance greatly exceeds the mean events are clumped
If the variance is much less than the mean events are dispersed.
• Proportional probability model: events fall in different categories in proportion to the number of opportunities. • The Poisson distribution model: describes the frequency distribution of successes in blocks of time or space when successes happen independently and with equal probability over time or space.
B215: Biostatistics with R