8.1.Contingency Analyses

Contingency Analyses

Bárbara D. Bitarello

2025-11-26

Outline

  • Quantifying associations between two categorical variables

  • Relative risk

  • Odds-ratio

  • Testing associations between categorical variables

  • \(\chi^2\) contingency test

  • G-test

  • Fisher’s test

We Often Care About Associations

  • Does eating chocolate decrease the chance you’ll have a bad day?
  • Does the home team have an advantage?
  • Is the use of “bath salts” associated with cannibalism?*
  • Does fertilizing tomatoes increase the chance they set fruit?
  • Does a given drug affect the disease outcome in patients?

Two kinds of chi-squared tests (1/2)

  1. Chi-square goodness-of-fit tests
  • one variable (observed vs. expected)
  • the chi-square distribution in tests where we compare observed frequencies for one variable in a sample to a theoretical expectation based on a distribution (e.g. the proportional model, the poisson model, etc

Two kinds of chi-squared tests (2/2)

Chi-square contingency (independence) tests (2x2)

  • the chi-square statistic in tests that compare frequencies/counts for two categorical variables
  • the \(\chi^2\) contingency test is a special application of the more general goodness-of-fit test for which the probability model being tested is the independence of variables

Quantifying Associations Between Categorical Variables

  • Relative risk
  • Odds ratio

Relative Risk: Definition

Relative risk is the probability of an undesired outcome in the treatment group divided by the probability of that outcome in the control group.

Parameter: \(\ RR=\frac{p_1}{p_2}\)

Estimate: \(\hat {RR}=\frac{\hat{p_1}}{\hat{p_2}}\)

, where

\(\hat{p_i}=\frac{n_{\text{bad outcomes in group i}}}{n_{\text{individuals in group i}}}\)

  • Probabilities (\(p1, p2\)) range from 0 to 1.
  • RR ranges from 0 to \(\infty\).

\(\hat {RR} < 0\): Relative risk is lower in group 1

\(\hat {RR} > 0\): Relative risk is higher in group 1

Reduction in relative risk is also of interest: \(1- \hat {RR}\)

Relative risk vs. Odds-ratio: Titanic example!

decorative figure of the titanic

Wikipedia Commons

Surviving the Titanic: RR (1/)

Mosaic plot plotted in R showing on the x axis the SEX variable and on the y axis the SURVIVED variable; clearly shows proportion of women who died was less than 1/2 and for men it was > 3/4.

## NULL
  • What is being calculated?
  • What is the focal outcome?
  • What’s the treatment group?
  • What’s the control group?
  • How to interpret RR?
  • How to interpret OR?

Surviving the Titanic: RR (2/)

  • Comparing the risk of death for women to men.
  • Relative Risk: female as treatment, and male as control.
## NULL

Women’s relative risk of dying in the titanic compared to men:

\[\hat {RR}=\frac{\hat{p}_{women}}{\hat{p}_{men}}\] , where

\(\hat{p_{w}}=\frac{death}{\text{all women}}\)

\(\hat{p_{m}}=\frac{death}{\text{all men}}\)

Surviving the Titanic: RR (3/)

  • Comparing the risk of death for women to men.
  • Relative Risk: female as treatment, and male as control.
## NULL

\(\hat p_{w}=\frac{109}{(109+316)}\approx0.256\)

\(\hat p_{men}=\frac{1329}{(1329+109)}\approx0.924\)

Women’s RR of dying compared to men:

\(\hat {RR}=\frac{0.256}{0.924}=0.277\) (women’s RR of dying was less than 1/3that of men)

Women’s reduction in relative risk of dying compared to men:

\(1-\hat {RR}=0.723\)

Relative Risk: Pros and Cons

Good:

  • Straightforward, intuitive interpretation
  • Should be reported whenever probabilities (\(p_{1},p_{2}\)) can be estimated without bias

Bad:

  • We often cannot study rare outcomes without some selection bias
  • It has some non-desirable properties for statistical modeling.
  • In those cases we might have to deal with the much less intuitive Odds Ratio (or log Odds Ratio)

Odds: Definition (1/)

  • Probability of success divided by probability of failure “Success” refers to the focal outcome
  • Note: Odds range from 0 to \(\infty\), unlike probabilities that range from 0 to 1.

Parameter: \(O=\frac{p}{1-p}\)

Estimate: \(\hat {O}=\frac{\hat p}{1-\hat p}\)

\(\hat{O}>1\) : Success more likely than failure

\(\hat{O}<1\) : Success less likely than failure

\(\hat{O}>1\) : Success and failure equally likely

Odds Ratio (OR) (1/)

Contingency table: Cells contain counts

This is a two-way contingency table
Treatment Control
Success a b
Failure c d

Success: the focal outcome

Treatment/Control: The two categories identify the two groups whose probability of success is being compared.

Odds Ratio (OR) (2/)

Contingency table: Cells contain counts

This is a two-way contingency table
Treatment Control
Success a b
Failure c d

\(O_{1}=\frac{P(Success|Group_1)}{1-P(Success|Group_1)}\)

\(O_{2}=\frac{P(Success|Group_2)}{1-P(Success|Group_2)}\)

\(OR=\frac{O_{1}}{O_{2}}\)

Odds Ratio (OR) (3/)

Contingency table: Cells contain counts

This is a two-way contingency table
Treatment Control
Success a b
Failure c d

\(OR=\frac{O_{1}}{O_{2}}\)

\(\hat{OR}=\frac{\hat{O_{1}}}{\hat{O_{2}}}\) \(\hat{OR}=\frac{a/c}{b/d}=\frac{ad}{bc}\)

Surviving the titanic: OR (1/)

  • “Success” refers to the focal outcome: dying in the titanic
Women Men
Death 109 1329
Survival 316 109

Surviving the titanic: OR (2/)

Women Men
Death 109 1329
Survival 316 109

\(\hat{OR}=\frac{\hat{O}_{women}}{\hat{O}_{men}}\).

\(\hat {O}_{women}=\frac{\hat p}{1-\hat p}=\frac{0.256}{1-0.256}\approx 0.344\). We have \(\hat{p_{w}}\) from before

\(\hat {O}_{men}=\frac{\hat p}{1-\hat p}=\frac{0.924}{1-0.924}\approx 12.2\). We have \(\hat{p_{w}}\) from before

\[\hat{OR}=\frac{\hat{O}_{women}}{\hat{O}_{men}}=\frac{0.344}{12.2}\approx 0.0282\]

Surviving the titanic: OR (3/)

Women Men
Death 109 1329
Survival 316 109

Alternatively, even without \(\hat{p_{w}}\) and \(\hat{p_{m}}\), we can use this table:

$===0.0766782

Odds Ratio: Pros and Cons

Good:

  • Not intuitive at all
  • The odds ratio does not equal the relative risk.

Bad:

  • The denominator drops out of the top and bottom – meaning this will work even if we don’t have true absolute probabilities.
  • This will be the case in studies of rare outcomes in which it makes sense to increase the number of these rare outcomes in our sample (as we do in case-control studies).