Nicola Lo Russo

About Bayes' Theorem, Disease Testing, and the Prosecutor's Fallacy

Scenario

A medical test is designed to detect a rare disease, that affects only 1 in 1000 people. The test correctly identifies 99% of those with the disease, and identifies correctly 95% of those without it. In mathematical terms, the test has:

99% sensitivity (correct identification of those with the disease).
95% specificity (correct identification of those without the disease).

If a patient tests positive, what is the probability they actually have the disease?

Intuition

At first glance, one might assume that a positive test result would indicate a high likelihood of having the disease. However, the reality is more nuanced, and we need to be careful not to fall into the so-called "base rate fallacy".

Let’s consider a screening of a population of 1000 people. Out of this group, we know that only 1 person has the disease, reflecting our prior knowledge about the disease's rarity (P(D), the prior). This means that the probability of having the disease before conducting the test is 0.1%, or 1 in 1000.

Now, let’s analyze what happens during testing. The test is designed to correctly identify those with the disease 99% of the time (P(T | D), the likelihood for positive results when the disease is present). In this case, the 1 infected person is correctly flagged by the test as positive. On the other hand, the test also has a 5% false positive rate, meaning that among the 999 uninfected people, approximately 49 will be incorrectly flagged as positive.

The final step is to consider all the positive test results together—both true positives and false positives. Out of the 50 positive test results (1 true positive and 49 false positives), the evidence (P(T)) is the overall probability of observing a positive result. By combining the prior, likelihood, and evidence, Bayes’ theorem helps us compute the probability of actually having the disease given a positive test result.

Therefore, the probability of actually having the disease, given a positive test result, is:

\[ P(D \mid T) = 1/50 = 2\% \]

In blue negative tests results, in orange false positives, in green true positives.

Formal Solution

More formally, to compute the probability of having the disease given a positive test result, we rely on the following formula.

Bayes' Theorem

\[ P(D \mid T) = \frac{P(T \mid D) \cdot P(D)}{P(T \mid D) \cdot P(D) + P(T \mid \neg D) \cdot P(\neg D)} \]

where:

\( P(D \mid T) \): Probability of having the disease given a positive test.
\( P(T \mid D) \): Probability of testing positive if you have the disease (sensitivity).
\( P(D) \): Probability of having the disease (prevalence).
\( P(T \mid \neg D) \): Probability of testing positive if you don't have the disease (false positive rate).
\( P(\neg D) \): Probability of not having the disease.

Calculation

Let: \[ P(D) = 0.001, \, P(\neg D) = 0.999, \, P(T \mid D) = 0.99, \, P(T \mid \neg D) = 1 - 0.95 \] Substituting into Bayes' theorem, the probability is approximately 2%.

Playground

Adjust the parameters to see how the probability changes:

Prevalence of the disease (e.g., 1 in 1,000 = 0.001):

Sensitivity of the test (e.g., 99% = 0.99):

Specificity of the test (e.g., 95% = 0.95):

Why Bayes Matters

As we have seen, in the context of medical testing, Bayes' theorem helps us interpret the significance of test results and avoid common misconceptions. But the implications of Bayes' theorem (and its misuses) extend far beyond healthcare.

Misapplying Bayes’ theorem can lead to significant errors also in judgment and decision-making. A famous fallacy arising from such misuse is the Prosecutor’s Fallacy.

This fallacy occurs when the probability of observing certain evidence if a defendant is innocent is confused with the probability that the defendant is innocent given the evidence. In legal contexts, this misinterpretation can lead to wrongful convictions.

For example, in the infamous Clark Case, Sally Clark was wrongfully convicted of murdering her two children based on flawed statistical reasoning. The prosecution argued that the chance of two siblings dying of Sudden Infant Death Syndrome (SIDS) was 1 in 73 million. This was presented as the probability of Clark’s innocence. However, this interpretation was incorrect, as it failed to consider other relevant factors, including the likelihood of such deaths in the general population and the possibility of wrongful accusations. The correct application of Bayes’ theorem could have prevented this miscarriage of justice.

References

Below are resources and articles related to the topics discussed:

Bayes' Theorem: [1a] [1b]

Base Rate Fallacy: [2]
Prosecutor's Fallacy: [3]
Sally Clark Case: [4a] [4b]