## Bayesian Analysis

### What is Bayesian Analysis?

- Bayesian analysis is a statistical method that updates beliefs (probabilities) when new evidence becomes available.

- It starts with a prior probability (initial belief), incorporates new evidence, and produces a posterior probability (updated belief).

The updating rule is Bayes’ Rule: **P(A∣B)= P(B∣A)⋅P(A)/P(B)** 


### Example: Email Spam Detection

- **Prior (before evidence)**: 20% of all emails are spam (P(spam)=0.2).
- **Evidence**: The word “money” appears in the email.
- **Likelihoods**:
    - P(money∣spam)=0.08
    - P(money∣ham)=0.01
- **Computation (using Bayes)**:
    - Denominator P(money)=0.08×0.2+0.01×0.8=0.024.
    - Posterior P(spam∣money)=0.08×0.2/0.024=0.667.

✅ Seeing “money” increases belief from 20% → **~67%** that the email is spam.

### Example: Medical Testing

- **Prior**: Disease prevalence is 1% (P(disease)=0.01).

- **Test performance**:
    - Sensitivity = 95% → P(positive∣disease)=0.95.
    - False positive rate = 2% → P(positive∣no disease)=0.02.

- **Computation**:
    - Denominator P(positive)=0.95×0.01+0.02×0.99=0.0095+0.0198=0.0293.
    - Posterior P(disease∣positive)=0.95×0.01/0.0293≈0.324.

✅ Even with a positive test, the chance of disease is **only ~32.4%**, not 95%.
This is because the disease is rare, so **false positives dominate**.

### 4. Why Bayesian Analysis Matters
- It teaches us to always consider the base rate (prior).
- Strong evidence (like a positive test) doesn’t guarantee certainty when the condition is rare.
- This reasoning is critical in spam filtering, diagnostics, fraud detection, and AI classifiers.

### ✅ Summary:

- Bayesian analysis = Prior belief + New evidence → Updated belief.
- It prevents misinterpretation by balancing how common something is (base rate) with how strong the evidence is (likelihoods).

## Warner’s Randomized Response Model

### The Problem

When surveys ask sensitive questions (e.g., “Have you ever cheated on an exam?”), respondents may lie because of **embarrassment, stigma, or fear of consequences**.
This leads to **biased results**—underreporting the true prevalence.

### Warner’s Randomized Response Technique (1965)

To encourage honesty while protecting privacy, Warner introduced randomization:

- Each respondent flips a coin **twice**.

- Based on the outcome, they answer **one of two questions**:

    1. **Sensitive Question**: “Have you ever cheated on an exam?”

    2. **Random Question**: “Did the coin land tails on the second toss?”
 
- If the first toss is tails, they answer Q1: “Have you ever cheated on an exam in college?”

- If the first toss is heads, they answer Q2: “Did you get tails on the second toss?”

The researcher doesn’t know which question the student answered, so **individual privacy is protected**.
But in aggregate, statistics can be used to estimate the true proportion.

### Why This Works

- A **“yes”** answer could mean:

    - The student cheated (from Q1), OR

    - The second coin landed tails (from Q2).

- Since the second coin toss is **random with known probability (50%)**, researchers can use probability theory to “filter out” the noise introduced by randomization.

### Example Calculation

**Suppose**:

- 50% of students answer Q1 (cheating), 50% answer Q2 (coin toss).

- Probability of “yes” from Q2 = 0.5.

If **47%** of all survey responses are “yes”: P(yes)=0.5⋅P(cheated)+0.5⋅0.5

Solve for P(cheated):

- 0.47=0.5⋅P(cheated)+0.25
- 0.47−0.25=0.22=0.5⋅P(cheated)
- P(cheated)=0.44

✅ The estimated proportion of students who have cheated is **44%**, even though direct answers might have underreported it.