# üî¥ Lab 3 ‚Äî Inference Attack (Membership Inference)
### Certified AI Penetration Tester ‚Äì Red Team (CAIPT-RT)

---

## üéØ The Story

A hospital trained a machine learning model on real patient records ‚Äî sensitive data including age, family background, financial situation, and health history. They never release the records themselves, but they do offer the trained model as a public tool: send it a patient profile, get a risk prediction back.

You are an attacker. You do not have the patient records. But you have access to the model. Can you figure out **whether a specific person's data was used to train it?**

If you can, you have violated that person's privacy ‚Äî you now know they were a patient at this hospital and their data was part of a sensitive medical study.

This is a **Membership Inference Attack**.

---

## üìñ What is a Membership Inference Attack?

It tries to determine whether a specific data point was part of a model's training dataset.

**Why does this work?** Models tend to behave slightly differently on data they trained on versus data they have never seen. They are more confident and make fewer errors on training data. An attacker exploits this difference.

**Real world examples:**
- Determining if a specific person's medical record was in a clinical trial dataset
- Confirming if someone's financial data trained a credit scoring model
- Violating GDPR or HIPAA by inferring membership in sensitive datasets

---

## üóÇÔ∏è What We Will Do in This Lab

1. Load the Nursery dataset and understand what it contains
2. Train a classifier and observe the train vs test accuracy gap
3. Run a rule-based membership inference attack
4. Run ART's black-box membership inference attack
5. Understand why overfitting makes models leak privacy

---

## ‚öôÔ∏è Step 1: Import the Tools We Need

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from art.estimators.classification import SklearnClassifier
from art.attacks.inference.membership_inference import (
    MembershipInferenceBlackBox,
    MembershipInferenceBlackBoxRuleBased
)

np.random.seed(42)
print("All tools imported successfully.")

---

## üìÇ Step 2: Load and Understand the Nursery Dataset

The **Nursery dataset** was created to rank applications for nursery school enrollment. It contains information about families that people would consider private ‚Äî financial standing, family structure, housing conditions.

Think of each row as a person's application record. The attack we are about to perform could reveal whether a specific family's private application was used to train the model.

In [None]:
# =============================================================================
# LOAD THE NURSERY DATASET
# =============================================================================

column_names = [
    'parents',   # parents occupation: usual, pretentious, great_pret
    'has_nurs',  # child nursery quality: proper, less_proper, improper, critical, very_crit
    'form',      # family form: complete, completed, incomplete, foster
    'children',  # number of children: 1, 2, 3, more
    'housing',   # housing conditions: convenient, less_conv, critical
    'finance',   # financial standing: convenient, inconv
    'social',    # social conditions: nonprob, slightly_prob, problematic
    'health',    # health conditions: recommended, priority, not_recom
    'target'     # enrollment decision: recommend, priority, not_recom, very_recom, spec_prior
]

df = pd.read_csv(
    '../datasets/nursery.data',
    header=None,
    names=column_names
)

print(f"Dataset loaded: {len(df)} records")
print("")
print("First 5 records:")
print("-" * 70)
print(df.head().to_string())
print("")
print("Enrollment decision distribution:")
print(df['target'].value_counts())

### üëÄ What Do You See?

- What kind of information does this dataset contain about families?
- Would you consider this information sensitive? Why?
- If this were a medical dataset, what columns might exist instead?

---

## üî¢ Step 3: Prepare the Data

In [None]:
# =============================================================================
# CONVERT CATEGORICAL TEXT TO NUMBERS
# =============================================================================
# LabelEncoder converts each unique text value into a number.
# Example: parents column: usual=2, pretentious=1, great_pret=0
# We apply this to every column including the target.
# =============================================================================

df_encoded = df.copy()
for column in df_encoded.columns:
    le = LabelEncoder()
    df_encoded[column] = le.fit_transform(df_encoded[column])

X = df_encoded.drop('target', axis=1).values
y = df_encoded['target'].values

# Split into training (members) and testing (non-members)
# We use these terms deliberately: members = in training set
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

print(f"Input features: {X.shape[1]} columns per record")
print("")
print(f"Training set : {len(X_train)} records  <- these are the MEMBERS")
print(f"Testing set  : {len(X_test)} records   <- these are the NON-MEMBERS")
print("")
print("Members    = records the model trained on")
print("Non-members = records the model has never seen")
print("The attack will try to tell these two groups apart.")

---

## üèãÔ∏è Step 4: Train the Target Model

In [None]:
# =============================================================================
# TRAIN THE TARGET MODEL
# =============================================================================
# Random Forest = an ensemble of many decision trees that vote together.
# n_estimators=100 means we build 100 decision trees.
# =============================================================================

print("Training target model (Random Forest with 100 trees)...")
print("(May take 10-20 seconds)")
print("")

target_model = RandomForestClassifier(n_estimators=100, random_state=42)
target_model.fit(X_train, y_train)

train_accuracy = accuracy_score(y_train, target_model.predict(X_train))
test_accuracy = accuracy_score(y_test, target_model.predict(X_test))

print("Training complete!")
print("")
print("=" * 50)
print(f"Training accuracy : {train_accuracy*100:.2f}%")
print(f"Testing accuracy  : {test_accuracy*100:.2f}%")
print(f"Gap               : {(train_accuracy - test_accuracy)*100:.2f}%")
print("=" * 50)
print("")
print("IMPORTANT: Notice the gap between training and testing accuracy.")
print("This gap is exactly what membership inference attacks exploit.")

### üëÄ What Do You See?

This is a critical observation. The model performs better on training data than on test data. This is called **overfitting** ‚Äî the model has partially memorised its training examples.

This overfitting is exactly what membership inference attacks exploit. If the model behaves differently on data it has seen versus data it has not, an attacker can use that difference to identify members.

A perfectly generalising model with no gap would be much harder to attack. Why?

---

## üî¥ Step 5: Rule-Based Membership Inference Attack

In [None]:
# =============================================================================
# RULE-BASED MEMBERSHIP INFERENCE ATTACK
# =============================================================================
# Simple rule:
#   IF model predicts correctly on a sample -> guess it IS a member
#   IF model predicts incorrectly           -> guess it is NOT a member
#
# This works because models tend to be more accurate on training data.
# =============================================================================

art_model = SklearnClassifier(model=target_model)
rule_attack = MembershipInferenceBlackBoxRuleBased(art_model)

sample_size = 200

member_idx = np.random.choice(len(X_train), sample_size, replace=False)
X_member = X_train[member_idx]
y_member = y_train[member_idx]

nonmember_idx = np.random.choice(len(X_test), sample_size, replace=False)
X_nonmember = X_test[nonmember_idx]
y_nonmember = y_test[nonmember_idx]

# infer() returns 1 if it thinks sample is a member, 0 if not
member_inferred = rule_attack.infer(X_member, y_member)
nonmember_inferred = rule_attack.infer(X_nonmember, y_nonmember)

member_accuracy = np.mean(member_inferred == 1)
nonmember_accuracy = np.mean(nonmember_inferred == 0)
overall_accuracy = (member_accuracy + nonmember_accuracy) / 2

print("Rule-Based Attack Results:")
print("=" * 50)
print(f"Correctly identified members     : {member_accuracy*100:.1f}%")
print(f"Correctly identified non-members : {nonmember_accuracy*100:.1f}%")
print(f"Overall attack accuracy          : {overall_accuracy*100:.1f}%")
print("")
print(f"Random guessing baseline         : 50.0%")
print(f"Advantage over random guessing   : +{(overall_accuracy-0.5)*100:.1f}%")

### üëÄ What Do You See?

- Random guessing gives 50%. Anything above that means the attacker is gaining real information.
- Even a small advantage above 50% is a **privacy violation** in a sensitive context like medical data.
- How much better than random guessing did the rule-based attack perform?

---

## üî¥ Step 6: Black-Box Membership Inference Attack

In [None]:
# =============================================================================
# BLACK-BOX MEMBERSHIP INFERENCE ATTACK
# =============================================================================
# More sophisticated than rule-based. Trains its own small attack model
# that learns to distinguish members from non-members based on the victim
# model's output probabilities.
#
# Members tend to get higher confidence scores because the model has seen
# them before. The attack model learns to exploit this pattern.
# =============================================================================

print("Running black-box membership inference attack...")
print("(Trains an attack model - may take 15-30 seconds)")
print("")

bb_attack = MembershipInferenceBlackBox(art_model, attack_model_type='rf')

train_split = sample_size // 2

bb_attack.fit(
    x=X_member[:train_split],
    y=y_member[:train_split],
    x_test=X_nonmember[:train_split],
    y_test=y_nonmember[:train_split]
)

print("Attack model trained. Evaluating on held-out data...")
print("")

bb_member_inferred = bb_attack.infer(X_member[train_split:], y_member[train_split:])
bb_nonmember_inferred = bb_attack.infer(X_nonmember[train_split:], y_nonmember[train_split:])

bb_member_acc = np.mean(bb_member_inferred == 1)
bb_nonmember_acc = np.mean(bb_nonmember_inferred == 0)
bb_overall = (bb_member_acc + bb_nonmember_acc) / 2

print("Black-Box Attack Results:")
print("=" * 50)
print(f"Correctly identified members     : {bb_member_acc*100:.1f}%")
print(f"Correctly identified non-members : {bb_nonmember_acc*100:.1f}%")
print(f"Overall attack accuracy          : {bb_overall*100:.1f}%")
print("")
print(f"Random guessing baseline         : 50.0%")
print(f"Advantage over random guessing   : +{(bb_overall-0.5)*100:.1f}%")
print("")
print("Comparison:")
print(f"  Rule-based attack : {overall_accuracy*100:.1f}%")
print(f"  Black-box attack  : {bb_overall*100:.1f}%")

### üëÄ What Do You See?

- Compare rule-based vs black-box results. Which performed better?
- If this were a medical dataset and the attack had 70% accuracy, what is the real-world privacy implication?

### üß™ Try This

Go back to the model training step and change `n_estimators=100` to `n_estimators=10`. Retrain and rerun the attacks.

- Does a model that overfits more become easier or harder to attack?
- What does this tell you about the relationship between model quality and privacy?

---

## üí≠ Step 7: Reflect

In [None]:
reflection = """
LAB 3 - INFERENCE ATTACK REFLECTION
=====================================

Q1: In plain English, what is a membership inference attack and why
    is it a privacy violation?
A1: [TYPE YOUR ANSWER HERE]

Q2: The attack worked because the model had a training/test accuracy gap
    (overfitting). What does overfitting mean and why does it help attackers?
A2: [TYPE YOUR ANSWER HERE]

Q3: Even a small advantage above 50% could be a serious problem in a
    medical context. What could an attacker do with even partial knowledge
    of who was in a training dataset?
A3: [TYPE YOUR ANSWER HERE]

Q4: What defensive measures could reduce membership inference risk?
    (Hint: look up differential privacy)
A4: [TYPE YOUR ANSWER HERE]

Q5: Which regulation (GDPR, HIPAA, etc.) would apply if an attack
    revealed that someone's medical data was in a hospital's AI training set?
A5: [TYPE YOUR ANSWER HERE]
"""

with open('../outputs/Lab3_Reflection.txt', 'w') as f:
    f.write(reflection)

print("Reflection saved to outputs/Lab3_Reflection.txt")
print(reflection)

---

## ‚úÖ Lab 3 Complete

Return to [START_HERE.ipynb](START_HERE.ipynb) and open Lab 4 ‚Äî Extraction Attack.

---
*Built with the Adversarial Robustness Toolbox (ART) ‚Äî https://github.com/Trusted-AI/adversarial-robustness-toolbox*