# PyCaret Anomaly Detection — Credit Card Transactions

Dataset: `creditcard.csv`. Label `Class` used only for inspection; model trains unsupervised. Notebook trimmed to avoid long t-SNE/UMAP runs.

## Environment
Python 3.11, `pycaret==3.3.0`, `scikit-learn==1.2.2`. If imports fail, reinstall requirements and restart kernel.

In [None]:
import pandas as pd
import numpy as np
import sklearn
from pycaret import __version__ as pycaret_version
from pycaret.anomaly import *

print('PyCaret', pycaret_version)
print('sklearn', sklearn.__version__)

In [None]:
# Load data
df = pd.read_csv('../data/creditcard.csv')
print(df.head())
print('Shape:', df.shape)
print('Class balance (for reference only):')
print(df['Class'].value_counts())

In [None]:
# Use a sample to keep runtime reasonable; set to None to use full data
SAMPLE_FRAC = 0.25  # 25% of data (~71k rows) to speed up
features = df.drop(columns=['Class'], errors='ignore')
if SAMPLE_FRAC:
    features = features.sample(frac=SAMPLE_FRAC, random_state=42)
print('Features shape used for training:', features.shape)

In [None]:
s = setup(
    data=features,
    session_id=42,
    normalize=True,
    use_gpu=False,
    log_experiment=False,
    verbose=True,
)

In [None]:
# Train isolation forest (fast and reliable for this dataset)
iforest = create_model('iforest')


In [None]:
# Assign anomaly scores/labels
scored = assign_model(iforest)
print(scored[['Anomaly', 'Anomaly_Score']].head())
print('Anomaly counts:', scored['Anomaly'].value_counts())
# Save for reuse
save_model(iforest, 'anomaly_creditcard_iforest')

## Notes
- Plotting t-SNE/UMAP on full data can take 30–40+ minutes; skipped here.
- Adjust `SAMPLE_FRAC` (e.g., 0.1 or None) depending on runtime vs fidelity.