Real vs AI-Generated Image Detection

A machine learning pipeline for classifying images as real (human-captured) or AI-generated. Built as a capstone project for ADS 504 (Machine Learning) at the University of San Diego.

Repository Layout

ADS504-Final/
├── data/
│   ├── images/          # Raw images (train / test / validation splits)
│   ├── arrays/          # Preprocessed 224x224x3 numpy arrays
│   └── metadata/        # Parquet files with labels and engineered features
├── features/            # Extracted feature sets (color histogram, GLCM, HOG, EfficientNet)
├── models/              # Trained model artifacts
├── experiments/         # Keras Tuner hyperparameter search results
├── config/              # Importable path configuration (paths.py)
├── utils/               # Helper modules for image reading and feature extraction
├── resources/           # Reference papers
├── MSADS_504_Final_Project.ipynb   # Main project notebook
├── Paola_s_Final_Notebook.ipynb    # EDA, classical ML, and CNN evaluation
├── Paola_s_Notebook-2.ipynb        # Additional exploration and modeling
└── Taylors_notebook.ipynb          # Feature engineering and model tuning

Summary of Findings

We compared classical machine learning models (Logistic Regression, SVM, Random Forest) against a fine-tuned EfficientNetB0 CNN on a multi-source dataset of ~15,000 images compiled from LAION, Open Images, and the AI vs Human study.

Key results:

Random Forest on engineered metadata features achieved the best performance at 92% accuracy (AUC 0.98), with balanced precision and recall across both classes.
Logistic Regression on metadata reached 81% accuracy (AUC 0.87).
The fine-tuned EfficientNetB0 CNN reached 77-80% accuracy, underperforming the classical models trained on hand-crafted features.
Engineered image properties (color statistics, edge density, texture descriptors, brightness, contrast) proved more discriminative than raw CNN embeddings for this task.

Contact

For questions or collaboration, reach out to any of the authors:

Name	Email
Taylor Kirk	tkirk@sandiego.edu
Tommy Baron	tbarron@sandiego.edu
Paola Rodriguez	prodriguez2@sandiego.edu

References

K. Balakrishna Maruthiram, G. Venkataramireddy, M. Klick. "Real VS AI Generated Image Detection and Classification." IJIRT, Volume 11, Issue 2, July 2024. IJIRT 166462.
LAION (Large-scale Artificial Intelligence Open Network) dataset.
Open Images dataset.
AI vs Human Images study dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
config		config
data/metadata		data/metadata
features		features
resources		resources
utils		utils
.gitignore		.gitignore
.python-version		.python-version
CNN_Training.ipynb		CNN_Training.ipynb
Final Project-Team 2 .pdf		Final Project-Team 2 .pdf
LICENSE		LICENSE
MSADS_504_Final_Project.ipynb		MSADS_504_Final_Project.ipynb
README.md		README.md
pyproject.toml		pyproject.toml
random_forest_model		random_forest_model
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real vs AI-Generated Image Detection

Repository Layout

Summary of Findings

Contact

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

tkbarb10/Image-Classification-504

Folders and files

Latest commit

History

Repository files navigation

Real vs AI-Generated Image Detection

Repository Layout

Summary of Findings

Contact

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages