An end-to-end Machine Learning project to classify chest CT scans as Normal or Adenocarcinoma Cancer (and potentially other classes). This project demonstrates a complete MLOps pipeline using TensorFlow/Keras, DVC (Data Version Control), MLflow, Flask, Docker, and GitHub Actions.
Check out the live application here: Chest Cancer Diagnostic Demo
This application takes a chest CT scan image as input and uses a VGG16-based Deep Learning model to predict the diagnosis. It provides a user-friendly web interface for easy interaction. The project is structured to be modular, reproducible, and easily deployable.
├── .github/workflows/ # CI/CD pipelines (GitHub Actions)
├── config/ # Configuration files
│ └── config.yaml # Main config for data paths, model params
├── src/ # Source code
│ └── cnnClassifier/ # Main package
│ ├── components/ # Core logic (Ingestion, Training, Evaluation)
│ ├── config/ # Configuration manager
│ ├── constants/ # Constant values
│ ├── entity/ # Data classes
│ ├── pipeline/ # Pipeline orchestration
│ └── utils/ # Utility functions
├── templates/ # HTML templates for Flask
├── artifacts/ # Generated artifacts (Data, Models - gitignored)
├── logs/ # Application & Training logs
├── app.py # Flask Application Entry point
├── main.py # Training Entry point
├── dvc.yaml # DVC Pipeline definition
├── params.yaml # Hyperparameters
├── requirements.txt # Python dependencies
├── setup.py # Package setup
├── Dockerfile # Docker configuration
└── .dockerignore # Docker ignore rules
- Python 3.8+
- Git
- Docker (Optional, for containerization)
- AWS Account (Optional, for deployment)
-
Clone the repository:
git clone https://github.com/trunglap923/Chest-Cancer-Classification.git cd Chest-Cancer-Classification -
Create and activate a virtual environment (Recommended):
# Windows python -m venv .venv .venv\Scripts\activate # Linux/Mac python3 -m venv .venv source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
To start the web application on your local machine:
python app.pyOpen your browser and navigate to http://localhost:8080.
- Build the image:
docker build -t chest-cancer-app . - Run the container:
Access the app at
docker run -p 8080:8080 chest-cancer-app
http://localhost:8080.
This project uses DVC to manage the training pipeline. To rerun the entire pipeline (Ingestion -> Training -> Evaluation):
dvc reproOr run main.py directly (if not using DVC caching):
python main.pyThe project includes a GitHub Actions workflow (.github/workflows/main.yaml) to automate deployment to AWS EC2 using ECR.
Setup Steps:
-
AWS Console:
- Create an IAM User with
AmazonEC2ContainerRegistryFullAccessandAmazonEC2FullAccess. - Create an ECR Repository (e.g.,
chest-cancer-repo). - Launch an EC2 Instance (Ubuntu).
- Install Docker on the EC2 instance.
- Create an IAM User with
-
Self-Hosted Runner:
- Go to GitHub Repo > Settings > Actions > Runners.
- Follow instructions to install the runner on your EC2 instance.
-
GitHub Secrets: Add the following secrets in GitHub Repo > Settings > Secrets and variables > Actions:
AWS_ACCESS_KEY_ID: Your IAM Access Key.AWS_SECRET_ACCESS_KEY: Your IAM Secret Key.AWS_REGION: e.g.,us-east-1.ECR_REPOSITORY_NAME: Name of your ECR repo.
Once configured, every push to the main branch will trigger the pipeline to build the Docker image, push it to ECR, and deploy it to your EC2 instance.