# Cell Segmentation & Counting with Deep U-Net
A robust, industry-standard Deep Learning pipeline for segmenting and counting cell nuclei in biomedical images. Trained on the **2018 Data Science Bowl (BBBC038)** dataset, this project leverages a custom Deep U-Net architecture to handle complex biological textures and overlapping cells.
## Features
* Real Data Pipeline: Automatically downloads and parses the ~85MB BBBC038 dataset (Data Science Bowl 2018).
* Deep U-Net Architecture: A 5-level U-Net with Batch Normalization and He Initialization for stable training on textured biomedical images.
* Advanced Post-Processing: Uses Watershed algorithm with distance transform to separate touching cells (crucial for accurate counting).
* Industry-Standard Evaluation: Includes Bland-Altman plots and IoU distribution analysis to validate scientific accuracy.
* Data Augmentation: Real-time rotation, flipping, and zooming to prevent overfitting.
Follow these steps to set up the project and train your own model.
First, clone the repository and navigate into the project directory:
git clone [https://github.com/yourusername/cell-segmentation-unet.git](https://github.com/yourusername/cell-segmentation-unet.git)
cd cell-segmentation-unet
Create a virtual environment (recommended) and install the required dependencies:
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
To start the training pipeline, run the train.py script. This script handles the entire workflow:
- Downloads the BBBC038 dataset.
- Preprocesses the images and merges mask files.
- Augments the data in real-time.
- Trains the Deep U-Net model.
python train.py
- Output: The script will save the best-performing model to
best_model.kerasand the test dataset to.npyfiles for evaluation.
Once training is complete, you can generate a comprehensive performance report using evaluate.py. This script loads the trained model and the test data to produce industry-standard metrics.
python evaluate.py
- Output: This will generate
evaluation_report.png, containing: - IoU Histogram: Distribution of segmentation quality.
- Bland-Altman Plot: Analysis of counting bias and agreement.
- Visual Overlays: Qualitative comparison of predictions vs. ground truth.
The project uses the 2018 Data Science Bowl (BBBC038) dataset, hosted by the Broad Institute.
- Content: Diverse microscopy images (fluorescence, histology, brightfield).
- Ground Truth: High-quality masks where each nucleus is annotated.
- Preprocessing: The
RealBiologicalLoaderclass merges individual mask files into a single binary map for semantic segmentation.
The model is a Deep U-Net optimized for biomedical segmentation:
- Encoder: 4 downsampling blocks (Conv2D -> BatchNorm -> ReLU -> MaxPool).
- Bottleneck: 512 filters with Dropout (0.3) to capture high-level features.
- Decoder: 4 upsampling blocks with skip connections to preserve spatial resolution.
- Output: Sigmoid activation for pixel-wise probability.
We will use the gold standard Bland-Altman Analysis to validate counting accuracy.
| Metric | Value (Approx) | Description |
|---|---|---|
| Mean IoU | 0.85+ | Intersection over Union (Segmentation Quality) |
| Counting Bias | < 1.0 | Average difference between Pred & Ground Truth counts |
| Pixel Accuracy | > 98% | Accuracy of background/foreground classification |
- Python 3.8+
- TensorFlow 2.x
- OpenCV
- Scikit-Image
- Matplotlib / Seaborn
- Pandas / Numpy
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.