Skip to content

tpeat/segment-and-replace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Segment and Replace

Objective: Design novel method of segmenting humans from images and replacing them with targets of choice

What's important?

  • Hiera is a hiearchical transformer based encoder
  • Encoder is loaded with pretrained weights
  • Immediately attach FPN to reproduce (1,256,256) binary mask
  • Model is large > 50M params so need decent amount of gpu RAM

Requirements: saved in yaml

  • torch with gpu compat
  • timm

Future Upgrades:

  • V1: replace with static person
  • V2: text to image
  • V3: image to image

Inspiriation:

  • Segmant Anything (SAM)
  • Inpaint Anything

Directory

inpainting.ipynb

  • A Test Jupyter Notebook that uses an out-of-the-box inpainting model to give us a reference for how our model should hopefully look

example_h100_job.sh

  • A template for running a script on the H100's in PACE.

/assets/

  • A folder to contain all object files like images, environment ymls, etc. necessary to run the model

/assets/Messi_Filtered

  • A folder containing all filtered (pre-processed and edited) images of Lionel Messi

/assets/Messi_Unfiltered

  • A folder containing all un-filtered images of Lionel Messi that need to be filtered

/assets/env/tristan_env.yml

  • An Anaconda environment yml that is used to ensure all packages are keep consistent

/assets/cap.png

  • A generic, copyright free png of a hat to help increase our dataset

/assets/sunglasses.png

  • A generic, copyright free png of sunglasses to help increase our dataset

/data/

  • A folder containing all relevant Jupyter Notebook's for data pre-processing

/data/clean.py

  • A Python script to aid in cleaning up some parts of the Messi Dataset

/data/DataAugment.ipynb

  • A Jupyter notebook for adding hats / sunglasses to pictures to increase the size of the dataset

/data/ImagePreprocessing.ipynb

  • A Jupyter notebook that runs facial recognition on our unfiltered image dataset in order to remove images without Messi's actual face in them.

/data/train_analysis.ipynb

  • A Jupyter notebook that is helping pre-process our image dataset for how the model expects the image filesnames

/gen/

  • A folder containing all relevant code for generating and visualizing images for our unsupervised learning method

/gen/consistency_models

  • A base model taken from here that we are focusing on Messi

/gen/consitency_model.ipynb

  • A Jupyter notebook for testing one a diffusion image generation model

/gen/main.ipynb

  • A Jupyter notebook that was for testing training the consistency_models on a pre-made dataset to learn more about consistency_models

/gen/viz_samples.ipynb

  • A Jupyter notebook to visualize samples generated as .npz

/seg/

  • A folder containing all relevant code for segmenting an image (our supervised learning method)

/seg/hiera.py, /seg/hiera_mae.py, /seg/hiera_utils.py

  • Python files taken from Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles by Meta (temporarily for testing)

/seg/model.py

  • A Python file to define and train our segmentation model. Includes a Dataloader, Model definition, and Training Loop

/seg/decoder.py

  • An implementation of a Feature Pyramid Network decoder

/seg/playground.ipynb

  • A Jupyter notebook for testing how a full model (Hiera) works and provides a baseline for our future implementation

/detect/

/detect/dataset

  • Where the MsCoCo dataset is downloaded and stored, including a script to download the dataset

/detect/out_charts

  • Where the output plots of Loss and Accuracy are stored during the training of the model

/detect/detection.ipynb

  • A Jupyter notebook for designing, creating, training, and testing of our Human Detection Model

Env Setup

  1. On slurm cluster allocate A100:
salloc -N1 -n1 --mem-per-gpu 20GB -t 8:00:00 -G A100:1
  1. Load modules:
module load python/3.9.12-h6yxcg
module load anaconda3/2022.05.0.1
module load gcc/10.3.0-o57x6h
module load mvapich2/2.3.6
module load cuda/11.7.0-7sdye3

NOTE: we require gcc10.3 and mvapich2 for MPI

  1. Create conda env:
conda create -n torch python=3.8
  1. Get compatable torch version (with cuda11.7):
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
  1. Install consistency models package:
cd <path to consitency models>
pip install -e .
  1. Ensure installation works:
python # or ~/.conda/envs/torch/bin/python
>>> import cm
>>> import torch
>>> torch.rand(1).cuda()

Log

03/30/2024

Tristan - setup Consistency Model library

Using openais consitency model library

Why? Don't want to worry about removing latent text vector entanglement from stable diffusion models.

Having text embedding is a crutial component of latent diffuion models, so even if I implemented the Unet structure and pretrained-weights, I'm not confident the model would generate what we want.

Need mpi4py arch to make library work because the training is designed to be distributed. mpi4py requires mpi compiler which isn't compat with gcc12, so I had to revert to 10.3 if I want it to work which renders all previously installed packages useless.

Then lots of pace-quota exceeded issues when holding two envs at once.

03/31/2024

Tristan - sampling and training

Training Details:

  • Only 162 images of Messi so far
  • 2 nodes, four threads, 4 A100's
  • Documentation on using MPI jobs on pace-ice found here
  • 100k steps, saved every 5k
  • bs = 64, lr = 0.001

04/01/2024

Tristan - more images training

Training Details:

  • Adarsh and Bijan got an additional 1k images from internet
  • data naming scheme is: 0_<data_creator><sample_number>.jpg
  • 1 nodes, 1 threads, 1 A100's
  • changed save step to 500 and resumed from checkpoint 5k
  • decreased learning rate to 0.0001
  • still only ran for 500 iteraitons before nan losses found

Many many out of memory issues:

  • increased GPU mem allocation to 80gb
  • torch.cuda.empty_cache() after every checkpoint loaded
  • max_split_size = 256 (not sure what this means)
  • created a eddiscussion post for some aid on how to manage large models
  • model takes like 40G

TODO:

  • Going to relaunch training from scratch with lower learning rate to start
  • Investigate how to normalize these images
    • find mean and std dev of dataset (going to be different from imagenet)
  • Fix multiGPU issue

04/02/2024

Tristan - restarting training

Restarting training with a lower learning rate and higher batch size

lr=1e-4 and batch=128

improved results, moving slower from imagenet to messi distribution, less transfixed on naive features like hair and blue/white

04/18/2024

Tristan - training results with lower learning rate

Seeing further results, training to around 18k iterations

04/20/2024

Tristan - moving to H100

Had major issues allocating 1 of the 4 A100's that pace has so switching to H100s

New modules to load - not a 1to1 mapping so requires model rebuild

module load anaconda3
module load gcc/12.3.0
module load python/3.10.10
module load cuda/12.1.1
module load mvapich2/2.3.7-1 

Have to rebuild environment

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

Major issue is that now flash-attn is out of date with current version of pytorch, requires an update to the attention mechansims

With torch=2.1.0, previously 2.0.1, we can use F.scaled_dot_product_attention and allocate a

04/22/2024

Tristan - memory errors all around

running out of memory in my scratch dir, when I checkpoint too frequently 500 - 1k then I get all the checkpoints I want but run out of memory too quickly

About

ML course project for Segment and Replace project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages