# Raw data processing
## seperate mixed data
```
In each data folder,images(.jpg files and .JPG files) and labels(.txt files) are mixed together. We need to seperate them into two folders and rename them.
Code for replacing space with underscore:
```
```bash
data_dir="~/Documents/Defect-Detection-yolov5/data/缺陷识别raw/设备部件识别"
cd $data_dir
for f in *; do mv "$f" `echo $f | tr ' ' '_'`; done
```
### Code for seperating images and labels:
```bash
data_dir="~/Documents/Defect-Detection-yolov5/data/缺陷识别raw/设备部件识别"
touch mvjpg.txt
touch mvxml.txt
find $data_dir -iregex .*\.jpg    > mvjpg.txt
find $data_dir -iregex .*\.xml    > mvxml.txt
mkdir $data_dir/images
mkdir $data_dir/labels
for i in $(cat mvjpg.txt); do (mv $i $data_dir/images); done
for i in $(cat mvxml.txt); do (mv $i $data_dir/labels); done
rm mvjpg.txt
rm mvxml.txt
```


## deal with images with wrong label in the dataset
```
labeling errors are common in the dataset, so we need to deal with them.
for some  unknown reason, labels of some images are rotated by 90 degrees(seems randomly clockwise or counterclockwise).
```
```bash
# find the images with wrong label
python data/wrong_size.py
```
```
the outout is a txt file with the path of images with wrong label.including rotated images and images with incorrect label for other reasons.then seperate them from the dataset.
```
```bash
data_dir="/Users/wzilai/Documents/Defect-Detection-yolov5/data/缺陷识别raw/设备部件识别"
mkdir $data_dir/bad_examples/images
mkdir $data_dir/bad_examples/labels
for i in $(cat $data_dir/wrong_size.txt); do mv $data_dir/images/$i $data_dir/bad_examples/images; done
for i in $(cat $data_dir/wrong_size.txt); do mv $data_dir/labels/$i $data_dir/bad_examples/labels; done
```
```
all label problems are recored in label_error.ipynb.
we have a dataset with correct label, I choose to use this part of the dataset to train the model. Maybe we can rescue the images with wrong label in the future.
```



## change label format
```
raw labels are stored in xml format,but yolo model need input in yolo-format.
prepare.py can change xml format to yolo-format and separate data into train set and validation set. 
before run prepare.py, you need to change class name and yolo-format dir name in prepare.py. dir create is not fully-automatic in prepare.py, you need to create dir by yourself for now.
yolov5-format and yolov7-format dir tree is as follows:
```
### data dir tree
```
yolov5_data
└── sub_yolo_data
    ├── images
    │   ├── train
    │   └── val
    └── labels
        ├── train
        └── val

yolov7_data
└── sub_yolo_data
    ├── test
    │   ├── images
    │   └── labels
    └── train
        ├── images
        └── labels
```
### run prepare.py

```bash
!python  data/prepare.py
```
```
In experiment, data of class yx, ddjt, cysb_tg is not labeled properly, so I delete them in prepare.py.
To change tracked class, you just need to change class_name in prepare.py.
```





## create yaml file
```
yaml file is used to record the project information.
after run prepare.py, create yaml file for dataset, then change train/test image path and class name in yaml file. train.py will automatically read yaml file, and find labels.
```
### sample yolov5 yaml file
```yaml
path: ./data/dataset
#train/val/test dir are subdirs of path
train: images/train
val: images/val
test: image/test #can be empty if no test set
#Number of classes
nc: 3
#Classes
names:
  0: cat
  1: dog
  2: person
```

### sample yolov7 yaml file
```yaml
train: ./data/dataset/train.txt or ./data/dataset/train
val: ./data/dataset/val.txt or ./data/dataset/val
test: ./data/dataset/test.txt or ./data/dataset/test
#txt files were auto generated by prepare.py
# Number of classes
nc: 3
# Class names
names: ['cat', 'dog', 'person']
```


# training

## requirements
```
preinstall: python3.11 CUDA pytorch
```
### create virtual environment
```bash
python -m venv Defect-Detection
source Defect-Detection/bin/activate 
```
### install requirements
```bash
cd $project_path
pip install -r requirements.txt
```
```
yolov5 and yolov7 have different dependencies, please prepare them separately
```


## track and visualize

### yolov5
```python
!pip install comet_ml
```
### yolov7
```python
!pip install wandb
import wandb
wandb.login()
```
#### modified Defect-Detection-yolov7/utils/google_utils.py
```
delete dtype=np.int32 in cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32))
because this typecheck will cause error in wandb
```

## train
### change hyperparameters and parameters
```
hyperparameters are stored in data/
```
```bash
python train.py --epochs 150 --data 设备部件识别_without_ddjt_yx_tg.yaml --weights yolov5s.pt --batch-size 16 --img 640 --device 0 --hyp data/hyp.scratch-low.yaml --freeze 50 
```


```

```

# Detect