### Clean data files

In [None]:
import pandas as pd
import numpy as np
import os
import shutil

Make sure:
- to be in the same directory as the folder where the images and annotations are
- to have installed `detecto` in the environment

Note: change file names and paths where needed for your use case

In [None]:
files_name = os.listdir('train') #folder with all the images
annot_data = pd.read_csv('annot_train.csv') #original csv file with all the annotations
len(files_name), len(annot_data)

In [None]:
#filter out the annotations for images that are not present in the image folder
new_annot_data = annot_data[annot_data.photo_filename.isin(files_name)]

#adding column 'image_id' in the annotations as per requirement of detecto model
new_annot_data['image_id'] = [i for i in range(1, len(files_name)+1)]

#replace all the classes that are not one of the 5 compulsory logos
logos = ['Nike','Adidas','Under Armour','Puma','The North Face']
new_annot_data.loc[~new_annot_data['class'].isin(logos),'class'] = 'Other'

Splitting the dataset into train and test sets, 80% and 20%. Based on that, creating annotations for the respective image folder

In [None]:
np.random.seed(123)
for f in files_name:
    if np.random.rand(1) < 0.2:
        shutil.move('train/'+f, 'test/'+f)  #make sure to have already created 'test' folder in the directory

In [None]:
#getting file names in each folder
train_files = os.listdir('train')
test_files = os.listdir('test')
len(train_files), len(test_files)

In [None]:
#getting annotations only for train set images
annot_train = new_annot_data[new_annot_data.photo_filename.isin(train_files)]
annot_train.to_csv(r'annot_train.csv', index=False, header=True) #choose the output path to store the csv

In [None]:
#getting annotations only for test set images
annot_test = new_annot_data[new_annot_data.photo_filename.isin(test_files)]
annot_test.to_csv(r'annot_test.csv', index=False, header=True)  #choose the output path to store the csv

In [None]:
#checking the csv are of the same length as the number of image files
len(annot_train), len(annot_test)

### Detecto
Ref: https://towardsdatascience.com/build-a-custom-trained-object-detection-model-with-5-lines-of-code-713ba7f6c0fb
Documentation: https://detecto.readthedocs.io/en/latest/api/core.html#detecto.core.Dataset

In [None]:
from detecto import core, utils, visualize
from torchvision import transforms #torchvision is installed together with detecto

In [None]:
augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize(900),
    transforms.RandomHorizontalFlip(0.5),
    transforms.ColorJitter(saturation=0.4),
    transforms.ToTensor(),
    utils.normalize_transform(),
])

Change the files and folder names/paths depending on where those are stored

In [None]:
dataset = core.Dataset(label_data='annot_train.csv',image_folder='train',transform=augmentations)
loader = core.DataLoader(dataset, batch_size=2, shuffle=True)
model = core.Model(classes=['Nike', 'Adidas', 'Under Armour', 'Puma', 'The North Face','Other'])

I stopped here because I had a `RuntimeError`:

```python
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
```

Below should be the next steps to do.

In [None]:
model.fit(dataset,epochs=8, learning_rate=0.005,verbose=True)

To save the model and the current progress to come back to it later:
```python
model.save('model_weights.pth')```

To load the model from files:
```python
model = core.Model.load('model_weights.pth', ['Nike', 'Adidas', 'Under Armour', 'Puma', 'The North Face','Other'])```


The method `predict_top` returns the top scoring predictions for each detected label in each image

In [None]:
from detecto.utils import read_image

#need to finda way to predict an entire folder of test images
image = read_image('image.jpg') #example
top_preds = model.predict_top(image)

Next steps: calculate the IoU on the test set