# Basic App

Exploratory work around building the fish ID app. Mostly used as a scrap notebook, with most functionality moved to files once it's ready.

In [1]:
# Needed in some environments, harmless in others.
import sys

sys.path[0] = sys.path[0].replace("/notebooks", "")
assert sys.path[0].endswith("ichthywhat")

# Portable path setup.
from ichthywhat.constants import ROOT_PATH, DEFAULT_DATA_PATH, DEFAULT_MODELS_PATH
from ichthywhat.training import train_app_model

DATASET_PATH = DEFAULT_DATA_PATH / "rls-species-m1-all"

%config InlineBackend.figure_format = 'retina'

## Initial model

While there's no validation set, it seems likely that this model performs similarly to `rls-species-min-images-2` with `shorter-freeze`. The validation set was removed to maximise the training set and include species that have a single photo.

In [3]:
train_app_model(DATASET_PATH, model_version=1)



epoch,train_loss,valid_loss,time
0,9.449698,,00:30
1,8.923815,,00:30
2,7.813148,,00:30
3,6.534555,,00:31
4,5.370297,,00:31


  warn("Your generator is empty.")


epoch,train_loss,valid_loss,time
0,4.103325,,00:32
1,3.844926,,00:32
2,3.677434,,00:32
3,3.40747,,00:31
4,3.194624,,00:31
5,2.960518,,00:32
6,2.763527,,00:32
7,2.569397,,00:31
8,2.41414,,00:32
9,2.156993,,00:32


## Improved(?) model

Updated model based on the settings from `02b-initial-experiments-larger-datasets.ipynb` that were shown to improve performance (ran on 6 Jan 2022). However, the results are suspicious because training loss is so much higher than with the previous settings. The difference in training loss wasn't as bad with smaller dataset experiments. Nonetheless, anecdotal results on my test sets are encouraging.

**June 2023**: Reran training with `train_app_model()` (rather than inlined code) and resumption from a checkpoint. See the repo's history for the previous version of the notebook that included a single run on a slower GPU.

In [None]:
train_app_model(DATASET_PATH, model_version=2)

epoch,train_loss,valid_loss,time
0,9.396283,,01:03
1,8.847295,,01:04
2,8.192937,,01:04
3,7.496504,,01:05
4,6.923687,,01:04
5,6.486866,,01:04
6,5.963583,,01:05
7,5.790339,,01:05
8,5.434147,,01:05
9,5.034936,,01:05


epoch,train_loss,valid_loss,time
0,4.364036,,01:25
1,4.032345,,01:25
2,3.844149,,01:25
3,3.689682,,01:22
4,3.440499,,01:21
5,3.308518,,01:22
6,3.163404,,01:24
7,3.148466,,01:25
8,2.966457,,01:25
9,2.928413,,01:25


Checkpoint saved for epoch 10
Checkpoint saved for epoch 20
Checkpoint saved for epoch 30
Checkpoint saved for epoch 40
Checkpoint saved for epoch 50
Checkpoint saved for epoch 60
Checkpoint saved for epoch 70


In [2]:
train_app_model(DATASET_PATH, model_version=2)



epoch,train_loss,valid_loss,time
0,00:00,,
1,00:00,,
2,00:00,,
3,00:00,,
4,00:00,,
5,00:00,,
6,00:00,,
7,00:00,,
8,00:00,,
9,00:00,,


  warn("Your generator is empty.")


Checkpoint saved for epoch 90
Checkpoint saved for epoch 100
Checkpoint saved for epoch 110
Checkpoint saved for epoch 120
Checkpoint saved for epoch 130
Checkpoint saved for epoch 140
Checkpoint saved for epoch 150
Checkpoint saved for epoch 160
Checkpoint saved for epoch 170
Checkpoint saved for epoch 180
Checkpoint saved for epoch 190


## New (June 2023): Model testing on the QUT cropped & controlled fish dataset

Going without any proper test set didn't instil much confidence that there are no regressions with fast.ai upgrades and the switch to checkpoints. To get the benefits of _some_ testing, [the QUT fish dataset](https://www.kaggle.com/datasets/sripaadsrinivasan/fish-species-image-data) is used as a test set. Since that dataset includes some RLS photos that are in the training dataset, it was processed to include only the cropped versions of images for which the `setting` is `controlled` and the species appears in the RLS dataset. This should avoid any RLS photos because they were all shot in situ.

While QUT cropped & controlled images are different from RLS non-cropped & in-situ images, we still get the expected results:
* For the original models (trained with the previous version of this notebook), version 2 of the model is better than version 1.
* The retrained models yield performance that is in line with the original models.

In [4]:
from fastai.learner import load_learner
from ichthywhat import experiments

In [9]:
# Trick sys.path to load the legacy models that predate the icthywhat/ directory.
sys.path.insert(0, sys.path[0] + "/ichthywhat")
model_v1 = load_learner(DEFAULT_MODELS_PATH / "old/app-v1.pkl", cpu=False)
model_v2 = load_learner(DEFAULT_MODELS_PATH / "old/app-v2.pkl", cpu=False)
_ = sys.path.pop(0)

In [6]:
QUT_DATASET_PATH = ROOT_PATH / "data/qut-cropped-controlled"
qut_paths = list(QUT_DATASET_PATH.glob("*.png"))
qut_labels = [" ".join(p.name.split("-")[:2]).capitalize() for p in qut_paths]
list(zip(qut_paths, qut_labels))[:5]

[(Path('/home/studio-lab-user/ichthywhat/data/qut-cropped-controlled/acanthaluteres-spilomelanurus-1.png'),
  'Acanthaluteres spilomelanurus'),
 (Path('/home/studio-lab-user/ichthywhat/data/qut-cropped-controlled/acanthaluteres-vittiger-2.png'),
  'Acanthaluteres vittiger'),
 (Path('/home/studio-lab-user/ichthywhat/data/qut-cropped-controlled/acanthaluteres-vittiger-9.png'),
  'Acanthaluteres vittiger'),
 (Path('/home/studio-lab-user/ichthywhat/data/qut-cropped-controlled/acanthistius-cinctus-1.png'),
  'Acanthistius cinctus'),
 (Path('/home/studio-lab-user/ichthywhat/data/qut-cropped-controlled/acanthistius-cinctus-3.png'),
  'Acanthistius cinctus')]

In [10]:
experiments.test_learner(model_v1, qut_paths, qut_labels, show_grid=False)

{'top_1_accuracy': 0.060253698378801346,
 'top_3_accuracy': 0.10200845450162888,
 'top_10_accuracy': 0.17494714260101318}

In [11]:
experiments.test_learner(model_v2, qut_paths, qut_labels, show_grid=False)

{'top_1_accuracy': 0.09778012335300446,
 'top_3_accuracy': 0.15433403849601746,
 'top_10_accuracy': 0.24418604373931885}

In [7]:
model_v1_retrained = load_learner(
    DEFAULT_MODELS_PATH / "pre-ckpt" / "app-v1.pkl", cpu=False
)
experiments.test_learner(model_v1_retrained, qut_paths, qut_labels, show_grid=False)

{'top_1_accuracy': 0.06395348906517029,
 'top_3_accuracy': 0.10729386657476425,
 'top_10_accuracy': 0.17547568678855896}

In [8]:
model_v2_retrained = load_learner(DEFAULT_MODELS_PATH / "app-v2.pkl", cpu=False)
experiments.test_learner(model_v2_retrained, qut_paths, qut_labels, show_grid=False)

{'top_1_accuracy': 0.08826638758182526,
 'top_3_accuracy': 0.15274842083454132,
 'top_10_accuracy': 0.24154333770275116}