# Detect plankton on MacOS with arm64 using Yolov5. (2022_06_08)

Inference time with CoreML ~5s per image vs. about 25s per image with pytorch on CPU.
Has to be started locally in your yolov5 directory.

tested with torchvision == 0.14.0.dev20220603, torch == 1.13.0.dev20220607.

In [36]:
import csv
import cv2 as cv
import numpy as np
import os

0. Export the pretrained weights (in this case yolov5l trained on copepods from CUSCO using colab with this notebook: https://github.com/DariGor/plancton_classifier/blob/main/Plankton_training_YOLOv5.ipynb) into the CoreML format. Important: choose the imagesize to match the images for detection and the data.yaml file of the original training data.

In [48]:
#%%capture
!python export.py --weights best_colab_yolo5s_peru.pt --include coreml --imgsz 5120 --data data.yaml
#!python export.py --help
#!python export.py --weights best_colab_yolo5s_peru.pt --include coreml --imgsz 640 --conf-thres 0.4 --data data.yaml

[34m[1mexport: [0mdata=data.yaml, weights=['best_colab_yolo5s_peru.pt'], imgsz=[5120], batch_size=1, device=cpu, half=False, inplace=False, train=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['coreml']
YOLOv5 🚀 v6.2-80-g55b0096 Python-3.8.13 torch-1.13.0.dev20220831 CPU

Fusing layers... 
Model summary: 224 layers, 7056607 parameters, 0 gradients

[34m[1mPyTorch:[0m starting from best_colab_yolo5s_peru.pt with output shape (1, 1612800, 7) (54.4 MB)
TensorFlow version 2.9.2 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.6.2 is the most recent version that has been tested.
Keras version 2.9.0 has not been tested with coremltools. You may run into unexpected errors. Keras 2.6.0 is the most recent version that has been tested.
Torch version 1.13.0.dev20220831 has not be

1. Rename the dataset and find duplicate images (buffer error) in the Dataset. Rename using the buggy script called "Rename_and_pressure_correct", pay attention to the order of directories in the image rootfolder and if all profiles end with a file that has a pressure value >0 (out of index error). Remove all duplicate images (I did it with Duplicate File finder for MacOS).

2. Here comes the function that extracts the information after detection. (Will be used after detection, but has to be defined here)

In [35]:
import csv
import cv2 as cv
import numpy as np
import os

def get_object_info(file_path, result_path, txt_path):

    pixel_square = (13.5*10**-3)**2 #in mm

    def find_object(img):
        """Finds the largest object and computes the esd.
        Args:
            img (np.array): gray scale image
        Returns:
            ESD: 2 * sqrt(Area / PI)
        """
        max_val = np.mean(img) - np.std(img)

        img_c = img.copy()
        img_c[np.where(img < max_val)] = 255
        img_c[np.where(img >= max_val)] = 0
        img = img_c
        thresh = img
        # thresh = cv.threshold(img, max_val, 255, cv.THRESH_BINARY)[1]

        cnts = cv.findContours(thresh, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_NONE)[0]
        cnt_areas = {cv.contourArea(cnt): cnt for cnt in cnts}
        if cnt_areas:
            cnt_areas_list = list(cnt_areas)
            cnt_areas_list.sort()
            max_area = cnt_areas_list[-1]
            esd = 2 * np.sqrt(max_area / np.pi)
            return esd, max_area
        else:
            return 0

    fns = sorted(os.listdir(file_path))
    result_path = result_path
    txt_path = txt_path
    info = []
    fileinfo = []

    for fn_r in fns:
        counter = 0
        bounding_info = []
        img_mass = []
        img_mass2 = []
        img_esd = []
        txtfn = os.path.join(txt_path, fn_r[:-4])
        fn = os.path.join(file_path, fn_r)
        fn, ending = fn[:-4], fn[-4:]
        
        try:
            pos = fn.find('bar')
            try:
                press = float(fn[pos-5:pos])
                depth = (press-1)/0.1
            except ValueError:
                press = float(fn[pos-4:pos])
                depth = (press-1)/0.1
        except:
            continue
            
        #print(txtfn)
        if ending == ".jpg" or ending == ".tif" or ending == ".png":
            img = cv.imread(fn + ending)
            try:
                with open(txtfn + '.txt') as file:
                    for line in file:
                        d=[]
                        for i in (line.rstrip()).split(' '):
                            d.append(float(i))
                        bounding_info.append(d)
                #print('found: ' + fn +'.txt')
            except:
                continue
            img_h, img_w = img.shape[:2]

            for bbox in bounding_info:
                counter = counter + 1
                instclass, rel_x, rel_y, rel_w, rel_h, conf = bbox
                c_x = round(rel_x * img_w)  #x-position of center
                c_y = round(rel_y * img_h)  #y-position of center
                w = round(rel_w * img_w)
                h = round(rel_h * img_h)
                x = c_x - w // 2            #x-position of top-left corner
                y = c_y - h // 2            #y-position of top-left corner

                crop = img[y:y+h, x:x+w]
                
                gray = cv.cvtColor(crop, cv.COLOR_BGR2GRAY)
                esd,p_area = find_object(gray)
                biovolume = (np.pi*(esd * 13.5 * 10**-4)**3)/6 #in cubic centimeters
                biomass_vol =  biovolume * 1.060  #biomass calculation from sphere volume derived from ESD, with a density of 1.060 g cm-3
                #area = np.pi*((esd * 13.5 * 10**-3)/2)**2 #area of the ESD in square millimeters
                area= p_area*pixel_square
                biomass_area = 43.97*(area**1.52)*0.48 #according to Lahette et al. 2009, dry weight to biomass C conversion following Kiorboe et al 2013
                img_esd.append(esd)
                img_mass.append(biomass_vol)
                img_mass2.append(biomass_area)
                info.append([fn, counter, c_x, c_y, w, h, x, y, esd, instclass, conf, depth, biomass_vol, biomass_area])
            
            sum_esd = 0
            for i in range(counter):
                sum_esd=sum_esd+img_esd[i]
            meanESD = sum_esd/counter

            sum_bio = 0
            for i in range(counter):
                sum_bio=sum_bio+img_mass[i]
            total_biomass = sum_bio

            sum_bio2 = 0
            for i in range(counter):
                sum_bio2=sum_bio2+img_mass2[i]
            total_biomass2 = sum_bio2

            fileinfo.append([fn_r,fn_r[24:32],fn_r[33:39],fn_r[16:18],depth,counter,meanESD*13.5,total_biomass,total_biomass2])

    with open(result_path + '_detailed.csv', "w", newline="") as f:
        writer = csv.writer(f, delimiter=";")
        writer.writerow(["Filename", "ID", "Center x in px", "Center y in px", "Width in px", "Height in px", "x in px", "y in px", "ESD in px", "class", "confidence", "depth in meters below surface", "biomass in g(C) using volume", "biomass in ug(C) after Lahette 2009"])
        for row in info:
            writer.writerow(row)    
    with open(result_path + '_overview.csv', "w", newline="") as f:
        writer = csv.writer(f, delimiter=";")
        writer.writerow(["Filename", "Sampling day", "local time", "Mesocosm-#", "depth in meters below surface", "Copepod-#", "mean-ESD in um", "Biomass in g(C) using Volume", "Biomass in ug(C) after Lahette 2009"])
        for row in fileinfo:
            writer.writerow(row)   
    


3. Start detection using the right images as source, the results will be in /runs/detect/exp.../:

In [None]:

#define the folders containing all images and where the detection information is stored
rootfolder = '/Users/vdausmann/yolo/datasets/PIScO_Peru_source'
#destfolder = 
for dayfolder in os.listdir(rootfolder):
    print(dayfolder)
    if os.path.isdir(rootfolder+'/'+dayfolder):
        for meso in os.listdir(rootfolder+'/'+dayfolder):
            print(meso)
            mesodir = rootfolder+'/'+dayfolder+'/'+meso
            if os.path.isdir(mesodir):
                print(mesodir)
                !python detect.py --weights best_colab_yolo5s_peru.mlmodel --source {mesodir} --img 5120 --data data.yaml --save-txt --save-crop --save-conf --nosave --conf-thres 0.4 --name {mesodir[-13:]} 
                IMG_PATH = mesodir
                TXT_PATH = os.path.join(detect_dir, dayfolder, meso)
                RESULT_PATH = os.path.join(detect_dir, 'results', dayfolder, meso)
                get_object_info(IMG_PATH, RESULT_PATH, TXT_PATH)
            else:
                pass
    else:
        pass

In [46]:
!python val.py --data data.yaml --weights best_colab_yolo5s_peru.mlmodel --conf-thres 0.4

[34m[1mval: [0mdata=data.yaml, weights=['best_colab_yolo5s_peru.mlmodel'], batch_size=32, imgsz=640, conf_thres=0.4, iou_thres=0.6, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.1-246-g2dd3db0 Python-3.8.13 torch-1.13.0.dev20220607 CPU

Loading best_colab_yolo5s_peru.mlmodel for CoreML inference...
TensorFlow version 2.9.2 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.6.2 is the most recent version that has been tested.
Keras version 2.9.0 has not been tested with coremltools. You may run into unexpected errors. Keras 2.6.0 is the most recent version that has been tested.
Torch version 1.13.0.dev20220607 has not been tested with coremltools. You may run into unexpected errors. Torch 1.10.2 is the most recent version that has been tested.
Forcing --batch-size 1 square

4. The detection results will be translated into the desired form, with the calculation of object ESD and biomass (following XXX):

In [40]:
#rootfolder = '/Users/vdausmann/yolo/datasets/PIScO_Peru_source'
rootfolder = '/Volumes/Extreme Pro/PIScO_Peru_source'

#detect_dir = '/Users/vdausmann/pytorch-test/yolov5/runs/detect/PIScO_Peru_detected+summary'
detect_dir = '/Users/vdausmann/pytorch-test/yolov5/runs/detect/PIScO_Peru_detected_not_yet_analysed'

result_dir = os.path.join(detect_dir,'results')
if not os.path.exists(result_dir):
    os.mkdir(result_dir)

for dayfolder in os.listdir(detect_dir):
    daydir = os.path.join(detect_dir,dayfolder)
    if os.path.isdir(daydir):
        for meso in os.listdir(daydir):
            mesodir = os.path.join(rootfolder,dayfolder,meso)
            if os.path.isdir(mesodir):
                #print(mesodir)
                print('processing ...'+dayfolder+' '+meso)
                IMG_PATH = mesodir
                TXT_PATH = os.path.join(detect_dir, dayfolder, meso, 'labels')
                #print(TXT_PATH)
                if not os.path.exists(os.path.join(result_dir, dayfolder)):
                    os.mkdir(os.path.join(result_dir, dayfolder))   
                RESULT_PATH = os.path.join(result_dir, dayfolder, meso) 
                #print(RESULT_PATH)            
                get_object_info(IMG_PATH, RESULT_PATH, TXT_PATH)
            else:
                pass
    else:
        pass

processing ...13_03_2020 M7


In [2]:
#test
rootfolder = '/Volumes/Elements/PIScO_Peru_cleaned'
os.listdir(rootfolder)

['.DS_Store',
 '01_03_2020',
 '03_03_2020',
 '05_03_2020',
 '07_03_2020',
 '09_03_2020',
 '11_03_2020',
 '13_03_2020']

5. Plotting results:

This will plot the following figures:

Per day, for all mesocosms:
- Depth vs copepod-counts
- Depth vs equvialent spherical diameter (ESD)
- Depth vs biomass formula


For each mesocosm:
- Copepod-# vs depth over time
- ESD vs depth over time
- Biomass vs depth over time

- Sum of #,ESD, Biomass over time

- Average of #,ESD, Biomass per depth over time

- Concentration per liter per day

Differences:
ESD LL vs HL
Normalized against PA -> DW%
Size top vs bottom

In [49]:
!python detect.py --weights best_colab_yolo5s_peru.mlmodel --source '/Users/vdausmann/yolo/datasets/CUSCO-Peru-2020_M6_PISCO20200323_133905.7300_1.688bar.tif' --img 5120 --data data.yaml --conf-thres 0.4 

[34m[1mdetect: [0mweights=['best_colab_yolo5s_peru.mlmodel'], source=/Users/vdausmann/yolo/datasets/CUSCO-Peru-2020_M6_PISCO20200323_133905.7300_1.688bar.tif, data=data.yaml, imgsz=[5120, 5120], conf_thres=0.4, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.2-80-g55b0096 Python-3.8.13 torch-1.13.0.dev20220831 CPU

Loading best_colab_yolo5s_peru.mlmodel for CoreML inference...
TensorFlow version 2.9.2 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.6.2 is the most recent version that has been tested.
Keras version 2.9.0 has not been tested with coremltools. You may run into unexpected errors. Keras 2.6.0 is the most recent version that has been tested.
Torch versi