<a href="https://colab.research.google.com/github/ua-datalab/geospatial_2025/blob/main/notebooks/deepforest_colab_e.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Tree Detection with DeepForest

This jupyter notebook uses the python library DeepForest to identify and put bounding boxes around trees.

If using the software, please cite as:
Geographic Generalization in Airborne RGB Deep Learning Tree Detection Ben. G. Weinstein, Sergio Marconi, Stephanie A. Bohlman, Alina Zare, Ethan P. White bioRxiv 790071; doi: https://doi.org/10.1101/790071

Documentation for DeepForest can be found at https://deepforest.readthedocs.io/en/latest/index.html

In [None]:
#Install the deepforest python library. After installing, you may need to restart the kernel before moving to the next code snippet
!pip install DeepForest --quiet

In [None]:
# Uninstall the current version of albumentations
!pip uninstall -y albumentations

# Install a compatible version of albumentations
!pip install albumentations==1.4.24

In [None]:
##After restarting the kernel, import libraries into environment...
from deepforest import main
from deepforest import get_data
from deepforest.utilities import boxes_to_shapefile
from deepforest.utilities import shapefile_to_annotations
from deepforest.preprocess import split_raster
from deepforest.visualize import plot_predictions

import matplotlib.pyplot as plt
import os
import time
import numpy
import rasterio
import geopandas as gpd
from rasterio.plot import show
import torch


In [None]:
#Bring a DeepForest pretrained model into environment. It is trained to identify trees from aerial imagery
#It is located at https://huggingface.co/weecology/deepforest-tree
model = main.deepforest()
model.load_model(model_name="weecology/deepforest-tree", revision="main")

In [None]:
## Let's use GPU for prediction and training
print("Current device is {}".format(model.device))
model.to("cuda")
print("Current device is {}".format(model.device))

## Predict Tree Crowns on Raw (non-georeferenced images)

In [None]:
#Path for the image you want to ID trees.
#These are non-georeferenced single jpeg drone image located in Cyverse datastore
# 720 x 540 pixels

!wget https://data.cyverse.org/dav-anon/iplant/commons/cyverse_curated/Gillan_Ecosphere_2021/raw_images/May_2019/15-g2/100_0123/100_0123_0086.JPG
image_path = get_data("/content/100_0123_0086.JPG")

!wget https://data.cyverse.org/dav-anon/iplant/projects/cyverse_training/datalab/nextgen_geospatial/DJI_0184.jpeg
image_path2 = get_data("/content/DJI_0184.jpeg")

!wget https://data.cyverse.org/dav-anon/iplant/projects/cyverse_training/datalab/nextgen_geospatial/100_0407_0064.jpeg
image_path3 = get_data("/content/100_0407_0064.jpeg")

!wget https://data.cyverse.org/dav-anon/iplant/projects/cyverse_training/datalab/nextgen_geospatial/DJI_0468.jpeg
image_path4 = get_data("/content/DJI_0468.jpeg")

!wget https://data.cyverse.org/dav-anon/iplant/projects/cyverse_training/datalab/nextgen_geospatial/101_0472_0074.jpeg
image_path5 = get_data("/content/101_0472_0074.jpeg")


In [None]:
#Identify and put bounding boxes around all trees in the image
#This will create a table showing image coordinates of every predicted tree
#The 'score' is the confidence that the prediction is correct. Values closer to 1 are better.
trees = model.predict_image(path=image_path2, return_plot = False)
trees

In [None]:
#Show the image with the bounding boxes
plot = model.predict_image(path=image_path2, return_plot = True, color=(0, 255, 255), thickness=6)
plt.imshow(plot[:,:,::-1])

## Predict Tree Crowns on Georeferenced Images

In [None]:
#Set the path for a georeferenced image you want to predict tree crowns
#This example image is 735 mb and 10088 x 26175 pixels
!wget https://data.cyverse.org/dav-anon/iplant/projects/cyverse_training/datalab/nextgen_geospatial/hole_17_ortho_utm.tif
raster_path = get_data("/content/hole_17_ortho_utm.tif")



In [None]:
##Predict tree crowns on a georeferenced image
predicted_raster = model.predict_tile(raster_path, return_plot = True, patch_size=600, patch_overlap=0.25, color=(255, 255, 0), thickness=20)

In [None]:
plt.figure(figsize=(20, 20))
plt.imshow(predicted_raster)
plt.show()

## Improve Model with Training
If the pre-trained model does not identify all trees correclty, then we want to improve the model by adding some training data and fine-tuning the model.
Manual labeling of trees (bounding boxes) can be done in Label Studio.

In [None]:
## Split raster into 1200x1200 pixel chips

from deepforest import preprocess

#large geospatial image, probably a geotiff
train_image_path = get_data("/content/hole_17_ortho_utm.tif")

#output directory on colab
output_dir = "/content/chip"

#parameters for splitting
output_crops = preprocess.split_raster(
    path_to_raster=train_image_path,
    annotations_file=None,   # no labels yet
    save_dir=output_dir,
    patch_size=1200,          # chip size (pixels)
    patch_overlap=0        # overlap to capture cut-off trees
)

print(f"Created {len(output_crops)} chips in {output_dir}")

In [None]:
##Download the chips to your local machine

from google.colab import files
import os

# Replace 'your_directory_name' with the name of the directory you want to download
directory_to_download = '/content/chip/'
zip_filename = f'{directory_to_download}.zip'

# Zip the directory
!zip -r "$zip_filename" "$directory_to_download"

# Download the zipped file
files.download(zip_filename)

# Optional: Remove the zipped file after downloading
# os.remove(zip_filename)

Bring your images chips into Label Studio on your local machine, makes labels of trees, export the labels (annotatations & images) in pascal VOC xml format. Make sure the label is 'Tree' not 'tree'.

In [None]:
#Bring annotations and chips (single zip file) into Colab

from google.colab import files
import zipfile, io, os

# Upload your zip
uploaded = files.upload()
zip_path = next(iter(uploaded.keys()))

# Unzip
base_dir = "/content/labels"
with zipfile.ZipFile(io.BytesIO(uploaded[zip_path])) as z:
    z.extractall(base_dir)

# Inspect to see folder names
print("Contents of dataset:", os.listdir(base_dir))

In [None]:
#Convert pascal voc xml annotation format to csv format that deepforest wants

from deepforest.utilities import read_pascal_voc
import pandas as pd
from pathlib import Path
import glob

# Directory with your VOC XMLs
VOC_DIR = Path("/content/labels/Annotations")

# Collect all XML files into one DataFrame
dfs = []
for xml in glob.glob(str(VOC_DIR / "*.xml")):
    df = read_pascal_voc(xml)
    dfs.append(df)

all_df = pd.concat(dfs, ignore_index=True)

# Save as CSV for training
all_df.to_csv("/content/deepforest_annotations.csv", index=False)
print(all_df.head())


### Split annotation data into training and validation

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Path to your consolidated annotations CSV
CSV_IN = "/content/deepforest_annotations.csv"
TRAIN_OUT = "/content/deepforest_train.csv"
VAL_OUT   = "/content/deepforest_valid.csv"

df = pd.read_csv(CSV_IN)

# Unique images/chips
images = df["image_path"].dropna().unique()

# 75/25 split at the image level (reproducible)
train_imgs, val_imgs = train_test_split(
    images, test_size=0.25, random_state=42, shuffle=True
)

# Filter rows by image split
train_df = df[df.image_path.isin(train_imgs)].copy()
val_df   = df[df.image_path.isin(val_imgs)].copy()

# Save
train_df.to_csv(TRAIN_OUT, index=False)
val_df.to_csv(VAL_OUT, index=False)

# Sanity checks & summary
print("Images — total/train/val:", len(images), len(train_imgs), len(val_imgs))
print("Boxes  — total/train/val:", len(df), len(train_df), len(val_df))

# Leakage check
leak = set(train_imgs).intersection(set(val_imgs))
print("Leakage (should be 0):", len(leak))


## Train

In [None]:
##Set parameters for the training run

#Define the pre-trained model (done earlier in the notebook)
#model = main.deepforest()

model.config['gpus'] = '-1' #move to GPU and use all the GPU resources

#model.config["save-snapshot"] = False
model.config["train"]["fast_dev_run"] = False

#The annotation table
model.config["train"]["csv_file"] = TRAIN_OUT
#The directory where the training imagery is located
model.config["train"]["root_dir"] = "/content/labels/images"

model.config["validation"]["csv_file"] = VAL_OUT
model.config["validation"]["root_dir"] = "/content/labels/images"

model.config["train"]["epochs"] = 5          # 4 will run, but 8–20 is more typical
model.config["train"]["batch_size"] = 2       # if out of memory, drop to 1; if comfy, try 4
model.config["workers"] = 2                   # dataloader threads; bump if IO-bound
model.config["score_thresh"] = 0.4


model.create_trainer()


In [None]:
##TRAIN THE MODEL!
start_time = time.time()
model.trainer.fit(model)
print(f"--- Training on GPU: {(time.time() - start_time):.2f} seconds ---")

## Visualize the prediction after model fine-tuning

In [None]:
##Predict tree crowns on a georeferenced image
predicted_raster = model.predict_tile(raster_path, return_plot = True, patch_size=1000, patch_overlap=0.25, color=(255, 255, 0), thickness=20)

In [None]:
plt.figure(figsize=(20, 20))
plt.imshow(predicted_raster)
plt.show()

## Output and save prediction results for each image crop

In [None]:
import os

# Folder where you want results to go
save_dir = "/content/pred_result"
os.makedirs(save_dir, exist_ok=True)

# Evaluate the training set
results = model.evaluate(
    TRAIN_OUT,      # CSV of training annotations
    IMG_ROOT,       # directory containing the chip images
    iou_threshold=0.4,
    savedir=save_dir
)

## Assessing the Quality of our Tree Predictions

In [None]:
#show assessment of results
results

In [None]:
results['results']

In [None]:
results['box_precision']

In [None]:
results["box_recall"]

In [None]:
results["class_recall"]

## Save and load the fine-tuned model

In [None]:
#Save the fine-tuned model out to your storage
save_model_dir = os.path.join(savedir, 'golf_course_deepforest.pt')
torch.save(model.model.state_dict(),save_model_dir)

In [None]:
#Bring existing model into environment
fine_tuned_model = main.deepforest()
fine_tuned_model.model.load_state_dict(torch.load(save_model_dir))

## Save Fine-tuned model to Hugging Face

In [None]:
#Install python libraries that allow you to connect with Hugging Face
!pip install huggingface_hub

In [None]:
#Input your Hugging Face username toke to authenticate your account
from huggingface_hub import notebook_login

notebook_login()

In [None]:
#Push fine-tuned model up to Hugging Face
from huggingface_hub import HfApi

# Set up repository details
repo_name = "deepforest_fine_tuning"
model_file = "/content/golf_course_deepforest.pt"

# Create a new repo if it doesn't exist
#api = HfApi()
#api.create_repo(repo_name)

# Upload model to Hugging Face
api.upload_file(
    path_or_fileobj=model_file,   # Path to your .pt file
    repo_id=f"jgillan/{repo_name}",
    path_in_repo="golf_course_deepforest.pt"  # The name you want for the file on the Hub
)


## Download a model from Hugging Face and bring into Colab

In [None]:

from huggingface_hub import hf_hub_download

# Download the .pt file from Hugging Face
model_file = hf_hub_download(repo_id="jgillan/deepforest_fine_tuning", filename="golf_course_deepforest.pt")

fine_tuned_model = main.deepforest()
fine_tuned_model.model.load_state_dict(torch.load(model_file))