---
title: "Reevaluating Automated Wildlife Species Detection: A Reproducibility Study on a Custom Image Dataset"
subtitle: "Testing the Pre-trained Inception-ResNet-v2 Computer Vision Model on a Google Image Dataset"
keywords: ["machine learning", "reproducibility", "camera tramp", "pre-trained model", "animal species classification", "computer vision", "neural networks", "cnn", "resnet", "tensorflow", "wildlife monitoring"]
exports: 
  - format: pdf
    template: arxiv_two_column
    output: document/reproducibility-experiment-animal-identification-carl.pdf
---

+++ {"part": "abstract"}
This experiment reproduces the results of the paper Automated detection of European wild mammal species in camera trap images with an existing and pre-trained computer vision model {cite}`doi:10.1007/s10344-020-01404-y`, 
which tests the pretrained Google Inception-ResNet-v2 model for animal species identification.
We describe the required software, image loading processes, and model outputs. Furthermore, we calculate the prediction accuracies for each present species and the whole dataset and compare them to the metrics from the original paper. The observed total prediction accuracy of 62% comes close to the reported 71% by Carl et al. The large difference in per-class accuracy, ranging from 0% to 100%, can also be observed in our experiment. Like Carl et al., we recommend the use of the pretrained Inception-ResNet-v2 model for simple animal species identification tasks, emphasizing the need to refit the model to the species relevant for the specific use case if high prediction accuracies and consistency are desired.
+++

# Introduction

While biodiversity is decreasing at a rapid pace, the rise of specific species, be they invasive or predatory, concerns societies around the world. As a consequence, researchers and conservationists are interested in continuously monitoring wildlife populations in terms of their geographical distribution, size, and behavior. Researchers successfully deploy camera traps that can take photographs of passing animals without disturbing them {cite}`trolliet2014camera`. The photos are typically manually collected from the traps and annotated with the name of the species present in the image {cite}`10.1145/3615893.3628760`. This experiment tests one popular software for eliminating the need for manual annotation of images: deep convolutional neural networks.

# Experiment Setup

We decide to reimplement the Python code for the experiment from scratch because all the necessary components (data {cite}`banerjee2024animal`, model {cite}`doi:10.48550/arXiv.1602.07261`, and metrics {cite}`scikit-learn`) can be taken from stable public sources. To maximize the readability and reproducibility of the experiment, a minimal setup was chosen, defining all necessary code, data, and requirements in one GitHub project.
State-of-the-art Python packages are chosen, installed, and imported. The exact versions are shown in {numref}`table-requirements`. The Jupyter notebook is run locally on a Thinkpad T14 with an AMD Ryzen 5 PRO 5650U processor, 16 GB of memory, and Linux Mint 22.1 installed. No GPU was used, but it can be expected that the results would not change if one were used.

In [1]:
!python3.12 -m pip install -r requirements.txt



In [2]:
# image & data processing
from pathlib import Path
from PIL import Image
import numpy as np
import pandas as pd
# neural network
import tensorflow as tf
from tensorflow.keras.applications import InceptionResNetV2
from tensorflow.keras.applications.inception_resnet_v2 import decode_predictions
# evaluation
from sklearn.metrics import accuracy_score, confusion_matrix

2025-09-13 16:11:50.981266: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-09-13 16:11:50.984933: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-09-13 16:11:50.993174: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1757772711.006309   57140 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1757772711.009993   57140 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1757772711.021358   57140 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

## Dependency Table

In [3]:
package_versions = {}
with open("requirements.txt", "r") as f:
    for line in f:
        line = line.strip()
        if not line or line.startswith("#"):
            continue
        if "==" in line:
            pkg, ver = line.split("==")
            package_versions[pkg] = ver
        else:
            package_versions[line] = "unknown"

pd.DataFrame({
    "package": package_versions.keys(),
    "version": package_versions.values()
}).to_markdown("figure/requirements.md", index=False)

```{table} 
:name: table-requirements
:align: center

Runtime dependencies
```{include} figure/requirements.md

# Model
After setting up the Python runtime and importing the packages, the Inception-ResNet-v2 model is downloaded from the TensorFlow repository {cite}`tensorflow2015-whitepaper` and directly fit to the ImageNet dataset {cite}`doi:10.1109/CVPR.2009.5206848`. This eliminates all model design and training work.

In [4]:
model = InceptionResNetV2(weights="imagenet")

2025-09-13 16:11:53.579282: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


# Data
Carl et al. provide the source for their wildlife images for their dataset {cite}`nationalpark2019schwarzwild`. This source is no longer available, requiring us to run the experiment on a different dataset. To test the generalizability of the model, we take a larger public dataset containing images of 90 different species {cite}`banerjee2024animal`. To mimic the original experiment setup, only 10 samples are used for each species, resulting in a total test sample size of 900 images.

```{image} data/kaggle-90-different-animals/wombat/2a6c3fd292.jpg
:align: center
:alt: Example wildlife image (wombat)
:width: 300px
```

In [5]:
DATA_PATH = Path("data/kaggle-90-different-animals")
GITHUB_PROJECT = "https://github.com/tobsel7/research-vetmedwien-animal-species-identification"
input_shape = model.input_shape[1:3] # the required image dimensions (299, 299)

## Data Preprocessing
We load the images respecting all three color channels (RGB), resize them to 299 by 299 pixels, and convert them into a 1-dimensional vector. The color intensities are scaled to be floating-point numbers from 0 to 1. This is the minimal preprocessing required to fit the required input size of the neural network.

In [6]:
def load_image(path, target_size):
    image = Image.open(path).convert("RGB")
    image = image.resize(target_size)
    return np.array(image) / 255.0

In [7]:
wildlife_image_paths = []

animal_species = sorted([d.name for d in DATA_PATH.iterdir() if d.is_dir()])

for species_name in animal_species:
    animal_image_folder = DATA_PATH / species_name # every species has its image folder
    for image_path in animal_image_folder.glob("*.jpg"):
        wildlife_image_paths.append(image_path)

Then we construct the testing dataset by stacking all normalized image vectors and using the folder names as the labels.

In [8]:
animal_images = [load_image(p, input_shape) 
                 for p in wildlife_image_paths]
animal_species = [p.parent.name 
                  for p in wildlife_image_paths]

X_test = np.stack(animal_images, axis=0)
y_true = animal_species

In [9]:
X_test.shape

(900, 299, 299, 3)

# Test
The model yields a probability for each of the 1000 classes. The classes represent 1000 different classes taken from the ImageNet database. For this experiment, we use the output from the top neuron of the final softmax layer and compare its label to the true label.

In [10]:
y_pred = model.predict(X_test)
y_pred = [pred[0][1] # take output label
          for pred 
          in decode_predictions(y_pred, top=1)]

2025-09-13 16:12:08.879206: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 965530800 exceeds 10% of free system memory.


[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m121s[0m 4s/step


When looking at the results, it becomes apparent that the model yields usable results. Almost all inference outputs are animal species somehow related to the one present in the image. This already shows that the InceptionResNetV2 is generalizable to some extent.

In [11]:
species_recognition_result = pd.DataFrame({
    "y_true": y_true,
    "y_pred": y_pred
})

In [12]:
species_recognition_result.groupby("y_true", as_index=False).first()[["y_pred", "y_true"]][:10].to_markdown("figure/species_recognition_result_raw_short.md", index=False)

```{table} 
:name: species_recognition_result_raw_short
:align: center

Subset of Inception-ResNet-v2 raw predictions
```{include} figure/species_recognition_result_raw_short.md

## Label Mapping
The main issue with this experiment is the set of classes known to the model, which do not match the dataset used for testing. This is not unique to this dataset, but it is very likely to happen in any kind of realistic setup. We manually define a mapping table to relate the model output label to the labels from the dataset. 

The mapping rules are defined manually, following a best-effort approach respecting the Linnean system of taxonomy, and we acknowledge some of its shortcomings:

- Some semantic information is lost as species are sometimes mapped to their families. (e.g., all bear species are mapped to bear)
- The dataset contains images of species that are not directly related to any class from the model. (e.g., all bats, deers)

In [13]:
imagenet_to_kaggle = {
    "gazelle": "antelope",
    "impala": "antelope",
    "American_black_bear": "bear",
    "brown_bear": "bear",
    "ground_beetle": "beetle",
    "leaf_beetle": "beetle",
    "rhinoceros_beetle": "beetle",
    "dung_beetle": "beetle",
    "wild_boar": "boar",
    "ringlet": "butterfly",
    "monarch": "butterfly",
    "sulphur_butterfly": "butterfly",
    "lycaenid": "butterfly",
    "Egyptian_cat": "cat",
    "tabby": "cat",
    "Siamese_cat": "cat",
    "Persian_cat": "cat",
    "lynx": "cat",
    "water_buffalo": "cow",
    "Dungeness_crab": "crab",
    "red_deer": "deer",
    "elk": "deer",
    "Labrador_retriever": "dog",
    "basset": "dog",
    "Border_collie": "dog",
    "Chihuahua": "dog",
    "Bouvier_des_Flandres": "dog",
    "Brittany_spaniel": "dog",
    "English_setter": "dog",
    "Greater_Swiss_Mountain_dog": "dog",
    "Ibizan_hound": "dog",
    "Mexican_hairless": "dog",
    "tox_terrier": "dog",
    "Pekinese": "dog",
    "Pomeranian": "dog", 
    "golden_retriever": "dog",
    "pug": "dog",
    "ass": "donkey",
    "mallard": "duck",
    "bald_eagle": "eagle",
    "golden_eagle": "eagle",
    "African_elephant": "elephant",
    "Indian_elephant": "elephant",
    "Arctic_fox": "fox",
    "red_fox": "fox",
    "grey_fox": "fox",
    "kit_fox": "fox",
    "ibex": "goat",
    "mountain_goat": "goat",
    "Arabian_horse": "horse",
    "Appaloosa": "horse",
    "wallaby": "kangaroo",
    "agama": "lizard",
    "alligator_lizard": "lizard",
    "banded_gecko": "lizard",
    "Komodo_dragon": "lizard",
    "whiptail": "lizard", 
    "American_lobster": "lobster",
    "spiny_lobster": "lobster",
    "house_mouse": "mouse",
    "fiddler_crab": "crab",
    "rock_crab": "crab",
    "king_crab": "crab",
    "magpie": "crow",
    "jay": "crow",
    "drake": "duck",
    "tusker": "elephant",
    "great_grey_owl": "owl",
    "giant_panda": "panda",
    "African_grey": "parrot",
    "macaw": "parrot",
    "sulphur-crested_cockatoo": "parrot",
    "pelican": "pelecaniformes",
    "black_stork": "pelecaniformes",
    "king_penguin": "penguin",
    "hog": "pig",
    "red-backed_sandpiper": "sandpiper",
    "redshank": "sandpiper",
    "dowitcher": "sandpiper",
    "great_white_shark": "shark",
    "hammerhead": "shark",
    "tiger_shark": "shark",
    "horned_viper": "snake",
    "ram": "sheep",
    "bighorn": "sheep",
    "vine_snake": "snake",
    "king_snake": "snake",
    "night_snake": "snake",
    "Indian_cobra": "snake",
    "ringneck_snake": "snake",
    "rock_python": "snake",
    "thunder_snake": "snake",
    "fox_squirrel": "squirrel",
    "black_swan": "swan",
    "box_turtle": "turtle",
    "loggerhead": "turtle",
    "leatherback_turtle": "turtle",
    "mud_turtle": "turtle",
    "grey_whale": "whale",
    "killer_whale": "whale",
    "timber_wolf": "wolf",
    "white_wolf": "wolf"
}

y_pred_mapped = species_recognition_result["y_pred"].map(
    lambda l: imagenet_to_kaggle.get(l, l) # safe map accessor (labels that are not present in the dict keys, remain unchanged)
)
species_recognition_result["y_pred_mapped"] = y_pred_mapped

### Imagenet Label Mapping Table

In [14]:
pd.DataFrame({
    "ImageNet label": imagenet_to_kaggle.keys(),
    "dataset label": imagenet_to_kaggle.values()
}).groupby("dataset label")["ImageNet label"].apply(
    lambda labels: ", ".join(labels)
).reset_index()[["ImageNet label", "dataset label"]][:10].to_markdown("figure/imagenet_label_mapping.md", index=False)

```{table} 
:name: imagenet_label_mapping
:align: center

Mapping rules between ImageNet classes and test data classes
```{include} figure/imagenet_label_mapping.md

# Evaluation
Carl et al. provide two kinds of performance metrics: overall model accuracy and the accuracy for each species. By grouping the samples by true species, it is straightforward to calculate metrics that are semantically equivalent and therefore allow us to use them for comparison.

In [15]:
accuracy = accuracy_score(y_true, y_pred_mapped)

In [16]:
accuracy

0.62

In [17]:
group_accuracy = species_recognition_result.assign(
    accuracy = species_recognition_result["y_true"] == species_recognition_result["y_pred_mapped"]
).groupby("y_true")["accuracy"].mean().sort_values(ascending=False)

In [18]:
group_accuracy

y_true
bison         1.0
bear          1.0
boar          1.0
crab          1.0
elephant      1.0
             ... 
reindeer      0.0
squid         0.0
sparrow       0.0
turkey        0.0
woodpecker    0.0
Name: accuracy, Length: 90, dtype: float64

## Group Accuracy Table

In [19]:
grouped_species = group_accuracy.reset_index()
grouped_species.columns = ["species", "accuracy"]

summary_grouped = (
    grouped_species.groupby("accuracy")["species"]
    .apply(lambda x: ", ".join(x))
    .reset_index()
    .sort_values(by="accuracy", ascending=False)
)[["species", "accuracy"]]

md_table = pd.concat([
    summary_grouped,
    pd.DataFrame([["...", "..."]], columns=["species", "accuracy"]),
    pd.DataFrame([["TOTAL", accuracy]], columns=["species", "accuracy"])
], ignore_index=True)

md_table.to_markdown("figure/species_recognition_result_summary.md", index=False)

```{table} 
:name: species_recognition_result_summary
:align: center

Prediction accuracy per species and the total accuracy
```{include} figure/species_recognition_result_summary.md

# Summary

TODO: Carl et al. did a good job and their results are valid. It is important to emphasize the fact that the model can only predict a limited number of species. This makes the model almost certainly unsuitable for use in real use cases. The neural network architecture is powerful and very accurate in predicting classes that it was trained on.

# Future Work

TODO: Find paper that demonstrates transfer lerning using the inceptionresnet model and explain how great accuracy can be achived also for other species. Also mention that there are still few projects that deploy camera traps with such models and really have all of this automated.

# Experiment Artifact

## Inference Output Table

In [20]:
github_animal_image_paths = [
    f"[{p.parent.name}]({GITHUB_PROJECT}/raw/main/{p})"
    for p in wildlife_image_paths
]

pd.DataFrame({
    "image link": github_animal_image_paths,
    "truth": species_recognition_result["y_true"],
    "mapped prediction": species_recognition_result["y_pred_mapped"],
    "model prediction": species_recognition_result["y_pred"]
}).to_csv("figure/model_prediction.csv", index=False)