---
title: "Reevaluating Automated Wildlife Species Detection: A Reproducibility Study on a Custom Image Dataset"
subtitle: "Testing the Pre-trained Inception-ResNet-v2 Computer Vision Model on a Google Image Dataset"
date: 2025-09-03
keywords: ["machine learning", "reproducibility", "animal species classification", "computer vision", "neural networks", "cnn", "resnet", "tensorflow", "wildlife monitoring"]
exports: 
  - format: pdf
    template: arxiv_nips
---

+++ {"part": "abstract"}
This experiment attempts to reproduce the results of the paper [Automated detection of European wild mammal species in camera trap images with an existing and pre-trained computer vision model](doi:10.1007/s10344-020-01404-y), 
which tests the pretrained Google Inception-ResNet-v2 model for predicting animal species.
We describe the required software, image loading processes, model outputs. Furthermore we calculate prediction global and per-class prediction accuracies and compare them to the metrics from the original paper.
+++

# Introduction


# Experiment Setup

Carl et al. do not supply great insight into the runtime environment. This is likely due to the standardized way the model can be used through the TensorFlow library. To maximize the readability and reproducibility of the experiment, a minimal setup was chosen, defining all necessary code, data, and requirements in one GitHub project.
State-of-the-art Python packages are chosen, installed, and imported. The exact versions are shown in {numref}`table-requirements`. The Jupyter notebook is run locally on a Thinkpad T14 with an AMD Ryzen 5 PRO 5650U processor, 16 GB of memory, and Linux Mint 22.1 installed. No GPU was used, but it can be expected that the results would not change if one were used.

In [1]:
!python3.12 -m pip install -r requirements.txt



In [2]:
# image & data processing
from pathlib import Path
from PIL import Image
import numpy as np
import pandas as pd
# neural network
import tensorflow as tf
from tensorflow.keras.applications import InceptionResNetV2
from tensorflow.keras.applications.inception_resnet_v2 import decode_predictions
# evaluation
from sklearn.metrics import accuracy_score, confusion_matrix

2025-09-11 13:01:41.989680: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-09-11 13:01:41.992823: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-09-11 13:01:42.000486: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1757588502.013231   17180 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1757588502.016802   17180 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1757588502.027391   17180 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

# Model
Once the Python runtime is set up and the packages are imported, the neural network can be instantiated and directly fit to the ImageNet dataset. This eliminates all model design and training work.

In [3]:
model = InceptionResNetV2(weights="imagenet")

2025-09-11 13:01:44.301380: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


# Data
For the experiment, 90 common animals are used. They are sourced from Google Images and provided in a labeled format in @banerjee2024animal. The Kaggle dataset is rather large. To mimic the original experiment setup, only 10 samples are used for each species, resulting in a total test sample size of 900 images.

```{image} data/kaggle-90-different-animals/wombat/2a6c3fd292.jpg
:align: center
:alt: Example wildlife image (wombat)
:width: 300px
```

In [4]:
DATA_PATH = Path("data/kaggle-90-different-animals")
GITHUB_PROJECT = "https://github.com/tobsel7/research-vetmedwien-animal-species-identification"
model_input_size = model.input_shape[1:3] # the required image dimensions (299, 299)

## Data Preprocessing
The images are loaded with three color channels (RGB), resized to 299 by 299 pixels and converted into an 1-dimensional vector. The color intensities are scaled to be floating point numbers from 0 to 1. This is the minimal preprocessing required to fit the required input size of the neural network.

In [5]:
def load_normalized_image(path, target_size=model_input_size):
    image = Image.open(path).convert("RGB")
    image = image.resize(target_size)
    return np.array(image) / 255.0

In [6]:
wildlife_image_paths = []

animal_species = sorted([d.name for d in DATA_PATH.iterdir() if d.is_dir()])

for species_name in animal_species:
    animal_image_folder = DATA_PATH / species_name # every species has its image folder
    for image_path in animal_image_folder.glob("*.jpg"):
        image_array = load_normalized_image(image_path)
        wildlife_image_paths.append(image_path)

The testing data is constructed by stacking the normalized image vectors and using the folder names as the label.

In [7]:
animal_images = [load_normalized_image(p) for p in wildlife_image_paths]
animal_species = [p.parent.name for p in wildlife_image_paths]

X_test = np.stack(animal_images, axis=0)
y_true = animal_species

In [8]:
X_test.shape

(900, 299, 299, 3)

# Test
The model yields a probability for each of the 1000 classes. The classes represent 1000 different classes taken from the ImageNet database. For this experiment, we use the output from the top neuron of the final softmax layer and compare its label to the true label.

In [9]:
y_pred = model.predict(X_test)
y_pred = [pred[0][1] for pred in decode_predictions(y_pred, top=1)]

2025-09-11 13:02:05.551344: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 965530800 exceeds 10% of free system memory.


[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m111s[0m 4s/step


When looking at the results, it becomes apparent that the model yields usable results. Almost all inference outputs are animal species somehow related to the one present in the image. This shows that the InceptionResNetV2 is generalizable to some extent.

In [10]:
species_recognition_result = pd.DataFrame({
    "y_true": y_true,
    "y_pred": y_pred
})

In [11]:
with open("figure/species_recognition_result_raw_short.md", "w") as f:
    f.write(
        species_recognition_result.groupby("y_true", as_index=False).first()[:10].to_markdown(index=False)
    )

```{table} 
:name: species_recognition_result_raw_short
:align: center

Subset of Inception-ResNet-v2 raw predictions
```{include} figure/species_recognition_result_raw_short.md

## Label Mapping
The main issue with this experiment is the set of classes known to the model, which do not match the dataset used for testing. This is not specific to this dataset, but it is very likely to happen in any kind of realistic setup. We manually define a mapping table to relate the model output label to the labels from the dataset. 

This mapping is done manually, as a best-effort approach following the Linnean system of taxonomy, and we acknowledge some shortcomings of it:

- Some semantic information is lost as species are sometimes mapped to their families. (e.g., all bear species are mapped to bear)
- The dataset contains images of species that are not directly related to any class from the model. (e.g., all bats, deers)


 We document all defined mappings in {numref}`table-imagenet-label-mapping`.

In [12]:
imagenet_to_kaggle = {
    "gazelle": "antelope",
    "impala": "antelope",
    "American_black_bear": "bear",
    "brown_bear": "bear",
    "ground_beetle": "beetle",
    "leaf_beetle": "beetle",
    "rhinoceros_beetle": "beetle",
    "dung_beetle": "beetle",
    "wild_boar": "boar",
    "ringlet": "butterfly",
    "monarch": "butterfly",
    "sulphur_butterfly": "butterfly",
    "lycaenid": "butterfly",
    "Egyptian_cat": "cat",
    "tabby": "cat",
    "Siamese_cat": "cat",
    "Persian_cat": "cat",
    "lynx": "cat",
    "water_buffalo": "cow",
    "Dungeness_crab": "crab",
    "red_deer": "deer",
    "elk": "deer",
    "Labrador_retriever": "dog",
    "basset": "dog",
    "Border_collie": "dog",
    "Chihuahua": "dog",
    "Bouvier_des_Flandres": "dog",
    "Brittany_spaniel": "dog",
    "English_setter": "dog",
    "Greater_Swiss_Mountain_dog": "dog",
    "Ibizan_hound": "dog",
    "Mexican_hairless": "dog",
    "tox_terrier": "dog",
    "Pekinese": "dog",
    "Pomeranian": "dog", 
    "golden_retriever": "dog",
    "pug": "dog",
    "ass": "donkey",
    "mallard": "duck",
    "bald_eagle": "eagle",
    "golden_eagle": "eagle",
    "African_elephant": "elephant",
    "Indian_elephant": "elephant",
    "Arctic_fox": "fox",
    "red_fox": "fox",
    "grey_fox": "fox",
    "kit_fox": "fox",
    "ibex": "goat",
    "mountain_goat": "goat",
    "Arabian_horse": "horse",
    "Appaloosa": "horse",
    "wallaby": "kangaroo",
    "agama": "lizard",
    "alligator_lizard": "lizard",
    "banded_gecko": "lizard",
    "Komodo_dragon": "lizard",
    "whiptail": "lizard", 
    "American_lobster": "lobster",
    "spiny_lobster": "lobster",
    "house_mouse": "mouse",
    "fiddler_crab": "crab",
    "rock_crab": "crab",
    "king_crab": "crab",
    "magpie": "crow",
    "jay": "crow",
    "drake": "duck",
    "tusker": "elephant",
    "great_grey_owl": "owl",
    "giant_panda": "panda",
    "African_grey": "parrot",
    "macaw": "parrot",
    "sulphur-crested_cockatoo": "parrot",
    "pelican": "pelecaniformes",
    "black_stork": "pelecaniformes",
    "king_penguin": "penguin",
    "hog": "pig",
    "red-backed_sandpiper": "sandpiper",
    "redshank": "sandpiper",
    "dowitcher": "sandpiper",
    "great_white_shark": "shark",
    "hammerhead": "shark",
    "tiger_shark": "shark",
    "horned_viper": "snake",
    "ram": "sheep",
    "bighorn": "sheep",
    "vine_snake": "snake",
    "king_snake": "snake",
    "night_snake": "snake",
    "Indian_cobra": "snake",
    "ringneck_snake": "snake",
    "rock_python": "snake",
    "thunder_snake": "snake",
    "fox_squirrel": "squirrel",
    "black_swan": "swan",
    "box_turtle": "turtle",
    "loggerhead": "turtle",
    "leatherback_turtle": "turtle",
    "mud_turtle": "turtle",
    "grey_whale": "whale",
    "killer_whale": "whale",
    "timber_wolf": "wolf",
    "white_wolf": "wolf"
}

species_recognition_result["y_pred_mapped"] = species_recognition_result["y_pred"].map(
    lambda l: imagenet_to_kaggle.get(l, l) # safe map accessor (labels that are not present in the dict keys, remain unchanged)
)

# Evaluation
Carl et al. provide two kinds of performance metrics: overall model accuracy and the accuracy for each species. By grouping the samples by the true species, it is straightforward to calculate metrics that can be directly compared to the results from Carl et al.

In [25]:
total_accuracy = accuracy_score(species_recognition_result["y_true"], species_recognition_result["y_pred_mapped"])

In [14]:
group_accuracy = species_recognition_result.assign(
    accuracy = species_recognition_result["y_true"] == species_recognition_result["y_pred_mapped"]
).groupby("y_true")["accuracy"].mean().sort_values(ascending=False)

In [45]:
example_species = group_accuracy.reset_index().iloc[np.linspace(0, len(group_accuracy) - 1, 5, dtype=int)]
example_species.columns = ["species", "accuracy"]

example_rows = (
    species_recognition_result.groupby("y_true", as_index=False).first()
    .loc[lambda df: df["y_true"].isin(example_species["species"])]
)

summary_table = pd.merge(example_rows, example_species, left_on="y_true", right_on="species")
summary_table = summary_table[["y_true", "accuracy"]]
summary_table.columns = ["species", "accuracy"]
summary_table.sort_values(by="accuracy", inplace=True, ascending=False)

summary_table = pd.concat([
    summary_table,
    pd.DataFrame([["...", "..."]], columns=summary_table.columns),
    pd.DataFrame([["TOTAL", total_accuracy]], columns=summary_table.columns)
], ignore_index=True)

with open("figure/species_recognition_result_summary.md", "w") as f:
    f.write(summary_table.to_markdown(index=False))

```{table} 
:name: species_recognition_result_summary
:align: center

Prediction accuracy for 5 different species and the total accuracy
```{include} figure/species_recognition_result_summary.md

Refer to {numref}`group_accuracy` for the group accuracies for each species.

# Summary

# Future Work

# Appendix

## Dependency Table

In [15]:
package_versions = {}
with open("requirements.txt", "r") as f:
    for line in f:
        line = line.strip()
        if not line or line.startswith("#"):
            continue
        if "==" in line:
            pkg, ver = line.split("==")
            package_versions[pkg] = ver
        else:
            package_versions[line] = "unknown"
            
with open("figure/requirements.md", "w") as f:
    f.write(
        pd.DataFrame({
            "package": package_versions.keys(),
            "version": package_versions.values()
        }).to_markdown(index=False)
    )

```{table} 
:name: table-requirements
:align: center

Runtime dependencies
```{include} figure/requirements.md

## Imagenet Label Mapping Table

In [16]:
with open("figure/imagenet_label_mapping.md", "w") as f:
    f.write(
        pd.DataFrame({
            "imagenet label": imagenet_to_kaggle.keys(),
            "mapped label": imagenet_to_kaggle.values()
        }).groupby("mapped label")["imagenet label"].apply(
            lambda labels: ", ".join(labels)
        ).reset_index().to_markdown(index=False)
    )

```{table} 
:name: table-imagenet-label-mapping
:align: center

Imagenet label mapping
```{include} figure/imagenet_label_mapping.md

## Inference Output Table

In [17]:
github_animal_image_paths = [
    f"[{p.parent.name}]({GITHUB_PROJECT}/raw/main/{p})"
    for p in wildlife_image_paths
]

with open("figure/species_recognition_result.md", "w") as f:
    f.write(
        pd.DataFrame({
            "truth": github_animal_image_paths,
            "mapped prediction": species_recognition_result["y_pred_mapped"],
            "model prediction": species_recognition_result["y_pred"]
        }).to_markdown(index=False)
    )

```{table} 
:name: species-recognition-result
:align: center

Inception-ResNet-v2 predictions
```{include} figure/species_recognition_result.md

## Per-Class Accuracy Table

In [18]:
with open("figure/group_accuracy.md", "w") as f:
    f.write(
        group_accuracy.reset_index().rename(columns={"y_true": "species"}).to_markdown(index=False)
    )

```{table} 
:name: group_accuracy
:align: center

The prediction accuracy for each animal species
```{include} figure/group_accuracy.md