---
title: "Reevaluating Automated Wildlife Species Detection: A Reproducibility Study on a Custom Image Dataset"
subtitle: "Testing Inception-ResNet-v2 on a Public Google Image Dataset"
keywords: ["machine learning", "reproducibility", "camera trap", "pre-trained model", "animal species classification", "computer vision", "neural networks", "cnn", "resnet", "tensorflow", "wildlife monitoring"]
exports: 
  - format: pdf
    template: arxiv_two_column
    output: document/reproducibility-experiment-animal-identification-carl.pdf
---

+++ {"part": "abstract"}
This study revisits the findings of Carl et al. {cite}`doi:10.1007/s10344-020-01404-y`, who evaluated the pre-trained Google Inception-ResNet-v2 model for automated detection of European wild mammal species in camera trap images. To assess the reproducibility and generalizability of their approach, we reimplemented the experiment from scratch using openly available resources and a different dataset consisting of 900 images spanning 90 species. After minimal preprocessing, we obtained an overall classification accuracy of 62%, closely aligning with the 71% reported in the original work despite differences in datasets. As seen in the original study, per-class performance varied substantially, ranging from 0% to 100% accuracy, highlighting limitations in generalization when labels do not align directly with ImageNet classes. Our results confirm that pretrained convolutional neural networks can provide a practical baseline for wildlife species identification but also reinforce the need for dataset-specific adaptation or transfer learning to achieve consistent, high-quality predictions.
+++

# Introduction

While biodiversity is decreasing at a rapid pace, the rise of specific species, be they invasive or predatory, concerns societies around the world. As a consequence, researchers and conservationists are interested in continuously monitoring wildlife populations in terms of their geographical distribution, size, and behavior. Researchers successfully deploy camera traps that can take photographs of passing animals without disturbing them {cite}`trolliet2014camera`. The photos are typically manually collected from the traps and annotated with the name of the species present in the image {cite}`10.1145/3615893.3628760`.

Deep convolutional neural networks (CNNs) have emerged as a promising solution to automate this process, offering robust image classification capabilities. Building on prior work, this study evaluates the reproducibility of results reported by Carl et al. {cite}`doi:10.1007/s10344-020-01404-y`, who applied a pre-trained Inception-ResNet-v2 model for European mammal species detection in camera trap images. By reconstructing their experiment using a different dataset and a reproducible, open-source workflow, we examine both the reliability of the original findings and the generalizability of pretrained CNNs to broader wildlife monitoring scenarios.

# Experiment Setup

We reimplemented the Python code for the experiment from scratch because all the necessary components (data {cite}`banerjee2024animal`, model {cite}`doi:10.48550/arXiv.1602.07261`, and metrics {cite}`scikit-learn`) can be taken from stable public sources. To maximize the readability and reproducibility of the experiment, a minimal setup was chosen, defining all necessary code, data, and requirements in one GitHub project {cite}`doi:10.5281/zenodo.17116549`.
State-of-the-art Python packages are chosen, installed, and imported. The exact versions are shown in {numref}`table-requirements`. The Jupyter notebook is run locally on a Thinkpad T14 with an AMD Ryzen 5 PRO 5650U processor and 16 GB of memory but no GPU. The operating system is Linux Mint 22.1 and the Python kernel is version 3.12. We expect there to be no deviation in the results, even if different hardware or runtime is chosen.

In [1]:
!python3.12 -m pip install -r requirements.txt



In [2]:
# image & data processing
from pathlib import Path
from PIL import Image
import numpy as np
import pandas as pd
# neural network
import tensorflow as tf
from tensorflow.keras.applications import InceptionResNetV2
from tensorflow.keras.applications.inception_resnet_v2 import decode_predictions
# evaluation
from sklearn.metrics import accuracy_score, confusion_matrix

2025-09-30 16:37:13.927511: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-09-30 16:37:13.931430: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-09-30 16:37:13.939868: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1759243033.952973   17191 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1759243033.956602   17191 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1759243033.967795   17191 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

## Dependency Table

In [3]:
package_versions = {}
with open("requirements.txt", "r") as f:
    for line in f:
        line = line.strip()
        if not line or line.startswith("#"):
            continue
        if "==" in line:
            pkg, ver = line.split("==")
            package_versions[pkg] = ver
        else:
            package_versions[line] = "unknown"

pd.DataFrame({
    "package": package_versions.keys(),
    "version": package_versions.values()
}).to_markdown("figure/requirements.md", index=False)

```{table} 
:name: table-requirements
:align: center

Runtime dependencies
```{include} figure/requirements.md

# Model

After setting up the Python runtime and importing the required packages, we load the Inception-ResNet-v2 model from the TensorFlow model repository {cite}`tensorflow2015-whitepaper`. We use the publicly available pretrained weights, which were obtained by training the model on the ImageNet dataset {cite}`doi:10.1109/CVPR.2009.5206848`. This approach eliminates the model design and training phase completely but limits the model prediction space to the 1000 classes from the ImageNet dataset.

In [4]:
model = InceptionResNetV2(weights="imagenet")

2025-09-30 16:37:16.531312: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


# Data

Carl et al. provide the source of the wildlife images used in their dataset {cite}`nationalpark2019schwarzwild`. This source is no longer available, requiring us to run the experiment on a different dataset. To test the generalizability of the model, we take a larger public dataset containing images of 90 different species {cite}`banerjee2024animal`. To mimic the original experiment setup, only 10 samples are used for each species, resulting in a total test sample size of 900 images.

```{image} data/kaggle-90-different-animals/wombat/2a6c3fd292.jpg
:align: center
:alt: Example wildlife image (wombat)
:width: 300px
```

In [5]:
DATA_PATH = Path("data/kaggle-90-different-animals")
GITHUB_PROJECT = "https://github.com/tobsel7/research-vetmedwien-animal-species-identification"
input_shape = model.input_shape[1:3] # the required image dimensions (299, 299)

## Data Preprocessing

We load the images, respecting all three color channels (RGB), resize them to 299 by 299 pixels, and convert them into a 1-dimensional vector. The color intensities are scaled to be floating-point numbers from 0 to 1. This is the minimal preprocessing required to fit the required input size of the neural network.

In [6]:
def load_image(path, target_size):
    image = Image.open(path).convert("RGB")
    image = image.resize(target_size)
    return np.array(image) / 255.0

In [7]:
wildlife_image_paths = []

animal_species = sorted([d.name for d in DATA_PATH.iterdir() if d.is_dir()])

for species_name in animal_species:
    animal_image_folder = DATA_PATH / species_name # every species has its image folder
    for image_path in animal_image_folder.glob("*.jpg"):
        wildlife_image_paths.append(image_path)

Then we construct the testing dataset by stacking all normalized image vectors and using the folder names as the labels.

In [8]:
animal_images = [load_image(p, input_shape) 
                 for p in wildlife_image_paths]
animal_species = [p.parent.name 
                  for p in wildlife_image_paths]

X_test = np.stack(animal_images, axis=0)
y_true = animal_species

In [9]:
X_test.shape

(900, 299, 299, 3)

# Test

The Inception-ResNet-v2 model outputs a probability distribution over 1,000 classes, corresponding to the categories defined in the ImageNet dataset. For this study, we use only the top-1 prediction (the class with the highest softmax probability) as the model’s output and compare it to the ground-truth label from our test dataset.

In [10]:
y_pred = model.predict(X_test)
y_pred = [pred[0][1] # take output label
          for pred 
          in decode_predictions(y_pred, top=1)]

2025-09-30 16:37:31.143939: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 965530800 exceeds 10% of free system memory.


[1m29/29[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m117s[0m 4s/step


When looking at the predictions, it becomes apparent that the model yields usable results. Almost all inference outputs are animal species somehow related to the one present in the image. The data already shows that the InceptionResNetV2 is generalizable to some extent.

In [11]:
species_recognition_result = pd.DataFrame({
    "y_true": y_true,
    "y_pred": y_pred
})

In [12]:
species_recognition_result.groupby("y_true", as_index=False).first()[["y_pred", "y_true"]][:10].to_markdown("figure/species_recognition_result_raw_short.md", index=False)

```{table} 
:name: species_recognition_result_raw_short
:align: center

Subset of Inception-ResNet-v2 raw predictions
```{include} figure/species_recognition_result_raw_short.md

## Label Mapping

A key challenge in this setup is that ImageNet classes do not align directly with the wildlife species in our dataset. To enable evaluation, we constructed a manual mapping table linking model output labels to the target species. This mapping followed a best-effort approach based on the Linnean system of taxonomy.

Several limitations arise from this process:

 - When we collapse multiple species into higher-level taxa (e.g., mapping all bear species to "bear"), we lose some species-level detail.

 - Certain species in the dataset are not represented in ImageNet classes at all (e.g., bats, deer), preventing meaningful predictions for these cases.

While this mapping introduces ambiguity, it reflects a realistic challenge when applying pretrained ImageNet models to ecological data and illustrates the need for task-specific model adaptation.

In [13]:
imagenet_to_kaggle = {
    "gazelle": "antelope",
    "impala": "antelope",
    "American_black_bear": "bear",
    "brown_bear": "bear",
    "ground_beetle": "beetle",
    "leaf_beetle": "beetle",
    "rhinoceros_beetle": "beetle",
    "dung_beetle": "beetle",
    "wild_boar": "boar",
    "ringlet": "butterfly",
    "monarch": "butterfly",
    "sulphur_butterfly": "butterfly",
    "lycaenid": "butterfly",
    "Egyptian_cat": "cat",
    "tabby": "cat",
    "Siamese_cat": "cat",
    "Persian_cat": "cat",
    "lynx": "cat",
    "water_buffalo": "cow",
    "Dungeness_crab": "crab",
    "red_deer": "deer",
    "elk": "deer",
    "Labrador_retriever": "dog",
    "basset": "dog",
    "Border_collie": "dog",
    "Chihuahua": "dog",
    "Bouvier_des_Flandres": "dog",
    "Brittany_spaniel": "dog",
    "English_setter": "dog",
    "Greater_Swiss_Mountain_dog": "dog",
    "Ibizan_hound": "dog",
    "Mexican_hairless": "dog",
    "tox_terrier": "dog",
    "Pekinese": "dog",
    "Pomeranian": "dog", 
    "golden_retriever": "dog",
    "pug": "dog",
    "ass": "donkey",
    "mallard": "duck",
    "bald_eagle": "eagle",
    "golden_eagle": "eagle",
    "African_elephant": "elephant",
    "Indian_elephant": "elephant",
    "Arctic_fox": "fox",
    "red_fox": "fox",
    "grey_fox": "fox",
    "kit_fox": "fox",
    "ibex": "goat",
    "mountain_goat": "goat",
    "Arabian_horse": "horse",
    "Appaloosa": "horse",
    "wallaby": "kangaroo",
    "agama": "lizard",
    "alligator_lizard": "lizard",
    "banded_gecko": "lizard",
    "Komodo_dragon": "lizard",
    "whiptail": "lizard", 
    "American_lobster": "lobster",
    "spiny_lobster": "lobster",
    "house_mouse": "mouse",
    "fiddler_crab": "crab",
    "rock_crab": "crab",
    "king_crab": "crab",
    "magpie": "crow",
    "jay": "crow",
    "drake": "duck",
    "tusker": "elephant",
    "great_grey_owl": "owl",
    "giant_panda": "panda",
    "African_grey": "parrot",
    "macaw": "parrot",
    "sulphur-crested_cockatoo": "parrot",
    "pelican": "pelecaniformes",
    "black_stork": "pelecaniformes",
    "king_penguin": "penguin",
    "hog": "pig",
    "red-backed_sandpiper": "sandpiper",
    "redshank": "sandpiper",
    "dowitcher": "sandpiper",
    "great_white_shark": "shark",
    "hammerhead": "shark",
    "tiger_shark": "shark",
    "horned_viper": "snake",
    "ram": "sheep",
    "bighorn": "sheep",
    "vine_snake": "snake",
    "king_snake": "snake",
    "night_snake": "snake",
    "Indian_cobra": "snake",
    "ringneck_snake": "snake",
    "rock_python": "snake",
    "thunder_snake": "snake",
    "fox_squirrel": "squirrel",
    "black_swan": "swan",
    "box_turtle": "turtle",
    "loggerhead": "turtle",
    "leatherback_turtle": "turtle",
    "mud_turtle": "turtle",
    "grey_whale": "whale",
    "killer_whale": "whale",
    "timber_wolf": "wolf",
    "white_wolf": "wolf"
}

y_pred_mapped = species_recognition_result["y_pred"].map(
    lambda l: imagenet_to_kaggle.get(l, l) # safe map accessor (labels that are not present in the dict keys, remain unchanged)
)
species_recognition_result["y_pred_mapped"] = y_pred_mapped

### Imagenet Label Mapping Table

In [14]:
pd.DataFrame({
    "ImageNet label": imagenet_to_kaggle.keys(),
    "dataset label": imagenet_to_kaggle.values()
}).groupby("dataset label")["ImageNet label"].apply(
    lambda labels: ", ".join(labels)
).reset_index()[["ImageNet label", "dataset label"]][:10].to_markdown("figure/imagenet_label_mapping_short.md", index=False)

```{table} 
:name: imagenet_label_mapping
:align: center

Mapping rules between ImageNet classes and test data classes
```{include} figure/imagenet_label_mapping_short.md

# Evaluation

Carl et al. reported two performance metrics for their study: overall classification accuracy across the dataset and per-species accuracy. To enable a direct comparison, we adopt the same evaluation strategy. After applying the label mapping described above, predictions are grouped by true species, and accuracy is computed at both the global and class levels.

In [15]:
accuracy = accuracy_score(y_true, y_pred_mapped)

In [16]:
accuracy

0.62

In [17]:
group_accuracy = species_recognition_result.assign(
    accuracy = species_recognition_result["y_true"] == species_recognition_result["y_pred_mapped"]
).groupby("y_true")["accuracy"].mean().sort_values(ascending=False)
group_accuracy = group_accuracy.reset_index()
group_accuracy.columns = ["species", "accuracy"]
group_accuracy

Unnamed: 0,species,accuracy
0,bison,1.0
1,bear,1.0
2,boar,1.0
3,crab,1.0
4,elephant,1.0
...,...,...
85,reindeer,0.0
86,squid,0.0
87,sparrow,0.0
88,turkey,0.0


In [18]:
group_accuracy.groupby("accuracy").size()

accuracy
0.0    26
0.1     1
0.3     3
0.4     1
0.6     1
0.7     4
0.8     6
0.9    18
1.0    30
dtype: int64

## Group Accuracy Table

In [19]:
summary_grouped = (
    group_accuracy.groupby("accuracy")["species"]
    .apply(lambda x: ", ".join(x))
    .reset_index()
    .sort_values(by="accuracy", ascending=False)
)[["species", "accuracy"]]

md_table = pd.concat([
    summary_grouped,
    pd.DataFrame([["TOTAL", accuracy]], columns=["species", "accuracy"])
], ignore_index=True)

md_table.to_markdown("figure/species_recognition_result_summary.md", index=False)

```{table} 
:name: species_recognition_result_summary
:align: center

Prediction accuracy per species and the total accuracy
```{include} figure/species_recognition_result_summary.md

# Summary

Our reproduced results confirm the findings of Carl et al. We achieve an overall top-1 prediction accuracy of 62% by using a larger dataset that includes many species not present in the original study. This result is comparable to the 71% reported by Carl et al. Consistent with their study, we observe substantial variation in per-species accuracies. 48 species out of 90 are predicted with an accuracy greater than or equal to 90%, and 26 species are predicted without any success (0% accuracy). A detailed summary of per-species prediction accuracies is provided in {numref}`species_recognition_result_summary`.

The experiment shows that pretrained convolutional neural networks, like the Inception-ResNet-v2, are a viable option for the annotation of camera trap images. The network design allows for detailed pattern recognition and robust identification of a large number of animal species.

# Future Work

Prior research has explored transfer learning of convolutional neural networks for recognizing animals and their facial features, showing that retraining pretrained networks for a specific use case can substantially improve prediction accuracy {cite}`doi:10.3390/app13021178`.
The investigated model is still very deep (about 55 million parameters) and therefore less suitable for outdoor deployment with strict energy limitations and memory constraints. Other similar models, like MobileNet {cite}`doi:10.1109/ICCV.2019.00140` and EfficientNet {cite}`doi:10.48550/arXiv.1905.11946` variants, use significantly fewer layers and can realistically be deployed in nature for live animal identification. Future work could evaluate these smaller models following the same experimental approach. Combining them with transfer learning may yield highly efficient and accurate models suitable for deployment in real-world wildlife monitoring.

# Experiment Artifact

## Inference Output Table

In [20]:
github_animal_image_paths = [
    f"{GITHUB_PROJECT}/raw/main/{p}"
    for p in wildlife_image_paths
]

pd.DataFrame({
    "image link": github_animal_image_paths,
    "truth": species_recognition_result["y_true"],
    "mapped prediction": species_recognition_result["y_pred_mapped"],
    "model prediction": species_recognition_result["y_pred"]
}).to_csv("figure/model_prediction.csv", index=False)