# GBIF Plant Image Fetching Notebook

This notebook fetches plant images from the GBIF (Global Biodiversity Information Facility) API.

We save the raw images to *../data/raw/*.  


#### Setup: Import Libraries and Create Target Directory

This cell imports the necessary libraries and ensures the *../data/raw/* folder exists for storing raw images.


In [7]:
import os
import requests
from PIL import Image
from io import BytesIO

os.makedirs("../data/raw", exist_ok=True)

#### Query GBIF API for Plant Occurrences

This cell sends a request to the GBIF API to fetch up to 200 image-backed occurrence records for a specific taxon (in this case, *Quercus* — oaks).  
It filters for records that include still images and valid geographic coordinates.


In [8]:
url = "https://api.gbif.org/v1/occurrence/search"
params = {
    "mediaType": "StillImage",
    "hasCoordinate": "true",
    "limit": 200,
    "taxonKey": 6  # Quercus (oak) — example
}

response = requests.get(url, params=params)
data = response.json()["results"]

print(f"Found {len(data)} entries")


Found 200 entries


#### Download Images from Retrieved GBIF Entries

This cell extracts image URLs from the API response and downloads each image.  
Images are saved in *../data/raw/* as JPEGs named `leaf_<index>.jpg`.  
It tracks how many downloads succeed or fail, printing a summary at the end.


In [3]:
success, fail = 0, 0
image_urls = []

for entry in data:
    media = entry.get("media")
    if media:
        image_urls.append(media[0]["identifier"])

for i, img_url in enumerate(image_urls):
    try:
        response = requests.get(img_url, timeout=10)
        img = Image.open(BytesIO(response.content)).convert("RGB")
        img.save(f"../data/raw/leaf_{i}.jpg")
        success += 1
    except Exception as e:
        print(f"Failed {i}: {img_url} — {e}")
        fail += 1

print(f"\nDownloaded: {success}")
print(f"Failed: {fail}")



✅ Downloaded: 200
❌ Failed: 0
