# Introduction

This notebook will download all the JSON data for bird sounds in the US. The [REST API](https://www.xeno-canto.org/article/153) is documented. 

## Download the JSON

In [1]:
import httplib2
import json
import os.path

In [2]:
h = httplib2.Http()

In [3]:
url = 'https://www.xeno-canto.org/api/2/recordings?query=cnt:%22United+States%22'
response, content = h.request(url, 'GET')
data = json.loads(content)
num_pages = data['numPages']

In [5]:
from tqdm import tqdm

dataset = []

for i in tqdm(range(1, num_pages + 1)):
    fname = './json/{}.json'.format(i)
    if os.path.exists(fname):
        with open(fname, 'rb') as f:
            dataset.append(json.load(f))
    else:
        url = 'https://www.xeno-canto.org/api/2/recordings?query=cnt:%22United+States%22&page={}'.format(i)
        _, content = h.request(url, 'GET')
        with open(fname, 'wb') as f:
            f.write(json.dumps(d, f))
        dataset.append(json.loads(content))
        
print('{} pages of data'.format(len(dataset)))

100%|██████████████████████████████████████████████████████████████████████████████████| 83/83 [00:00<00:00, 89.15it/s]


83 pages of data


# Get the download URLs and output file names of each recording

In [6]:
def get_recording_download_info(item):
    url = 'https:{}'.format(item['file'])
    mp3 = './mp3/{}.mp3'.format(item['id'])
    return {
        'url': url,
        'mp3': mp3
    }
    
def get_recording_download_infos(dataset):
    recordings = []
    for data in dataset:
        download_infos = [get_recording_download_info(item) for item in data['recordings']]
        recordings.extend(download_infos)
    return recordings

In [7]:
download_info = get_recording_download_infos(dataset)

Check to make sure we have 41,291 recording download informations. Also, check to make sure the output paths are unique.

In [8]:
len(download_info)

41291

In [9]:
len(set([info['mp3'] for info in download_info]))

41291

In [10]:
download_info[0]

{'mp3': './mp3/316537.mp3',
 'url': 'https://www.xeno-canto.org/316537/download'}

# Download the mp3 files

In [11]:
from httplib2 import RelativeURIError

def download_save(item):
    h = httplib2.Http()
    url = item['url']
    mp3 = item['mp3']
    
    if os.path.exists(mp3):
        return 1
    
    try:
        _, content = h.request(url, 'GET')
        with open(mp3, 'wb') as f:
            f.write(content)
    except RelativeURIError as e:
        s = str(e)
        eq_index = s.find('=')
        url = s[eq_index + 1: len(s)].strip()
        url = 'https:{}'.format(url)
        try:
            _, content = h.request(url, 'GET')
            with open(mp3, 'wb') as f:
                f.write(content)
            return 1
        except RelativeURIError as e:
            return 0

In [12]:
download_save(download_info[0])

1

In [17]:
from joblib import Parallel, delayed

results = Parallel(n_jobs=15, backend='threading', verbose=1)(delayed(download_save)(item) for item in download_info)

[Parallel(n_jobs=15)]: Done  20 tasks      | elapsed:    0.0s
[Parallel(n_jobs=15)]: Done 170 tasks      | elapsed:    0.0s
[Parallel(n_jobs=15)]: Done 420 tasks      | elapsed:    0.1s
[Parallel(n_jobs=15)]: Done 770 tasks      | elapsed:    0.3s
[Parallel(n_jobs=15)]: Done 1220 tasks      | elapsed:    0.4s
[Parallel(n_jobs=15)]: Done 1770 tasks      | elapsed:    0.6s
[Parallel(n_jobs=15)]: Done 2420 tasks      | elapsed:    0.9s
[Parallel(n_jobs=15)]: Done 3170 tasks      | elapsed:    1.2s
[Parallel(n_jobs=15)]: Done 4020 tasks      | elapsed:    1.6s
[Parallel(n_jobs=15)]: Done 4970 tasks      | elapsed:    2.0s
[Parallel(n_jobs=15)]: Done 6020 tasks      | elapsed:    2.4s
[Parallel(n_jobs=15)]: Done 7170 tasks      | elapsed:    2.9s
[Parallel(n_jobs=15)]: Done 8420 tasks      | elapsed:    3.3s
[Parallel(n_jobs=15)]: Done 9770 tasks      | elapsed:    3.9s
[Parallel(n_jobs=15)]: Done 11220 tasks      | elapsed:    4.4s
[Parallel(n_jobs=15)]: Done 12770 tasks      | elapsed:   

# Take a Look!

Take a look at [Sir Ronald Fisher](https://en.wikipedia.org/wiki/Ronald_Fisher).