# Analyze downloaded bird sound files 

## Introduction
The program analyze the bird song dataset downloaded with the file "AM - Download dataset.py" and checks the metadata itself.
Each folder saved have at least one file inside: json with metadata. 

> JSON is short for JavaScript Object Notation, and is a way to store information in an organized, easy-to-access manner. In a nutshell, it gives us a human-readable collection of data that we can access in a really logical manner. (source: https://www.copterlabs.com/json-what-it-is-how-it-works-how-to-use-it/)

If the algorithm found at least one file under searched term, those sound files are saved in the folder.

Script in the notebook returns:
- [x] Number of files for a given bird, 
- [x] Minimum, maximum, and average length of a file, 
- [ ] Number of sounds with more than 1 tag,
- [x] Number of sounds with specified quality (e.g. none - 460, A- 102 recordings, B - 230 recordigns, ...).

Next, it analyze the recordings and:
- [ ] Calculate the signal to noise ratio -> return min, max and average values, 
- [ ] Shows 3 "A" quality bird-songs spectograms, melgrams and sound waves,
- [ ] Shows 3 random bird-songs spectograms, melgrams and sound waves, 



In [34]:
import json
import pandas as pd
import sys
import os

In [35]:
from AM_downloadDataset import read_data

In [36]:
countries = ['Poland', 'Germany', 'Slovakia', 'Czech', 'Lithuania']

# make and initialize dictionary of a bird
bird = {
        'type':'Parus major',
        'country':'',
        'number of files': 
            {
            'total' : 0,
            'quality':{'A':0,
                        'B':0,
                        'C':0,
                        'D':0,
                        'E':0,
                        'F':0,
                      }
            },
        'length':{'min':0,
                      'max':0,
                      'mean':0,
                      'median':0
                      }
        }

pd.DataFrame(bird) 


Unnamed: 0,type,country,number of files,length
max,Parus major,,,0.0
mean,Parus major,,,0.0
median,Parus major,,,0.0
min,Parus major,,,0.0
quality,Parus major,,"{'A': 0, 'B': 0, 'C': 0, 'D': 0, 'E': 0, 'F': 0}",
total,Parus major,,0,


In [37]:
from mutagen.mp3 import MP3
from statistics import mean, median

lengthData = list()
audioLength = []
for country in range(len(countries)):
        # find the driectory with recordings
        bird['country']=bird['country']+' ' + countries[country]
        path = '../data/xeno-canto-dataset/' + bird['type'] + ' cnt '+ countries[country] + ' type song'
        print('Loading data from folder ' + path)
        
        # load info about the quality of the recording from json file
        qualityData = read_data('q', path)
        bird['number of files']['total']=bird['number of files']['total']+len(qualityData)
        for quality in bird['number of files']['quality']:
             bird['number of files']['quality'][quality]=bird['number of files']['quality'][quality]+qualityData.count(quality)
        
        # load MP3 file of every recording and check the length of a file
        idData = read_data('id', path)
        genData = read_data('gen', path)
        for audioFile in range(len(idData)):
            audioLength.append(MP3(path+'/'+genData[audioFile]+idData[audioFile]+'.mp3').info.length) 
                        
        lengthData = list(audioLength) + list(lengthData)

      
bird['length']['max']=max(lengthData)
bird['length']['min']=min(lengthData)
bird['length']['mean']=mean(lengthData) 
bird['length']['median']=median(lengthData)  

pd.DataFrame(bird)       


Loading data from folder ../data/xeno-canto-dataset/Parus major cnt Poland type song
Loading data from folder ../data/xeno-canto-dataset/Parus major cnt Germany type song
Loading data from folder ../data/xeno-canto-dataset/Parus major cnt Slovakia type song
Loading data from folder ../data/xeno-canto-dataset/Parus major cnt Czech type song
Loading data from folder ../data/xeno-canto-dataset/Parus major cnt Lithuania type song


Unnamed: 0,type,country,number of files,length
max,Parus major,Poland Germany Slovakia Czech Lithuania,,974.244375
mean,Parus major,Poland Germany Slovakia Czech Lithuania,,102.053273
median,Parus major,Poland Germany Slovakia Czech Lithuania,,58.004725
min,Parus major,Poland Germany Slovakia Czech Lithuania,,3.325813
quality,Parus major,Poland Germany Slovakia Czech Lithuania,"{'A': 169, 'B': 145, 'C': 56, 'D': 46, 'E': 1,...",
total,Parus major,Poland Germany Slovakia Czech Lithuania,418,
