## What this notebook will do 
This notebook will combine the split-measuring of observed and virtual audio files. Till now, the split-measuring have been done in separate notebooks - making it all a bit scattered. This is my attempt at getting it all into one place. 

### Jump to sections:

1. Split-measuring 
    * [Observed audio split measuring](#observed-audio) 
    * [Virtual audio split measuring](#virtual-audio)
1. [Keeping only 50 ms windows ](#only50ms)
1. [Choosing non-silent windows](#loudwindows)
1. [Assigning number of bats to each annotation](#assignbatnum)
1. [Saving non-silent window measurements](#savingdata)


Some of the things I'll make sure to include are:

* removal of the last audio window measurements. This is because the last audio window is very likely to be <50 ms. I'd set a threshold of between 45-50ms to maximise the number of windows included in the analysis, but after discussions with N, it just added to the variation in spectral and temporal resolution across windows. To simplify things, we decided it's just best to get rid of windows that are <50ms long. Either way, it's not like this exclusion of <50 ms windows will really affect sample sizes. 


Original date of notebook creation : 2020-11-21

Author: Thejasvi Beleyur

In [1]:
import datetime as dt
import glob
import sys 
sys.path.append('../../correct_call_annotations/')
sys.path.append('../')
sys.path.append('../../individual_call_analysis/analysis/')

import correct_call_annotations.correct_call_annotations as cca
import format_and_clean
from format_and_clean import ind_call_format as icf
import measure_annot_audio

from measure_annot_audio import split_measure
from measure_annot_audio.inbuilt_measurement_functions import dB
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import soundfile as sf
import tqdm

In [2]:
%matplotlib notebook

In [3]:
print(f'Notebook started at: {dt.datetime.now()}')

Notebook started at: 2020-12-17 15:16:40.586850


<a id='observed-audio'></a>

## Observed audio split-measuring

In [4]:
verif_annots = pd.read_csv('../verified_annotations.csv').reset_index(drop=True) # manually re-checked set of audio files.
# See ../choosing valid annotation audio files.ipynb

verif_annots

Unnamed: 0.1,Unnamed: 0,valid_annotations,video_annot_id,num_bats
0,0,matching_annotaudio_Aditya_2018-08-16_21502300_18,Aditya_2018-08-16_21502300_18,1
1,1,matching_annotaudio_Aditya_2018-08-16_21502300_20,Aditya_2018-08-16_21502300_20,1
2,2,matching_annotaudio_Aditya_2018-08-16_21502300_22,Aditya_2018-08-16_21502300_22,1
3,3,matching_annotaudio_Aditya_2018-08-16_21502300_24,Aditya_2018-08-16_21502300_24,1
4,4,matching_annotaudio_Aditya_2018-08-16_21502300_25,Aditya_2018-08-16_21502300_25,1
...,...,...,...,...
409,430,matching_annotaudio_Aditya_2018-08-20_0300-040...,Aditya_2018-08-20_0300-0400_84,1
410,431,matching_annotaudio_Aditya_2018-08-20_0300-040...,Aditya_2018-08-20_0300-0400_85,1
411,432,matching_annotaudio_Aditya_2018-08-20_0300-040...,Aditya_2018-08-20_0300-0400_90,1
412,433,matching_annotaudio_Aditya_2018-08-20_0300-040...,Aditya_2018-08-20_0300-0400_91,1


In [5]:
all_measures = []

In [6]:
fs = 250000
kwargs = { 'inter_peak_difference':250, 
           'spectrum_smoothing_width': 100,
           'peak_range': 14,
           'fs':fs,
           'db_range':46,
           'overlap':0.5,
           'threshold':95,
           'nperseg':512,
           'fm_freqs_condition' : lambda X : X<98000,
           'non_signal_freqs_condition' : lambda X: X<70000
          
            }

In [7]:
for i, row in tqdm.tqdm(verif_annots.iterrows()):
    
    audio_folder = '../../individual_call_analysis/hp_annotation_audio/'
    file_path = cca.find_file_in_folder(verif_annots['valid_annotations'][i]+'_hp.WAV', audio_folder)
    measurements = split_measure.split_measure_audio(file_path[0],
                                  **kwargs)
    all_measures.append(measurements)


0it [00:00, ?it/s]

Match found!


1it [00:00,  2.64it/s]

Match found!


2it [00:01,  1.74it/s]

Match found!


3it [00:02,  1.56it/s]

Match found!


4it [00:02,  1.94it/s]

Match found!


5it [00:02,  2.25it/s]

Match found!


6it [00:03,  1.71it/s]

Match found!


7it [00:03,  2.03it/s]

Match found!


8it [00:04,  1.56it/s]

Match found!


9it [00:05,  1.52it/s]

Match found!


10it [00:07,  1.05s/it]

Match found!


11it [00:09,  1.35s/it]

Match found!


12it [00:10,  1.24s/it]

Match found!


13it [00:11,  1.23s/it]

Match found!


14it [00:12,  1.01it/s]

Match found!


15it [00:12,  1.20it/s]

Match found!


16it [00:15,  1.52s/it]

Match found!


17it [00:17,  1.43s/it]

Match found!


18it [00:17,  1.15s/it]

Match found!


  power_spectrum_audio = 20*np.log10(np.abs(np.fft.rfft(audio)))


19it [00:21,  1.87s/it]

Match found!


20it [00:21,  1.47s/it]

Match found!


21it [00:22,  1.35s/it]

Match found!


22it [00:24,  1.51s/it]

Match found!


23it [00:25,  1.22s/it]

Match found!


24it [00:25,  1.05s/it]

Match found!


25it [00:26,  1.09it/s]

Match found!


26it [00:28,  1.23s/it]

27it [00:28,  1.09it/s]

Match found!
Match found!


28it [00:29,  1.14it/s]

Match found!


29it [00:29,  1.41it/s]

Match found!


30it [00:29,  1.70it/s]

31it [00:30,  2.26it/s]

Match found!
Match found!


32it [00:30,  2.61it/s]

Match found!


33it [00:30,  2.97it/s]

Match found!


34it [00:30,  3.32it/s]

Match found!


35it [00:31,  1.84it/s]

Match found!


36it [00:32,  2.22it/s]

Match found!


37it [00:32,  2.50it/s]

Match found!


38it [00:32,  2.53it/s]

Match found!


39it [00:33,  2.55it/s]

Match found!


40it [00:33,  2.87it/s]

Match found!


41it [00:33,  2.89it/s]

Match found!


42it [00:34,  2.69it/s]

44it [00:34,  3.51it/s]

Match found!
Match found!
Match found!


45it [00:34,  3.46it/s]

Match found!


46it [00:34,  3.35it/s]

47it [00:35,  3.71it/s]

Match found!


Match found!


48it [00:35,  2.77it/s]

49it [00:35,  3.13it/s]

Match found!


50it [00:36,  3.86it/s]

Match found!
Match found!


51it [00:36,  2.18it/s]

Match found!


52it [00:39,  1.18s/it]

53it [00:39,  1.15it/s]

Match found!
Match found!


54it [00:40,  1.35it/s]

Match found!


55it [00:40,  1.64it/s]

56it [00:40,  2.14it/s]

Match found!
Match found!


57it [00:41,  2.59it/s]

Match found!
Match found!


59it [00:41,  3.21it/s]

Match found!


60it [00:41,  2.97it/s]

Match found!


61it [00:41,  3.07it/s]

Match found!


62it [00:43,  1.42it/s]

Match found!


63it [00:44,  1.39it/s]

Match found!


64it [00:44,  1.59it/s]

Match found!


65it [00:44,  1.96it/s]

Match found!
Match found!


67it [00:45,  2.29it/s]

Match found!


68it [00:45,  2.49it/s]

69it [00:45,  3.22it/s]

Match found!
Match found!


70it [00:47,  1.38it/s]

Match found!


71it [00:48,  1.49it/s]

Match found!


72it [00:48,  1.81it/s]

Match found!


73it [00:48,  2.14it/s]

Match found!


74it [00:48,  2.49it/s]

Match found!


75it [00:51,  1.06it/s]

Match found!


76it [00:51,  1.32it/s]

Match found!


77it [00:53,  1.00s/it]

Match found!


78it [00:54,  1.00it/s]

Match found!


79it [00:58,  1.92s/it]

Match found!


80it [00:59,  1.72s/it]

81it [00:59,  1.24s/it]

Match found!
Match found!


82it [00:59,  1.02it/s]

Match found!


83it [01:00,  1.12it/s]

Match found!


84it [01:01,  1.18it/s]

Match found!


85it [01:01,  1.49it/s]

86it [01:01,  1.96it/s]

Match found!
Match found!


87it [01:02,  2.02it/s]

Match found!


88it [01:02,  2.26it/s]

Match found!
Match found!


90it [01:03,  2.48it/s]

Match found!


91it [01:03,  2.84it/s]

92it [01:03,  3.27it/s]

Match found!


93it [01:03,  3.71it/s]

Match found!
Match found!


94it [01:04,  2.85it/s]

95it [01:04,  3.62it/s]

Match found!
Match found!


96it [01:04,  4.09it/s]

Match found!


97it [01:04,  4.01it/s]

98it [01:04,  4.67it/s]

Match found!
Match found!


99it [01:05,  3.84it/s]

Match found!


100it [01:05,  3.71it/s]

Match found!


101it [01:06,  1.72it/s]

102it [01:07,  2.24it/s]

Match found!
Match found!


103it [01:07,  2.03it/s]

104it [01:07,  2.50it/s]

Match found!
Match found!


105it [01:08,  1.58it/s]

Match found!


106it [01:10,  1.10it/s]

Match found!


107it [01:10,  1.37it/s]

Match found!


108it [01:11,  1.63it/s]

Match found!


109it [01:11,  1.84it/s]

Match found!


110it [01:11,  1.99it/s]

112it [01:12,  2.70it/s]

Match found!
Match found!
Match found!


113it [01:12,  3.21it/s]

Match found!


114it [01:12,  3.56it/s]

115it [01:12,  3.98it/s]

Match found!
Match found!


116it [01:12,  4.60it/s]

117it [01:12,  4.86it/s]

Match found!
Match found!


Match found!


119it [01:13,  5.20it/s]

Match found!


120it [01:13,  4.33it/s]

121it [01:13,  4.83it/s]

Match found!
Match found!


122it [01:13,  5.56it/s]

Match found!


123it [01:14,  4.95it/s]

124it [01:14,  5.13it/s]

Match found!
Match found!


125it [01:14,  5.21it/s]

Match found!


126it [01:14,  4.51it/s]

127it [01:14,  5.25it/s]

Match found!
Match found!


129it [01:15,  5.94it/s]

Match found!
Match found!


130it [01:15,  4.92it/s]

132it [01:15,  5.88it/s]

Match found!
Match found!
Match found!


133it [01:16,  3.54it/s]

134it [01:16,  3.97it/s]

Match found!
Match found!


135it [01:16,  4.72it/s]

136it [01:16,  4.84it/s]

Match found!


Match found!


137it [01:17,  3.70it/s]

138it [01:17,  4.10it/s]

Match found!
Match found!


139it [01:17,  3.51it/s]

140it [01:17,  3.81it/s]

Match found!


141it [01:18,  4.36it/s]

Match found!
Match found!


142it [01:18,  3.09it/s]

143it [01:18,  3.60it/s]

Match found!
Match found!


144it [01:19,  3.16it/s]

145it [01:19,  3.87it/s]

Match found!
Match found!


146it [01:19,  4.05it/s]

Match found!


147it [01:20,  2.60it/s]

Match found!


148it [01:20,  2.87it/s]

Match found!


149it [01:21,  2.19it/s]

Match found!


150it [01:21,  2.56it/s]

Match found!


151it [01:21,  2.82it/s]

152it [01:21,  3.42it/s]

Match found!
Match found!


153it [01:22,  3.03it/s]

Match found!


154it [01:22,  2.48it/s]

Match found!


155it [01:24,  1.31it/s]

156it [01:24,  1.70it/s]

Match found!
Match found!


157it [01:25,  1.22it/s]

Match found!


158it [01:26,  1.55it/s]

Match found!


159it [01:26,  1.61it/s]

Match found!


160it [01:27,  1.83it/s]

Match found!


161it [01:27,  2.02it/s]

Match found!


162it [01:27,  2.27it/s]

Match found!


163it [01:28,  2.06it/s]

164it [01:28,  2.66it/s]

Match found!
Match found!


165it [01:29,  2.11it/s]

166it [01:29,  2.69it/s]

Match found!
Match found!


167it [01:29,  3.24it/s]

Match found!


168it [01:29,  3.11it/s]

Match found!
Match found!


170it [01:31,  2.32it/s]

Match found!


171it [01:31,  2.29it/s]

Match found!


172it [01:32,  2.26it/s]

Match found!
Match found!


174it [01:32,  2.57it/s]

Match found!


175it [01:33,  2.26it/s]

Match found!


176it [01:33,  2.27it/s]

Match found!


177it [01:33,  2.59it/s]

178it [01:34,  3.11it/s]

Match found!
Match found!


179it [01:34,  3.38it/s]

Match found!


180it [01:36,  1.12it/s]

Match found!


181it [01:37,  1.01s/it]

Match found!


182it [01:39,  1.08s/it]

Match found!


183it [01:41,  1.44s/it]

Match found!


184it [01:42,  1.40s/it]

Match found!


185it [01:43,  1.25s/it]

Match found!


186it [01:45,  1.30s/it]

Match found!


187it [01:45,  1.15s/it]

Match found!


188it [01:46,  1.11it/s]

Match found!


189it [01:46,  1.20it/s]

Match found!


190it [01:47,  1.42it/s]

191it [01:47,  1.85it/s]

Match found!
Match found!


192it [01:47,  2.16it/s]

Match found!
Match found!


194it [01:48,  2.31it/s]

Match found!


195it [01:48,  2.34it/s]

Match found!


196it [01:49,  2.45it/s]

197it [01:49,  3.14it/s]

Match found!
Match found!


199it [01:49,  4.06it/s]

Match found!
Match found!


200it [01:49,  4.45it/s]

Match found!
Match found!


202it [01:49,  4.95it/s]

203it [01:50,  5.20it/s]

Match found!
Match found!


204it [01:51,  1.81it/s]

Match found!
Match found!


206it [01:51,  2.23it/s]

Match found!


207it [01:52,  2.30it/s]

Match found!


208it [01:53,  1.83it/s]

Match found!


209it [01:53,  1.97it/s]

Match found!


210it [01:53,  2.35it/s]

211it [01:53,  2.83it/s]

Match found!
Match found!


212it [01:54,  2.39it/s]

Match found!


213it [01:54,  2.55it/s]

Match found!


214it [01:55,  2.53it/s]

215it [01:55,  3.01it/s]

Match found!


Match found!


216it [01:55,  2.92it/s]

217it [01:56,  3.40it/s]

Match found!
Match found!


218it [01:56,  3.75it/s]

Match found!


219it [01:56,  3.05it/s]

Match found!


220it [01:57,  2.25it/s]

Match found!


221it [01:57,  2.22it/s]

Match found!


222it [01:58,  2.27it/s]

223it [01:58,  2.89it/s]

Match found!
Match found!


224it [01:58,  3.18it/s]

Match found!


225it [01:58,  3.08it/s]

226it [01:59,  3.51it/s]

Match found!
Match found!


227it [01:59,  3.46it/s]

Match found!


228it [02:00,  2.58it/s]

229it [02:00,  3.28it/s]

Match found!
Match found!


230it [02:00,  3.63it/s]

231it [02:00,  3.95it/s]

Match found!


Match found!


232it [02:00,  4.13it/s]

Match found!


233it [02:01,  2.16it/s]

Match found!


234it [02:02,  1.58it/s]

Match found!


235it [02:03,  1.69it/s]

Match found!


236it [02:03,  1.66it/s]

Match found!


237it [02:04,  1.51it/s]

238it [02:04,  1.98it/s]

Match found!
Match found!


239it [02:05,  2.14it/s]

240it [02:05,  2.77it/s]

Match found!
Match found!


241it [02:05,  2.82it/s]

242it [02:05,  3.48it/s]

Match found!
Match found!


243it [02:06,  3.76it/s]

244it [02:06,  4.06it/s]

Match found!


Match found!


245it [02:06,  4.03it/s]

Match found!


246it [02:07,  1.70it/s]

Match found!


247it [02:08,  1.48it/s]

Match found!


248it [02:09,  1.47it/s]

Match found!


249it [02:09,  1.81it/s]

250it [02:09,  2.27it/s]

Match found!
Match found!


251it [02:10,  2.01it/s]

Match found!


252it [02:11,  1.58it/s]

Match found!


253it [02:11,  1.73it/s]

Match found!
Match found!


255it [02:12,  2.28it/s]

256it [02:12,  2.77it/s]

Match found!
Match found!


257it [02:13,  1.40it/s]

258it [02:14,  1.82it/s]

Match found!
Match found!


259it [02:14,  2.21it/s]

Match found!


260it [02:14,  2.57it/s]

261it [02:14,  3.11it/s]

Match found!
Match found!


262it [02:15,  2.66it/s]

Match found!


263it [02:15,  2.62it/s]

Match found!


264it [02:16,  1.57it/s]

265it [02:16,  2.06it/s]

Match found!
Match found!


266it [02:17,  2.51it/s]

Match found!


267it [02:18,  1.67it/s]

268it [02:18,  2.18it/s]

Match found!
Match found!


269it [02:19,  1.77it/s]

Match found!


270it [02:19,  1.91it/s]

Match found!


271it [02:19,  2.05it/s]

272it [02:20,  2.67it/s]

Match found!
Match found!
Match found!


274it [02:20,  2.70it/s]

Match found!


275it [02:21,  3.08it/s]

Match found!


276it [02:21,  2.27it/s]

Match found!


277it [02:22,  1.52it/s]

278it [02:23,  2.00it/s]

Match found!
Match found!


279it [02:23,  2.40it/s]

280it [02:23,  2.92it/s]

Match found!
Match found!


281it [02:23,  3.44it/s]

Match found!


282it [02:25,  1.47it/s]

Match found!


283it [02:26,  1.07it/s]

Match found!


284it [02:26,  1.38it/s]

Match found!


285it [02:27,  1.74it/s]

286it [02:27,  2.25it/s]

Match found!
Match found!


287it [02:27,  2.38it/s]

Match found!


288it [02:27,  2.77it/s]

289it [02:28,  3.26it/s]

Match found!
Match found!


290it [02:28,  3.77it/s]

291it [02:28,  4.08it/s]

Match found!


292it [02:28,  4.87it/s]

Match found!
Match found!


293it [02:28,  3.97it/s]

Match found!


294it [02:29,  3.83it/s]

Match found!
Match found!


296it [02:30,  2.38it/s]

297it [02:30,  2.88it/s]

Match found!
Match found!


298it [02:34,  1.26s/it]

Match found!


299it [02:36,  1.47s/it]

Match found!


300it [02:36,  1.11s/it]

Match found!


301it [02:37,  1.07s/it]

302it [02:37,  1.25it/s]

Match found!
Match found!


303it [02:38,  1.09it/s]

Match found!


304it [02:40,  1.15s/it]

Match found!


305it [02:41,  1.06it/s]

Match found!


306it [02:41,  1.18it/s]

307it [02:41,  1.55it/s]

Match found!
Match found!


Match found!


309it [02:42,  1.83it/s]

310it [02:42,  2.33it/s]

Match found!
Match found!


311it [02:42,  2.60it/s]

Match found!


312it [02:43,  2.90it/s]

Match found!


313it [02:43,  3.13it/s]

Match found!


314it [02:43,  3.23it/s]

315it [02:43,  3.84it/s]

Match found!
Match found!


317it [02:44,  4.87it/s]

Match found!
Match found!
Match found!


319it [02:44,  4.69it/s]

Match found!


320it [02:44,  3.96it/s]

Match found!


321it [02:46,  1.57it/s]

Match found!


322it [02:46,  1.82it/s]

Match found!


323it [02:47,  1.38it/s]

Match found!
Match found!


325it [02:49,  1.23it/s]

326it [02:50,  1.60it/s]

Match found!
Match found!


327it [02:51,  1.14it/s]

Match found!


328it [02:52,  1.22it/s]

329it [02:52,  1.63it/s]

Match found!
Match found!


330it [02:52,  2.03it/s]

Match found!
Match found!


332it [02:52,  2.59it/s]

Match found!


333it [02:53,  2.82it/s]

Match found!


334it [02:53,  3.03it/s]

335it [02:53,  3.76it/s]

Match found!
Match found!


336it [02:53,  3.19it/s]

Match found!


337it [02:54,  1.97it/s]

Match found!


338it [02:55,  1.90it/s]

Match found!
Match found!


340it [02:56,  2.22it/s]

Match found!
Match found!


342it [02:56,  2.19it/s]

343it [02:57,  2.78it/s]

Match found!
Match found!
Match found!


345it [02:57,  3.72it/s]

Match found!


346it [02:57,  3.53it/s]

Match found!


347it [02:57,  3.25it/s]

Match found!


348it [02:58,  2.94it/s]

Match found!


349it [02:58,  2.50it/s]

350it [02:59,  2.99it/s]

Match found!
Match found!


351it [02:59,  2.73it/s]

Match found!


352it [02:59,  2.82it/s]

Match found!


353it [03:00,  3.04it/s]

Match found!


354it [03:00,  2.97it/s]

Match found!


355it [03:00,  3.28it/s]

Match found!


356it [03:01,  1.89it/s]

Match found!


357it [03:02,  2.18it/s]

Match found!


358it [03:02,  2.43it/s]

359it [03:02,  3.07it/s]

Match found!
Match found!


360it [03:02,  2.83it/s]

Match found!
Match found!


362it [03:03,  3.52it/s]

364it [03:03,  4.46it/s]

Match found!
Match found!
Match found!


Match found!


366it [03:03,  4.78it/s]

Match found!


367it [03:03,  4.03it/s]

Match found!


368it [03:04,  3.63it/s]

Match found!


369it [03:04,  3.49it/s]

Match found!


370it [03:04,  3.27it/s]

Match found!


371it [03:05,  2.90it/s]

Match found!


372it [03:07,  1.37it/s]

373it [03:07,  1.78it/s]

Match found!
Match found!


374it [03:07,  2.17it/s]

Match found!
Match found!


376it [03:07,  2.72it/s]

377it [03:07,  3.45it/s]

Match found!
Match found!


378it [03:08,  3.41it/s]

Match found!


379it [03:08,  3.58it/s]

Match found!


380it [03:08,  3.50it/s]

Match found!


381it [03:08,  3.44it/s]

382it [03:09,  4.01it/s]

Match found!
Match found!


383it [03:09,  3.78it/s]

Match found!


384it [03:09,  2.97it/s]

Match found!
Match found!


386it [03:10,  2.97it/s]

Match found!


387it [03:11,  2.49it/s]

Match found!


388it [03:11,  1.99it/s]

Match found!


389it [03:12,  2.35it/s]

Match found!


390it [03:12,  2.72it/s]

Match found!


391it [03:12,  2.53it/s]

Match found!


392it [03:13,  2.01it/s]

393it [03:13,  2.65it/s]

Match found!
Match found!


394it [03:13,  2.94it/s]

Match found!


395it [03:14,  2.61it/s]

Match found!


396it [03:15,  2.19it/s]

Match found!


397it [03:15,  2.37it/s]

398it [03:15,  3.02it/s]

Match found!
Match found!


399it [03:15,  3.50it/s]

Match found!


400it [03:15,  3.59it/s]

401it [03:16,  4.21it/s]

Match found!
Match found!


402it [03:17,  1.37it/s]

Match found!


403it [03:18,  1.53it/s]

Match found!


404it [03:19,  1.53it/s]

Match found!


405it [03:19,  1.87it/s]

406it [03:19,  2.45it/s]

Match found!
Match found!


407it [03:20,  1.55it/s]

Match found!


408it [03:20,  1.92it/s]

410it [03:21,  2.63it/s]

Match found!
Match found!
Match found!


411it [03:21,  2.86it/s]

412it [03:21,  3.33it/s]

Match found!
Match found!


413it [03:21,  4.00it/s]

414it [03:21,  4.34it/s]

414it [03:21,  2.05it/s]

Match found!





In [8]:
all_split_measure = pd.concat(all_measures).reset_index(drop=True)

In [9]:
all_split_measure

Unnamed: 0,value,segment_number,measurement,file_name
0,0.045400,0,rms,matching_annotaudio_Aditya_2018-08-16_21502300...
1,0.118805,0,peak_amplitude,matching_annotaudio_Aditya_2018-08-16_21502300...
2,105660.000000,0,dominant_frequencies,matching_annotaudio_Aditya_2018-08-16_21502300...
3,87402.343750,0,fm_terminal_freqs,matching_annotaudio_Aditya_2018-08-16_21502300...
4,85449.218750,0,fm_terminal_freqs,matching_annotaudio_Aditya_2018-08-16_21502300...
...,...,...,...,...
402688,103600.000000,12,dominant_frequencies,matching_annotaudio_Aditya_2018-08-20_0300-040...
402689,103840.000000,12,dominant_frequencies,matching_annotaudio_Aditya_2018-08-20_0300-040...
402690,104540.000000,12,dominant_frequencies,matching_annotaudio_Aditya_2018-08-20_0300-040...
402691,105100.000000,12,dominant_frequencies,matching_annotaudio_Aditya_2018-08-20_0300-040...


In [10]:
all_split_measure['unique_window_id'] = all_split_measure['segment_number'].astype(str) +'_'+all_split_measure['file_name'] 

<a id='virtual-audio'></a>
## Virtual audio split-measuring 

In [11]:
audio_source_folder = '../virtual_multi_bat_audio/'
virtual_multibat_to_measure = glob.glob(audio_source_folder + '*.WAV')

In [12]:
all_virt_measures = []
for each in tqdm.tqdm(virtual_multibat_to_measure):

    measurements = split_measure.split_measure_audio(each,
                                  **kwargs)
    all_virt_measures.append(measurements)


  0%|                                                                                           | 0/89 [00:00<?, ?it/s]

  1%|▉                                                                                  | 1/89 [00:00<00:14,  5.91it/s]

  2%|█▊                                                                                 | 2/89 [00:01<00:34,  2.52it/s]

  3%|██▊                                                                                | 3/89 [00:02<00:48,  1.76it/s]

  4%|███▋                                                                               | 4/89 [00:02<00:41,  2.03it/s]

  6%|████▋                                                                              | 5/89 [00:02<00:39,  2.15it/s]

  7%|█████▌                                                                             | 6/89 [00:03<00:51,  1.61it/s]

  9%|███████▍                                                                           | 8/89 [00:04<00:39,  2.04it/s]

 10%|████████▍                                                                          | 9/89 [00:04<00:34,  2.31it/s]

 11%|█████████▏                                                                        | 10/89 [00:04<00:29,  2.68it/s]

 12%|██████████▏                                                                       | 11/89 [00:04<00:26,  2.90it/s]

 13%|███████████                                                                       | 12/89 [00:05<00:24,  3.15it/s]

 15%|███████████▉                                                                      | 13/89 [00:05<00:20,  3.75it/s]

 16%|████████████▉                                                                     | 14/89 [00:05<00:18,  4.13it/s]

 17%|█████████████▊                                                                    | 15/89 [00:05<00:15,  4.64it/s]

 18%|██████████████▋                                                                   | 16/89 [00:05<00:15,  4.64it/s]

 19%|███████████████▋                                                                  | 17/89 [00:06<00:15,  4.51it/s]

 20%|████████████████▌                                                                 | 18/89 [00:06<00:17,  4.07it/s]

 21%|█████████████████▌                                                                | 19/89 [00:06<00:17,  4.04it/s]

 22%|██████████████████▍                                                               | 20/89 [00:08<00:39,  1.75it/s]

 24%|███████████████████▎                                                              | 21/89 [00:08<00:39,  1.74it/s]

 25%|████████████████████▎                                                             | 22/89 [00:08<00:33,  2.00it/s]

 26%|█████████████████████▏                                                            | 23/89 [00:09<00:27,  2.42it/s]

 27%|██████████████████████                                                            | 24/89 [00:09<00:24,  2.67it/s]

 29%|███████████████████████▉                                                          | 26/89 [00:10<00:29,  2.13it/s]

 30%|████████████████████████▉                                                         | 27/89 [00:11<00:27,  2.24it/s]

 31%|█████████████████████████▊                                                        | 28/89 [00:11<00:23,  2.58it/s]

 33%|██████████████████████████▋                                                       | 29/89 [00:11<00:20,  2.91it/s]

 34%|███████████████████████████▋                                                      | 30/89 [00:11<00:17,  3.36it/s]

 35%|████████████████████████████▌                                                     | 31/89 [00:12<00:15,  3.64it/s]

 36%|█████████████████████████████▍                                                    | 32/89 [00:12<00:12,  4.49it/s]

 38%|███████████████████████████████▎                                                  | 34/89 [00:12<00:10,  5.48it/s]

 39%|████████████████████████████████▏                                                 | 35/89 [00:12<00:13,  3.96it/s]

 40%|█████████████████████████████████▏                                                | 36/89 [00:13<00:14,  3.74it/s]

 42%|██████████████████████████████████                                                | 37/89 [00:13<00:12,  4.29it/s]

 43%|███████████████████████████████████                                               | 38/89 [00:13<00:11,  4.48it/s]

 44%|███████████████████████████████████▉                                              | 39/89 [00:13<00:11,  4.23it/s]

 45%|████████████████████████████████████▊                                             | 40/89 [00:14<00:17,  2.76it/s]

 46%|█████████████████████████████████████▊                                            | 41/89 [00:14<00:18,  2.57it/s]

 47%|██████████████████████████████████████▋                                           | 42/89 [00:15<00:15,  3.00it/s]

 48%|███████████████████████████████████████▌                                          | 43/89 [00:15<00:22,  2.01it/s]

 49%|████████████████████████████████████████▌                                         | 44/89 [00:16<00:17,  2.65it/s]

 51%|█████████████████████████████████████████▍                                        | 45/89 [00:16<00:15,  2.82it/s]

 52%|██████████████████████████████████████████▍                                       | 46/89 [00:16<00:13,  3.15it/s]

 53%|███████████████████████████████████████████▎                                      | 47/89 [00:16<00:13,  3.20it/s]

 55%|█████████████████████████████████████████████▏                                    | 49/89 [00:17<00:09,  4.02it/s]

 56%|██████████████████████████████████████████████                                    | 50/89 [00:17<00:08,  4.44it/s]

 57%|██████████████████████████████████████████████▉                                   | 51/89 [00:17<00:07,  4.96it/s]

 60%|████████████████████████████████████████████████▊                                 | 53/89 [00:17<00:06,  5.43it/s]

 61%|█████████████████████████████████████████████████▊                                | 54/89 [00:17<00:06,  5.57it/s]

 62%|██████████████████████████████████████████████████▋                               | 55/89 [00:18<00:06,  4.89it/s]

 63%|███████████████████████████████████████████████████▌                              | 56/89 [00:18<00:05,  5.77it/s]

 64%|████████████████████████████████████████████████████▌                             | 57/89 [00:18<00:05,  5.81it/s]

 65%|█████████████████████████████████████████████████████▍                            | 58/89 [00:18<00:04,  6.44it/s]

 66%|██████████████████████████████████████████████████████▎                           | 59/89 [00:19<00:15,  1.98it/s]

 67%|███████████████████████████████████████████████████████▎                          | 60/89 [00:19<00:11,  2.47it/s]

 69%|████████████████████████████████████████████████████████▏                         | 61/89 [00:20<00:09,  2.98it/s]

 70%|█████████████████████████████████████████████████████████                         | 62/89 [00:20<00:07,  3.58it/s]

 71%|██████████████████████████████████████████████████████████                        | 63/89 [00:20<00:08,  3.15it/s]

 72%|██████████████████████████████████████████████████████████▉                       | 64/89 [00:20<00:06,  3.66it/s]

 73%|███████████████████████████████████████████████████████████▉                      | 65/89 [00:20<00:05,  4.47it/s]

 74%|████████████████████████████████████████████████████████████▊                     | 66/89 [00:21<00:10,  2.16it/s]

 75%|█████████████████████████████████████████████████████████████▋                    | 67/89 [00:22<00:08,  2.69it/s]

 76%|██████████████████████████████████████████████████████████████▋                   | 68/89 [00:23<00:13,  1.54it/s]

 78%|███████████████████████████████████████████████████████████████▌                  | 69/89 [00:23<00:09,  2.04it/s]

 79%|████████████████████████████████████████████████████████████████▍                 | 70/89 [00:23<00:07,  2.63it/s]

 80%|█████████████████████████████████████████████████████████████████▍                | 71/89 [00:23<00:05,  3.04it/s]

 81%|██████████████████████████████████████████████████████████████████▎               | 72/89 [00:24<00:04,  3.80it/s]

 83%|████████████████████████████████████████████████████████████████████▏             | 74/89 [00:24<00:03,  4.90it/s]

 84%|█████████████████████████████████████████████████████████████████████             | 75/89 [00:24<00:02,  5.23it/s]

 87%|██████████████████████████████████████████████████████████████████████▉           | 77/89 [00:24<00:02,  5.07it/s]

 89%|████████████████████████████████████████████████████████████████████████▊         | 79/89 [00:25<00:01,  5.12it/s]

 90%|█████████████████████████████████████████████████████████████████████████▋        | 80/89 [00:25<00:02,  3.89it/s]

 91%|██████████████████████████████████████████████████████████████████████████▋       | 81/89 [00:26<00:04,  1.73it/s]

 93%|████████████████████████████████████████████████████████████████████████████▍     | 83/89 [00:27<00:02,  2.22it/s]

 94%|█████████████████████████████████████████████████████████████████████████████▍    | 84/89 [00:27<00:02,  2.15it/s]

 96%|██████████████████████████████████████████████████████████████████████████████▎   | 85/89 [00:28<00:02,  1.90it/s]

 97%|███████████████████████████████████████████████████████████████████████████████▏  | 86/89 [00:28<00:01,  1.97it/s]

 98%|████████████████████████████████████████████████████████████████████████████████▏ | 87/89 [00:29<00:00,  2.25it/s]

100%|██████████████████████████████████████████████████████████████████████████████████| 89/89 [00:30<00:00,  1.82it/s]

100%|██████████████████████████████████████████████████████████████████████████████████| 89/89 [00:30<00:00,  2.90it/s]




In [13]:
all_virtual_splitmeasure = pd.concat(all_virt_measures).reset_index(drop=True)

all_virtual_splitmeasure['unique_window_id'] = all_virtual_splitmeasure['segment_number'].astype(str) +'_'+all_virtual_splitmeasure['file_name'] 


In [14]:
virt_rms_values = all_virtual_splitmeasure[all_virtual_splitmeasure['measurement']=='rms']
obs_rms_values = all_split_measure[all_split_measure['measurement']=='rms']
plt.figure()
plt.violinplot([dB(obs_rms_values['value']) , dB(virt_rms_values['value'])], quantiles=[[0.025,0.5,0.975]]*2)
plt.title('Window intensities')
plt.xticks([1,2],['Observed\nwindows','Virtual\nwindows'], fontsize=10)
plt.ylabel('Received level, dB rms', fontsize=12)

<IPython.core.display.Javascript object>

Text(0, 0.5, 'Received level, dB rms')

Both observed and virtual audio windows have a wide range of levels. We now need to filter out those which are very silent, and unlikely to have calls or only have very faint calls. The threshold has been chosen as being +20 dB above the level of manually chosen silent audio segments. 

<a id='only50ms'></a>
## Keeping only 50 ms windows 
This is done by removing the last measured window from each audio file. The last window of each audio file can be 47-50ms due to the 90% allowance built into the ```split_measure.split_audio``` function. 

In [15]:
byfilename = all_split_measure.groupby('file_name')

only50ms_windows_allfiles = []
for filename, df in byfilename:
    segment_numbers = np.sort(np.unique(df['segment_number']))
    
    if len(segment_numbers)>1:
        only50ms_windows = df[df['segment_number']!=segment_numbers[-1]]
        only50ms_windows_allfiles.append(only50ms_windows)
    else:
        only50ms_windows_allfiles.append(df)
obs_50ms_splitmeasure = pd.concat(only50ms_windows_allfiles).reset_index(drop=True)

In [16]:
virt_byfilename = all_virtual_splitmeasure.groupby('file_name')

virt_only50ms_windows_allfiles = []
for filename, df in virt_byfilename:
    segment_numbers = np.sort(np.unique(df['segment_number']))
    
    if len(segment_numbers)>1:
        only50ms_windows = df[df['segment_number']!=segment_numbers[-1]]
        virt_only50ms_windows_allfiles.append(only50ms_windows)
    else:
        virt_only50ms_windows_allfiles.append(df)

virt_50ms_splitmeasure = pd.concat(virt_only50ms_windows_allfiles).reset_index(drop=True)

<a id='loudwindows'></a>
##  Choosing non-silent windows

Let's load the silent audio +20 dB thresholds to choose only those windows that are loud. 

In [17]:
plus20dB_threshold = pd.read_csv('../call_threshold_levels.csv')
plus20dB_threshold

Unnamed: 0.1,Unnamed: 0,call_dbpeak_threshold,call_dbrms_threshold
0,0,-23.0,-37.0


In [18]:
def choose_all_windows_above_threshold(measurement_df, threshold):
    '''
    Chooses all windows with rms greater or equal to the given rms threshold
    
    
    Parameters
    ----------
    measurement_df : pd.DataFrame
        A long dataframe with one measurement value per row
        Must have the columns 'measurement', 'unique_window_id'
        The 'measurement' column must also have some entries with 'rms' in them. 
    threshold : float>0
        The rms threshold to define windows which are not silent. 
    
    Returns 
    -------
    all_above_threshold : pd.DataFrame
        The dataframe with all segments across different files that are greater than or equal to the 
        chosen threshold rms. 
    '''

    rms_values = measurement_df[measurement_df['measurement']=='rms'].reset_index(drop=True)
    above_threshold = rms_values[rms_values['value']>=threshold]
    windowids_above_threshold = above_threshold['unique_window_id']
    all_above_threshold = measurement_df[measurement_df['unique_window_id'].isin(windowids_above_threshold)].reset_index(drop=True)
    return all_above_threshold


In [19]:
threshold_20db = float(10**(plus20dB_threshold['call_dbrms_threshold']/20.0))
threshold_20db

0.01412537544622754

In [20]:
non_silent_observedaudio_measures = choose_all_windows_above_threshold(obs_50ms_splitmeasure, threshold_20db)
non_silent_virtualaudio_measures = choose_all_windows_above_threshold(virt_50ms_splitmeasure, threshold_20db)


In [21]:
rms_df = non_silent_observedaudio_measures[non_silent_observedaudio_measures['measurement']=='rms']
virt_rmsdf = non_silent_virtualaudio_measures[non_silent_virtualaudio_measures['measurement']=='rms']

plt.figure()
plt.violinplot([dB(rms_df['value']), dB(virt_rmsdf['value'])], showextrema=False, 
               quantiles=[[0.025,0.5,0.975]]*2);
plt.xticks([1,2],['Observed windows','Virtual windows']);plt.ylabel('Non-silent window received level, dB rms', fontsize=12)

<IPython.core.display.Javascript object>

Text(0, 0.5, 'Non-silent window received level, dB rms')

<a id='assignbatnum'></a>
## Assigning number of bats to each annotation

In [22]:
# observed audio bat number assigning

non_silent_observedaudio_measures['video_annot_id'] = non_silent_observedaudio_measures['file_name'].str.lstrip('matching_annotaudio_')
# remove the '_hp' from the video_annot_ida
non_silent_observedaudio_measures['video_annot_id'] = non_silent_observedaudio_measures['video_annot_id'].str.rstrip('_hp.WAV')

# match the annotation id to the number of bats 
video_annot_folder = '../../whole_data_analysis/annotations/corrected_HBC_video_annotations_Aditya/'
non_silent_observedaudio_measures['num_bats'] = icf.get_numbats_from_annotation_id(non_silent_observedaudio_measures['video_annot_id'], video_annot_folder)

In [23]:
non_silent_observedaudio_measures

Unnamed: 0,value,segment_number,measurement,file_name,unique_window_id,video_annot_id,num_bats
0,0.022077,0,rms,matching_annotaudio_Aditya_2018-08-16_21502300...,0_matching_annotaudio_Aditya_2018-08-16_215023...,Aditya_2018-08-16_21502300_100,1
1,0.062134,0,peak_amplitude,matching_annotaudio_Aditya_2018-08-16_21502300...,0_matching_annotaudio_Aditya_2018-08-16_215023...,Aditya_2018-08-16_21502300_100,1
2,105040.000000,0,dominant_frequencies,matching_annotaudio_Aditya_2018-08-16_21502300...,0_matching_annotaudio_Aditya_2018-08-16_215023...,Aditya_2018-08-16_21502300_100,1
3,90332.031250,0,fm_terminal_freqs,matching_annotaudio_Aditya_2018-08-16_21502300...,0_matching_annotaudio_Aditya_2018-08-16_215023...,Aditya_2018-08-16_21502300_100,1
4,89843.750000,0,fm_terminal_freqs,matching_annotaudio_Aditya_2018-08-16_21502300...,0_matching_annotaudio_Aditya_2018-08-16_215023...,Aditya_2018-08-16_21502300_100,1
...,...,...,...,...,...,...,...
17933,0.044647,3,peak_amplitude,matching_annotaudio_Aditya_2018-08-20_0300-040...,3_matching_annotaudio_Aditya_2018-08-20_0300-0...,Aditya_2018-08-20_0300-0400_91,1
17934,105200.000000,3,dominant_frequencies,matching_annotaudio_Aditya_2018-08-20_0300-040...,3_matching_annotaudio_Aditya_2018-08-20_0300-0...,Aditya_2018-08-20_0300-0400_91,1
17935,87402.343750,3,fm_terminal_freqs,matching_annotaudio_Aditya_2018-08-20_0300-040...,3_matching_annotaudio_Aditya_2018-08-20_0300-0...,Aditya_2018-08-20_0300-0400_91,1
17936,85449.218750,3,fm_terminal_freqs,matching_annotaudio_Aditya_2018-08-20_0300-040...,3_matching_annotaudio_Aditya_2018-08-20_0300-0...,Aditya_2018-08-20_0300-0400_91,1


In [24]:
# report the stats on number of annotations with diff types of bats
print('Number of observed annotations by group size')
obs_by_batnum = non_silent_observedaudio_measures.groupby(['num_bats'])
for batnum, df in obs_by_batnum:
    print(batnum, np.unique(df['file_name']).size)

Number of observed annotations by group size
1 236
2 71
3 18
4 6


In [25]:
# virtual audio bat number assigning
non_silent_virtualaudio_measures['video_annot_id'] = non_silent_virtualaudio_measures['file_name'].str.lstrip('matching_annotaudio_')
# remove the '_hp' from the video_annot_ida
non_silent_virtualaudio_measures['video_annot_id'] = non_silent_virtualaudio_measures['video_annot_id'].str.rstrip('_hp_singlebatmixed.WAV')
video_annot_folder = '../../whole_data_analysis/annotations/corrected_HBC_video_annotations_Aditya/'
non_silent_virtualaudio_measures['num_bats'] = icf.get_numbats_from_annotation_id(non_silent_virtualaudio_measures['video_annot_id'], video_annot_folder)


In [26]:
# report the stats on number of annotations with diff types of bats
print('Number of virtual annotations by group size')
obs_by_batnum = non_silent_virtualaudio_measures.groupby(['num_bats'])
for batnum, df in obs_by_batnum:
    print(batnum, np.unique(df['file_name']).size)

Number of virtual annotations by group size
2 65
3 14
4 4


<a id='savingdata'></a>
## Saving non-silent window measurements

In [27]:
# observed audio data 
non_silent_observedaudio_measures.to_csv('obs_nonsilent_measurements_20dBthreshold.csv')

# virtual audio data 
non_silent_virtualaudio_measures.to_csv('virt_nonsilent_measurement_20dBthreshold.csv')

In [28]:
print(f'Notebook run ended at : {dt.datetime.now()}')

Notebook run ended at : 2020-12-17 15:20:58.546331
