## This is a user manual on how to use the ```match_audio_to_video``` package. 

Author : Thejasvi Beleyur
         Acoustic and Functional Ecology Group, 
         Max Planck Institute for Ornithology, Seewiesen

last updated 6 August 2019 

This package is released with a GPL License - please refer to file 'LICENSE' for more details.

###  Requirements:
This package is built and tested on Python 2.7.15 running on Ubuntu 18.04. It requires the following packages

numpy, pandas, opencv, pytesseract, PIL, skimage, scipy, soundfile 

pytesseract is only a wrapper around the main Tesseract OCR. You must [install Tesseract OCR](https://github.com/tesseract-ocr/tesseract/wiki) for it to work. Windows users please be aware that you must [put the folder](https://stackoverflow.com/a/43785789/4955732) in which the ```tesseract.exe``` file is found into the PATH environment variable. 


###  Low-cost audio-video synchronisation :
Many experiments/recording situations require the use of audio and video devices that cannot directly interface with e ach other - and may also include the use of custom equipment. This is where a specific audio-video sync protocol is required. One of the ways includes the [SMPTE timestamps](https://en.wikipedia.org/wiki/SMPTE_timecode) with a special device. However, recently [Laurijssen et al. 2017](https://jeb.biologists.org/content/221/4/jeb173724.abstract) published their low-cost methodology which relies on a simple and common ON/OFF signal passed to a light source and the audio device. The ON/OFF signal of the light source on the video is matched to the audio. 

### What this package does:
This package was developed to find matching audio snippets to user annotated segments in a video file. 
eg. If the relevant video footage is between 23:00:05 to 23:01:10, then the blinking light signal from the footage and 
the timestamps are used to find an audio segment of the same length. 


### What this package requires:
The package assumes the following user annotation, audio and video inputs:






<a id='video_input'></a>
* #### Video Files : Each frame of the video file must have a timestamp and blinking light visible for best results. The position of the timestamp and blinking light should not move within a single video file. 


![](img/eg_video_frame.jpg)
Figure 1 : An example frame from the video file. The timestamp can be user specified and is not limited to YYYY-mm-dd HH:MM:SS. The ON/OFF of the blinking light source must be clearly visible for best results. 

<a id='audio_input'></a>
* #### Audio File/s : Audio files must be in a commonly used format (wav, flac,etc) and one of these channels must be solely dedicated to recording the audio sync signal. Since not all soundcards can handle DC signals - like those in an ON/OFF signal to drive a light - even capacitor gated signals can be handled by this package. In capacitor gated signals - a positive spike appears when the light goes on and a negative spike appears when the light goes off.

![](img/capacitor_gated.PNG)
Figure 2 : The upper channel is the capacitor gated audio and the lower channel is the blinking lights' normalised intensity. 


<a id='annotation_input'></a>
* #### User annotations : the user annotations are expected to have a start and end with the timestamps corresponding to the video frames in the file. Unless the video fps is 1 Hz, there will be multiple frames with the same timestamps, and thus the start_framenumber and the end_framenumber must also be specified. The user annotations must be a .csv file in the format shown below. The csv file may have even more columns - but these will not be accessed by the package.
![](img/eg_csv_file.PNG)
Figure 3 : An example annotation csv file

# The workflow : 

## Steps 1-3 : Read timestamps and light intensity in video 

Step 1: Annotate all relevant video segments into a csv file with the format described [here](#annotation_input)

Step 2 : Get the 'borders' of your blinking light and the timestamp in the video. Open the 'browse_through_video.py' file through a command prompt or on your IDE of choice. Here we will show the command prompt example as it's quicker. 

### What a border is and how to get it yourself:
The 'borders' are admittedly somewhat un-intuitive - they are a set of 4 numbers describing the number of pixels to be cropped to the left, above, right and below the region of interest. This is the format used by ImageOps.crop of the PIL package. 

### Using the *browse_through_video* module to get borders:
Move to the match_audio_to_video folder. 
Be sure to activate your virtual environment and type ```python browse_through_video.py -v <path to your video here> ```

![](img/eg_browse_through_video.png)

An interactive window will pop up to show the video  <DETAILED INSTRUCTIONS HERE!!!>


Step 3: Generate the relevant data from the video file using the *generate_data_from_video* module. This step can be time consuming as it runs Optical Character Recognition on the timestamps. Describe where the timestamp and light can be found by entering the number of pixels to be cropped to the left, above, right and below. {DESCRIBE THIS PART BETTER!!} 

The keyword arguments must be ```timestamp_border``` and ```led_border```. If the positions of the timestamp and led_border are different across multiple videos - remember to get the data from videos in separate runs. 
If you would like to run the data generation for a limited part of a video, then you can specify the ```start_frame```  or ```end_frame``` too. 


```
import pandas as pd
from generate_data_from_video import generate_videodata_from_annotations

annotations = pd.read_csv('video1_and_2_annotations.csv')

kwargs= {}
kwargs['timestamp_border'] = (550, 50, 70, 990)
kwargs['led_border'] = (874, 1025, 45, 38)

# kwargs['end_frame'] = 7000 # only if you want to run the function before the 7000 frame
# kwargs['start_frame'] = 500 # only if you want to run the function from the 500th frame

generate_videodata_from_annotations(annotations, **kwargs)
 
```

Running this will generate a set of csv files with the automatically read timestamps and light intensity signal for each frame. The files will be named with the following convention ```videosync_{VIDEO_FILENAME}_.csv```. 

Step 4: The auto-read timestamps may not always be perfectly detected. These timestamps must be verified manually before further processing can be done. The verified timestamps must be in the ```timestamp_verified``` column in the videosync csv file. Copy the contents of the ```timestamp``` column onto the ```timestamp_verified``` column first. 

Check if there are any misreads. In the event of a misread - when in doubt it's best to go back to that particular frame and check the timestamp directly - and make the change in the timestamp_verified. 


###  the timestamps are !@#! and the light intensity is not being picked up well?

## TODOOOOOO : You can input custom

## Steps 5 - 6 : Check for missing frames, variable fps and process annotations

Step 5 : load the annotations file and the corresponding video_sync_data from a single video file. For proper audio-video synchronisation it is important that a sufficiently long signal is used to crosscorrelate with the audio signal. 
This means that even though the relevant annotation may be a few frames long, the actual signal for cross-correlation may be a few seconds long. This minimum duration is set by the ```min_durn``` keyword argument.  According to Laurijssen et al., the shortest duration for a reliable match is the period that has at least 10 transitions. eg. if the ON/OFF signal is output from a distribution of pulse durations betweeen 0.5-2.0 seconds. Then the min_durn should be set at least to 20 seconds. In practice it may be better to set it to a bit longer if you suspect the video sync signal is faint/unreliable. 

The ```video_sync_over_annotation_block``` function checks, corrects and alerts, for the following issues:

* Variable fps : consumer-grade DVRs can show varying fps, perhapsdue to frame drops. If detected, and if at no point of time the fps falls below the ```min_fps```, then the light intensity signal is resampled to the ```common_fps``` every second.

*ATTENTION* Currently the fps in any given second is counted by the number of frames with the same time stamp - this may mess things up if the timestamp on the frame shows sub-second resolution! 

* Missing frames : sometimes there could be a whole second missing in the timestamps within a single annotation - indicating major frame drops. This annotation is dropped - and not processed anymore. 

* Inconsistent annotations : if the start and end times of the annotation are vice-versa then an error is thrown. 


Step 6 :  Run the ```video_sync_over_annotation_block``` over each annotation. ```min_fps``` is the minimum fps that must be maintained every second in the sync block - else the annotation segment is not processed further. If varying fps is detected within a sync block - then all the frames are resampled to ```common_fps``` . ```min_durn``` is the minimum duration that is required for a reliable match. 

Optional Parameters 
## TODO 


```
from process_video_annotations import video_sync_over_annotation_block

annotations = pd.read_csv('video1_annotations.csv')
video_sync_raw = pd.read_csv('videosync_video1.avi_.csv')

kwargs = {'timestamp_pattern': '%Y-%m-%d %H:%M:%S'}
kwargs['min_fps']= 22 # Hz
kwargs['min_durn'] = 5.0 # seconds 
kwargs['common_fps'] = 25 # Hz

success = annotations.apply(video_sync_over_annotation_block,1, video_sync_data=video_sync_raw, 
                                 **kwargs)
```
For every annotation which is successfully processed, a csv file is output with the following name pattern : *common_fps_video_sync{annotation_id}.csv*. You can see which annotations were succesfully processed by looking at the ```success``` list with Boolean entries. 


##  Steps 7 -8 : Begin the search for matching audio segments 

Step 7 : After generating a reliable video sync signal - we can now begin looking for a matching audio segment. 

Optional Parameters - add these to the keyword arguments

* audio sampling rate : the ```audio_fs``` keyword argument can be used to set the audio sampling rate in Hz. If not set manually the first file encountered in the ```audio_folder``` will be used to get this information. 

* audio file format : the ```audio_fileformat``` keyword argument can be used to search for a specific set of audio files. If not set - the format is assumed to be '.WAV'

* audio sync channel : the ```audiosync_channel``` keyword argument can be used to set which channel is taken to have the audio sync signal. Defaults to the last channel. Channel numbering starts from 0 ! 

* contiguous files : the ```contiguous_files``` keyword argument can be used to flag whether all the files in the ```audio_folder``` are contiguous, ie. the end of one file corresponds to the start of the next file. The files are sorted by filename and chunks of them are contiguously loaded. Defaults to True. ATTENTION : ```contiguous_files=False``` is **not** implemented in the current version. 

* spikey audio sync : whether the audio sync signal is capacitor gated or not. If capacitor gated then, the ON/OFF signal looks 'spikey' in the waveform representation. See [Figure 1](#audio_input) for example. A ON/OFF signal is derived from the spikes and then used to cross-correlate with the video sync signal. This is set by ```audio_sync_spikey``` keyword argument. Defaults to True. 


```
import glob 

from audio_for_videoannotation import match_video_sync_to_audio

all_commonfps = glob.glob('common_fps_video_sync*') # get all the relevant common_fps_sync files
audio_folder = 'test_data/'
kwargs = {}

### uncomment if it applies 

# generate the 
for somenumber, each_commonfps in enumerate(all_commonfps):
    video_sync = pd.read_csv(each_commonfps)
    best_audio = match_video_sync_to_audio(video_sync, audio_folder,
                                                                 **kwargs)
                                                                 
    sf.write('matching_sync_'+str(somenumber)+'.WAV', best_audio,250000)
```
The output of ```match_video_sync_to_audio``` is a Nchannel + 1 numpy array. The extra channel is the time-aligned video sync signal corresponding to this audio segment. 

Step 8 : *CHECK* the audio-video match. The video sync signal and audio sync signal must go ON and OFF relatively synchronously. A good match is one where the correspondence is fairly obvious - a bad match is pretty obvious - with only a few matching ON-OFFs and the rest of the segment off-track. If you suspect a bad match redo the video-sync signal , check the files in the ```audio_folder``` once more and check if they could have overlapped at all using the modified time or similar file information. If the audio files definitely overlapped with the video segment annotated, increase the ```min_durn``` used to get a more precise match. 


### Example AV matches : top channel is the audio sync recording and the bottom channel is the video sync recording. 

The examples below illustrate what happens when the same video sync signal is matched in the presence of the right audio files, and an arbitrary audio file. Focus on the spikey audio sync channel!

#### A good match : 
![goodmatch](img/good_AV_match.PNG)
Figure 4: A good audio-video match.

##### Interpreting the waveforms in the 'spikey' case :
The green lines show where the audio spikes up and the LED voltage is high, and the red lines show when the voltage dropped. The audio-video alignment is not sample-accurate mainly because of the physical delays in the light going on + probably shifts caused by the resampling procedure whenever the frame rate varied. To see how good this is compare it to the bad match below! 
![goodmatch](img/good_AV_match_labelled.PNG)


#### A bad match : 
![badmatch](img/bad_AV_match.PNG)
Figure 4: A bad audio-video match.

##### and now look at how bad the ON/OFF correspondence is :
The audio spikes do not correspond so well with the video sync signal -- and only do so at one or two places.
![](img/bad_AV_match_labelled.PNG)