## <center> Data Combining Outline </center>
This outline is created to help put together all functions/scripts that were written to populate the final data frame. I will outline the general logic and each necessary step, as well as create some skeleton code to get started. The ultimate goal is to make the combining file run on machine without supervisino. The outline be updated everytime after each meeting. **Please check out your own branch and work separately**.  

**Note**: the following discussion will be broken down into two parts. Multi sources in long period (MS+LP) and single source in short period (SS+SP). To simplify the problem for now, the main discussion will focus on SS+SP, which will be gradually extended to MS+LP.   

======================================================================================================================
### General Steps:
- [ ]  Sound source matching
- [ ]  Timestamp matching
- [ ]  Delay and significance calculation
- [ ]  Extract coordinates
- [ ]  Populate dataframe

======================================================================================================================
### Current Goal:
Formulate data frame for 1 min recording from 1:05 pm - 1:15pm, on March 25th. Single stationary source.

======================================================================================================================

#### Sound Source Matching

The biggest challenge for data combining is sound source matching. Each array has 4 beams/channels, with each channel corresponding to a different sound source. ODAS library is able to track and separate up to 4 different sources at the same time. Currently, we have seen two main problems: cross talk between channels within an array and reflection. The first problem happens when one sound source appears in multiple channels as a result of long silence or ODAS sound source tracking (SST) algorithm failure. The second problem happens when the sound source is too close to a smooth surface and ODAS is picking up sound from direct path and reflection path, and therefore two channels have the same sound. 

Since each recording raw file is approximatly 5 mins long, there is a high chance that each channel within an array corresponds to more than one sound source. Calculating cross corrleation pairs over the entire 5 mins recording for sound source matching (SSM) will not be reliable. Therefore, to adapt to the volatile nature of ODAS SST algorithm, we will implement local SSM, using windowing method with no overlap. 

**MS+LP**: For 1 min recording starting at 1:05pm, given a window length of 1s and 4 arrays (16 channels). The number of cross correlation calculation is 60s/1s * (16 pick 2).

**SS+SP**: Assume only one stationary source is present and no reflection paths are recorded. Then each array should only have one valid channel corresponding to that source. Thus for 1 min recording starting at 1:05pm, given a window length of 1s and 4 arrays (4 valid channels). The number of necessary cross correlation calculation is 60s/1s * (4 pick 2).

**Challenge**:

In [None]:
import pandas as pd

In [None]:
def source_matching(path):
    df = pd.read_csv(path)
    master_list = []
    start_time = df['Time In Seconds'].iloc[0]
    split = start_time + 0.008
    end_time = df['Time In Seconds'].iloc[-1]
    array_0 = 0
    array_1 = 0
    array_2 = 0
    array_3 = 0
    counter = 0
    count = 0
    #Assuming only 1 channel for each array
    for row in df.iterrows():
        count = count + 1
        if row[1]['Time In Seconds'] >= split:
            dic = {}
            dic['Column 1'] = array_0["Microphone Number"] 
            dic['Column 2'] = array_1["Microphone Number"]
            dic['xyz 1'] = [array_0['X'],array_0['Y'],array_0['Z'] ]
            dic['xyz 2'] = [array_1['X'],array_1['Y'],array_1['Z'] ]
            dic['Grouping Time'] = split - 0.008
            master_list.append(dic)
            
            dic = {}
            dic['Column 1'] = array_0["Microphone Number"] 
            dic['Column 2'] = array_2["Microphone Number"]
            dic['xyz 1'] = [array_0['X'],array_0['Y'],array_0['Z'] ]
            dic['xyz 2'] = [array_2['X'],array_2['Y'],array_2['Z'] ]
            dic['Grouping Time'] = split - 0.008
            master_list.append(dic)
            
            dic = {}
            dic['Column 1'] = array_0["Microphone Number"] 
            dic['Column 2'] = array_3["Microphone Number"]
            dic['xyz 1'] = [array_0['X'],array_0['Y'],array_0['Z'] ]
            dic['xyz 2'] = [array_3['X'],array_3['Y'],array_3['Z'] ]
            dic['Grouping Time'] = split - 0.008
            master_list.append(dic)
            
            dic = {}
            dic['Column 1'] = array_1["Microphone Number"] 
            dic['Column 2'] = array_2["Microphone Number"]
            dic['xyz 1'] = [array_1['X'],array_1['Y'],array_1['Z'] ]
            dic['xyz 2'] = [array_2['X'],array_2['Y'],array_2['Z'] ]
            dic['Grouping Time'] = split - 0.008
            master_list.append(dic)
            
            dic = {}
            dic['Column 1'] = array_1["Microphone Number"] 
            dic['Column 2'] = array_3["Microphone Number"]
            dic['xyz 1'] = [array_1['X'],array_1['Y'],array_1['Z'] ]
            dic['xyz 2'] = [array_3['X'],array_3['Y'],array_3['Z'] ]
            dic['Grouping Time'] = split - 0.008
            master_list.append(dic)
            
            dic = {}
            dic['Column 1'] = array_2["Microphone Number"] 
            dic['Column 2'] = array_3["Microphone Number"]
            dic['xyz 1'] = [array_2['X'],array_2['Y'],array_2['Z'] ]
            dic['xyz 2'] = [array_3['X'],array_3['Y'],array_3['Z'] ]
            dic['Grouping Time'] = split - 0.008
            master_list.append(dic)
            
            split = split + 0.008

            array_0 = 0
            array_1 = 0
            array_2 = 0
            array_3 = 0
            

            if len(master_list) >= 50000:
                output = pd.DataFrame(master_list)
                output.to_csv(path_or_buf= "/Users/brian_wangst/Desktop/mlr/data/" + path[path.find("data/") + 5: path.find("hour") - 1] + "_" + str(counter) + ".csv")
                master_list = []
                counter = counter + 1
                
        if row[1]['Microphone Number'] == 0:
            array_0 = row[1]
        elif row[1]['Microphone Number'] == 1:
            array_1 = row[1]
        elif row[1]['Microphone Number'] == 2:
            array_2 = row[1]
        else:
            array_3 = row[1]
                
    output = pd.DataFrame(master_list)
    output.to_csv(path_or_buf= "/Users/brian_wangst/Desktop/mlr/data/" + path[path.find("data/") + 5:path.find("hour") - 1] + "_" + str(counter) + ".csv")



In [None]:
source_matching("/Users/brian_wangst/Desktop/mlr/data/recordingWednesday, March 25, 2020 01:05:31hour1.csv")

#### Timestamp Matching