## <center> Data Combining Outline </center>
This outline is created to help put together all functions/scripts that were written to populate the final data frame. I will outline the general logic and each necessary step, as well as create some skeleton code to get started. The ultimate goal is to make the combining file run on machine without supervisino. The outline be updated everytime after each meeting. **Please check out your own branch and work separately**.  

**Note**: the following discussion will be broken down into two parts. Multi sources in long period (MS+LP) and single source in short period (SS+SP). To simplify the problem for now, the main discussion will focus on SS+SP, which will be gradually extended to MS+LP.   

======================================================================================================================
### General Steps:
- [ ]  Sound source matching
- [ ]  Timestamp matching
- [ ]  Delay and significance calculation
- [ ]  Extract coordinates
- [ ]  Populate dataframe

======================================================================================================================
### Current Goal:
Formulate data frame for 1 min recording from 1:05 pm - 1:15pm, on March 25th. Single stationary source.

======================================================================================================================

#### Sound Source Matching

The biggest challenge for data combining is sound source matching. Each array has 4 beams/channels, with each channel corresponding to a different sound source. ODAS library is able to track and separate up to 4 different sources at the same time. Currently, we have seen two main problems: cross talk between channels within an array and reflection. The first problem happens when one sound source appears in multiple channels as a result of long silence or ODAS sound source tracking (SST) algorithm failure. The second problem happens when the sound source is too close to a smooth surface and ODAS is picking up sound from direct path and reflection path, and therefore two channels have the same sound. 

Since each recording raw file is approximatly 5 mins long, there is a high chance that each channel within an array corresponds to more than one sound source. Calculating cross corrleation pairs over the entire 5 mins recording for sound source matching (SSM) will not be reliable. Therefore, to adapt to the volatile nature of ODAS SST algorithm, we will implement local SSM, using windowing method with no overlap. 

**MS+LP**: For 1 min recording starting at 1:05pm, given a window length of 1s and 4 arrays (16 channels). The number of cross correlation calculation is 60s/1s * (16 pick 2).

**SS+SP**: Assume only one stationary source is present and no reflection paths are recorded. Then each array should only have one valid channel corresponding to that source. Thus for 1 min recording starting at 1:05pm, given a window length of 1s and 4 arrays (4 valid channels). The number of necessary cross correlation calculation is 60s/1s * (4 pick 2).

**Challenge**:

In [19]:
import pandas as pd
import glob as glob

In [20]:
def source_matching(path, number):
    df = pd.read_csv(path)
    master_list = []
    start_time = df['Time In Seconds'].iloc[0]
    split = start_time + 0.008
    end_time = df['Time In Seconds'].iloc[-1]
    temp = df.iloc[0]
    temp['X'] = None
    temp['Y'] = None
    temp['Z'] = None
    temp['Microphone Number'] = None
    array_0 = temp 
    array_1 = temp
    array_2 = temp
    array_3 = temp
    counter = 0
    #Assuming only 1 channel for each array
    for row in df.iterrows():
        if row[1]['Time In Seconds'] >= split:
            dic = {}
            dic['0'] = array_0['X']
            dic['1'] = array_0['Y']
            dic['2'] = array_0['Z']
            dic['3'] = array_1['X']
            dic['4'] = array_1['Y']
            dic['5'] = array_1['Z']
            dic['6'] = array_2['X']
            dic['7'] = array_2['Y']
            dic['8'] = array_2['Z']
            dic['9'] = array_3['X']
            dic['10'] = array_3['Y']
            dic['11'] = array_3['Z']
            master_list.append(dic)

            dic = {}
            dic['0'] = array_0['X']
            dic['1'] = array_0['Y']
            dic['2'] = array_0['Z']
            dic['3'] = array_1['X']
            dic['4'] = array_1['Y']
            dic['5'] = array_1['Z']
            dic['6'] = array_2['X']
            dic['7'] = array_2['Y']
            dic['8'] = array_2['Z']
            dic['9'] = array_3['X']
            dic['10'] = array_3['Y']
            dic['11'] = array_3['Z']
            master_list.append(dic)

            dic = {}
            dic['0'] = array_0['X']
            dic['1'] = array_0['Y']
            dic['2'] = array_0['Z']
            dic['3'] = array_1['X']
            dic['4'] = array_1['Y']
            dic['5'] = array_1['Z']
            dic['6'] = array_2['X']
            dic['7'] = array_2['Y']
            dic['8'] = array_2['Z']
            dic['9'] = array_3['X']
            dic['10'] = array_3['Y']
            dic['11'] = array_3['Z']
            master_list.append(dic)

            dic = {}
            dic['0'] = array_0['X']
            dic['1'] = array_0['Y']
            dic['2'] = array_0['Z']
            dic['3'] = array_1['X']
            dic['4'] = array_1['Y']
            dic['5'] = array_1['Z']
            dic['6'] = array_2['X']
            dic['7'] = array_2['Y']
            dic['8'] = array_2['Z']
            dic['9'] = array_3['X']
            dic['10'] = array_3['Y']
            dic['11'] = array_3['Z']
            master_list.append(dic)

            dic = {}
            dic['0'] = array_0['X']
            dic['1'] = array_0['Y']
            dic['2'] = array_0['Z']
            dic['3'] = array_1['X']
            dic['4'] = array_1['Y']
            dic['5'] = array_1['Z']
            dic['6'] = array_2['X']
            dic['7'] = array_2['Y']
            dic['8'] = array_2['Z']
            dic['9'] = array_3['X']
            dic['10'] = array_3['Y']
            dic['11'] = array_3['Z']
            master_list.append(dic)

            dic = {}
            dic['0'] = array_0['X']
            dic['1'] = array_0['Y']
            dic['2'] = array_0['Z']
            dic['3'] = array_1['X']
            dic['4'] = array_1['Y']
            dic['5'] = array_1['Z']
            dic['6'] = array_2['X']
            dic['7'] = array_2['Y']
            dic['8'] = array_2['Z']
            dic['9'] = array_3['X']
            dic['10'] = array_3['Y']
            dic['11'] = array_3['Z']
            master_list.append(dic)

            split = split + 0.008

            filler = row[1]
            filler['X'] = None
            filler['Y'] = None
            filler['Z'] = None
            filler['Microphone Number'] = None
            array_0 = filler
            array_1 = filler
            array_2 = filler
            array_3 = filler

            if len(master_list) >= 500000:
                output = pd.DataFrame(master_list)
                #output.to_csv(path_or_buf= "/Users/brian_wangst/Desktop/mlr/data/" + path[path.find("data/") + 5: path.find("hour") - 1] + "_" + str(counter) + ".csv")
                output.to_csv(path_or_buf= "/Users/brian_wangst/Desktop/mlr/data/Fridays/origdata/Ardel/" + path[54:path.find(":")-2] + str(number) + "hour " + str(counter) + ".csv")
                master_list = []
                counter = counter + 1
                
        if row[1]['Microphone Number'] == 0:
            array_0 = row[1]
        elif row[1]['Microphone Number'] == 1:
            array_1 = row[1]
        elif row[1]['Microphone Number'] == 2:
            array_2 = row[1]
        else:
            array_3 = row[1]
                
    output = pd.DataFrame(master_list)
    #output.to_csv(path_or_buf= "/Users/brian_wangst/Desktop/mlr/data/" + path[path.find("data/") + 5:path.find("hour") - 1] + "_" + str(counter) + ".csv")
    output.to_csv(path_or_buf= "/Users/brian_wangst/Desktop/mlr/data/Fridays/origdata/Ardel/" + path[54:path.find("/")-2] + str(number) + "hour " + str(counter) + ".csv")

    master_list = []


In [22]:
fridays = glob.glob("/Users/brian_wangst/Desktop/mlr/data/Fridays/*.csv" )
fridays.sort()
counter = 0
for day in fridays:
    source_matching(day, counter)
# source_matching("/Users/brian_wangst/Desktop/mlr/dataJanuary 08, 2020 11:54:14hour3.csv")

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()


#### Timestamp Matching