This notebook will detail my attempts at getting video and acoustic tracking for the speaker playbacks. This is a simple recording to handle. 

* Speaker recording: SPKRPLAYBACK_multichirp_2018-07-29_09-42-59.WAV
* Video recording : 2018-07-28/P03/K1,2,3/02000.TMC


In [4]:
import datetime as dt
import scipy.signal as signal 
import scipy.spatial as spatial
import soundfile as sf
import numpy as np 
import matplotlib.pyplot as plt


In [5]:
print(f'Notebook cell run at {dt.datetime.now()}')

Notebook cell run at 2021-06-26 13:48:56.206729


In [6]:
import batracker
from batracker.localisation import friedlander_1987 as fr87
from batracker.localisation import schau_robinson_1987 as sr87
from batracker.localisation import spiesberger_wahlberg_2002 as sw02

from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
from batracker.signal_detection.detection import cross_channel_threshold_detector
from batracker.signal_detection.detection import envelope_detector
from batracker.tdoa_estimation.tdoa_estimators import measure_tdoa
from batracker.correspondence_matching.multichannel_match import generate_crosscor_boundaries

In [7]:
%matplotlib notebook

In [8]:
audiofile = 'multichirp_sankenscamerasyncoutput_2018-07-29_09-42-59.wav'
# gwt only first 2 s for now. 
audio, fs = sf.read(audiofile, stop=int(192000*7.5))

In [9]:
# get all audio that start from frame 1 of the camera sync signal (1st frame that is +ve)
first_frame_sample = np.min(np.argwhere(audio[:,-1]>=np.percentile(audio[:,-1],95)))
# audio sync'ed with 1st camera frame
cam_audio = audio[first_frame_sample:,:]

print(first_frame_sample/fs)
# get the array audio 
array_audio = cam_audio[:,:4]


1.2398802083333333


In [10]:
b,a = signal.butter(1,np.array([40e3,95e3])/(fs*.5),'bandpass')
array_audiohp = np.apply_along_axis(lambda X: signal.filtfilt(b,a,X),0,array_audio)

In [11]:
plt.figure()
plt.specgram(array_audiohp[:,0],Fs=fs);

<IPython.core.display.Javascript object>

In [74]:
detections = cross_channel_threshold_detector(array_audiohp, fs,
                                              detector_function=envelope_detector,
                                              threshold_db_floor=12,
                                              lowpass_durn=0.004)
# for now just use manual detections to generate the correlation boundaries

              
    
            

  0%|                                                                                            | 0/4 [00:00<?, ?it/s]

4 1201943


100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.49it/s]


In [75]:
                                    
# Spectrogram of the cross-corr boundaries
plt.figure(figsize=(4,4))
ax= plt.subplot(411)
plt.specgram(array_audiohp[:,0], Fs=fs)
for each in detections[0]:
    plt.vlines(each, 0, fs*0.5, linewidth=0.4)

for i in range(2,5):
    plt.subplot(410+i, sharex=ax)
    plt.specgram(array_audiohp[:,i-1], Fs=fs)
    for each in detections[i-1]:
        plt.vlines(each, 0, fs*0.5, linewidth=0.4)

<IPython.core.display.Javascript object>

In [76]:
# filter all detections, and keep only those that are >1 ms long. 
min_durn = 0.0075
filtered_detections = []
for channel_dets in detections:
    long_detections = [] 
    for detn in channel_dets:
        if detn[1]-detn[0]>=min_durn:
            long_detections.append(detn)
    filtered_detections.append(long_detections) # Spectrogram of the cross-corr boundaries plt.figure() ax= plt.subplot(411) plt.specgram(audio[:,0], Fs=fs) for each in filtered_detections[0]: plt.vlines(each, 0, fs*0.5, linewidth=0.4) for i in range(2,5): plt.subplot(410+i, sharex=ax) plt.specgram(audio[:,i-1], Fs=fs) for each in filtered_detections[i-1]: plt.vlines(each, 0, fs*0.5, linewidth=0.4)filtered_detections

In [77]:
[len(each)for each in filtered_detections]

[18, 18, 18, 18]

In [78]:
                                    
# Spectrogram of the cross-corr boundaries
plt.figure(figsize=(4,4))
ax= plt.subplot(411)
plt.specgram(array_audiohp[:,0], Fs=fs)
for each in filtered_detections[0]:
    plt.vlines(each, 0, fs*0.5, linewidth=0.4)

for i in range(2,5):
    plt.subplot(410+i, sharex=ax)
    plt.specgram(array_audiohp[:,i-1], Fs=fs)
    for each in filtered_detections[i-1]:
        plt.vlines(each, 0, fs*0.5, linewidth=0.4)

<IPython.core.display.Javascript object>

In [39]:
# Array geometry
## What we expect it to be theoretically
R = 1.2 # meters
theta = np.pi/3
other_x_position = 0.5
theta2 = np.arctan(other_x_position/(R*np.cos(theta)))
R_2 = np.sqrt(other_x_position**2 +  (R*np.cos(theta))**2)
arbit_y = 10e-4
mic_positions = np.array([[0,arbit_y,0],
                          [R_2*np.sin(theta2),  arbit_y, -R*np.cos(theta), ],
                          [-R*np.sin(theta), arbit_y, -R*np.cos(theta)],
                          [0,arbit_y,R]])

ag = pd.DataFrame(mic_positions)
ag.columns  = ['x','y','z']

In [41]:
crosscor_boundaries = generate_crosscor_boundaries(filtered_detections, ag)

In [46]:
# Spectrogram of the cross-corr boundaries
plt.figure()
ax= plt.subplot(411)
plt.specgram(array_audiohp[:,0], Fs=fs)
for each in crosscor_boundaries[0]:
    plt.vlines(each, 0, fs*0.5, linewidth=0.4)
    
for each in crosscor_boundaries:
    plt.vlines(each, 0, fs*0.5, linewidth=0.2, color='k', alpha=1)

for i in range(2,5):
    plt.subplot(410+i, sharex=ax)
    plt.specgram(array_audiohp[:,i-1], Fs=fs)
    for each in crosscor_boundaries[i-1]:
        plt.vlines(each, 0, fs*0.5, linewidth=0.4)
        for each in crosscor_boundaries:
            plt.vlines(each, 0, fs*0.5, linewidth=0.2, color='k', alpha=1)

<IPython.core.display.Javascript object>

In [42]:
reference_ch = 0

all_tdoas = {}
for i,each_common in enumerate(crosscor_boundaries):
    start, stop = each_common
    start_sample, stop_sample = int(start*fs), int(stop*fs)
    tdoas = measure_tdoa(array_audiohp[start_sample:stop_sample,:], fs, ref_channel=reference_ch)
    all_tdoas[i] = tdoas

In [43]:
all_tdoas

{0: array([-0.00139844,  0.00063281,  0.00078906]),
 1: array([-0.00139583,  0.00063542,  0.00079167]),
 2: array([-0.00448958,  0.00063542,  0.00079167]),
 3: array([-0.00139583,  0.00063542,  0.00079167]),
 4: array([-0.00139583,  0.00063542,  0.00079167]),
 5: array([-0.00139583,  0.00063542,  0.00079167]),
 6: array([-0.00139583,  0.00063542,  0.00079167]),
 7: array([-0.00139844,  0.00063281,  0.00078906]),
 8: array([-0.00139583,  0.00063542,  0.00079687]),
 9: array([-0.00136458,  0.00021875,  0.00128125]),
 10: array([-0.00136198,  0.00021094,  0.00127865]),
 11: array([-0.00135417,  0.00021875,  0.00127604]),
 12: array([-0.00125521,  0.00021875,  0.00127604]),
 13: array([-0.00125781,  0.00021615,  0.00127344]),
 14: array([-0.00125781,  0.00021615,  0.00127344]),
 15: array([-0.0012526 ,  0.00021615,  0.00126823]),
 16: array([-0.0012474 ,  0.00022135,  0.00170573]),
 17: array([-0.00124219,  0.00022135,  0.00126302])}

In [44]:
vsound = 340.0
all_positions = []
num_rows = mic_positions.shape[0]-1
calculated_positions = np.zeros((len(all_tdoas.keys()), 3,2))
for det_number, tdoas in all_tdoas.items():
        d = (vsound*tdoas).reshape(-1,1)
        solution1, solution2 = sw02.spiesberger_wahlberg_solution(mic_positions, d)
        calculated_positions[det_number,:,0] = solution1
        calculated_positions[det_number,:,1] = solution2
        #calculated_positions[det_number,:] = pos    

  t_solution1 = (-b_quad + np.sqrt(b_quad**2 - 4*a_quad*c_quad))/(2*a_quad)
  t_solution2 = (-b_quad - np.sqrt(b_quad**2 - 4*a_quad*c_quad))/(2*a_quad)


In [45]:
valid_positions = calculated_positions[:,:,0]
valid_positions

array([[ -2.97772102,   4.73816446,   1.88634926],
       [ -3.00781225,   4.79623797,   1.90570181],
       [         nan,          nan,          nan],
       [ -3.00781225,   4.79623797,   1.90570181],
       [ -3.00781225,   4.79623797,   1.90570181],
       [ -3.00781225,   4.79623797,   1.90570181],
       [ -3.00781225,   4.79623797,   1.90570181],
       [ -2.97772102,   4.73816446,   1.88634926],
       [ -3.02735007,   4.83097694,   1.92392102],
       [ -3.61190672,   7.69956626,   3.91299155],
       [ -3.56124344,   7.64056153,   3.87623413],
       [ -3.66537241,   7.90709424,   3.97880293],
       [ -4.97811878,  12.12748195,   5.67635623],
       [ -4.85377912,  11.79691203,   5.52829927],
       [ -4.85377912,  11.79691203,   5.52829927],
       [ -4.87132998,  11.90496725,   5.54602852],
       [  7.58279211, -20.22343005, -11.30493552],
       [ -5.06243457,  12.48682172,   5.75606457]])

In [81]:
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111, projection='3d')
ax.view_init(elev=17, azim=122)
ax.plot(valid_positions[:,0], valid_positions[:,1],
            valid_positions[:,2],'*')

for each in range(4):
    ax.plot(mic_positions[:,0],mic_positions[:,1],mic_positions[:,2],'k*')

<IPython.core.display.Javascript object>

### Distance of positions from central microphone - acoustic tracking 



In [17]:
mic_positions[0,:]

array([0.   , 0.001, 0.   ])

In [18]:

def calc_dist_to_m0(X, refpos):
    try:
        distance = spatial.distance.euclidean(X,refpos)
    except ValueError:
        distance = np.nan
    return distance
dist_to_mic0 = np.apply_along_axis(calc_dist_to_m0,1,valid_positions,mic_positions[0,:])
dist_to_mic0

array([5.97268291, 5.97268291, 5.90473227, 5.97268291, 5.97268291,
       5.97268291, 5.97268291, 5.90749379, 6.01622787])

In [19]:
print(np.mean(dist_to_mic0), np.std(dist_to_mic0))

5.962727930309971 0.03311790419366706


Another way to estimate distance from speaker to mic0 is to utilise the digital copy of the playback signal. We can then estimate the time of flight of the playback. 

In [20]:
# crosscorrelate the output signal with channel 0. 
output_ch = cam_audio[:,-2]
plt.figure()
a0 = plt.subplot(211)
plt.plot(array_audiohp[:,0])
plt.subplot(212,sharex=a0)
plt.plot(output_ch)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x273da8b45e0>]

In [21]:

ind_forcc = [30000, 80000]
cc = signal.correlate(array_audiohp[ind_forcc[0]:ind_forcc[1],0], cam_audio[ind_forcc[0]:ind_forcc[1],-2],'same')
delay = (np.argmax(cc)-cc.size/2.0)/fs
delay

0.017666666666666667

### Getting mic0-speaker distance through the time of flight

In [22]:
print(f'The sync-channel delay based mic0-speaker distance is {delay*340.0} m')

The sync-channel delay based mic0-speaker distance is 6.006666666666667 m


### Distance of positions from central mic - video tracking

In [23]:
# get video tracked speaker positions
speaker_posns = pd.read_csv('video_tracking/speaker_pbks/DLTdv7_data_2018-07-28_p03_2000_spkr_pbksxyzpts.csv')
speaker_posns.columns = ['x','y','z']
speaker_posns = speaker_posns[~pd.isna(speaker_posns['x'])]
speaker_posns

Unnamed: 0,x,y,z
0,0.008015,2.229281,-0.078225
62,0.420424,2.148869,-0.162072
124,0.645528,2.175114,-1.026141
187,0.400062,1.887525,0.549685
249,-0.189985,1.400496,0.825609
312,-0.616268,1.152299,0.868576
374,-1.254743,0.820054,0.728765
479,-1.910361,0.17045,-0.037294


In [24]:
video_mic_positions = pd.read_csv('video_tracking/mic_positions_video/DLTdv7_data_mics9-12positionsxyzpts.csv')
mic_xyz = video_mic_positions[~pd.isna(video_mic_positions['pt1_X'])].reset_index(drop=True)
mic_xyz.columns=['x','y','z']
mic_xyz

Unnamed: 0,x,y,z
0,-0.162229,-3.854878,0.097387
1,-1.084803,-3.256231,-0.437829
2,0.278436,-4.036286,-0.54438
3,-0.118366,-3.985202,1.299513


The first speaker position in the video corresponds to the set of first playbacks (first 9 detections). Let's see the mic0 to speaker distance estimated here. 

### Camera based m0-speaker distance

In [25]:
video_dist_to_mic0 = spatial.distance.euclidean(mic_xyz.loc[0].tolist(),speaker_posns.loc[0].tolist())
video_dist_to_mic0

6.089073273607487

## Conclusion : it's not all crap - acoustic tracking and video tracking do work together and give ~consistent results!

* The camera based m0-speaker estimate is 6.08m
* Acoustic tracking based m0-speaker estimate is 5.96 $\pm$ 0.03 m (mean, sd)
    * The time-of-flight based m0-speaker estimate is 6.0 m

### Important lessons

* Audio processing is *very* important - the reverberation below 40 kHz made a *huge* difference on the TOADs estimated. Choosing the correct bandpass parameters made all of the difference.

### Next steps

* Now I'd like to push the same exercise to more playback positions, and finally then get to aligning the audio and video tracking systems into a common system. 
* The ```batracker``` detection routines need some tuning!

In [26]:
print(f'Notebook cell run at {dt.datetime.now()}')

Notebook cell run at 2021-06-26 13:00:56.402670
