# Voice Identification

### n-shot learning

The idea of one shot learning is to train an algorithm so that after only `n` examples, an algorithm can identify a example again. 

Siamese networks attempt to do this by training a model to learn how to take a high dimensional input an generate a feature vector. The network is trained by taking two examples either similar or different and training them to reduce/increase the distance of the feature vector generated by the output.

### Applying to Voice Identification

The end goal of this model is to take a audio sample that has undergone speech diarihsation and identify each speaker in the set.

### Datasets

| Name | Speakers           | Min  | Max
| ------------- |-------------|-----|-----|
[VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/)| 7000+| 3s | 3s
[10 English Speakers](http://www.openslr.org/resources/45/ST-AEDS-20180100_1-OS) | 10 | ? | ? 

### Articles

- https://github.com/zdmc23/oneshot-audio/blob/master/OneShot.ipynb

In [1]:
## Notebook settings
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
## fastai imports
from fastai.basics import *
from fastai.data_block import ItemList
from fastai.vision import *
from torch import nn
from exp.nb_AudioCommon import *
from exp.nb_DataBlock import *
from exp.nb_DataAugmentation import *

## 10 Speakers

Lets get a feel for the architecture by creating it and training on a dataset that we know we can do well on using standard classification techiques

In [3]:
## The actual url is http://www.openslr.org/resources/45/ST-AEDS-20180100_1-OS.tgz
## but we need to strip off the extension otherwise fastai gets confused.
data_url = 'http://www.openslr.org/resources/45/ST-AEDS-20180100_1-OS'
## Need this because the source tar file doesn't extract to its own folder
data_folder = datapath4file(url2name(data_url))
untar_data(data_url, dest=data_folder)

PosixPath('/home/h/.fastai/data/ST-AEDS-20180100_1-OS/ST-AEDS-20180100_1-OS')

In [4]:
max_length = (4*16000)
print(max_length)
tfm_params = {
    'max_to_pad':max_length,
    'use_spectro':True, 
    'cache_spectro':True, 
    'to_db_scale':True,
    'f_max': 120
}
label_pattern = r'_([mf]\d+)_'
audios = AudioList.from_folder(data_folder, **tfm_params).split_none().label_from_re(label_pattern)
audios.train.x.tfm_args = tfm_params
audios.valid.x.tfm_args = tfm_params

64000


## Loss functions 

In [5]:
def loss_max_sig(i, t): return nn.Sigmoid(-torch.sqrt(mse(i,t)))
def loss_min_sig(i ,t): return nn.Sigmoid(torch.sqrt(mse(i,t)))

In [6]:
class SiameseResnet(nn.Module):
    def __init__(self, encoder=models.resnet18):
        super().__init__()
        self.body = create_body(encoder, cut=-2)
        self.head = create_head(2048, 1, [512])
        
    def forward(self, x1, x2):
        print(x1, x2)
        out1 = self.body(x1)
        out2 = self.body(x2)
        out = torch.cat((out1, out2), dim=1)
        out = self.head(out)
        return out.view(-1)

In [7]:
class ItemTuple(ItemBase):
    
    def __init__(self, *items):
        self.items = items
        self.data = [x.data for x in items]
        
    def __len__(self):
        return self.size

In [35]:
class SiameseLabelList(LabelList):
    
    def __init__(self):
        pass
    
    @classmethod    
    def from_label_list(self, ll:LabelList, max_pairs=None):
        if max_pairs is None: max_pairs = len(ll)
        x = ll.x
        y = ll.y
        
        seperated = [x.items[y.items==c] for c in range(ll.c)]
        
        same_pairs = np.array([[0, 0]])
        for cis in seperated:
            r = np.array([np.random.choice(cis, 10), np.random.choice(cis, 10)]).T
            same_pairs = np.concatenate([same_pairs, r])
        same_pairs = same_pairs[1:]
        
        diff_pairs = np.array([[0, 0]])
        for i, cis in enumerate(seperated):
            other = np.delete(np.arange(ll.c), i)
            for i in other:
                ocis = seperated[i]
                dps = np.array([np.random.choice(cis, 1), np.random.choice(ocis, 1)]).T
                diff_pairs = np.concatenate([diff_pairs,dps])
        diff_pairs = diff_pairs[1:]
        labels = np.concatenate([np.ones(len(same_pairs), dtype=np.int8), np.ones(len(diff_pairs), dtype=np.int8)*0])
        al = np.concatenate([same_pairs, diff_pairs])
        print(al[0])
        return LabelList(AudioList(al), CategoryList(labels))
        
SiameseLabelList.from_label_list(audios.train)

[PosixPath('/home/h/.fastai/data/ST-AEDS-20180100_1-OS/f0001_us_f0001_00363.wav')
 PosixPath('/home/h/.fastai/data/ST-AEDS-20180100_1-OS/f0001_us_f0001_00011.wav')]


AttributeError: 'PosixPath' object has no attribute 'reshape'

In [21]:
data = audios.databunch()
learn = Learner(data, SiameseResnet(), )

In [34]:
learn.lr_find()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.


TypeError: forward() missing 1 required positional argument: 'x2'