### Speaker Identification using MFCC and Cosine similarity
#### Theory
Speaker recognition is a process of identifying or verifying a speaker's identity using their voice. It involves extracting unique features from the audio signal, such as Mel Frequency Cepstral Coefficients (MFCCs), which capture the spectral properties of the audio. These features are then compared with reference data to calculate similarity scores, often using distance metrics like cosine similarity. The speaker with the highest similarity score is identified as the best match.

#### Objectives:
1. To record a speech sample and extract its MFCC features.
2. To compare the extracted features with reference speaker data using cosine similarity.
3. To identify the speaker with the highest similarity score from the reference dataset.
4. To understand the process of feature extraction and similarity computation in speaker recognition.

In [None]:
import os
import re
import numpy as np
import librosa
from scipy.spatial.distance import cosine
import soundfile as sf
import sounddevice as sd

sample_rate = 16000
n_mfcc = 13
reference_folder = "speaker_rec"



rec = sd.rec(5*16000, sample_rate, 1, dtype='int16')
print("Recording...")
sd.wait()
print("Recording finished.")
sf.write("speeches/speaker.wav", rec, sample_rate)

y_test, sr_test = librosa.load("speeches/speaker.wav", sr=sample_rate)

mfcc_test = librosa.feature.mfcc(y=y_test, sr=sample_rate, n_mfcc=n_mfcc)
mfcc_test_mean = np.mean(mfcc_test, axis=1)

scores = {}
for file in os.listdir(reference_folder):
	speaker_name = file
	ref_path = os.path.join(reference_folder, file)
	y_ref, sr_ref = librosa.load(ref_path, sr=sample_rate)
	## resample to 16kHz if needed
	if sr_ref != sample_rate:
		y_ref = librosa.resample(y_ref, sr_ref, sample_rate)
	mfcc_ref = librosa.feature.mfcc(y=y_ref, sr=sample_rate, n_mfcc=n_mfcc)
	mfcc_ref_mean = np.mean(mfcc_ref, axis=1)
	similarity = 1 - cosine(mfcc_test_mean, mfcc_ref_mean)
	scores[speaker_name] = similarity
	print(f"Similarity with {speaker_name}: {similarity:.4f}")

best_match = max(scores, key=scores.get)
print(f"\nIdentified speaker: {best_match}")

Recording...
Recording finished.
Similarity with shanto.mp3: 0.9727
Similarity with mamun.mp3: 0.9778
Similarity with parvez.mp3: 0.9611

Identified speaker: mamun.mp3


Note: Illegal Audio-MPEG-Header 0x3833332c at offset 99840.
Note: Trying to resync...
Note: Hit end of (available) data during resync.
Note: Illegal Audio-MPEG-Header 0x3439372c at offset 202560.
Note: Trying to resync...
Note: Hit end of (available) data during resync.
Note: Illegal Audio-MPEG-Header 0x31352c31 at offset 104640.
Note: Trying to resync...
Note: Hit end of (available) data during resync.
