# Using Pairwise eeg sequences generated in [Notebook - Eegs Pairing Analysis & Features](https://www.kaggle.com/code/seshurajup/eegs-pairing-analysis-features)
# Useful for training wavenet

## Brain activity notebook series

### [EEGS 10–20 system](https://www.kaggle.com/code/seshurajup/eegs-10-20-system)
Better understanding eegs 10-20 system
### [Missing Eeg_ids Train.csv vs train_eegs [Resolved]](https://www.kaggle.com/code/seshurajup/missing-eeg-ids-in-train-csv-vs-train-eegs-parquet)
Extra training eggs [Resolved] as we can ignore it
### [EDA train.csv](https://www.kaggle.com/code/seshurajup/eda-train-csv)
Detailed analysis of the train.csv
### [Eegs Pairing Analysis & Features](https://www.kaggle.com/code/seshurajup/eegs-pairing-analysis-features)
Pairing features analysis and build features
### [Eegs Target Analysis - Correct way to merge target](https://www.kaggle.com/code/seshurajup/eegs-target-analysis-correct-way-to-merge-target)
How to choice the target votes for training
### [Eegs Train Split (CV)](https://www.kaggle.com/seshurajup/eegs-train-splits-cv)
generate better train split without patient_id overlap
### [Eegs Pairing Wav](https://www.kaggle.com/seshurajup/eeg-pairing-wav)
converting pairwise eeg sequences into wav format

### Datasets [eegs pairing dataset](https://www.kaggle.com/datasets/seshurajup/eegs-pairing-dataset), [eegs pairing wav dataset](https://www.kaggle.com/datasets/seshurajup/eegs-pairing-wav-dataset)

#### **Upvote my work if it is useful**

In [None]:
import numpy as np
import pandas as pd
from tqdm import tqdm
from scipy.io import wavfile
import matplotlib.pyplot as plt
from IPython.display import Audio, display, HTML

In [None]:
def eeg_to_wav(eeg_data, sample_rate, output_file):
    eeg_data_flattened = eeg_data.flatten()
    eeg_normalized = np.int16((eeg_data_flattened / np.max(np.abs(eeg_data_flattened))) * 32767)
    wavfile.write(output_file, sample_rate, eeg_normalized)

In [None]:
! mkdir /kaggle/working/pair_wavs

In [None]:
df = pd.read_csv("/kaggle/input/hms-harmful-brain-activity-classification/train.csv")
TARGETS = [x for x in df.columns if 'vote' in x]
train = df.groupby('eeg_id')[['spectrogram_id','spectrogram_label_offset_seconds']].agg(
    {'spectrogram_id':'first','spectrogram_label_offset_seconds':'min'})
train.columns = ['spec_id','min']

tmp = df.groupby('eeg_id')[['spectrogram_id','spectrogram_label_offset_seconds']].agg(
    {'spectrogram_label_offset_seconds':'max'})
train['max'] = tmp

tmp = df.groupby('eeg_id')[['patient_id']].agg('first')
train['patient_id'] = tmp

tmp = df.groupby('eeg_id')[TARGETS].agg('sum')
for t in TARGETS:
    train[t] = tmp[t].values
    
y_data = train[TARGETS].values
y_data = y_data / y_data.sum(axis=1,keepdims=True)
train[TARGETS] = y_data

tmp = df.groupby('eeg_id')[['expert_consensus']].agg('first')
train['target'] = tmp

train = train.reset_index()
print('Train non-overlapp eeg_id shape:', train.shape )
train['max_votes'] = train.apply(lambda x: max([x[c] for c in x.keys() if 'vote' in c]), axis=1)
train.head()

In [None]:
sample = np.load("/kaggle/input/eegs-pairing-dataset/pair_features/568657.npy")
sample.shape

In [None]:
eeg = pd.read_parquet("/kaggle/input/hms-harmful-brain-activity-classification/train_eegs/568657.parquet")
eeg.shape

In [None]:
sample_rate = 44_100  # 44.1 kHz
sample_wave_path = "/kaggle/working/pair_wavs/568657.wav"
eeg_to_wav(sample, sample_rate, sample_wave_path)

In [None]:
Audio(sample_wave_path)

In [None]:
for i, row in tqdm(train.iterrows(), total=len(train)):
    eeg_to_wav(sample, sample_rate, f"/kaggle/working/pair_wavs/{row['eeg_id']}.wav")

In [None]:
selected_wavs = train[train['max_votes']==1].groupby('target').sample(5)
selected_wavs

grouped = train[train['max_votes'] == 1].groupby('target')
for target, group in grouped:
    display(HTML(f"<h1>{target}</h1>"))
    for i, row in group.sample(5).iterrows():
        display(Audio(f"/kaggle/working/pair_wavs/{row['eeg_id']}.wav"))