<a href="https://colab.research.google.com/github/thesis17/Afaan-Oromoo-chatGPT/blob/main/Medical_Symptoms_Text_and_Audio_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
paultimothymooney_medical_speech_transcription_and_intent_path = kagglehub.dataset_download('paultimothymooney/medical-speech-transcription-and-intent')
paultimothymooney_scispacy_pretrained_models_path = kagglehub.dataset_download('paultimothymooney/scispacy-pretrained-models')

print('Data source import complete.')


**Medical Text and Audio Classification with Fastai**

I stumbled across an interesting dataset containing verbal descriptions of medical symptoms (.wav, audio data) paired with text transcriptions (.csv, text data) and labeled according to the category of the ailment.  I have never worked with audio data before and so I decided to explore this dataset.

Here I use the fastai library to classify medical text and audio according to the category of the ailment being described.


In [None]:
# **Step 1: Import Python Packages**

# Fastai, Librosa, Spacy, Scispacy, PySound, Seaborn, etc

In [None]:
!pip install scispacy
!pip install pysoundfile
!apt-get install libav-tools -y
!apt-get install zip
!pip freeze > '../working/dockerimage_snapshot.txt'

Collecting scispacy
  Downloading https://files.pythonhosted.org/packages/b2/68/33d18f448dfddda2392ffd9f4ef349c3627a9bf91806f55e1bf91ed64e75/scispacy-0.1.0-py3-none-any.whl
Collecting awscli (from scispacy)
[?25l  Downloading https://files.pythonhosted.org/packages/89/5b/ca70b0804813dda500736b0854ba15145442fa0a3ce3382d7688359fdd27/awscli-1.16.116-py2.py3-none-any.whl (1.5MB)
[K    100% |████████████████████████████████| 1.5MB 17.3MB/s ta 0:00:01
[?25hCollecting conllu (from scispacy)
  Downloading https://files.pythonhosted.org/packages/ca/82/b02495f1c594cfb4af9b1eb8f404e35c1298a1448fc950b37f14c3e83317/conllu-1.2.3-py2.py3-none-any.whl
Collecting botocore==1.12.106 (from awscli->scispacy)
[?25l  Downloading https://files.pythonhosted.org/packages/58/27/ec2c22fdc556c142c1cdf37a7335156482e5298db71980567961ab299ea4/botocore-1.12.106-py2.py3-none-any.whl (5.3MB)
[K    100% |████████████████████████████████| 5.3MB 8.0MB/s eta 0:00:01
Collecting rsa<=3.5.0,>=3.1.2 (from awscli->scispacy

In [None]:
from fastai.text import *
from fastai.vision import *
import spacy
from spacy import displacy
import scispacy
import librosa
import librosa.display
import soundfile as sf
from nltk.corpus import stopwords
from wordcloud import WordCloud, STOPWORDS
from collections import Counter
import IPython
import os
from glob import glob
from tqdm import tqdm
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import pylab
import gc
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

In [None]:
# **Step 3: Define Helper Functions**

# Create spectrograms and word frequency plots

In [None]:
def get_wav_info(wav_file):
    data, rate = sf.read(wav_file)
    return data, rate

def create_spectrogram(wav_file):
    # adapted from Andrew Ng Deep Learning Specialization Course 5
    data, rate = get_wav_info(wav_file)
    nfft = 200 # Length of each window segment
    fs = 8000 # Sampling frequencies
    noverlap = 120 # Overlap between windows
    nchannels = data.ndim
    if nchannels == 1:
        pxx, freqs, bins, im = plt.specgram(data, nfft, fs, noverlap = noverlap)
    elif nchannels == 2:
        pxx, freqs, bins, im = plt.specgram(data[:,0], nfft, fs, noverlap = noverlap)
    return pxx

def create_melspectrogram(filename,name):
    # adapted from https://www.kaggle.com/devilsknight/sound-classification-using-spectrogram-images
    plt.interactive(False)
    clip, sample_rate = librosa.load(filename, sr=None)
    fig = plt.figure(figsize=[0.72,0.72])
    ax = fig.add_subplot(111)
    ax.axes.get_xaxis().set_visible(False)
    ax.axes.get_yaxis().set_visible(False)
    ax.set_frame_on(False)
    S = librosa.feature.melspectrogram(y=clip, sr=sample_rate)
    librosa.display.specshow(librosa.power_to_db(S, ref=np.max))
    filename  = Path('/kaggle/working/spectrograms/' + name + '.jpg')
    plt.savefig(filename, dpi=400, bbox_inches='tight',pad_inches=0)
    plt.close()
    fig.clf()
    plt.close(fig)
    plt.close('all')
    del filename,name,clip,sample_rate,fig,ax,S

def wordBarGraphFunction(df,column,title):
    # adapted from https://www.kaggle.com/benhamner/most-common-forum-topic-words
    topic_words = [ z.lower() for y in
                       [ x.split() for x in df[column] if isinstance(x, str)]
                       for z in y]
    word_count_dict = dict(Counter(topic_words))
    popular_words = sorted(word_count_dict, key = word_count_dict.get, reverse = True)
    popular_words_nonstop = [w for w in popular_words if w not in stopwords.words("english")]
    plt.barh(range(50), [word_count_dict[w] for w in reversed(popular_words_nonstop[0:50])])
    plt.yticks([x + 0.5 for x in range(50)], reversed(popular_words_nonstop[0:50]))
    plt.title(title)
    plt.show()

def wordCloudFunction(df,column,numWords):
    topic_words = [ z.lower() for y in
                       [ x.split() for x in df[column] if isinstance(x, str)]
                       for z in y]
    word_count_dict = dict(Counter(topic_words))
    popular_words = sorted(word_count_dict, key = word_count_dict.get, reverse = True)
    popular_words_nonstop = [w for w in popular_words if w not in stopwords.words("english")]
    word_string=str(popular_words_nonstop)
    wordcloud = WordCloud(stopwords=STOPWORDS,
                          background_color='white',
                          max_words=numWords,
                          width=1000,height=1000,
                         ).generate(word_string)
    plt.clf()
    plt.imshow(wordcloud)
    plt.axis('off')
    plt.show()

In [None]:
overview = pd.read_csv('../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/overview-of-recordings.csv')
overview = overview[['file_name','phrase','prompt','overall_quality_of_the_audio','speaker_id']]
overview=overview.dropna()
overviewAudio = overview[['file_name','prompt']]
overviewAudio['spec_name'] = overviewAudio['file_name'].str.rstrip('.wav')
overviewAudio = overviewAudio[['spec_name','prompt']]
overviewText = overview[['phrase','prompt']]
noNaNcsv = '../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/overview-of-recordings.csv'
noNaNcsv = pd.read_csv(noNaNcsv)
noNaNcsv = noNaNcsv.dropna()
noNaNcsv = noNaNcsv.to_csv('overview-of-recordings.csv',index=False)
noNaNcsv

**Part 1 of 3: Exploratory Data Analysis and Data Visualization**


The dataset consists of verbal descriptions of medical symptoms (.wav, audio data) paired with text transcriptions (.csv, text data) and labeled according to the category of the ailment.

Here is a sample of the .csv file that accompanies the .wav audio files.

In [None]:
overview[110:120]

The categories of ailments and the quality of the audio descriptions are described below:

In [None]:
sns.set_style("whitegrid")
promptsPlot = sns.countplot(y='prompt',data=overview)
promptsPlot

qualityPlot = sns.FacetGrid(overview,aspect=2.5)
qualityPlot.map(sns.kdeplot,'overall_quality_of_the_audio',shade= True)
qualityPlot.set(xlim=(2.5, overview['overall_quality_of_the_audio'].max()))
qualityPlot.set_axis_labels('overall_quality_of_the_audio', 'Proportion')
qualityPlot

And here we zoom in on one specific example:


In [None]:
overview[62:63]

In [None]:
en_core_sci_sm = '../input/scispacy-pretrained-models/scispacy pretrained models/Scispacy Pretrained Models/en_core_sci_sm-0.1.0/en_core_sci_sm/en_core_sci_sm-0.1.0'
nlp = spacy.load(en_core_sci_sm)
text = overview['phrase'][62]
doc = nlp(text)
print(list(doc.sents))
print(doc.ents)
displacy.render(next(doc.sents), style='dep', jupyter=True,options = {'compact': True, 'word_spacing': 45, 'distance': 90})

In [None]:
IPython.display.Audio('../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/recordings/test/1249120_20518958_23074828.wav')

Here is another example:


In [None]:
overview[118:119]

In [None]:
en_core_sci_sm = '../input/scispacy-pretrained-models/scispacy pretrained models/Scispacy Pretrained Models/en_core_sci_sm-0.1.0/en_core_sci_sm/en_core_sci_sm-0.1.0'
nlp = spacy.load(en_core_sci_sm)
text = overview['phrase'][118]
doc = nlp(text)
print(list(doc.sents))
print(doc.ents)
displacy.render(next(doc.sents), style='dep', jupyter=True,options = {'compact': True, 'word_spacing': 45, 'distance': 90})

In [None]:
IPython.display.Audio('../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/recordings/test/1249120_43788827_53247832.wav')

These are the most common words that are described in the text descriptions:


In [None]:
plt.figure(figsize=(15,15))
wordCloudFunction(overview,'phrase',10000000)

In [None]:
plt.figure(figsize=(10,10))
wordBarGraphFunction(overview,'phrase',"Most Common Words in Medical Text Transcripts")

**Part 2 of 3: Classify Ailment from Text Description**

Next I will use the [fastai.text_classifier_learner()](https://docs.fast.ai/text.learner.html) functions to categorize the text descriptions according to the ailment category being described.

In [None]:
np.random.seed(7)
path = Path('../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/')
data_clas = (TextList.from_csv(path, 'overview-of-recordings.csv',
                               cols='phrase')
                   .random_split_by_pct(.2)
                   .label_from_df(cols='prompt')
                   .databunch(bs=42))
MODEL_PATH = "/tmp/model/"
learn = text_classifier_learner(data_clas,model_dir=MODEL_PATH,arch=AWD_LSTM)
learn.fit_one_cycle(5)

In [None]:
learn.unfreeze()
learn.fit_one_cycle(5)

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(10,10), dpi=60)

It worked! We were able to classify the category of the ailment being described from the text description of the symptoms and we were able to do so with a high accuracy.  

Now let's try to do the same thing but with the audio descriptions.

**Part 3 of 3: Classify Ailment from Audio Description**

Next I will convert the .wav files into .jpg spectrograms and then again I will attempt to classify the audio descriptions according to the category of the ailment that is being described.

A spectrogram is a visual representation of a sound. The x-axis represents time, the y-axis represents frequency, and the third dimension (intensity or color) represents the amplitutde of a specific frequency at a specific point in time.


In [None]:
testAudio = "../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/recordings/train/1249120_44176037_58635902.wav"
x = create_spectrogram(testAudio)

Prior work has shown that it can be advantageous to transform the spectrogram into a melspectrogram before proceeding with computer vision applications.  For more information, see: https://en.wikipedia.org/wiki/Mel-frequency_cepstrum.

Here is a representative melspectrogram:

In [None]:
filename = "../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/recordings/train/1249120_44176037_58635902.wav"
clip, sample_rate = librosa.load(filename, sr=None)
fig = plt.figure(figsize=[5,5])
S = librosa.feature.melspectrogram(y=clip, sr=sample_rate)
librosa.display.specshow(librosa.power_to_db(S, ref=np.max))

Next I convert all of the .wav audio files into .jpg melspectrogram files.

In [None]:
!mkdir /kaggle/working/spectrograms

Data_dir_train=np.array(glob("../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/recordings/train/*"))
Data_dir_test=np.array(glob("../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/recordings/test/*"))
Data_dir_val=np.array(glob("../input/medical-speech-transcription-and-intent/medical speech transcription and intent/Medical Speech, Transcription, and Intent/recordings/validate/*"))

for file in tqdm(Data_dir_train):
    filename,name = file,file.split('/')[-1].split('.')[0]
    create_melspectrogram(filename,name)
for file in tqdm(Data_dir_test):
    filename,name = file,file.split('/')[-1].split('.')[0]
    create_melspectrogram(filename,name)
for file in tqdm(Data_dir_val):
    filename,name = file,file.split('/')[-1].split('.')[0]
    create_melspectrogram(filename,name)

Then I use the [fastai.create_cnn()](https://docs.fast.ai/vision.learner.html) to classify the melspectrogram images according to the category of the ailment that is being described in the audio description.

In [None]:
path = Path('/kaggle/working/')
np.random.seed(7)
data = ImageDataBunch.from_df(path,df=overviewAudio, folder="spectrograms", valid_pct=0.2, suffix='.jpg',
        ds_tfms=get_transforms(), size=299, num_workers=0).normalize(imagenet_stats)
learn = create_cnn(data, models.resnet50, metrics=accuracy)
learn.fit_one_cycle(10)

In [None]:
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(50)

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(10,10), dpi=60)

In [None]:
!zip -r spectrograms.zip /kaggle/working/spectrograms/
!rm -rf spectrograms/*

In the end we were able to classify the category of the ailment being described from the audio description of the symptoms and we were able to do so with an accuracy that was much better than random chance (albeit much less accurate than earlier when we performed this same classification task using the text transcriptions instead of the audio files).


**Summary**

Here we use the [fastai.text_classifier_learner()](https://docs.fast.ai/text.learner.html) functions to classify text descriptions of medical symptoms according to the category of the ailment  being described.  Likewise, we use the [fastai.create_cnn()](https://docs.fast.ai/vision.learner.html) functions to classify melspectrogram audio descriptions of medical symptoms according to the category of the ailment being described in the audio file.


Please note that some of the labels are incorrect and some of the audio files have poor quality.  To improve the models that are produced by this kernel I would recommend cleaning the dataset in much more detail.

**Credit:**

Inspired by:
* [Jeremey Howard's Deep Learning Course](https://course.fast.ai/) (Lesson 1: Fastai and Convolutional Neural Networks; Lesson 4: NLP; Tabular data; Collaborative filtering; Embeddings)
* [Andrew Ng's Deep Learning Course](https://www.coursera.org/specializations/deep-learning) (Lesson 5: Spectrograms and Audio Data)

With select functions adapted from:

* [most-common-forum-topic-words](https://www.kaggle.com/benhamner/most-common-forum-topic-words) (plot word frequencies)
* [play-audio-read-the-files-create-a-spectrogram](https://www.kaggle.com/vbookshelf/play-audio-read-the-files-create-a-spectrogram) (preview audio files)
* [sound-classification-using-spectrogram-images](https://www.kaggle.com/devilsknight/sound-classification-using-spectrogram-images) (create spectrograms)



