**A scrapbook to explore librosa methods and how they impact our data.**

# Import the usual stuff 

In [7]:
# Usual Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import sklearn

# Librosa (the mother of audio files)
import librosa
import librosa.display
import IPython.display as ipd
import warnings
warnings.filterwarnings('ignore')

## Getting data/wav files from (local) folder

In [8]:
import os
general_path = '../k_means_klang/raw_data/Data'
print(list(os.listdir(f'{general_path}/genres_original/')))

['hiphop', 'classical', 'blues', 'metal', 'jazz', 'country', 'pop', 'rock', 'disco', 'reggae']


In [30]:
data = pd.read_csv(f'{general_path}/features_30_sec.csv')
data.iloc[3:4]


Unnamed: 0,filename,length,chroma_stft_mean,chroma_stft_var,rms_mean,rms_var,spectral_centroid_mean,spectral_centroid_var,spectral_bandwidth_mean,spectral_bandwidth_var,...,mfcc16_var,mfcc17_mean,mfcc17_var,mfcc18_mean,mfcc18_var,mfcc19_mean,mfcc19_var,mfcc20_mean,mfcc20_var,label
3,blues.00003.wav,661794,0.404785,0.093999,0.141093,0.006346,1070.106615,184355.942417,1596.412872,166441.494769,...,44.427753,-3.319597,50.206673,0.636965,37.31913,-0.619121,37.259739,-3.407448,31.949339,blues


# A. Creating variables 


- Sound: sequence of vibrations in varying pressure strengths (y)
- The sample rate (sr) is the number of samples of audio carried per second, measured in Hz or kHz
- audio_file is the sound or y, but trimmed using a librosa function
note: librosa.load()... This function loads an audio file.

Example for this excercise is blues.00003.wav

In [43]:
#Use librosa.load to the sound and the sample rate 
y, sr = librosa.load(f'{general_path}/genres_original/blues/blues.00003.wav')

y.shape

(661794,)

**the method below** returns.. 

A trimmed version of the audio signal (i.e., without the silence at the beginning and end) which is reassigned back to y.
A second output (usually the indices of the non-silent frames) is returned, but here it is ignored by assigning it to _.

In [36]:
#using librosa.effects.trim 
audio_file, _ = librosa.effects.trim(y)

audio_file.shape[0]

661794

# B. Understanding the Librosa functions needed to create the csv (for 30 sec wav files)

**the different features - 58 in total**
 1. Zero Crossing Rate,  (mean and var)
 2. Harmonics  (mean and var)
 3. Perceptrual  (mean and var)
 4. Tempo
 5. Spectral Centroid  (mean and var)
 6. Spectral Rolloff  (mean and var)
 7. Spectral Bandwidth (mean and var)
 8. Mel-Frequency Cepstral Coefficients (20 different coefficients) (mean and var)
 9. Chroma (mean and var)
 10. rms energy (mean and var)
 11. lenghth of the audio file (audio_file.shape[0]) .

*58 features in total*


### 1. Zero Range 
- the rate at which the signal changes from positive to negative or back.



In [60]:
# Total zero_crossings in our 1 song 
zero_crossings = librosa.zero_crossings(audio_file, pad=False)
# print(sum(zero_crossings))
print('Zero crossing mean is:', np.mean(zero_crossings))
print('Zero crossing var is:', np.var(zero_crossings))




Zero crossing mean is: 0.03335932329395552
Zero crossing var is: 0.03224647884332488


### 2. Harmonics and 3. Perceptrual
- Harmonics are characteristics that human ears can't distinguish (represents the sound color)
- Perceptrual understanding shock wave represents the sound rhythm and emotion
 here we use librosa.effects.hpss on our y trimmed ie. our "audio_file"

In [63]:
y_harm, y_perc = librosa.effects.hpss(audio_file)
print('harmony var is:', np.var(y_harm))
print('harmony mean is:', np.mean(y_harm))

print('perceptrual var is:', np.var(y_perc))
print('perceptrual is:', np.mean(y_perc))


harmony var is: 0.019055042
harmony mean is: 3.6126078e-06
harmony var is: 0.0027113643
harmony mean is: -1.8110986e-05


In [39]:
# y_harm

### 4. Tempo BMP (beats per minute)¶
-  using librosa.beat.beat_track methond on audio_file


In [143]:
tempo_value, _ = librosa.beat.beat_track(y=audio_file, sr = sr)
tempo_value.item()




63.02400914634146

### 5. Spectral Centroid
- indicates where the ”centre of mass” for a sound is located and is calculated as the weighted mean of the frequencies present in the sound.
- we use 'librosa.feature.spectral_centroid' method  (when visualizing we need to slice [0] because is a 2 dim list)

In [93]:
# Calculate the Spectral Centroids
spectral_centroids = librosa.feature.spectral_centroid(y=audio_file, sr=sr)[0]
print(spectral_centroids)

print('spectral_centroids var is:', np.var(spectral_centroids))
print('spectral_centroids mean is:', np.mean(spectral_centroids))

[ 694.20629018  696.33642143  659.9642591  ... 1008.04442371 1007.3869005
 1124.11357557]
spectral_centroids var is: 184366.00943826674
spectral_centroids mean is: 1070.1534175250665


### 6. Spectral Rolloff
-  is a measure of the shape of the signal. It represents the frequency below which a specified percentage of the total spectral energy, e.g. 85%, lies
-  we will use 'librosa.feature.spectral_rolloff' method


In [102]:
spectral_rolloff = librosa.feature.spectral_rolloff(y=audio_file, sr=sr)[0]
spectral_rolloff

print('spectral_rolloff var is:', np.var(spectral_rolloff))
print('spectral_rolloff mean is:', np.mean(spectral_rolloff))

spectral_rolloff var is: 1493077.8489404472
spectral_rolloff mean is: 2184.8790286035382


### 7. Spectral bandwidth
- The spectral bandwidth measures the spread of the spectrum around its centroid (mean frequency), giving you an idea of how "wide" or "narrow" the distribution of energy is across frequencies. This can be useful in characterizing the timbre or texture of a sound.
- we will use  method 'feature.spectral_bandwidth'



In [144]:
# Calculate the spectral bandwidth
bandwidth = librosa.feature.spectral_bandwidth(y=audio_file, sr=sr)

# Print the shape and a summary statistic (mean bandwidth)
print("Spectral bandwidth shape:", bandwidth.shape)
print("Mean spectral bandwidth (Hz):", np.mean(bandwidth))
print("Var spectral bandwidth (Hz):", np.var(bandwidth))

Spectral bandwidth shape: (1, 1293)
Mean spectral bandwidth (Hz): 1596.422564453577
Var spectral bandwidth (Hz): 166551.84424342896


### 8. Mel-Frequency Cepstral Coefficients:¶
- ie. The timbre! 
- The Mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10–20) which concisely describe the overall shape of a spectral envelope. It models the characteristics of the human voice.
- we will use the librosa.feature.mfcc method


In [116]:
mfccs = librosa.feature.mfcc(y=audio_file, sr=sr)

# we get 20 mfccs
print('mfccs shape:', mfccs.shape)

#now to calculate mean and var for each mfcc coefficient (ie. each row) 
mfcc_means = np.mean(mfccs, axis=1)
mfcc_vars = np.var(mfccs, axis=1)

print("Mean of each MFCC coefficient:", mfcc_means)
print("Variance of each MFCC coefficient:", mfcc_vars)

mfccs shape: (20, 1293)
Mean of each MFCC coefficient: [-199.57513     150.0861        5.663404     26.855282      1.7700713
   14.232647     -4.827845      9.286853     -0.75612015    8.134435
   -3.200026      6.078081     -2.4784453    -1.0815871    -2.8744543
    0.77399397   -3.3240693     0.63631064   -0.6159675    -3.405046  ]
Variance of each MFCC coefficient: [5508.266     456.30908   257.10977   158.36111   267.97372   126.80741
  155.94673    81.2703     92.3018     71.381836  110.269356   48.214516
   56.776222   62.243008   51.609818   44.432903   50.218452   37.325726
   37.257774   31.965254]


### 9. Chroma Frequencies¶
- Chroma features are an interesting and powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave.
- we will use 'librosa.feature.chroma_stft'

In [120]:
# Increase or decrease hop_length to change how granular you want your data to be
hop_length = 5000

# Chromogram
chromagram = librosa.feature.chroma_stft(y = audio_file, sr=sr, hop_length=hop_length)
print('Chromogram shape:', chromagram.shape)

Chromogram shape: (12, 133)


In [121]:
print('chromagram var is:', np.var(chromagram))
print('chromagram mean is:', np.mean(chromagram))

chromagram var is: 0.094975464
chromagram mean is: 0.42627874


### 10. RMS energy
- RMS energy in the context of an audio signal is a measure of the signal's power. It’s calculated as the square root of the average of the squares of the amplitude values over a given time frame. This metric effectively represents the "loudness" or energy of the audio, smoothing out rapid fluctuations in amplitude to provide a stable measure of how strong the signal is on average. It's widely used in audio processing for tasks like volume normalization, dynamic range compression, and silence detection.


In [126]:
# Compute RMS values per frame
rms_values = librosa.feature.rms(y=audio_file)
print(rms_values)

print('rms var is:', np.var(rms_values))
print('rms mean is:', np.mean(rms_values))

[[0.10660144 0.13829172 0.16152988 ... 0.01791866 0.01520183 0.01265691]]
rms var is: 0.006347602
rms mean is: 0.14104027


### Length 
- lenghth of the audio file (audio_file.shape[0]) 

In [127]:
audio_file.shape[0]

661794