# Week 10: Sliding-window audio analysis

### 19 March 2017

# Goals #

After doing this lab, you should be able to:
* Reason about what RMS and spectral centroid are telling you about an audio signal
* Implement a sliding-window feature extraction to analyse how sound is changing over time

# Part 1: Sliding window analysis using RMS #

In this section, you will see how to use RMS for simple sound volume analysis

## Loading some sound files ##
a. Start by grabbing some sound files 

* http://www.doc.gold.ac.uk/~mas01rf/PMC2014-15/IPython/lab13/song1.wav
* http://www.doc.gold.ac.uk/~mas01rf/PMC2016-17/lab19/loud.wav
* http://www.doc.gold.ac.uk/~mas01rf/PMC2016-17/lab19/soft.wav
* http://www.doc.gold.ac.uk/~mas01rf/PMC2016-17/lab19/saw.wav
* http://www.doc.gold.ac.uk/~mas01rf/PMC2016-17/lab19/sine.wav

The first is a free sound downloaded from
http://freemusicarchive.org/music/Jahzzar/Travellers_Guide/Siesta

The others are modified Garage Band loops (Traffic Jam Guitar, Kyoto Night Guitar), and finally two files I synthesised from scratch.

b. Now load them into variables:

In [None]:
song1 = wavReadMono("song1.wav")
loud = wavReadMono("loud.wav")
soft = wavReadMono("soft.wav")
sineTone = wavReadMono("sine.wav")
sawTone = wavReadMono("saw.wav")
#Listen to them if you'd like:
play(song1)
play(loud)
play(soft)
play(sineTone)
play(sawTone)

As we saw in lecture, RMS stands for "root mean square." To compute the RMS of a signal over some analysis frame, you will:
* Square every sample
* Take the average of these squares
* Take the square root of the average

As a mathematical equation, this looks like:

$$ r = \sqrt{\frac{1}{N}(x_1^2 + x_2^2 + ... + x_N^2)} $$

for any sound with *N* samples.

Use the `sqrt`, `mean`, and `pow` functions in Python to compute the RMS for `loud` (compute a single value over all the samples in the file). Hint: You can do this in one line of code without a for-loop.

Now compute this for all the samples in the `soft` file. Verify that the RMS for this file is indeed lower than for the `loud` file.

## Analysing change over time ##

The following code uses a *sliding analysis frame* of 128 samples, and it computes the average of all 128 samples in each frame of some sound file and stores the average in an array. This isn't very useful though. Edit the code so that it computes the RMS in each frame instead.


In [None]:
sound = song1 #choose which sound to analyse

win_length = 128 #number of samples in analysis frame
hop_size = 10

# the number of full-length analysis frames
results = []; #an empty array
win_index = 0 #start with the analysis frame at the beginning of the file
while (win_index < size(sound)) :  #Slide the frame until we reach the end of the file
    next_frame = sound[win_index:(win_index + win_length)] #grab the next frame of audio
    next_result = sum(next_frame)/win_length #CHANGE THIS LINE TO COMPUTE RMS INSTEAD!
    results = concatenate([results, [next_result]]) #concatenate this result to the end of the results array
    win_index = win_index + hop_size #advance the analysis frame location by 'hop_size' samples
    
plot(results) #plot the analysis results over time


Once you have this function implemented correctly, examine the RMS of the loud and soft sounds over time, as well as the RMS of song1 over time. Use the space below to explore at least one of the following questions, then write a few sentences about what you did.

* Can the RMS help you identify louder and softer parts in a file? e.g., can you detect when the sound is silent? Can you detect louder versus softer notes?
* How useful is the RMS for showing you where notes begin? Are there some notes whose beginnings ("onsets") are easier or harder to see using RMS? Are there certain instruments whose onsets are easier to spot using RMS?
* If you tried to implement a search system that ranked all the audio files on your computer based on the closeness of their RMS value to a query file, would this be useful for anything? If so, what? 
* If you ranked all the frames in a file according to their RMS value, from low to high, would this be useful for anything? if so, what?
* How else might you use RMS in audio analysis, search, or recommendation?

# Part 2. Sliding window analysis using spectral centroid #

The spectral centroid is another easy to compute audio feature, which tells us something about the spectral content of a sound. It can tell us whether one sound has a "brighter" timbre than another, and it can give us a hint about the instrumentation or mastering process.

The spectral centroid for a frame of audio is computed by taking a weighted average of the FFT magnitude bins (from 0 to the Nyquist rate). Each bin magnitude is weighted by the frequency corresponding to that bin.

In math, this looks like:

$$ c = \frac{\sum_{k=0}^{N/2-1} f_k \lvert X(k)\rvert }{ \sum_{k=0}^{N/2-1} \lvert X(k)\rvert } $$

where $\lvert X(k)\rvert $ is the magnitude of the $k$th FFT bin and $f_k$ is the frequency corresponding to that bin.

The code below computes the spectral centroid for a segment of audio stored in a variable called `sound`. Make sure you understand how this is computing the equation above.


In [None]:
halfSize = int(size(sound)/2)
f = abs(fft.fft(sound)[0:halfSize]) #the magnitude spectrum, from 0 to nyquist
freqs = (fft.fftfreq(size(sound), 1/44100))[0:halfSize] #the frequency values for each bin, from 0 to nyquist
centroid = sum(freqs * f)/sum(f) # the weighted sum of magnitudes divided by the unweighted sum of magnitudes
print centroid #print it out

Compute the spectral centroid for loud, soft, sineTone, and sawTone. What do you observe?

Using the sliding window analysis code for average / RMS above, implement a sliding window analysis for spectral centroid below. Use an analysis frame size of 2048 and a hop size of at least 100.

In [None]:
#put your sliding window centroid analysis here






Experiment with applying your function to song1, loud, soft, sawTone, and sineTone. You might want to add samples of your own (e.g., examples of speech, drum tracks, sound effects, ...) Explore at least a few of the following questions, then write a few sentences about what you did and what you found.

* How does the spectral centroid relate to your perception of brightness for the sine tone and saw tone?
* What does the spectral centroid seem to tell you about the music files? e.g., brightness, instrumentation, ...?
* There are 4 places in the loud file where the centroid is very high. What has happened here?
* If you were to rank all frames within a file from low spectral centroid to high centroid, what might this sound like? How might you use this?
* If you were to build a search tool that finds audio files on your computer with a similar spectral centroid to an example file you provide, how might this be useful?

In [None]:
# your work and answers here







# Part 3: STFT analysis of pitch #

Notice that if you got your spectral centroid sliding window analysis to work, you've just implemented an STFT! Instead of computing the centroid using each FFT frame, why not do something else?

For instance, let's try writing a simple pitch tracker. The simplest way to guess the pitch of a frame might be to take the frequency of the bin with the highest magnitude. In lecture, we talked about how this may often be wrong. But how wrong?

Modify your sliding window analysis to put the frequency of the most prominent current pitch in each element of `results.` Don't forget, you can use `argmax` to get the index of the largest element in an array.

You can test this on sineTone and sawTone, both of which have a pitch of 440Hz. Try tracking pitch over time for the `soft` file. How well do you think it works?

In [None]:
# your work here




