<a href="https://colab.research.google.com/github/youngmoo/ECES-435/blob/main/Midterm_Mini_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Midterm Mini-Project Overview: Audio Codec

In this mini-project you will compress (encode) a music file by saving only select frequencies in the STFT. You will report the compression ratio and observe how it changes for different encoding parameters. Then you will build a decoder for the compressed data and compare your decoded version to the original. 

## Set Up Your Colab Enviornment


First, mount Google Drive so that you can access the shared class drive and files. (You may want to check the notebooks from lecture for a reminder of how this is done.)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Import the libraries you'll be using:


*   `soundfile as sf` for audio signal processing
*   `IPython.display as ipd ` for audio playback
*   `matplotlib as rc` for figure customization
*   `matplotlib.pyplot as plt` for plotting
*   `numpy as np` for some math functions
* `scipy` for STFT calculations
* from `scipy` import `signal` for STFT calculations
* `time` for getting compute time of cells and functions
* `sys` and `pickle` to find the size of an object and serialize the data

In [None]:
#Your code here
import soundfile as sf
import IPython.display as ipd
import matplotlib as rc
import matplotlib.pyplot as plt
import numpy as np
import scipy
from scipy import signal
import time
import sys
import pickle

Tip: To quickly view documentation for a function, you can use the help function. See below.

In [None]:
#help(ipd.Audio)

Tip: To quickly and neatly hide code cell outputs, press Ctrl + C + M + O (for Windows) or Cmnd + C + M + O (for Mac).

## Load the Test Audio for the Mini-Project

In this mini project you are given 3 mono test files. 

Remember your mini-project will be evaluated on two other files that we haven't given you, so we encourage you to try all test samples and make sure that your Codec works for any sample we could give it. 


In [None]:
load_path_1 = '/content/drive/MyDrive/eces435-work/Labs/Lab5/data/CoolSong_mono.wav'
load_path_2 = '/content/drive/MyDrive/eces435-work/Labs/Lab5/data/WildSong_mono.wav'
load_path_3 = '/content/drive/MyDrive/eces435-work/Labs/Lab5/data/ChillSong_mono.wav'

In [None]:
y, fs = sf.read(load_path_3)
y = np.float16(y) #This will make sure that the audio you are comparing with for the project is 16 bit
ipd.Audio(y, rate=fs)

In [None]:
print(type(y))
print(type(y[0]))

# Part 1: Encoder Design

To compress a music excerpt, you will save only select frequencies occurring at a given time. This is an example of lossy compression––the song will certainly be losing some information. But the hope is that perceptually relevant info is getting saved.

To do this you will create an audio `encode()` function. The goal of your encoder is as follows:

* Take in audio and compute the STFT
* For each column/time-step in the STFT (remember each column is a DFT for that time window), pick $n$ frequencies which contribute the most relevant information to the audio
* You must determine how to choose which frequencies and the number of frequencies to save.
  * You might try frequencies with the highest magnitude...or maybe another strategy could provide better audio quality.
* Save the relevant information from the STFT for each frequency contribution:
  * Remember, the frequency weights are complex. You could save real and imaginary parts (or magnitude and phase) for each frequency.
  * Be sure to also save the index of these frequencies (so that your decoder can reconstruct the STFT).
* Output a data structure that contains all the encoded information necessary to decode the audio, so that for each time step you have $n$ frequency contributions saved as an index, and then the relevant frequency information

<br>

You could encode the audio in a variety of ways to increase compression, sound quality, or compute time. Try varying things like $n$ (the number of frequencies you save) and the way you choose relevant frequencies, and find the optimal encoder system. Feel free to refine this even more as you build the decoder.

## 1. Compute STFT

First, use `signal.stft` to compute the Short-time Fourier Transform (STFT) of the song and store it in the variable S.

In [None]:
n_fft = 
n_win = 
n_hop = 
f1, t1, S = signal.stft()

Use the code below to plot the STFT.

In [None]:
fig = plt.figure(figsize=(20,8))
S_mag = np.abs(S)
S_dB = 20*np.log10(S_mag)
plt.pcolormesh(t1, f1, S_dB)
plt.ylabel('Frequency (Hz)')
plt.xlabel('Time (sec)')

What are the dimensions of S? Which dimension is for frequency, which for time bins? 

**Your response here:**



In [None]:
#Your code here

## 2. Create the Encoder
Create a function `encode(y)` which performs the encoding task described above. The inputs should be `y `(time domain audio signal)  and `fs` (sample rate) and the output should be `y_compressed` (the data necessary to reconstruct the encoded audio for each STFT frame, and any other information your decoder needs).

Inside this function you can try various tequniques to encode the audio, like changing $n$ or your frequency peak picking strategy.

*Hint:* The output of signal.stft is a matrix of Python complex numbers. Complex numbers in Python are 128-bits (64-bit floating point numbers for both the real and imaginary components). You don't necessarily need to quantize at such a high resolution (it will definitly affect your compression ratio). Try using `dtype = ` to control the type of data and number of bits used for different elements of `y_compressed`.
* [List of NumPy standard data types](https://numpy.org/doc/stable/user/basics.types.html)

In [None]:
'''
Some examples of changing the dtype
'''

#Setting dtype when creating an array:
example_1 = np.zeros((10,10), dtype = 'float16') #Use dtype = 'type'
example_1_datatype = type(example_1[0,0]) #Looking at the type of an element in the array (example_1 is type 'numpy.ndarray')
print('Element [0,0] of example_1 is ', example_1[0,0], ' a ', example_1_datatype )

#Changing dtype of an existing item
example_1_f = np.int16(example_1) #Use np.'type'()
example_1_f_datatype = type(example_1_f[0,0]) 
print('Element [0,0] of example_1_f is ', example_1_f[0,0], ' a ', example_1_f_datatype )

Element [0,0] of example_1 is  0.0  a  <class 'numpy.float16'>
Element [0,0] of example_1_f is  0  a  <class 'numpy.int16'>


In [None]:
def encode(y, fs): #input is audio and sample rate
    f1, t1, S = signal.stft()

    y_compressed =  [indices, values, ...]
    return y_compressed                          #return all information needed for the decoder you will build

## 3. Encode the Audio

Encode your sample and save the output to `y_compressed`.


In [None]:
#Your code here
y_compressed = encode(y,fs)

### Compression Ratio
You will be using the compression ratio to test the performance of your encoder. So, go ahead and compute it with your current settings and then you may want to experiment to obtain a better compression ratio. <br>

Compression ratio is the compressed size over the original size (so that it is a percentage of the original size - smaller will be better).

To find the size of a Python object, we first serialize the object into a string of bytes using `pickle.dumps()` and then get its size using `sys.getsizeof()`.

For instance to get the size of an array x, do: `sys.getsizeof(pickle.dumps(x))`

Reminder: The original audio y you compare to should be 16 bit audio. This should have been taken care of when you loaded in `y` but you can double check the type of a python item using `type()`. The type of `y_compressed` depends on your encdoder. 


In [None]:
#Your code here
Size_y = sys.getsizeof(pickle.dumps(y))
print('Size of the original file: ', Size_y)
Size_y_compressed = 
print('Size of the compressed file: ', Size_y_compressed)
Compression_Ratio =

#Part 2: Decoder Design

Now you will create a function to decode the encoded signal in `y_compressed`. Remember that the encoding was made by keeping only information from certain frequencies and indices that tell you which frequency the data belongs to.

Reconstruct the approximated STFT from your compressed data. Then you can perform an inverse short-time Fourier transform (ISTFT) to get back to time domain audio.


## 4. Create the Decoder

Create a function `decode() ` which takes in the output of the encoder `encode()`  (`y_compressed`) and outputs `S_reconstructed` (the STFT reconstruction which you will use to plot the STFT after compression) and `y_reconstructed` (the audio reconstructed using `signal.istft()`)


In [None]:
def decode(y_compressed):
    
    return S_reconstructed, y_reconstructed

In [None]:
#Decode y_compressed here
S_reconstructed, y_reconstructed = decode(y_compressed)

## 5. Plot the Reconstructed STFT

Plot S_reconstructed using the code below. Compare it to the original to see how much information you are saving in y_compressed vs the original signal.

In [None]:
fig = plt.figure(figsize=(20,8))
S_mag = np.abs(S_reconstructed)
S_dB = 20*np.log10(S_mag)
plt.pcolormesh(t1, f1, S_dB)
plt.ylabel('Frequency (Hz)')
plt.xlabel('Time (sec)')

## 6. Listen to the Reconstructed Audio

In [None]:
ipd.Audio(y_reconstructed, rate=fs)

# Part 3: Evaluate your system

For each of the following evaluations (Perception, Compression Ratio, SNR, and Computation Time) report the results for **all three test samples** provided.

## 7. Perception

First, you will perform a perceptual evaluation. Listen to the original audio clip and then the audio reconstructed using the decoder.

Explain the differences you hear. Try a few parameter settings for the encoder and decoder and explain which ones you think affect reconstructed audio quality the most?

**Your response here:** <br>


## 8. Compression Ratio
Now, compute the compression ratio of this encoder for the given song and report it below. Refer to the instructions in Part 1.

Report on the Conpression Ratio of your encoder in the optimal setting you chose and report those settings.

**Your response here:** <br>
Compression Ratio:

Settings Used:
* 
* <br>
...

In [None]:
#Your code here


## 9. Signal to Noise Ratio (SNR)
Signal-to-noise ratio provides a way to measure the similarity between two signals. Use the `SNR()` function provided to compute SNR between `y` and `y_compressed`.
Report your SNR below for the optimal Encoder/Decoder and report those settings. <br>

<br>

**Your response here:** <br>
SNR:

Settings: 
* 
* <br> 
...


In [None]:
def SNR(original, output):
    # Normalize
    original = original/np.max(np.abs(original)) # Normalize to -1 to 1
    output =  output/np.max(np.abs(output)) # Normalize to -1 to 1
    
    # Compute SNR
    noise = original-output
    
    powS = np.mean(original**2)
    powN = np.mean(noise**2)

    snr = powS/powN
    snr = 10*np.log10(snr)
    return snr

In [None]:
#Your code here

## 10. Computation Time

You can time how long your encoder and decoder take to run by using the `%time` command introduced in Lab 4. Use the code below to evalute the Computation Time for your system.

In [None]:
%time encode(y) #Gets runtime for a line in the cell

In [None]:
%time decode(y_compressed) 

Report the runtime for the optimal settings you described in the Compression Ratio and SNR section.

**Your response here:** <br>
* Encoder Time:
* Decoder Time:

## 11. Answer the following questions.

a. How did you choose the parameters for your Encoder and Decoder? Explain the advantages and tradeoffs of your chosen parameters.

b. Did you try other numbers to find the optimal settings? How many?


**Your responses here:**

a.

b.


# Completing the Mini-Project

To submit this Mini-Project *share the notebook with charis.cochran@excitecenter.org* AND *share the notebook with youngmoo@excitecenter.org* AND *submit the link as a text submission on the Mini-Project assignment on BbLearn* to receive credit for this lab. (ONLY share with charis.cochran@excitecenter.org AND youngmoo@excitecenter.org)

*Ensure all cells and plots have been run and are visible in the notebook before submitting. Also, make sure you responded to the short answer questions. Try the encoder and decoder with all test files to ensure it can work agains the test files we will use (we did not give you the test files we will use for grading!) Submiting the link to BbLearn means the lab has been submitted and is ready for grading. DO THIS LAST. 