# Week 03

## Bigger Lists

## Setup

Include some helper functions and libraries

In [None]:
!wget -q https://github.com/DM-GY-9103-2024F-H/9103-utils/raw/main/src/data_utils.py

In [None]:
import matplotlib.pyplot as plt

from data_utils import object_from_json_url

### Load ANSUR 2 Databse

The `JSON` file has a subset of the measurements found [here](https://www.openlab.psu.edu/ansur2/).

In [None]:
ANSUR_JSON_URL = "https://raw.githubusercontent.com/DM-GY-9103-2024F-H/9103-utils/main/datasets/json/ansur.json"
ansur = object_from_json_url(ANSUR_JSON_URL)

# TODO: look at the data

# Answer:
#   - how many rows/records/items ?
#   - longest ear ?
#   - height of person with longest ear ?
#   - tallest height ?
#   - average height ?

### Let's look at a simpler versions:

In [None]:
AHW_JSON_URL = "https://raw.githubusercontent.com/DM-GY-9103-2024F-H/9103-utils/main/datasets/json/ansur_age_height_weight_object.json"
ahw_objs = object_from_json_url(AHW_JSON_URL)

# TODO: look at data
# How is it organized ?

In [None]:
AHW_LIST_URL = "https://raw.githubusercontent.com/DM-GY-9103-2024F-H/9103-utils/main/datasets/json/ansur_age_height_weight.json"
ahws = object_from_json_url(AHW_LIST_URL)

# TODO: look at data
# How is it organized ?

# Answer the following:
#   - how many items ?
#   - how do we access the height of a person ?
#   - tallest height ?
#   - average height ?

## List of Lists

Just like we can put lists inside objects, and objects inside lists, we can also put lists inside lists.

If we want to get to a particular value we have to use $2$ indices instead of using just one:
`list[i][j]`

The first index tells Python which of the sub-lists we want, and the second specifies the item on that list.

<img src="./imgs/list-of-lists00.jpg" width="700px" />

<img src="./imgs/list-of-lists01.jpg" width="700px" />

Sometimes we'll refer to the first index as the row index and the second index as the column index.

That's because if we imagine our list of lists as a 2-dimensional matrix of numbers, the first index tells Python which row we want to access and the second tells which column:

<img src="./imgs/list-of-lists02.jpg" width="700px" />

<img src="./imgs/list-of-lists03.jpg" width="700px" />

### Datasets

We'll see this kind of structure a lot.

It's very common for datasets to be organized by rows/columns, where each column specifies a different *property* (or *feature*) and each row is a different *measurement* (or *record*) of those features.

In our example above, our dataset had $3$ *features* (age, height, weight), and one *record* per person.

<img src="./imgs/datasets00.jpg" width="700px" />

### JSON

It's also common to find datasets specified in the JSON format.

Instead of just being a list of lists with values, each *record* is an object that specifies the names and values of its *features*:

<img src="./imgs/datasets01.jpg" width="700px" />

There are advantages and disadvantages to each. We'll soon look at another way to organize datasets that will make it easier to go from one type to the other if we have to.

## Plots

We can use the [matplot](https://matplotlib.org/stable/api/pyplot_summary.html) library to visualize our data.

In [None]:
# TODO: get heights
heights = []

plt.plot(heights, 'bo', markersize=2)
plt.show()

In [None]:
# TODO: get weights
weights = []

plt.plot(weights, 'ro', markersize=2)
plt.show()

In [None]:
# TODO: plot ages in green
ages = []

### Sorting data can give a different perspective

In [None]:
sorted_heights = sorted(heights)
plt.plot(sorted_heights, 'bo', markersize=2)
plt.show()

### Histograms

In [None]:
min_height = min(heights)
max_height = max(heights)
plt.hist(heights, bins=range(min_height, max_height + 1))
plt.grid()
plt.show()

## Correlation

Measurement of how $2$ independent variables (features) are related to each other.

<img src="./imgs/correlation.jpg" width="800px" />

They can have *positive* or *direct* correlation, if an increase in one of the variables comes with an increase in the other.

They can have *negative* or *inverse* correlation if an increase in one of the variables is accompanied by a decrease in the other.

Or, there can be *weak* or *NO* correlation, if a change in one variable doesn't seem to be accompanied by a change in the other.

In [None]:
# use "column" lists from above to plot scatter plot
plt.scatter(ages, heights, marker='o', alpha=0.2)
plt.xlabel("age")
plt.ylabel("height")
plt.show()

In [None]:
# TODO plot other combinations of variables
# TODO: any correlation ?

# Other Kinds of Lists

## Audio

### Setup

Run the following 2 cells to import all necessary libraries and helpers for this week's exercises

In [None]:
!wget -q https://github.com/DM-GY-9103-2024F-H/9103-utils/raw/main/src/audio_utils.py

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import wave

from IPython.display import Audio

from audio_utils import wav_to_list, list_to_wav
from audio_utils import fft, stft, cluster_fft_freqs

## Digital Audio

Air pressure waves converted to electrical pulses, which are then sampled and turned into a sequence of numbers.

<img src="./imgs/audio-00.jpg" width="720px">

### Playing an audio file

Easy !

In [None]:
display(Audio("./data/two-bits.wav"))

### Look at the `data/` directory, and load and play some of the other files:

In [None]:
# TODO: play other files

### Loading an audio file for analysis, manipulation, etc

is a bit more work. The Python [wave module](https://docs.python.org/3/library/wave.html) helps a lot, but there are still some steps that we have to take.

Let's first open a `.wav` file and read it into a wave object:

In [None]:
sound_file_path = "./data/air-horn.wav"
wav_in = wave.open(sound_file_path, mode="rb")

print(wav_in.getparams())
display(Audio(sound_file_path))

### Audio length, channels, samples, rate, depth

<img src="./imgs/audio-01.jpg" width="720px">

`Audio length`: The duration of an audio file in seconds. $Audio\ Length = \frac{Number\ of\ Samples}{Sample\ Rate}$

`Channels`: The different signals that make up an audio file.

`Amplitude`: The strength of the audio signal. Related to volume.

`Samples`: List of numbers that represent the amplitude of an audio signal at specific time intervals.

`Frame`: Collection of samples from all channels at a given time. $Number\ of\ Frames = \frac{Number\ of\ Samples}{Number\ of\ Channels}$

`Sample Rate`: How many times per second the original audio signal was recorded. $Sample\ Rate = \frac{Number\ of\ Frames}{Audio\ Length}$

`Bit Depth` / `Sample Width`: How many different unique numbers are used to represent a sample.

### Open some of the other files in the `data/` directory and print their parameters

Do those make more sense now?

In [None]:
# TODO: open and print params for some of the other files

## Samples
### Getting sample values

We first have to open a `.wav` file with `wave.open()` to get a file object.

We can then use the file object's `readframes()` function to read the file's contents into a buffer of `bytes`, and the `frombuffer()` function to turn a buffer of `bytes` into a list of `integers`.

And, finally, we can use `list()` to put it all inside a regular Python list.

<img src="./imgs/audio-02.jpg" width="720px">

# 😫

That's a lot of cryptic lines of code just to open a file and get a list of numbers !

In [None]:
sound_file_path = "./data/western.wav"
wav_in = wave.open(sound_file_path, mode="rb")
read_buffer = wav_in.readframes(wav_in.getnframes())
my_samples = list(np.frombuffer(read_buffer, dtype=np.int16))

### Get number of samples
There's a way to calculate the number of samples from the wave object's parameters.

Can you get it from the `my_samples` list?

What about the min and max sample values?

In [None]:
# TODO: get number of samples
# TODO: get min/max sample values

### Visualizing

At least we can visualize it now using matplotlib and play it from the list of samples.

In [None]:
plt.plot(my_samples)
plt.show()

display(Audio(my_samples, rate=44100))

# 😫😫

For sound files with more than one channel, the `Audio()` function expects the samples in a format that is different from the one returned by `wave.open()` and `wave.readframes()`.

<img src="./imgs/wav-x-audio.jpg" height="400px">

Argh!

We can give `Audio()` every other sample and listen to just one of the channels.

We could do this with a for-loop or list comprehension, but we can also use slicing with a third parameter.

Just like the `range()` function can take a third parameter to specify the steps/skips in a sequence, we can use slice with a third parameter to step through the array by every other sample:

In [None]:
display(Audio(my_samples[::2], rate=44100))

But, it's better to use a function to read our wave files and return a single-channel array that combines all of the channels in an audio file.

The `wav_to_list()` function does exactly this:
<br>It goes through the samples array and returns the average of the sample values for each frame.

For a 2-channel audio file it sums every $2$ samples and divides that sum by $2$.

In [None]:
sound_file_path = "./data/western.wav"
my_samples = wav_to_list(sound_file_path)

# TODO : check the length of the samples array, and its min and max sample values

Now we can plot and listen to the samples correctly and do any processing using only one list of samples

In [None]:
plt.plot(my_samples)
plt.show()

display(Audio(my_samples, rate=44100))

# 👯

### Repeat the previous process for a different audio file
Open an audio file and get a list of its samples.
Play the audio to make sure it sounds like what you expect.

In [None]:
# TODO: read another file into an array of samples

## Manipulating Audio

Once we have a list of samples we can process, analyze and manipulate the audio by performing list operations and simple arithmetics.

<img src="./imgs/audio-02.jpg" width="720px">

### Change volume

To change the volume of an audio file all we have to do is multiply its samples by a constant.

If the constant is greater than $1$ it will get louder, if it's between $0$ and $1$ it will get softer.

<img src="./imgs/audio-04.jpg" width="720px">

### Process the samples array to makes the audio softer and then louder
Check results visually and by listening to the audio

In [None]:
sound_file_path = "./data/air-horn.wav"
my_samples = wav_to_list(sound_file_path)

plt.plot(my_samples)
plt.show()
display(Audio(sound_file_path))

# TODO: make samples softer and louder
softer_samples = []
louder_samples = []

### Check modified samples

In [None]:
# Check modified samples
plt.plot(softer_samples)
plt.show()

display(Audio(softer_samples, rate=44100))

# TODO: check louder samples

### Change speed

If we just duplicate each sample in our sequence, while keeping the sample rate the same, we'll end up with an audio file that is twice as long as the original.

<img src="./imgs/audio-05.jpg" width="720px">

And, conversely, if we remove every other sample, we'll get an audio signal that is half of the original length.

### Process the samples array to makes the audio shorter and longer
Check results visually and by listening to the audio

In [None]:
sound_file_path = "./data/horn.wav"
my_samples = wav_to_list(sound_file_path)

plt.plot(my_samples)
plt.show()
display(Audio(sound_file_path))
print(len(my_samples), "samples")

# TODO: double and half the samples to hear the effects
double_samples = []
half_samples = []

### Check modified samples

In [None]:
# TODO: check visually and by listening

### Reverse

Flipping the order of the samples will make the audio sound backwards.

<img src="./imgs/audio-06.jpg" width="720px">

The following cell reverses the samples

In [None]:
sound_file_path = "./data/two-bits.wav"
my_samples = wav_to_list(sound_file_path)

rev_samples = list(reversed(my_samples))

And we can check the effect running the cell below

In [None]:
plt.plot(my_samples)
plt.show()
display(Audio(sound_file_path))
print(my_samples[:16])

plt.plot(rev_samples)
plt.show()
display(Audio(rev_samples, rate=44100))
print(rev_samples[-16:])

### Combining sounds

To combine two audio signals, to have them play on top of each other, we just have to add every sample $S_{A_i}$ of our first audio file with it's corresponding sample in the second audio file $S_{B_i}$.

<img src="./imgs/audio-07.jpg" width="720px">

In this situation we can use the `zip()` function, which returns a sequence that is made up of pairs of elements from other sequences.

For example, if we have:
```python
A = [10,11,12,13,14]
B = [20,21,22,23,24]
```

then, `zip(A,B)` will give us this list:
```python
[(10,20), (11,21), (12,22), (13,23), (14,24)]
```

It's like a zipper, where it builds its elements from one element of each of its arguments.

The `zip()` function is smart and will stop zipping once either of the two sequences runs out of samples.

### Use zip and the two sample sequences below to combine two sequences of samples
We might have to soften the resulting sums to avoid distortions by guaranteeing that the samples don't get too loud.

In [None]:
two_bit_file_path = "./data/two-bits.wav"
two_bit_samples = wav_to_list(two_bit_file_path)

air_horn_file_path = "./data/air-horn.wav"
air_horn_samples = wav_to_list(air_horn_file_path)

plt.plot(two_bit_samples)
plt.show()
display(Audio(two_bit_file_path))
print(len(two_bit_samples), "samples")

plt.plot(air_horn_samples)
plt.show()
display(Audio(air_horn_file_path))
print(len(air_horn_samples), "samples")

# TODO: sum samples
sum_samples = []

### Check results of sum

In [None]:
plt.plot(sum_samples)
plt.show()
display(Audio(sum_samples, rate=44100))
print(len(sum_samples), "samples")

### Splicing

Here we want to add the second wave after the first.

In Python we can use addition to concatenate two lists:
```python
A = [0,1,2,3]
B = [4,5,6,7]
C = A + B
```

The `C` variable now holds `[0,1,2,3,4,5,6,7]`.

We can also use slicing to select parts of the two sounds before adding them.

In [None]:
two_bit_file_path = "./data/two-bits.wav"
two_bit_samples = wav_to_list(two_bit_file_path)

air_horn_file_path = "./data/air-horn.wav"
air_horn_samples = wav_to_list(air_horn_file_path)

plt.plot(two_bit_samples)
plt.show()
display(Audio(two_bit_file_path))
print(len(two_bit_samples), "samples")

plt.plot(air_horn_samples)
plt.show()
display(Audio(air_horn_file_path))
print(len(air_horn_samples), "samples")


### This sum just places the second audio right after the first

In [None]:
sum_samples = two_bit_samples + air_horn_samples

plt.plot(sum_samples)
plt.show()
display(Audio(sum_samples, rate=44100))
print(len(sum_samples), "samples")

### This sum keeps $60\%$ of the first audio and then starts the second audio

In [None]:
end_idx = int(0.6 * len(two_bit_samples))

sum_samples = two_bit_samples[:end_idx] + air_horn_samples

plt.plot(sum_samples)
plt.show()
display(Audio(sum_samples, rate=44100))
print(len(sum_samples), "samples")

### Saving our samples

We can use the `list_to_wav()` function to save a sequence of samples as a mono `.wav` file:

```py
list_to_wav(sum_samples, "out.wav")
```

### Save your favorite modified sample list as a wave file

In [None]:
# TODO: save a list of samples as a wav file
# TODO: find it on the file explorer and download it to your computer.

## Audio Analysis

### Time-Domain

There are a couple of simple analysis and transformations that we can perform on our samples to extract information about them and our audio signal as a whole.

These are sometimes called _time-domain features_ because they are concerned with how an audio signal changes over time.

Since the information we want to extract from the samples will hopefully tell us something about the audio's characteristic in terms of loudness or pitch, it's useful if we work with chunks of audio that are long enough for us to notice these properties.

What this means is that we will further split our list of samples into smaller lists that contain about $10$ - $50$ milliseconds of audio.

This process is sometimes called _windowing_ or _blocking_, and the result is a list of lists, where the outer list gives us a list of windows or blocks and the internal lists are just regular lists of samples:

<img src="./imgs/window-00.jpg" height="250px">

<img src="./imgs/window-01.jpg" height="250px">

### Let's open up an audio file and split it into lists of 1024 samples

In [None]:
file_path = "./data/two-bits.wav"
all_samples = wav_to_list(file_path)

# variable for number of samples per window, or, the window length
WLEN = 1024

# first sample index for each window: [ 0, 1024, 2048, 3072, 4096, ... ]
wx = range(0, len(all_samples), WLEN)

samples_win = []
for s in wx:
  samples_win.append(all_samples[s : s + WLEN])

### Root Mean Square Energy

Now that we have our list split into chunks/blocks/windows, we can calculate some properties for each of these windows.

The first will be a measurement of loudness called the root mean square energy. This is calculated by taking the square root of the arithmetic mean of the squares of our sample values, or:

$ rms = \sqrt{\frac{1}{n} ({s_0}^2 + {s_1}^2 + {s_2}^2 + ... + {s_{n-1}^2})}$

<img src="./imgs/window-02.jpg" height="250px">

### Let's write a function that implements this

It will receive a list of samples and return their rms value.

First we can calculate the squares of all the samples with a comprehension, then find the average value of this array $\displaystyle \left(\frac{sum}{length}\right)$, and finally take the square root.

Remember that in python we can take the square root of a number $x$ by raising it to $0.5$, like `x ** 0.5`.

In [None]:
# TODO: implement rms

### Now, we'll use that to compute the rms for each of our windows

In [None]:
# TODO: compute the rms of each window in samples_win

samples_rms = []

If we compare the length of our two arrays (`all_samples` and `samples_rms`) and also plot their contents, we'll see that even though one of them is 1000 times smaller, it's still able to represent enough information about how the loudness of the sound changes over time.

This is good because if we wanted to compare it to other sounds to find similarities, instead of comparing $100,000$ values we can now compare $100$.

We'll see more about this in the homework.

In [None]:
print(len(all_samples), len(samples_rms))

plt.plot(all_samples)
plt.plot(wx, samples_rms, 'r')
plt.show()

### Zero-Crossing Rate

Another time-domain feature we can extract from our samples is their zero-crossing rate, or, how frequently the wave change from a positive value to a negative one.

<img src="./imgs/window-04.jpg" height="250px">

This can give us some idea about the frequency of our sound at different points in time because higher tones, with higher frequencies, tend to have higher zero-crossing rates.

The formula for computing the zero-crossing rate for a window of samples is:

$\displaystyle zcr = \frac{1}{2} \sum{\left|{\frac{|s_n|}{s_n} - \frac{|s_{n+1}|}{s_{n+1}}} \right|}$

This looks more complicated than it should.

The first thing we do is determine the sign of each sample. That's what the $\displaystyle \frac{|s_n|}{s_n}$ calculation does. It gives us a $+1$ if our sample is a positive number, $-1$ if it's a negative number and $0$ if the sample is $0$.

Then we look at pairs of consecutive samples and subtract their signs. We'll get a $-2$ if the signal goes from a negative number to a positive number and a $+2$ if it goes from positive to negative.

Finally, we sum up the absolute value of all of these $+2$ and $-2$ values and divide by $2$.

### Let's write a function that implements this

It will receive a list of samples and return the number of times the values change sign.

We'll also implement a separate `sign()` function to do the $\displaystyle \frac{|s_n|}{s_n}$ calculation with a little bit of filtering to avoid counting zero-cross rates for noisy and quiet parts of of audio.

In [None]:
def sign(sample):
  if abs(sample) < 256:
    return 0
  else:
    return (abs(sample) / sample)

def zcr(samples):
  signs = [sign(s) for s in samples]

  twos = []
  for i in range(0, len(samples) - 1):
    sign_diff = signs[i] - signs[i+1]
    twos.append(abs(sign_diff))

  return (sum(twos) / 2)

### Now, we can use that to compute the zero-crossing rate for each of our windows

We can then also plot this result overlaid with the original wave and rms plots.

We might have to scale the `zcr()` results to make them comparable in scale to the original sample values.

In [None]:
# TODO: compute the zcr of each window in samples_win 
# and plot the results along with the original wave and rms plots


### Repeat the time-domain feature extraction for another audio file

Open the file and get a list of samples, then do the windowing, the rms analysis and the zero-crossing rate calculation, and plot the results.

How do they compare to the `two-bits.wav` file ?

In [None]:
# TODO: repeat analysis with different file
file_path = ""

all_samples = []

# first index of each window
wx = range(0, len(all_samples), WLEN)

samples_win = []
samples_rms = []
samples_zcr = []

# TODO: plots

### Frequency-Domain

We saw that the zero-crossing rate can sometimes tell us something about the pitch of a sound, but there's a better way to get frequency information from a sound signal.

There's a mathematical operation called a Fourier Transform that we can use to decompose our audio signal into simpler, basic waves of pure frequencies.

A complex audio wave made up of many frequencies:<br>
<img src="./imgs/fft-00.jpg" width="600px">

Gets separated into sine waves of single frequencies:<br>
<img src="./imgs/fft-01.jpg" width="600px">

This is useful because it can tell us which frequencies are present in our audio at any given time.

The math is a bit beyond our scope here, but luckily there are many packages and libraries that implement the  Fast Fourier Transform algorithm for extracting frequency information from audio waves, and its inverse, the `IFFT`, which is used for transforming frequency information back into sound waves.

### Let's open up a file, read its samples and run the fft()

In [None]:
file_path = "./data/two-bits.wav"
all_samples = wav_to_list(file_path)

fft_energy, fft_freqs = fft(all_samples)

Running the `fft()` on an array of samples returns two lists: one with the amount of energy in different frequency bands, and the other with the specific values of the frequency bands (in units of Hertz).

We can then plot these to get information about the frequencies present in our sound.

In [None]:
plt.plot(all_samples)
plt.show()

plt.plot(fft_freqs, fft_energy)
plt.xlabel('Freq (Hz)')
plt.ylabel('FFT Energy')
plt.show()

Let's zoom in on the x-axis since it doesn't look like we have any frequencies less than $200$ Hz or greater than $1000$ Hz.

In [None]:
plt.plot(fft_freqs, fft_energy)
plt.xlabel('Freq (Hz)')
plt.ylabel('FFT Energy')
plt.xlim(200, 1000)
plt.show()

We can combine the two arrays that we got from `fft()` and sort them to get a list of the more prevalent frequencies.

First we'll combine them using zip, then sort by the fft energy values by using a key function.

We'll also round the frequencies to the nearest Hz just to make it easier to analyze the results.

In [None]:
fft_energy_freq = [(round(f), e) for f,e in zip(fft_freqs, fft_energy)]

def byFft(A):
  return A[1]

fft_sorted = sorted(fft_energy_freq, key=byFft, reverse=True)

If we just look at the first 5 elements of the resulting array we'll see that they all have pretty similar frequencies.

Must be a very strong component of the original signal.

In [None]:
fft_sorted[:5]

### Other frequencies ?

Take a look at other parts of the list and see which additional frequencies are dominant in our audio signal.

In [None]:
# TODO: Look into list for other frequencies


We can plot the top-100 strongest frequencies in a scatter plot to see how these (energy, frequency) pairs are distributed.

In [None]:
top_freqs = [x[0] for x in fft_sorted[:100]]
top_energy = [x[1] for x in fft_sorted[:100]]
plt.scatter(top_freqs, top_energy)
plt.show()

And if we only plot the frequencies along a diagonal, we can see some pretty well-defined frequency clusters.

In [None]:
plt.scatter(top_freqs, top_freqs)
plt.show()

### Clustering

We'll see a lot more about this in a few weeks, but this is a perfect situation where we can use a Machine Learning technique called clustering to "learn" how to combine similar frequencies into representative groups.

The `cluster_fft_freqs()` function takes a list of fft frequency values and another list of the corresponding energy at each of those frequencies, and then calculates frequency cluster groups.

There are additional optional parameters that we can use to tune this function.

The `top` parameter can be used to determine how many of the top frequencies we want to use to do the clustering. The default is $50$, but since we looked at the top-100 strongest frequencies a few cells above, we can use $100$ for this parameter.

Another parameter, `clusters`, can be used to specify how many groups we want to combine our data into. The default is $6$. From looking at the graphs above, maybe we can try $7$.

In [None]:
cluster_fft_freqs(fft_freqs, fft_energy, top=100, clusters=7)

### Repeat the `FFT` analysis and get the strongest $n$ frequencies in the `horn` audio file.

For the horn file $n$ might be different than 7. Once we start plotting we'll see how many clusters we want.

In [None]:
# TODO: repeat FFT on horn.wav

### STFT

We can run a windowed version of the `FFT` on our samples to see which frequencies are present at different times. This is called a Short-Time Fourier Transform (`STFT`) because instead of running on the entire audio at once, it runs the `FFT` on small chunks/windows of audio.

Running the `stft()` on an array of samples returns three lists: one with the amount of energy in different frequency bands, at specific times, another with the specific value for the frequency bands, and a third with the specific times when the `FFT` was performed (the chunk/window time).

We can plot these to get information about the frequencies present in our sound at different times.

In [None]:
fft_res, fft_freqs, fft_times = stft(all_samples)

plt.pcolormesh(fft_times, fft_freqs, np.array(fft_res).T)
plt.show()

We can see some frequency activity on the lower frequencies.

Let's zoom in on frequencies less than $2500$.

In [None]:
plt.pcolormesh(fft_times, fft_freqs, np.array(fft_res).T)
plt.ylim(0, 2500)
plt.show()

We can definitely see where each of the notes are being played and how their pitch is related to each other.

For now we'll only take this quick look at the `STFT`. It can be a bit harder to use for analysis and comparisons since it has $3$ dimensions of values (time, frequency, energy), but we'll get back to it in a couple of weeks and see how it can be used in more complex Machine Learning tasks.