# Final Project - Data Collection

This is a **project group assignment**. No more than 4 individuals.

**Due: Wednesday, October 21 @ 11:59 PM**

# Problem 1

Create a final project group with no more than 4 individuals.

1. Create your Group Team on GitHub by following this link: https://classroom.github.com/g/J5VxKG0U
    * Only 1 person per team should create a GitHub Team, and the other members will join the team
    * Select a name for your team. This will be used throughout the semester and to announce contest winners.

2. Create your Group Team on Canvas - use the same Team Name -, this will be helpful for grading your group assignments.
    * Once again, only 1 member should create the Canvas group, the other team members will later should the group.
    * In your group Canvas, you can upload files to share with your team members, this may include the .wav files. (This [FAQs list](https://elearning.ufl.edu/e-learning-basics/uf-e-learning-faqs/) may be helpful.)

**I encourage you to use Zoom Conference meetings to meet regularly with your team members. You can create Zoom meetings using your UFL account: https://ufl.zoom.us/.**

# Problem 2

## 2.1 Instructions
Each group will collect part of the training dataset. In order to collect the data, you will read a sentence out loud in a particular *emotional tone*. You will record this reading using the [Voice Recorder App](https://play.google.com/store/apps/details?id=com.media.bestrecorder.audiorecorder&hl=en_US]) using the following settings:

* **Microphone Adjustment:** Device auto control
* **Record file type:** .wav
* **Recording Quality:** Mono - 44kHz
* **Default File Name:** see the instructions below
* **Duration:** Trim all your voice recordings to have 2 seconds duration exactly (this is important!). If speech is cut-off, re-record your trial (also crucially important!).

The **emotion labels** are: 
* 1- neutral
* 2 - calm
* 3 - happy
* 4 - sad
* 5 - angry
* 6 - fearful
* 7 - disgust
* 8 - surprise

The speech statements you should read are: 
* 1 - "Kids are talking by the door"
* 2 - "Dogs are sitting by the door"

Each student should record **5 trials** of each statement for each emotional label, giving a total of 80 recordings. So, for a group with 4 members, there should be a total of 320 recordings.

I recommend you to save your files using a **coding system**, e.g. **ID-trial-statement-label**.

* First give a number from 1 to 4 to each team member, this is the ID. Then, for example, when team member with ID 4 is recording hers/his 5th recording trial of the statement (2) "Dogs sitting by the door" in a happy tone (emotional label 3), the file name should read "4-5-2-3.wav".

* You can find examples of actors emoting these statements in this YouTube video: https://www.youtube.com/watch?v=Y7OQoNEu3dY

Create a shared folder (you have access to Google Drive with your UFL account) where you will place all the recordings. Once completed, everyone can download this folder and have a local copy of the group's data.

## 2.2 Install library ```librosa```

Follow instructions here: https://pypi.org/project/librosa/

## 2.3 Create your Data for Submission

Create a dictionary with your .wav recordings as well as target label array and statement label array.

Before you start, change the ```mydir``` variable below to the folder directory with your .wav recordings.

In [None]:
import numpy as np
import os
import librosa
import pickle
from IPython.display import display, Audio

# Folder where all recordings are located
# mydir = 'change-this-to-your-data-directory-local-path'
mydir = 'C:/Users/catia/Dropbox (UFL)/Teaching/2020 Fall/EEE 4773 Fundamentals of Machine Learning/GitHub/Final Project'

Now, there are two options to create your data:

### Option 1

Use the code below to play one file at a time, and manually label each recording.

This code will output and save the data files in the desired format for assignment submission.

In [None]:
labels = np.array([])
data = {}
statements = np.array([])
i=0
for file in os.listdir(mydir):
    if file.endswith(".wav"): # Will only read .wav files
        filename = mydir+'/'+file
        
        y, sr = librosa.load(filename, sr=44000)
        data[i] = y
        display(Audio(filename, rate=44000, autoplay=True)) # load a local WAV file
        l = input('Type the emotion label (1,2,3,4,5,6,7,8) in this recording and then press Enter...\n')
        labels = np.hstack((labels, l))
        s = input('Type the sentence (1 or 2) being read in this recording and then press Enter...\n')
        statements = np.hstack((statements,s))
        i+=1

print('-------------------------------------------------------')
print('----------------------DONE-----------------------------')
print('-------------------------------------------------------')
if np.sum(labels=='')>0:
    print('ATTENTION, ',np.sum(labels==''), ' LABEL/S IS/ARE MISSING')
    
if np.sum(statements=='')>0:
    print('ATTENTION, ',np.sum(statements==''), ' STATEMENT/S IS/ARE MISSING')
    
print('There are ', len(data),' recordings')
print('There are ', len(labels[labels!='']),' labels')
print('There are ', len(statements[statements!='']),' statement recordings')

# Saves the files to your current directory
f = open("data.pkl","wb")
pickle.dump(data,f)
f.close()
np.save('labels', labels)
np.save('statements', labels)

### Option 2

Use the **coding system** from data collection to automatically create and save your data.

The code below will help you with that, and it will output and save the data files in the desired format for assignment submission.

In [None]:
labels = np.array([])
data = {}
statements = np.array([])
i=0
for file in os.listdir(mydir):
    if file.endswith(".wav"): # Will only read .wav files
        filewav = file
        filename = mydir+'/'+file
        y, sr = librosa.load(filename, sr=44000)
        data[i] = y
        labels = np.hstack((labels, int(filewav[4])))
        statements = np.hstack((statements,int(filewav[6])))
        i+=1

print('-------------------------------------------------------')
print('----------------------DONE-----------------------------')
print('-------------------------------------------------------')

# Saves the files to your current directory
f = open("data.pkl","wb")
pickle.dump(data,f)
f.close()
np.save('labels', labels)
np.save('statements', labels)

## 2.4 Gather All Files for Submission

To receive full credit in this question, you should submit to Canvas:

1. Compressed folder (.zip) with the recordings from all team members. (80 recordings per student should be included.)
2. File "data.pkl"
3. File "labels.npy"
4. File "statements.npy"

## Submit your Solution

Confirm that you've successfully completed the assignment.

```add``` and ```commit``` the final version of your work, and ```push``` your code/data to your GitHub repository -- **you may run into memory issues. If this happens, disregard this step and only submit the data files to Canvas**

Submit the URL of your GitHub Repository along with all data as your assignment submission on Canvas.