# Component Specification

## 1. Software components

### 1.1 Component 1: Visualization manager

Upload video component lets users to upload the video with cat in it. After clicking "Upload", the video will be ready for preprocessing. 
<br>
Input: Video with cat face and meowing in mp4 format
<br>
Output: One audio and three images with cat face
<br>

##### User interface (Minh)

###### main.py

The main script runs HTML webpage for user interaction and receives user input. After receiving the video input from the user through the webpage, the main script run video_input.py, which extracts audio data and image frames.

Secondly, it run sound analysis components to analyze the sound data. It would also run Image analysis components to obtain three random cat images from the frames. The script then shows the user those cropped images and request user’s selection for the best one for Emotion analysis.

After receiving user’s input for best image, the script would run the SVM model using the analyzed audio data and selected image. The model’s output, which would be determined emotion, then would be delivered to the user through the webpage.

##### Video_input.py
<br>
This module contain a single function. Its job is to receive a MP4 video as input, and return extracted sound data and image frames.

The module first extract the sound data and then save it in WAV format. Then, it creates a directory /frames to store the frames that are about to be collected. Afterward, using a for loop, each frame would be picked and save into the directory in PNG format, ready to be analyzed.


### 1.2 Component 2: Data manager

This component would process the raw image and audio data into analyzable dataset. 
<br>
Input: raw audio and image data
<br>
Output: analyzable audio and image data

##### Image analysis (JerYo)

###### image_analysis.py

After extracted the MP4 video into images frames. This module first convert the image into grayscale, then the cat face can be detected using Haar Cascade Classifier. After detecting, the images with cat face will then be cut into a constant size and saved in a folder.

###### random_pick_3.py

In order to provide better images with cat face. We decide to let the user to select three pictures with cats included from the video. The module will randomly select three images from the cat face image folder.

###### image_output.py

After letting the user select the best image from the three pictures. The image will then been compressed and converted into a csv file for next step modeling.

##### Audio analysis (Weishi)

###### audio_input.py

This module contains the function that read in user input sound and save as audio_test.csv for fitting the model.
The function, audio_input, read input of the user input wav file then attach it to the whole data set and run PCA to reduce features. It will save the converted and reduced user input as audio_test.csv for next step SVM. The audio_test.csv is a 1D array with 20 features extracted from the wav file. 

###### audio_create_model.py

This module contains a function that use principal component analysis to reduce feature size and then create a csv file that contains all features from the training data for SVM. 
The function, create_model, first takes input of the folder directory that contains all raw wav files from youtube videos, and apply mel-spectrogram to the raw csv files. The original converted data contains over 12000 features that are impossible to build model. A principle component analysis was used to select top 20 features. Then the function will save all data to a csv file called full.csv. The full.csv is a 20 by N matrix (N = number of samples). It will also be needed for analyzing user input. 


###### audio_training.py

This module contains some essential functions for sound analysis. All modules in this software need to import this module for proper usage. Three functions are included: convert_mel_one, save_csv_raw, add_label. 
Convert_mel_one will convert one single wav file to 1D numerical data using mel-spectrogram analysis. The return is 1D array that contains all features (>12000). 
Save_csv_raw will convert batch wav files in a directory and save all files in an M by N matrix (M = number of features from one wav file, N = number of samples) as audio_raw.csv. The audio_raw.csv file will be saved in the same folder as the raw wav files. The function also returns the label of each wav file which will be used in the next step. 
Add_label is a simple function that generates 20 alphabets as feature names for SVM. 



### 1.3 Component 3: Model

After user select one cat face image, both audio and image data will be prompt into the trained SVM model. And return the cat emotion.
<br>
Input: one audio csv and one image csv 
<br>
Output: cat emotion (happy, angry, hungry)

##### Model training and classification (Yue)

###### svm.py

This component includes the SVM model to do the emotion classification. Two functions are included in this module: csv_merge and classification. 

The csv_merge function is to import the image and audio dataset separately and merge them on ‘catID’. An user_csv file will be output into userData folder for the classification. If the selected_image.csv and audio_test.csv cannot be found in the current working directory, a value error message will raise as “Invalid input file”. 

The classification function is to train the linear SVC model based on the training data and classify the cat emotion. First, image and audio training datasets will be imported and merged based on the ‘catID’. Linear Support Vector Classification (LinearSVC) is one of the models in Support vector machines (SVMs) that able to perform multi-class classification on a dataset. Similar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples. For the LinerSVC model, Tolerance for stopping criteria is 1e-5, and the seed of the pseudo random number generator to use when shuffling the data for the dual coordinate descent is 0. Then the user data (userData/user_csv.csv) will fit into the trained model to get the emotion classification. If the user_csv.csv file cannot be found in the working directory, a value error message will raise as “Invalid input file”. 


## 2. Interactions to accomplish use cases

### 2.1 Interaction 1: Upload video 
The component accepts the video input from the user. Unless it's the required file type (.mp4), it will prevent the user from proceeding to the next step.

### 2.2 Interaction 2: Select one cat image out of three cat images

The component accepts the three cat face images and prompt them into the screen so that user could select the one that they want to use in the analysis. 

### 2.3 Interaction 3: Evaluate cat emotion 

The component accepts the one audio csv file and one image csv file from previous process. After user click confirm button under the three cat face images, the two csv files will be prompt into the svm analysis part. Then the reuslt of the cat emotion will show in the screen.

## 3. Preliminary plan

A list of tasks in priority order

1. Download the video;
2. Dataset preparation:
    2.1 Extract image and audio dataset from the video dataset;
    2.2 Capture the image with cat face;
    2.3 Get the audio clip with meowing;
    2.4 Register cat face image into 48*48 greyscale pixels;
    2.5 Get the frequency of the audio clip;
    2.6 Merge the image and audio dataset by catID;
3. Data analysis using SVM;
4. Develop the upload function;
5. Develop the evaluation function;
6. Code review;
7. Finalize the documentation.