# CBU5201 mini-project submission

The mini-project has two separate components:


1.   **Basic component** [6 marks]: Using the genki4k dataset, build a machine learning pipeline that takes as an input an image and predicts 1) whether the person in the image is similing or not 2) estimate the 3D head pose labels in the image.
2.   **Advanced component** [10 marks]: Formulate your own machine learning problem and build a machine learning solution using the genki4k dataset (https://inc.ucsd.edu/mplab/398/). 

Your submission will consist of two Jupyter notebooks, one for the basic component and another one for advanced component. Please **name each notebook**:

* CBU5201_miniproject_basic.ipynb
* CBU5201_miniproject_advanced.ipynb

then **zip and submit them toghether**.

Each uploaded notebook should include: 

*   **Text cells**, describing concisely each step and results.
*   **Code cells**, implementing each step.
*   **Output cells**, i.e. the output from each code cell.

and **should have the structure** indicated below. Notebooks might not be run, please make sure that the output cells are saved.

How will we evaluate your submission?

*   Conciseness in your writing (10%).
*   Correctness in your methodology (30%).
*   Correctness in your analysis and conclusions (30%).
*   Completeness (10%).
*   Originality (10%).
*   Efforts to try something new (10%).

Suggestion: Why don't you use **GitHub** to manage your project? GitHub can be used as a presentation card that showcases what you have done and gives evidence of your data science skills, knowledge and experience. 

Each notebook should be structured into the following 9 sections:


# 1 Author

**Student Name**:  
**Student ID**:  



# 2 Problem formulation

Describe the machine learning problem that you want to solve and explain what's interesting about it.

1.  input: image
2.  emotion stage: -1(very sad), -0.5(sad), 0.5(happy), 1(very happy)

# 3 Machine Learning pipeline

Describe your ML pipeline. Clearly identify its input and output, any intermediate stages (for instance, transformation -> models), and intermediate data moving from one stage to the next. It's up to you to decide which stages to include in your pipeline. 

# 4 Transformation stage

Describe any transformations, such as feature extraction. Identify input and output. Explain why you have chosen this transformation stage.

# 5 Modelling

Describe the ML model(s) that you will build. Explain why you have chosen them.

# 6 Methodology

Describe how you will train and validate your models, how model performance is assesssed (i.e. accuracy, confusion matrix, etc)

# 7 Dataset

Describe the dataset that you will use to create your models and validate them. If you need to preprocess it, do it here. Include visualisations too. You can visualise raw data samples or extracted features.

# 8 Results

Carry out your experiments here, explain your results.

In [1]:
import os, shutil
import keras
base_dir = 'D:\\desktop\\bupt\\ML\\Mini_Project\\dataset_advanced'

train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'valid')
test_dir = os.path.join(base_dir, 'test')

In [2]:

import pandas as pd

train_data = pd.read_csv('D:\\desktop\\bupt\\ML\\Mini_Project\\dataset_advanced\\train\\label.csv')
valid_data = pd.read_csv('D:\\desktop\\bupt\\ML\\Mini_Project\\dataset_advanced\\valid\\label.csv')
test_data = pd.read_csv('D:\\desktop\\bupt\\ML\\Mini_Project\\dataset_advanced\\test\\label.csv')
print(valid_data)

             file  label
0    file1731.jpg    0.5
1    file1732.jpg    0.5
2    file1733.jpg    1.0
3    file1734.jpg    1.0
4    file1735.jpg    1.0
..            ...    ...
385  file3819.jpg   -1.0
386  file3820.jpg   -1.0
387  file3821.jpg   -1.0
388  file3822.jpg   -1.0
389  file3823.jpg   -0.5

[390 rows x 2 columns]


In [3]:
import tensorflow.python.keras as keras
from keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_dataframe(
    dataframe=train_data,
    directory=os.path.join(train_dir, 'image'),
    x_col='File', 
    y_col='Label', 
    batch_size=20, 
    target_size=(150, 150), 
    class_mode='raw')
test_generator = test_datagen.flow_from_dataframe(
    dataframe=test_data,
    directory=os.path.join(test_dir, 'image'),
    x_col='file', 
    y_col='label', 
    batch_size=20, 
    target_size=(150, 150), 
    class_mode='raw')
valid_generator = test_datagen.flow_from_dataframe(
    dataframe=valid_data,
    directory=os.path.join(validation_dir, 'image'),
    x_col='file', 
    y_col='label', 
    batch_size=20, 
    target_size=(150, 150), 
    class_mode='raw')

Found 3200 validated image filenames.
Found 409 validated image filenames.
Found 390 validated image filenames.


  n_invalid, x_col))


**Building Model**

In [1]:
#TODO: Model的搭建设计

In [10]:
from keras import optimizers

model.compile(loss='mae',
              optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=['acc'])

history = model.fit_generator(
      train_generator,
      steps_per_epoch=100,
      epochs=30,
      validation_data=valid_generator,
      validation_steps=390//20)

ValueError: Could not interpret optimizer identifier: <keras.optimizers.optimizer_v2.rmsprop.RMSprop object at 0x000002660035C348>

In [None]:
model.save('D:\\desktop\\bupt\\ML\\Mini_Project\\advanced_v1.h5')

In [None]:
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

In [None]:
model = keras.models.load_model('D:\\desktop\\bupt\\ML\\Mini_Project\\advanced_v1.h5')
score = model.evaluate_generator(
  test_generator,
  steps=409//20,
  max_queue_size=10,
  workers=1,
  use_multiprocessing=False,
  verbose=0)
print("loss: %.6f - acc: %.6f" % (score[0], score[1]))

# 9 Conclusions

Your conclusions, improvements, etc should go here