# Tutorial

In [1]:
# Loading all helper functions
import sys
sys.path.insert(0, '..')
from src.models.data_utils import *
from src.models.model_utils import *
from src.models.train_model import *
from tqdm import tqdm

## General info:

### Code structure
- Helper functions are stored in src/models
- Data is stored in data/processed

### Corpora and annotation
#### Dicta-Sign
Annotations include (see more detail in the ortolang repo):
* **fls**: fully-lexical signs, encoded as categorical values (gloss indices)
* **PT**:    pointing signs, encoded as binary
* **PT_PRO1**, **PT_PRO2**, **PT_PRO3**, **PT_LOC**, **PT_DET**, **PT_LBUOY**, **PT_BUOY**: sub-categories for pointing signs, encoded as binary
* **DS**:    depicting signs, encoded as binary
* **DSA**, **DSG**, **DSL**, **DSM**, **DSS**, **DST**, **DSX**: sub-categories for depicting signs, encoded as binary
* **FBUOY**: fragment buoys, encoded as binary
* **N**:     numbering signs, encoded as binary
* **FS**:    fingerspelling signs, encoded as binary

#### NCSLGR
Annotations include (all data is binary):
* **lexical_with_ns_not_fs**: lexical signs, including numbering signs but excluding fingerspelling signs
* **fingerspelling**, **fingerspelled_loan_signs**: fingerspelling signs, finglerspelling loan signs
* **IX_1p**, **IX_2p**, **IX_3p**, **IX_loc**: sub-categories for pointing signs
* **POSS**, **SELF**: possessive pronouns
* **DCL**, **LCL**, **SCL**, **BCL**, **ICL**, **BPCL**, **PCL**: sub-categories for classifier signs (i.e. depicting signs)
* **gesture**: culturally shared gestures
* **part_indef**
* **other**

### Type of input features
Originally, this code was designed around preprocessed features for each frame. Possible features types are : 
- **2Draw**
- **2Draw_HS**
- **2Draw_HS_noOP**
- **2Draw_noHands**
- **2Dfeatures**
- **2Dfeatures_HS**
- **2Dfeatures_HS_noOP**
- **2Dfeatures_noHands**
- **3Draw**
- **3Draw_HS**
- **3Draw_HS_noOP**
- **3Draw_noHands**
- **3Dfeatures**
- **3Dfeatures_HS**
- **3Dfeatures_HS_noOP**
- **3Dfeatures_noHands**

which correspond to 2D or 3D data, raw OpenPose or preprocessed body and face data, including or excluding hand shape estimates, including or excluding OpenPose hand data.

The main data function `get_data_concatenated` requires a features dictionary, which can be obtained with the `getFeaturesDict` function.

Recently, we also added direct image input to the model, but it has not been tested thoroughly.

### Model outputs
The model can be trained for the recognition of:
- one or several (N) mixed linguistic descriptors in parallel, possibly simultaneously true. Each descriptor includes a 'garbage/other' class.
    - **ex. 1** (N = 4) : Y1 = [Y1_1 : lexical signs (categorical with 4 different signs), Y1_2 : pointing signs, Y1_3 : depicting signs, Y1_4 : fragment buoys]
    - **ex. 2** (N = 2) : Y2 = [Y2_1 : pointing signs to PRO1/2/3, Y2_2 : lexical signs (all)]
    - **ex. 3** (N = 2) : Y3 = [Y3_1 : pointing signs to PRO1/2/3, Y3_2 : depicting signs A/G]
    - **ex. 4** (N = 1) : Y4 = [Y4_1 : depicting signs]
- sign types, i.e. the most probable sign type for each frame. In this case, lexical signs are seen as binary.
    - **ex. 5**: Y5 = [other, lexical signs,  pointing signs, depicting signs, fragment buoys]. 
    - **ex. 6**: Y6 = [other, depicting signs] : this should give the same results as ex. 4

## Getting help on a function

In [2]:
# Use help(function_name)
# For instance:

help(get_raw_annotation_from_file)

Help on function get_raw_annotation_from_file in module src.models.data_utils:

get_raw_annotation_from_file(corpus, from_notebook=False)
    Gets raw annotation from data file
    
    Inputs:
        corpus: 'DictaSign' or 'NCSLGR'
        from_notebook: True if used in Jupyter notebook
    
    Outputs:
        Annotation data



## Main helper function for data handling (`data_utils.py`): `get_data_concatenated`

This is the main function, which enables to extract data in usable format for training.

`get_data_concatenated` returns [X_features, X_frames], Y, idx_trueData or [X_features, X_frames], Y (depending on return_idx_trueData)

#### Outputs
- **X_features** is a numpy array of size [1, total_time_steps, features_number] containing all retained preprocessed features for all retained frames
- **X_frames** is simply a list of paths for all retained frames (frames cannot be stored in memory directly, and will have to be read during training thanks to frames paths)
- **Y** is the annotation data (i.e. ground truth data) in the desired format

#### Inputs
- **corpus** (string)
- **output_form**:
    - 'mixed' if different and separated Outputs
    - 'sign_types' if annotation is only a binary matrix of sign types
- **types**: a list of lists of original names that are used to compose final outputs
- **nonZero**: a list of lists of nonzero values to consider. If 4 outputs with all nonZero values should be considered, nonZero=[[],[],[],[]]
- **binary**: only considered when output_form=mixed. It's a list (True/False) indicating whether the values should be categorical or binary
- **features_dict**: a dictionary indication which features to keep ; e.g.: {'features_HS':np.arange(0, 420), 'features_HS_norm':np.array([]), 'raw':np.array([]), 'raw_norm':np.array([])}
- **preloaded_features**: if features are already loaded, in the format of a list (features for each video)
- **provided_annotation**: raw annotation data (not needed)
- **video_indices**: numpy array for a list of videos
- **separation**: in order to separate consecutive videos
- **from_notebook**: if notebook script, data is in parent folder
- **return_idx_trueData**: if True, returns a binary vector with 0 where separations are
- **features_type**: 'features', 'frames', 'both'            
- **frames_path_before_video**: video frames are supposed to be in folders, like '/localHD/DictaSign/convert/img/DictaSign_lsf_S7_T2_A10',
- **empty_image_path**: path of a white frame


## Main helper function for model handling (`model_utils.py`): `get_model`

This is the main function, which enables to obtain the Keras model.

`get_model` returns a Keras model

#### Outputs
- a Keras model

#### Inputs
- **output_names**: list of outputs (strings)
- **output_classes**: list of number of classes of each output type
- **output_weights**: list of weights for each_output
- **conv** (bool): if True, applies convolution on input
- **conv_filt**: number of convolution filters
- **conv_ker**: size of convolution kernel
- **conv_strides**: size of convolution strides
- **rnn_number**: number of recurrent layers
- **rnn_type**: type of recurrent layers (string)
- **rnn_hidden_units**: number of hidden units
- **dropout**: how much dropout (0 to 1)
- **att_in_rnn**: if True, applies attention layer before recurrent layers
- **att_in_rnn_single**: single (shared) attention layer or not
- **att_in_rnn_type** (string): timewise or featurewise attention layer
- **att_out_rnn**: if True, applies attention layer after recurrent layers
- **att_out_rnn_single**: single (shared) attention layer or not
- **att_out_rnn_type** (string): timewise or featurewise attention layer
- **rnn_return_sequences**: if False, only last timestep of recurrent layers is returned
- **classif_local** (bool): whether classification is for each timestep (local) of globally for the sequence
- **mlp_layers_number**: number of additional dense layers
- **mlp_layers_size**: size of additional dense layers
- **optimizer**: gradient optimizer type (string)
- **learning_rate**: learning rate (float)
- **time_steps**: length of sequences (int)
- **features_number**: number of features (int)
- **features_type**: 'features' (1D vector of features), 'frames' (for a CNN processing) or 'both'
- **img_height** and **img_width**: size of CNN input
- **cnnType**: 'resnet', 'vgg' or 'mobilenet'
- **cnnFirstTrainedLayer**: index of first trainable layer in CNN (int)
- **cnnReduceDim**: if greater than 0, size of CNN flattened output is reduced to cnnReduceDim
- **print_summary** (bool)

## Shortcut: script to recognize a unique output on DictaSign

Just use `python src/recognitionUniqueDictaSignFromScript.py`

Provided help:

In [71]:
!python ../src/recognitionUniqueDictaSignFromScript.py -h

usage: recognitionUniqueDictaSignFromScript.py [-h] [--outputName OUTPUTNAME]
                                               [--flsBinary {0,1}]
                                               [--flsKeep [FLSKEEP [FLSKEEP ...]]]
                                               [--comment COMMENT]
                                               [--videoSplitMode {manual,auto}]
                                               [--fractionValid FRACTIONVALID]
                                               [--fractionTest FRACTIONTEST]
                                               [--signerIndependent {0,1}]
                                               [--taskIndependent {0,1}]
                                               [--excludeTask9 {0,1}]
                                               [--tasksTrain [{1,2,3,4,5,6,7,8,9} [{1,2,3,4,5,6,7,8,9} ...]]]
                                               [--tasksValid [{1,2,3,4,5,6,7,8,9} [{1,2,3,4,5,6,7,8,9} ...]]]
                   

A few examples:

* `python src/recognitionUniqueDictaSignFromScript.py --outputName DS` to select DS as output
* `python src/recognitionUniqueDictaSignFromScript.py --outputName fls --flsBinary 1` to select binary FLS as output
* `python src/recognitionUniqueDictaSignFromScript.py --outputName fls --flsBinary 0 --flsKeep 41891 43413 43422 42992` to select categorical FLS as output, with 4 signs
* `python src/recognitionUniqueDictaSignFromScript.py --outputName DS --inputType 2Draw` to select 2D raw input preprocessed features
* `python src/recognitionUniqueDictaSignFromScript.py --outputName DS --inputFeaturesFrames frames` to select only frames as input (CNN)

## Building data and model together, manually

In these examples we analyze the different types of output form Y

In [66]:
# let us split train/valid/test videos
# In this case we split by signers in a manual fashion

idxTrain, idxValid, idxTest = getVideoIndicesSplitDictaSign(tasksTrain=[],
                                                            tasksValid=[],
                                                            tasksTest=[],
                                                            signersTrain=[0,1,2,3,4,5,6,7,8,9],
                                                            signersValid=[10,11,12],
                                                            signersTest=[13,14,15],
                                                            excludeTask9=False,
                                                            videoSplitMode='manual',
                                                            checkSplits=True,
                                                            checkSets=True,
                                                            from_notebook=True)

Number of videos:
Train: 66
Valid: 10
Test: 18
Total: 94


In [4]:
# Getting a dictionary for desired preprocessed features
# In this case we ask for normalized 3Dfeatures_HS (this correspond to a total number of 420 features):

features_dict, features_number = getFeaturesDict(inputType='3Dfeatures_HS', inputNormed=True)

print(features_dict)
print(features_number)

{'features_HS': array([], dtype=float64), 'features_HS_norm': array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
  

### First category of examples: `output_form = 'mixed'` (examples 1 to 4)

#### ex. 1 :

##### Data

In [5]:
import csv

idGloss = {}

with open('Dicta-Sign-LSF_ID.csv', newline='') as csvfile:
    glossreader = csv.reader(csvfile, delimiter=';', quotechar='|')
    for row in glossreader:
        idGloss[row[0]] = row[1]

In [6]:
# we only consider 4 lexical signs, indices 41891,43413,43422,42992

flsKept = [41891,43413,43422,42992]
N_fls = len(flsKept)

# that correspond to glosses:

for i in flsKept:
    print(idGloss[str(i)])

PARIS3 (TOUR EIFFEL):NS
RESTAURANT
QUALITE-DE GRANDE /EXCELLENT
VISITER1:VAR


In [59]:
[X_feat_train_1, X_frames_train_1], Y_train_1 =\
      get_data_concatenated(corpus='DictaSign',
                            output_form='mixed',
                            types=[['fls'], ['DS'], ['PT'], ['FBUOY']],
                            nonZero=[flsKept, [], [], []],
                            binary=[False, True, True, True],
                            video_indices=idxTrain,
                            features_dict=features_dict,
                            features_type='both',
                            from_notebook=True)

In [8]:
print(X_feat_train_1.shape)
print(X_frames_train_1.shape)
print(len(Y_train_1))

(1, 665612, 420)
(665612,)
4


As can be seen above, X_feat is a big matrix storing all 420 preprocessed features for each frame. We can print the first 15 features for frame number 192:

In [9]:
print(X_feat_train_1[0,192,0:15])

[2.56896496e-01 8.08442011e-02 6.51594326e-02 4.68903668e-02
 7.92527862e-05 2.17038882e-03 3.29143912e-01 1.75906322e-03
 2.52671860e-04 2.70878343e-04 4.83787793e-04 4.01530191e-02
 3.07233859e-04 4.62686792e-02 4.28746128e-03]


X_frames stores paths for all frames:

In [10]:
X_frames_train_1[192]

'/localHD/DictaSign/convert/img/DictaSign_lsf_S4_T8_B14_front/00193.jpg'

Y_train stores the 4 linguistic descriptors annotation:

In [11]:
print(Y_train_1[0].shape)
print(Y_train_1[1].shape)
print(Y_train_1[2].shape)
print(Y_train_1[3].shape)

(1, 665612, 5)
(1, 665612, 2)
(1, 665612, 2)
(1, 665612, 2)


In [12]:
# First depicting frame:
i_DS_one = np.where(Y_train_1[1][0,:,1]==1)[0][0]
print(i_DS_one)

1916


In [13]:
print(Y_train_1[0][0,i_DS_one,:])
print(Y_train_1[1][0,i_DS_one,:])
print(Y_train_1[2][0,i_DS_one,:])
print(Y_train_1[3][0,i_DS_one,:])

[1. 0. 0. 0. 0.]
[0. 1.]
[1. 0.]
[1. 0.]


Frame 192 is annotated as non-lexical, depicting, non-pointing, non-fbuoy

##### Model

In [14]:
# training model can be built to take preprocessed features as input, frames as input or both
model_1_features = get_model(output_names=['fls', 'DS', 'PT', 'FBUOY'],
                    output_classes=[N_fls+1,2,2,2],
                    output_weights=[1,1,1,1],
                    features_number=features_number,
                    features_type='features')

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 100, 420)]   0                                            
__________________________________________________________________________________________________
conv1d (Conv1D)                 (None, 100, 200)     252200      input_1[0][0]                    
__________________________________________________________________________________________________
bidirectional (Bidirectional)   (None, 100, 110)     112640      conv1d[0][0]                     
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 100, 110)     73040       bidirectional[0][0]              
______________________________________________________________________________________________

In [15]:
model_1_frames = get_model(output_names=['fls', 'DS', 'PT', 'FBUOY'],
                    output_classes=[N_fls+1,2,2,2],
                    output_weights=[1,1,1,1],
                    features_number=features_number,
                    features_type='frames')

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            [(None, 100, 224, 22 0                                            
__________________________________________________________________________________________________
time_distributed_4 (TimeDistrib (None, 100, 2048)    23587712    input_2[0][0]                    
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, 100, 110)     925760      time_distributed_4[0][0]         
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, 100, 110)     73040       bidirectional_2[0][0]            
____________________________________________________________________________________________

In [16]:
model_1_both = get_model(output_names=['fls', 'DS', 'PT', 'FBUOY'],
                    output_classes=[N_fls+1,2,2,2],
                    output_weights=[1,1,1,1],
                    features_number=features_number,
                    features_type='both')

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_4 (InputLayer)            [(None, 100, 420)]   0                                            
__________________________________________________________________________________________________
input_5 (InputLayer)            [(None, 100, 224, 22 0                                            
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 100, 200)     252200      input_4[0][0]                    
__________________________________________________________________________________________________
time_distributed_9 (TimeDistrib (None, 100, 2048)    23587712    input_5[0][0]                    
____________________________________________________________________________________________

#### ex. 2 :

##### Data

In [17]:
# in this example, data from 3 channels (PT_PRO1, 2 and 3) are assembled to form a new category, 
# and a second category corresponds to all lexical signs (binary)
[X_feat_train_2, X_frames_train_2], Y_train_2 =\
      get_data_concatenated(corpus='DictaSign',
                            output_form='mixed',
                            types=[['PT_PRO1','PT_PRO2', 'PT_PRO3'], ['fls']],
                            nonZero=[[],[]],
                            binary=[True,True],
                            video_indices=idxTrain,
                            features_dict=features_dict,
                            features_type='both',
                            from_notebook=True)

In [18]:
print(X_feat_train_2.shape)
print(X_frames_train_2.shape)
print(len(Y_train_2))

(1, 665612, 420)
(665612,)
2


In [21]:
print(Y_train_2[0].shape)
print(Y_train_2[1].shape)

(1, 665612, 2)
(1, 665612, 2)


##### Model

In [22]:
# using only preprocessed features as input:
model_2_features = get_model(output_names=['pointing-signs-pro123', 'lexical'],
                    output_classes=[2,2],
                    output_weights=[1,1],
                    features_number=features_number,
                    features_type='features')

Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_7 (InputLayer)            [(None, 100, 420)]   0                                            
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 100, 200)     252200      input_7[0][0]                    
__________________________________________________________________________________________________
bidirectional_6 (Bidirectional) (None, 100, 110)     112640      conv1d_2[0][0]                   
__________________________________________________________________________________________________
bidirectional_7 (Bidirectional) (None, 100, 110)     73040       bidirectional_6[0][0]            
____________________________________________________________________________________________

#### ex. 3 :

##### Data

In [23]:
# in this example, data from 3 channels (PT_PRO1, 2 and 3) are assembled to form a new category
# and data from 2 channels (DSA, DSG) are assembled to form a second category
[X_feat_train_3, X_frames_train_3], Y_train_3 =\
      get_data_concatenated(corpus='DictaSign',
                            output_form='mixed',
                            types=[['PT_PRO1','PT_PRO2', 'PT_PRO3'], ['DSA', 'DSG']],
                            nonZero=[[],[]],
                            binary=[True,True],
                            video_indices=idxTrain,
                            features_dict=features_dict,
                            features_type='both',
                            from_notebook=True)

In [24]:
print(X_feat_train_3.shape)
print(X_frames_train_3.shape)
print(len(Y_train_3))

(1, 665612, 420)
(665612,)
2


In [25]:
print(Y_train_3[0].shape)
print(Y_train_3[1].shape)

(1, 665612, 2)
(1, 665612, 2)


##### Model

In [26]:
# using only preprocessed features as input:
model_3_features = get_model(output_names=['pointing-signs-pro123', 'depicting-signs-AG'],
                    output_classes=[2, 2],
                    output_weights=[1, 1],
                    features_number=features_number,
                    features_type='features')

Model: "model_4"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_8 (InputLayer)            [(None, 100, 420)]   0                                            
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, 100, 200)     252200      input_8[0][0]                    
__________________________________________________________________________________________________
bidirectional_8 (Bidirectional) (None, 100, 110)     112640      conv1d_3[0][0]                   
__________________________________________________________________________________________________
bidirectional_9 (Bidirectional) (None, 100, 110)     73040       bidirectional_8[0][0]            
____________________________________________________________________________________________

#### ex. 4 :

##### Data

In [29]:
# in this example, only depicting signs as output
[X_feat_train_4, X_frames_train_4], Y_train_4 =\
      get_data_concatenated(corpus='DictaSign',
                            output_form='mixed',
                            types=[['DS']],
                            nonZero=[[]],
                            binary=[True],
                            video_indices=idxTrain,
                            features_dict=features_dict,
                            features_type='both',
                            from_notebook=True)

In [30]:
print(X_feat_train_4.shape)
print(X_frames_train_4.shape)
print(len(Y_train_4))

(1, 665612, 420)
(665612,)
1


In [53]:
print(Y_train_4[0].shape)
print(np.sum(Y_train_4[0],axis=1))

(1, 665612, 2)
[[623477.  42135.]]


##### Model

In [32]:
# using only preprocessed features as input:
model_4_features = get_model(output_names=['DS'],
                    output_classes=[2],
                    output_weights=[1],
                    features_number=features_number,
                    features_type='features')

Model: "model_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_9 (InputLayer)         [(None, 100, 420)]        0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 100, 200)          252200    
_________________________________________________________________
bidirectional_10 (Bidirectio (None, 100, 110)          112640    
_________________________________________________________________
bidirectional_11 (Bidirectio (None, 100, 110)          73040     
_________________________________________________________________
time_distributed_18 (TimeDis (None, 100, 2)            222       
Total params: 438,102
Trainable params: 438,102
Non-trainable params: 0
_________________________________________________________________


### Second category of examples: `output_form = 'sign_types'` (examples 5 and 6)

#### ex. 5 :

##### Data

In [44]:
[X_feat_train_5, X_frames_train_5], Y_train_5 =\
      get_data_concatenated(corpus='DictaSign',
                            output_form='sign_types',
                            types=[['fls'],['DS'],['PT'],['FBUOY']],
                            nonZero=[[],[],[],[]],
                            binary=[],
                            video_indices=idxTrain,
                            features_dict=features_dict,
                            features_type='both',
                            from_notebook=True)

In [45]:
print(X_feat_train_5.shape)
print(X_frames_train_5.shape)
print(Y_train_5.shape) # when output_form='sign_types', Y is not a list, but a matrix

(1, 665612, 420)
(665612,)
(1, 665612, 5)


In [46]:
#print(Y_train_5[0,0:200,:])
print(np.sum(Y_train_5,axis=1))

[[488403. 126592.  42135.  15146.   9260.]]


##### Model

In [47]:
# using only preprocessed features as input:
model_5_features = get_model(output_names=['fls-DS-PT-FBUOY'],
                    output_classes=[5],
                    output_weights=[1],
                    features_number=features_number,
                    features_type='features')

Model: "model_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_10 (InputLayer)        [(None, 100, 420)]        0         
_________________________________________________________________
conv1d_5 (Conv1D)            (None, 100, 200)          252200    
_________________________________________________________________
bidirectional_12 (Bidirectio (None, 100, 110)          112640    
_________________________________________________________________
bidirectional_13 (Bidirectio (None, 100, 110)          73040     
_________________________________________________________________
time_distributed_19 (TimeDis (None, 100, 5)            555       
Total params: 438,435
Trainable params: 438,435
Non-trainable params: 0
_________________________________________________________________


#### ex. 6 :

##### Data

In [48]:
[X_feat_train_6, X_frames_train_6], Y_train_6 =\
      get_data_concatenated(corpus='DictaSign',
                            output_form='sign_types',
                            types=[['DS']],
                            nonZero=[[]],
                            binary=[],
                            video_indices=idxTrain,
                            features_dict=features_dict,
                            features_type='both',
                            from_notebook=True)

In [49]:
print(X_feat_train_6.shape)
print(X_frames_train_6.shape)
print(Y_train_6.shape) # when output_form='sign_types', Y is not a list, but a matrix

(1, 665612, 420)
(665612,)
(1, 665612, 2)


In [52]:
print(Y_train_6[0,1000:1020,:])
print(np.sum(Y_train_6,axis=1))

[[1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]]
[[623477.  42135.]]


##### Model

In [51]:
# using only preprocessed features as input:
model_6_features = get_model(output_names=['DS'],
                    output_classes=[2],
                    output_weights=[1],
                    features_number=features_number,
                    features_type='features')

Model: "model_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_11 (InputLayer)        [(None, 100, 420)]        0         
_________________________________________________________________
conv1d_6 (Conv1D)            (None, 100, 200)          252200    
_________________________________________________________________
bidirectional_14 (Bidirectio (None, 100, 110)          112640    
_________________________________________________________________
bidirectional_15 (Bidirectio (None, 100, 110)          73040     
_________________________________________________________________
time_distributed_20 (TimeDis (None, 100, 2)            222       
Total params: 438,102
Trainable params: 438,102
Non-trainable params: 0
_________________________________________________________________


## Training the models

Now we look at how to train the models

In [62]:
batch_size=100
epochs=10
seq_length=100

In [60]:
[X_feat_valid_1, X_frames_valid_1], Y_valid_1 =\
      get_data_concatenated(corpus='DictaSign',
                            output_form='mixed',
                            types=[['fls'], ['DS'], ['PT'], ['FBUOY']],
                            nonZero=[flsKept, [], [], []],
                            binary=[False, True, True, True],
                            video_indices=idxValid,
                            features_dict=features_dict,
                            features_type='both',
                            from_notebook=True)

In [64]:
history = train_model(model_1_features,
                      [X_feat_train_1, X_frames_train_1],
                      Y_train_1,
                      [X_feat_valid_1, X_frames_valid_1],
                      Y_valid_1,
                      batch_size=batch_size,
                      epochs=epochs,
                      seq_length=seq_length)

Train for 67.0 steps, validate for 1 steps
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [65]:
history

{'loss': [0.4873052234969922,
  0.36351746926779177,
  0.3095500692169168,
  0.3050421726014187,
  0.3225172839836398,
  0.23277043945976159,
  0.27617699491666325,
  0.2752775760617719,
  0.23305533198055936,
  0.22476121366246424],
 'time_distributed_loss': [0.037806444,
  0.0075106374,
  0.0046897423,
  0.010325068,
  0.0080221575,
  0.0076571107,
  0.0066989427,
  0.007943476,
  0.006906639,
  0.0044149715],
 'time_distributed_1_loss': [0.23907705,
  0.18651846,
  0.16627303,
  0.15832663,
  0.15969767,
  0.11779289,
  0.14515437,
  0.14083445,
  0.11762877,
  0.12321183],
 'time_distributed_2_loss': [0.12885219,
  0.09632742,
  0.08722829,
  0.07934722,
  0.09076402,
  0.06719639,
  0.081218965,
  0.08266508,
  0.06670073,
  0.061022077],
 'time_distributed_3_loss': [0.081569545,
  0.07316094,
  0.05135904,
  0.057043206,
  0.06403344,
  0.040124036,
  0.0431047,
  0.04383459,
  0.041819192,
  0.036112316],
 'time_distributed_acc': [0.9846254,
  0.99911195,
  0.9994433,
  0.998707