Scores are represented by
where
where
Let
where
Then we can define
1st iteration:
2nd iteration:
Calculate the sub-gradients of the margin rescaling loss
where
When the training set consists of samples of different lengths, we can to normalize the sums of scores and the sub-gradients of the margin rescaling loss. For example: