# Machine Learning
___
## Notes
In this notebook, we will discuss machine learning as a whole.

Specifically, we'll be talking about the following:
> [Overview](#Overview:-Machine-Learning)
>
> [System Evaluation](#System-Evaluation)
>
> [Random Forests](#Random-Forests)
>
> [Gradient Boosting](#Gradient-Boosting)

We finish the notebook off with a [review](#Review) of everything discussed. 

___

> ## Overview: Machine Learning
> ___
> **Machine Learning** refers to algorithms that use data to make predictions. 
>
> There are two types of machine learning: 
> - **Supervised Learning**: Learning a mapping function from labeled training data to make predictions on unseen data.
> 
> - **Unsupervised Learning**: Discovering hidden patterns/structures in unlabelled data.
> 
> We will be focusing on supervised learning for now. 

___

> ## System Evaluation
> ___
> **K-Fold Cross Validation** is a process of dividing a dataset into k subsets where a different holdout subset is used in each iteration to test our model. A nice graphic illustrating this process is shown below.
> 
> INSERT GRAPHIC
> 
> **Evaluation metrics** quantify the performance of our predictive model. 
>
> Here are a few common evaluation metrics we will use:
> - $\text{Accuracy} = \frac{\text{# predicted correctly}}{\text{ total # of observations}}$
> - $\text{Precision} = \frac{\text{# predicted as class A that are actually class A}}{\text{ total # predicted as class A}}$
> - $\text{Recall} = \frac{\text{# predicted as class A that are actually class A}}{\text{ total # that are actually class A}}$
>
> `Note`: If we want to limit false positives, we will optimize the model for precision. If we want to limit false negatives, we will optomize the model for recall. 

___

> ## Random Forests
> 
> **Random forests** are collections of decision trees whose predictions aggregated to obtain a final prediction. 
> 
> They use **ensemble learning**, a process by which multiple models are created and combined to make a more informed prediction (compared to a single model). 
>
> As with any other model, we can use things like **k-fold cross validation** and **train-test splits** to evaluate our random forests. We can also use things like **grid-search**, an exhaustive search process across all parameter combinations in a given grid for the best performing parameters, to improve our random forests. 
>
> Let's explore random forests a bit. 

First, we'll read in our data and clean it up. 

In [1]:
import nltk
import re
import string
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

stopwords = nltk.corpus.stopwords.words('english')
ps = nltk.PorterStemmer()

data = pd.read_csv('SMSSpamCollection.tsv', sep='\t', header=None)
data.columns = ['label','text']

In [7]:
def count_punctuation(text):
    count = sum([1 for char in text if char in string.punctuation])
    return round(count/(len(text) - text.count(" ")),3)*100

data['length'] = data['text'].apply(lambda x: len(x) - x.count(" "))
data['punctuation'] = data['text'].apply(lambda x: count_punctuation(x))

In [8]:
def clean_text(text):
    cleaned_text = ''.join([char for char in text if char not in string.punctuation])
    tokenized_text = re.split('\W+',cleaned_text)
    stemmed_tokens = [ps.stem(word) for word in tokenized_text if word not in stopwords]
    return stemmed_tokens

tdidf_vect = TfidfVectorizer(analyzer=clean_text)
X_tfidf_counts = tdidf_vect.fit_transform(data['text'])

X_features = pd.concat([data['length'], data['punctuation'], pd.DataFrame(X_tfidf_counts.toarray())], axis=1)
X_features.head()

Unnamed: 0,length,punctuation,0,1,2,3,4,5,6,7,...,8181,8182,8183,8184,8185,8186,8187,8188,8189,8190
0,160,2.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,128,4.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,49,4.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,62,3.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,28,7.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now, let's import our `RandomForestClassifier` and explore it through 10-fold cross-validation.

In [24]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import KFold, cross_val_score

rf = RandomForestClassifier(n_jobs=-1) # allows for trees to run in parallel
k_fold = KFold(n_splits=10)
cross_val_score(rf, X_features, data['label'], cv=k_fold, scoring='accuracy', n_jobs=-1)

array([0.98384201, 0.96947935, 0.97666068, 0.98743268, 0.97307002,
       0.98025135, 0.97127469, 0.96947935, 0.9676259 , 0.98201439])

Let's explore our `RandomForestClassifier` using a holdout test set. 

In [26]:
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_features, data['label'], test_size = 0.2)
rf2 = RandomForestClassifier(n_estimators= 50,max_depth=20, n_jobs=-1)
rf2_model = rf2.fit(X_train, y_train)

Our `RandomForestClassifier` has a `feature_importances_` attribute we would like to make use of. 

In [28]:
sorted(zip(rf2_model.feature_importances_, X_train.columns), reverse=True)[0:10]

[(0.04727789288599296, 'length'),
 (0.039905742377050364, 2048),
 (0.02900918223533704, 1819),
 (0.0245632355485361, 7090),
 (0.023629581186565912, 4838),
 (0.02316410225353268, 3159),
 (0.021391812063942965, 295),
 (0.020058904539315043, 7422),
 (0.019354545728288715, 2188),
 (0.017632349642892228, 7864)]

The above results tell us that the length of text is a key factor in determining labels. 

Let's go ahead and make predictions using our model (since we've already fit the model to our training data). 

In [34]:
y_pred = rf2_model.predict(X_test)
precision, recall, fscore, support = score(y_test, y_pred, pos_label='spam', average='binary')
print('Precision: {}'.format(round(precision,3)))
print('Recall: {}'.format(round(recall,3)))
print('Accuracy: {}'.format(round((y_pred==y_test).sum() / len(y_pred),3)))

Precision: 1.0
Recall: 0.57
Accuracy: 0.942


The above results tell us that:
- 100% mail in the spam folder is actually spam
- 57% of all spam was properly marked as spam
- 94.2% of all emails were identified correctly (whether spam or ham)

To finish up, let's perform a grid-search for our `RandomForestClassifier`. 

In [37]:
def train_rf(num_estimators, depth_num):
    rf = RandomForestClassifier(n_estimators=num_estimators, max_depth=depth_num, n_jobs=-1)
    rf_model = rf.fit(X_train, y_train)
    y_pred = rf_model.predict(X_test)
    precision, recall, fscore, support = score(y_test, y_pred, pos_label='spam', average='binary')
    print('Est: {} / Depth: {} --- Precision: {} / Recall: {} / Accuracy: {}'.format(
        num_estimators, depth_num, round(precision,3), round(recall,3), 
        round((y_pred==y_test).sum() / len(y_pred),3)))

for num_estimators in [10, 50, 100]:
    for depth_num in [10, 20, 30, None]:
        train_rf(num_estimators, depth_num)

Est: 10 / Depth: 10 --- Precision: 1.0 / Recall: 0.146 / Accuracy: 0.884
Est: 10 / Depth: 20 --- Precision: 1.0 / Recall: 0.57 / Accuracy: 0.942
Est: 10 / Depth: 30 --- Precision: 1.0 / Recall: 0.742 / Accuracy: 0.965
Est: 10 / Depth: None --- Precision: 0.991 / Recall: 0.755 / Accuracy: 0.966
Est: 50 / Depth: 10 --- Precision: 1.0 / Recall: 0.238 / Accuracy: 0.897
Est: 50 / Depth: 20 --- Precision: 1.0 / Recall: 0.55 / Accuracy: 0.939
Est: 50 / Depth: 30 --- Precision: 1.0 / Recall: 0.675 / Accuracy: 0.956
Est: 50 / Depth: None --- Precision: 1.0 / Recall: 0.801 / Accuracy: 0.973
Est: 100 / Depth: 10 --- Precision: 1.0 / Recall: 0.258 / Accuracy: 0.899
Est: 100 / Depth: 20 --- Precision: 1.0 / Recall: 0.536 / Accuracy: 0.937
Est: 100 / Depth: 30 --- Precision: 1.0 / Recall: 0.682 / Accuracy: 0.957
Est: 100 / Depth: None --- Precision: 1.0 / Recall: 0.815 / Accuracy: 0.975


The above results tell us that:
- As depth increase, recall increases drastically without a change in precision
- As estimators increase, recall increases slightly without a change in precision

Therefore, the best random forest model for this problem would be one with a very high depth. 

We could also perform grid-search using `GridSearchCV`, which simultaneously performs cross-validation for us.

In [38]:
from sklearn.model_selection import GridSearchCV

rf3 = RandomForestClassifier()
param = {'n_estimators':[10, 150, 300],
        'max_depth':[30, 60, 90, None]} #defining our grid as a dictionary

gs = GridSearchCV(rf3, param, cv=5, n_jobs=-1)
gs_fit = gs.fit(X_features, data['label'])
pd.DataFrame(gs_fit.cv_results_).sort_values('mean_test_score', ascending=False)[0:5] #print some results

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_max_depth,param_n_estimators,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
7,22.201188,0.194872,0.433598,0.092399,90.0,150,"{'max_depth': 90, 'n_estimators': 150}",0.976661,0.977558,0.976661,0.967655,0.975741,0.974855,0.003646,1
4,20.388246,1.089177,0.423114,0.065336,60.0,150,"{'max_depth': 60, 'n_estimators': 150}",0.978456,0.977558,0.973968,0.967655,0.973944,0.974316,0.003802,2
11,30.940607,4.130551,0.267455,0.066316,,300,"{'max_depth': None, 'n_estimators': 300}",0.977558,0.978456,0.976661,0.965858,0.973046,0.974316,0.004611,3
8,39.967187,1.712187,0.449691,0.107516,90.0,300,"{'max_depth': 90, 'n_estimators': 300}",0.976661,0.979354,0.974865,0.967655,0.972147,0.974136,0.004002,4
10,22.674996,0.421129,0.382897,0.068638,,150,"{'max_depth': None, 'n_estimators': 150}",0.978456,0.977558,0.973968,0.966757,0.973046,0.973957,0.004145,5


In [39]:
gs_fit2 = gs.fit(X_tfidf_counts, data['label'])
pd.DataFrame(gs_fit2.cv_results_).sort_values('mean_test_score', ascending=False)[0:5] #print some results

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_max_depth,param_n_estimators,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
7,7.209251,0.047505,0.078798,0.001776,90.0,150,"{'max_depth': 90, 'n_estimators': 150}",0.976661,0.972172,0.974865,0.964061,0.973944,0.972341,0.004386,1
6,0.43194,0.008801,0.00797,0.000222,90.0,10,"{'max_depth': 90, 'n_estimators': 10}",0.974865,0.973968,0.977558,0.966757,0.967655,0.972161,0.004224,2
10,8.054038,0.08171,0.085057,0.005951,,150,"{'max_depth': None, 'n_estimators': 150}",0.977558,0.971275,0.972172,0.964061,0.974843,0.971982,0.004529,3
8,13.677317,0.693079,0.143896,0.017107,90.0,300,"{'max_depth': 90, 'n_estimators': 300}",0.976661,0.974865,0.972172,0.967655,0.968553,0.971981,0.003485,4
11,11.6015,1.343673,0.121324,0.006759,,300,"{'max_depth': None, 'n_estimators': 300}",0.977558,0.971275,0.971275,0.965858,0.972147,0.971623,0.003714,5


The above results tell us that deeper models tend to perform better. 

___

> ## Gradient Boosting
> 
> **Gradient Boosting** is an ensemble method, based on decision trees, that combines weak learners together to generate a strong learner. Specifically, mistakes made in prior iterations heavily impact future iterations in gradient boosting. 
>
> Typically, <u>gradient boosting</u> takes longer to train and is much easier to overfit when compared to <u>random forsts</u>. However, gradient boost generally performs much better in comparison. 
>
> Let's dive into gradient boosting. 

In [41]:
from sklearn.ensemble import GradientBoostingClassifier

def train_GB(est, max_depth, lr):
    gb = GradientBoostingClassifier(n_estimators=est, max_depth=max_depth, learning_rate=lr)
    gb_model = gb.fit(X_train, y_train)
    y_pred = gb_model.predict(X_test)
    precision, recall, fscore, support = score(y_test, y_pred, pos_label='spam', average='binary')
    print('Est: {} / Depth: {} / LR: {} --- Precision: {} / Recall: {} / Accuracy: {}'.format(
        num_estimators, depth_num, lr, round(precision,3), round(recall,3), 
        round((y_pred==y_test).sum() / len(y_pred),3)))

In [42]:
for n_est in [50, 100, 150]:
    for max_depth in [3, 7, 11, 13]:
        for lr in [0.01, 0.1, 1]:
            train_GB(n_est, max_depth, lr)

  _warn_prf(average, modifier, msg_start, len(result))


Est: 100 / Depth: None / LR: 0.01 --- Precision: 0.0 / Recall: 0.0 / Accuracy: 0.864
Est: 100 / Depth: None / LR: 0.1 --- Precision: 0.972 / Recall: 0.702 / Accuracy: 0.957
Est: 100 / Depth: None / LR: 1 --- Precision: 0.888 / Recall: 0.788 / Accuracy: 0.958


  _warn_prf(average, modifier, msg_start, len(result))


Est: 100 / Depth: None / LR: 0.01 --- Precision: 0.0 / Recall: 0.0 / Accuracy: 0.864
Est: 100 / Depth: None / LR: 0.1 --- Precision: 0.93 / Recall: 0.788 / Accuracy: 0.963
Est: 100 / Depth: None / LR: 1 --- Precision: 0.871 / Recall: 0.808 / Accuracy: 0.958
Est: 100 / Depth: None / LR: 0.01 --- Precision: 1.0 / Recall: 0.007 / Accuracy: 0.865
Est: 100 / Depth: None / LR: 0.1 --- Precision: 0.909 / Recall: 0.795 / Accuracy: 0.961
Est: 100 / Depth: None / LR: 1 --- Precision: 0.911 / Recall: 0.815 / Accuracy: 0.964


  _warn_prf(average, modifier, msg_start, len(result))


Est: 100 / Depth: None / LR: 0.01 --- Precision: 0.0 / Recall: 0.0 / Accuracy: 0.864
Est: 100 / Depth: None / LR: 0.1 --- Precision: 0.896 / Recall: 0.801 / Accuracy: 0.961
Est: 100 / Depth: None / LR: 1 --- Precision: 0.9 / Recall: 0.834 / Accuracy: 0.965
Est: 100 / Depth: None / LR: 0.01 --- Precision: 0.963 / Recall: 0.523 / Accuracy: 0.933
Est: 100 / Depth: None / LR: 0.1 --- Precision: 0.973 / Recall: 0.728 / Accuracy: 0.961
Est: 100 / Depth: None / LR: 1 --- Precision: 0.894 / Recall: 0.781 / Accuracy: 0.958
Est: 100 / Depth: None / LR: 0.01 --- Precision: 0.933 / Recall: 0.642 / Accuracy: 0.945
Est: 100 / Depth: None / LR: 0.1 --- Precision: 0.945 / Recall: 0.801 / Accuracy: 0.967
Est: 100 / Depth: None / LR: 1 --- Precision: 0.923 / Recall: 0.795 / Accuracy: 0.963
Est: 100 / Depth: None / LR: 0.01 --- Precision: 0.909 / Recall: 0.728 / Accuracy: 0.953
Est: 100 / Depth: None / LR: 0.1 --- Precision: 0.917 / Recall: 0.808 / Accuracy: 0.964
Est: 100 / Depth: None / LR: 1 --- Preci

The above results tell us that:
- A learning rate of 0.1 appears to give the best performance
- A large max depth appears to give the best performance
- A large number of estimators appears to give the best performance

What if we want to evaluate gradient boosting with `GridSearchCV`? 

In [44]:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

# TF-IDF
tdidf_vect = TfidfVectorizer(analyzer=clean_text)
X_tfidf = tdidf_vect.fit_transform(data['text'])
X_tfidf_feat = pd.concat([data['length'], data['punctuation'], pd.DataFrame(X_tfidf.toarray())], axis=1)

# CountVectorizer
count_vect = CountVectorizer(analyzer=clean_text)
X_count = tdidf_vect.fit_transform(data['text'])
X_count_feat = pd.concat([data['length'], data['punctuation'], pd.DataFrame(X_count.toarray())], axis=1)

X_count_feat.head()

Unnamed: 0,length,punctuation,0,1,2,3,4,5,6,7,...,8181,8182,8183,8184,8185,8186,8187,8188,8189,8190
0,160,2.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,128,4.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,49,4.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,62,3.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,28,7.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [45]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV

gb = GradientBoostingClassifier()
param = {
    'n_estimators': [100, 150], # based on last section
    'max_depth': [7, 11, 15], # based on last section
    'learning_rate': [0.1] # default learning rate, so don't need to include
}

gs = GridSearchCV(gb, param, cv=5, n_jobs=-1)

In [50]:
cv_fit = gs.fit(X_tfidf_feat, data['label'])
pd.DataFrame(cv_fit.cv_results_).sort_values('mean_test_score', ascending=False)[0:5]

KeyboardInterrupt: 

In [None]:
cv_fit2 = gs.fit(X_count, data['label'])
pd.DataFrame(cv_fit2.cv_results_).sort_values('mean_test_score', ascending=False)[0:5]

## Model Selection

So far, we've been making a mistake. We were fitting our model with all the data, then splitting it into training and testing. We did this to make using GridSearchCV easier for us, but now we'll go ahead and correct this mistake. 

In [1]:
import nltk
import re
import string
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

stopwords = nltk.corpus.stopwords.words('english')
ps = nltk.PorterStemmer()

data = pd.read_csv('SMSSpamCollection.tsv', sep='\t', header=None)
data.columns = ['label','text']

In [7]:
def count_punctuation(text):
    count = sum([1 for char in text if char in string.punctuation])
    return round(count/(len(text) - text.count(" ")),3)*100

data['length'] = data['text'].apply(lambda x: len(x) - x.count(" "))
data['punctuation'] = data['text'].apply(lambda x: count_punctuation(x))

def clean_text(text):
    cleaned_text = ''.join([char for char in text if char not in string.punctuation])
    tokenized_text = re.split('\W+',cleaned_text)
    stemmed_tokens = [ps.stem(word) for word in tokenized_text if word not in stopwords]
    return stemmed_tokens

In [48]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data[['text','length','punctuation']], data['label'], test_size = 0.2)

In [51]:
tfidf_vect = TfidfVectorizer(analyzer=clean_text)
tfidf_vect_fit = tfidf_vect.fit(X_train['text'])

tfidf_train = tfidf_vect_fit.transform(X_train['text'])
tfidf_test = tfidf_vect_fit.transform(X_test['text'])

X_train_vect = pd.concat([X_train[['length','punctuation']].reset_index(drop=True), 
           pd.DataFrame(tfidf_train.toarray())], axis=1) # 1 = side-by-side, 0 = on top of each other
X_test_vect = pd.concat([X_test[['length','punctuation']].reset_index(drop=True), 
           pd.DataFrame(tfidf_test.toarray())], axis=1) # 1 = side-by-side, 0 = on top of each other

X_train_vect.head()

Unnamed: 0,length,punctuation,0,1,2,3,4,5,6,7,...,7297,7298,7299,7300,7301,7302,7303,7304,7305,7306
0,39,7.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,46,13.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,30,6.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,41,7.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,131,9.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [53]:
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import precision_recall_fscore_support as score
from time import time

In [55]:
rf = RandomForestClassifier(n_estimators=150, max_depth=None, n_jobs=-1)

start = time()
rf_model = rf.fit(X_train_vect, y_train)
end = time()
fit_time = (end-start)

start = time()
y_pred = rf_model.predict(X_test_vect)
end = time()
pred_time = (end - start)

precision, recall, fscore, support = score(y_test, y_pred, pos_label='spam', average='binary')
print('Fit Time: {} / Predict Time: {} --- Precision: {} / Recall: {} / Accuracy: {}'.format(
    round(fit_time, 3), round(pred_time, 3),round(precision,3), round(recall,3), round((y_pred==y_test).sum() / len(y_pred),3)))

Fit Time: 2.232 / Predict Time: 0.15 --- Precision: 1.0 / Recall: 0.82 / Accuracy: 0.974


In [57]:
gb = GradientBoostingClassifier(n_estimators=150, max_depth=11)

start = time()
gb_model = gb.fit(X_train_vect, y_train)
end = time()
fit_time = (end-start)

start = time()
y_pred = gb_model.predict(X_test_vect)
end = time()
pred_time = (end - start)

precision, recall, fscore, support = score(y_test, y_pred, pos_label='spam', average='binary')
print('Fit Time: {} / Predict Time: {} --- Precision: {} / Recall: {} / Accuracy: {}'.format(
    round(fit_time, 3), round(pred_time, 3),round(precision,3), round(recall,3), round((y_pred==y_test).sum() / len(y_pred),3)))

Fit Time: 171.472 / Predict Time: 0.173 --- Precision: 0.971 / Recall: 0.845 / Accuracy: 0.974


We would also want to perform further evaluation, such as:
- Examining certain text message types to see models performance
    - Text messages that are longer than 50 charactes
    - Text messages with no punctuation
    - and more...
    
Is prediction time a bottleneck for you business? Do you prefer one metrics performance over the other -- is precision more important than recall?