# Train an CoreML model, using TuriCreate

This first part, we have to import turicreate

In [1]:
import turicreate as tc

Then, We have to import the data

In [3]:
data = tc.SFrame.read_csv('./data/train.csv')

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,int,int,str,str,float,int,int,str,float,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


Now, we clear the data from empty values

In [4]:
data = data.dropna()

and then, we split the data to train and test data

In [5]:
train, test = data.random_split(0.8)

We can save these SFrame to use later.

In [13]:
train.save('data/train')
test.save('data/test')

Now, we can choose our features and the target

In [7]:
target = 'Survived'
features = ['Pclass', 'Sex', 'Age']

With these, we can create the classifier

In [8]:
model = tc.classifier.create(train, target=target, features=features)

PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.

PROGRESS: The following methods are available for this type of problem.
PROGRESS: BoostedTreesClassifier, RandomForestClassifier, DecisionTreeClassifier, SVMClassifier, LogisticClassifier
PROGRESS: The returned model will be chosen according to validation accuracy.


PROGRESS: Model selection based on validation accuracy:
PROGRESS: ---------------------------------------------
PROGRESS: BoostedTreesClassifier          : 0.896551724137931
PROGRESS: RandomForestClassifier          : 0.8620689655172413
PROGRESS: DecisionTreeClassifier          : 0.896551724137931
PROGRESS: SVMClassifier                   : 0.8620689655172413
PROGRESS: LogisticClassifier              : 0.8620689655172413
PROGRESS: ---------------------------------------------
PROGRESS: Selecting BoostedTreesClassifier based on validation set performance.


Now, we can see the predictions with the test data we have created before.

In [9]:
predictions = model.classify(test)

And, we can evaluate the model, with the results stored in a dictionary

In [11]:
metrics = model.evaluate(test)

In [12]:
metrics

{'accuracy': 0.7692307692307693,
 'auc': 0.8210715713714513,
 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 4
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        1        |   9   |
 |      0       |        0        |   73  |
 |      1       |        1        |   37  |
 |      1       |        0        |   24  |
 +--------------+-----------------+-------+
 [4 rows x 3 columns],
 'f1_score': 0.691588785046729,
 'log_loss': 0.4887292391659126,
 'precision': 0.8043478260869565,
 'recall': 0.6065573770491803,
 'roc_curve': Columns:
 	threshold	float
 	fpr	float
 	tpr	float
 	p	int
 	n	int
 
 Rows: 100001
 
 Data:
 +-----------+-----+-----+----+----+
 | threshold | fpr | tpr | p  | n  |
 +-----------+-----+-----+----+----+
 |    0.0    | 1.0 | 1.0 | 61 | 82 |
 |   1e-05   | 1.0 | 1.0 | 61 | 82 |
 |   2e-05   | 1.0 | 1.0 | 61 | 82 |
 | 

And then save in CoreML model and 

In [None]:
model.export_coreml('Titanic.mlmodel')