## Single Party XGBoost on Single Party Data
First we'll train an XGBoost model on only your data. Here, a party will only have a subset of the data that's globally available. Therefore, we would expect the trained model to not be as robust as a model trained on all available data. We'll look at the performance of a XGBoost model that's trained on only your data. 
![title](img/exercise1.png)

Note that this notebook is the same for aggregators and non-aggregators.

### Data Preprocessing
Import the necessary libraries

In [None]:
import xgboost as xgb
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, roc_auc_score
from Utils import load_training_data

Load in and examine the training data belonging to your party to get a better understanding of the data.

In [None]:
# Read in the training data and print out the first few rows of the dataset using the .head() function
training_data = load_training_data()
training_data.head()

In [None]:
# Split the training dataset into labels and features
y_train = training_data.iloc[:, 0]
y_train.head()

In [None]:
x_train = training_data.iloc[:, 1:]
x_train.head()

Preprocess the test data.
* Test data for the Higgs boson dataset is located at `/data/hb/hb_test.csv`

In [None]:
# Split the test data into labels and features
test_data_path = "/data/hb/hb_test.csv"
test_data = pd.read_csv(test_data_path, sep=",", header=None)
y_test = test_data.iloc[:, 0]
x_test = test_data.iloc[:, 1:]
x_test.head()

### Model Training and Evaluation
Train the model with the training data.

In [None]:
model = xgb.XGBClassifier()
print("Beginning training...")

# TODO: Train a classifier using the model.fit(features, labels) function
# ...

print("Training finished")

Get predictions and evaluate the model with the test data. Feel free to use different error functions. We suggest the sklearn `accuracy_score()` function for classification.

In [None]:
# TODO: use the model to get predictions for the test set and calculate the prediction error
# Use the model.predict(x_test) function
preds = # ...

# Compute the accuracy of the predictions
accuracy_percent = str(accuracy_score(y_test, preds) * 100)[:5] + "%"
print("Your model achieved %s accuracy " % accuracy_percent)

preds_probs = model.predict_proba(x_test)[:, 1]
auc = roc_auc_score(y_test, preds_probs)
rounded_auc = str(auc)[:5]
print("Your model achieved an AUC of %s " % rounded_auc)

Compare your locally trained model's accuracy with those of other members of your federation. How do you think your locally trained model will perform on data that doesn't fit the distribution of your training data?

Once you've finished discussing, let's move to [Exercise 2](exercise2-aggregator.ipynb).