**Data Set Information:**

This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

**Predicted attribute:** class of iris plant.

This is an exceedingly simple domain.

This data differs from the data presented in Fishers article (identified by Steve Chadwick, spchadwick '@' espeedaz.net ). The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa" where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa" where the errors are in the second and third features.

**Attribute Information:**

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica

In [1]:
#Importing dataset from sklearn
from sklearn import datasets
from sklearn import metrics

iris = datasets.load_iris() #dataset loading
X = iris.data               #Features stored in X 
y = iris.target             #Class variable

In [2]:
#Splitting dataset into Training (80%) and testing data (20%) using train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [3]:
#Create an XGB classifier and instance of the same
from sklearn import svm
from xgboost import XGBClassifier
clf = XGBClassifier()
clf

In [4]:
from codecarbon import track_emissions


@track_emissions(project_name='XGBoost model')
def fit_classifier(x, y, clf):
    clf.fit(x, y)

    return clf


In [5]:
from codecarbon import EmissionsTracker

clf = fit_classifier(X_train, y_train, clf)

[codecarbon INFO @ 11:23:16] [setup] RAM Tracking...
[codecarbon INFO @ 11:23:16] [setup] GPU Tracking...
[codecarbon INFO @ 11:23:16] No GPU found.
[codecarbon INFO @ 11:23:16] [setup] CPU Tracking...
[codecarbon DEBUG @ 11:23:16] Not using PowerGadget, an exception occurred while instantiating IntelPowerGadget : Intel Power Gadget executable not found on darwin
[codecarbon DEBUG @ 11:23:16] Not using the RAPL interface, an exception occurred while instantiating IntelRAPL : Platform not supported by Intel RAPL Interface
[codecarbon DEBUG @ 11:23:16] CPU : We detect a Apple M1 Pro with a TDP of 10 W
[codecarbon INFO @ 11:23:16] CPU Model on constant consumption mode: Apple M1 Pro
[codecarbon INFO @ 11:23:16] >>> Tracker's metadata:
[codecarbon INFO @ 11:23:16]   Platform system: macOS-14.0-arm64-arm-64bit
[codecarbon INFO @ 11:23:16]   Python version: 3.9.6
[codecarbon INFO @ 11:23:16]   CodeCarbon version: 2.3.1
[codecarbon INFO @ 11:23:16]   Available RAM : 16.000 GB
[codecarbon INFO

In [6]:
y_pred = clf.predict(X_test)

In [7]:
#classification accuracy
from sklearn import metrics
print(metrics.accuracy_score(y_test, y_pred))

1.0


**Implementation using xgb library**

In [8]:
#importing library and segregation of data as train and test using DMatrix Data structure
import xgboost as xgb

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

In [9]:
#paramaters 
param = {
    'max_depth': 3,  # the maximum depth of each tree
    'eta': 0.3,  # the training step for each iteration
    'silent': 1,  # logging mode - quiet
    'objective': 'multi:softprob',  # error evaluation for multiclass training
    'num_class': 3}  # the number of classes that exist in this datset
num_round = 5  # the number of training iterations

In [10]:
#model builing using training data
bst = xgb.train(param, dtrain, num_round)

Parameters: { "silent" } are not used.



In [11]:
#To calculate predication using test data
y_predict = bst.predict(dtest)
#print(y_predict)

**Generated dump file will be like this:**

booster[0]:

0:[cap-shape=convex] yes=2,no=1

    1:leaf=0.426036
	2:leaf=-0.218845

booster[1]:

0:[cap-shape=convex] yes=2,no=1

    1:leaf=-0.213018
	2:[cap-shape=flat] yes=4,no=3
		3:[cap-shape=convex] yes=6,no=5
			5:leaf=0.409091
			6:leaf=-9.75349e-09
		4:[cap-shape=convex] yes=8,no=7
			7:leaf=-7.66345e-09
			8:leaf=-0.210219

booster[2]:

0:[cap-shape=convex] yes=2,no=1

    1:[cap-shape=flat] yes=4,no=3
		3:leaf=-0.217895
		4:[cap-shape=bell] yes=8,no=7
			7:leaf=-7.66345e-09
			8:leaf=-0.155172
	2:[cap-shape=flat] yes=6,no=5
		5:[cap-shape=convex] yes=10,no=9
			9:leaf=-0.036
			10:leaf=0.18
		6:[cap-shape=convex] yes=12,no=11
			11:leaf=0.128571
			12:leaf=0.420438
            ........

Dump_Model terms:

**  ID:** unique identifier of a node 

**Feature:** feature used in the tree to operate a split. When Leaf is indicated, it is the end of a branch 

**Split**: value of the chosen feature where is operated the split 

**Yes**: ID of the feature for the next node in the branch when the split condition is met 

**No**: ID of the feature for the next node in the branch when the split condition is not met 

**Missing**: ID of the feature for the next node in the branch for observation where the feature used for the split are not provided 

In [12]:
#Prediction using test data
preds = bst.predict(dtest)

In [13]:
#Calculating prediction accuracy
import numpy as np
best_preds = np.asarray([np.argmax(line) for line in preds])
from sklearn.metrics import precision_score
print (precision_score(y_test, best_preds, average='macro'))
# >> 1.0

1.0


**That's It...!**