# XGBoost Algorithm:
* XGBoost is a robust machine-learning algorithm that can help you understand your data and make better decisions.
* XGBoost is an implementation of gradient-boosting decision trees. It has been used by data scientists and researchers     worldwide to optimize their machine-learning models.
* XGBoost is a classification algorithm. It's designed for problems where you have a bunch of training data that can be used to create a classifier, and then you have new data that you want to classify.
* XGBoost is used for these two reasons: Execution speed and Model performance.

# What Algorithm Does XGBoost Use?
* Gradient boosting is a ML algorithm that creates a series of models and combines them to create an overall model that is more accurate than any individual model in the sequence.
* It supports both regression and classification predictive modeling problems.
* To add new models to an existing one, it uses a gradient descent algorithm called gradient boosting.
* Gradient boosting is implemented by the XGBoost library, also known as multiple additive regression trees, stochastic gradient boosting, or gradient boosting machines.

# Advantages of XGBoost:
* Performance: XGBoost has a strong track record of producing high-quality results in various machine learning tasks, especially in Kaggle competitions, where it has been a popular choice for winning solutions.
* Scalability: XGBoost is designed for efficient and scalable training of machine learning models, making it suitable for large datasets.
* Customizability: XGBoost has a wide range of hyperparameters that can be adjusted to optimize performance, making it highly customizable.
* Handling of Missing Values: XGBoost has built-in support for handling missing values, making it easy to work with real-world data that often has missing values.
* Interpretability: Unlike some machine learning algorithms that can be difficult to interpret, XGBoost provides feature importances, allowing for a better understanding of which variables are most important in making predictions.

# Disadvantages of XGBoost:
* Computational Complexity: XGBoost can be computationally intensive, especially when training large models,making it less
suitable for resource-constrained systems.
* Overfitting: XGBoost can be prone to overfitting, especially when trained on small datasets or when too many trees are 
used in the model.
* Hyperparameter Tuning: XGBoost has many hyperparameters that can be adjusted, making it important to properly tune the 
parameters to optimize performance. However, finding the optimal set of parameters can be time-consuming and requires expertise.
* Memory Requirements: XGBoost can be memory-intensive, especially when working with large datasets, making it less suitable 
for systems with limited memory resources.

# Implementation Of XGBoost Algorithm:

In [10]:
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score

# Load the dataset
data = pd.read_csv('New heart.csv')

In [11]:
categorical_columns = ['Sex', 'ChestPainType', 'RestingECG', 'ExerciseAngina', 'ST_Slope']
for col in categorical_columns:
    data[col] = data[col].astype('category')
    
data = pd.get_dummies(data, columns=categorical_columns)

In [12]:
# Separate features and labels
X = data.iloc[:, :-1]
y = data.iloc[:, -1]


In [13]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [14]:
# Create the XGBoost model
modell = xgb.XGBClassifier()

In [15]:
# Train the model
modell.fit(X_train, y_train)

In [16]:
# Make predictions on the test set
y_pred = modell.predict(X_test)


In [17]:
# Calculate evaluation metrics
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print the evaluation metrics
print("Precision:", precision*100)
print("Recall:", recall*100)
print("Accuracy:", accuracy*100)
print("F1 Score:", f1*100)


Precision: 100.0
Recall: 100.0
Accuracy: 100.0
F1 Score: 100.0


In [18]:
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC  # Example classifier, you can use any other classifier of your choice

# Assuming you have a dataset with features (X) and corresponding labels (y)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate and train your classifier
classifier = SVC()  # Example classifier, you can use any other classifier of your choice
classifier.fit(X_train, y_train)

# Make predictions on the training set (for example purposes only, you should use the test set)
y_pred = classifier.predict(X_train)

# Calculate the confusion matrix
cm = confusion_matrix(y_train, y_pred)

print(cm)


[[340  87]
 [128 179]]
