In [9]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load data
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
columns = ['NumPregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
data = pd.read_csv(url, names=columns)

# Separate features and target
X = data.drop('Outcome', axis=1)
y = data['Outcome'].values

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


This imports two classes from my custom library (**lrxai**). LogisticRegression is the main model for performing logistic regression, and LogisticRegressionExplainability is used for explaining the predictions made by the logistic regression model, specifically by analyzing the feature importances.

In [10]:
from lrxai.logistic_regression import LogisticRegression
from lrxai.explainability import LogisticRegressionExplainability

* **LogisticRegression(learning_rate=0.01, num_iterations=1000):** This initializes the '**LogisticRegression**' object with a learning rate of 0.01 and sets it to perform 1000 iterations of the training algorithm (gradient descent).
* **model.fit(X_train, y_train):** This line trains the logistic regression model using the training data '**X_train**' (features) and '**y_train (labels)**, adjusting the model's weights based on the logistic regression algorithm.

In [11]:
# Initialize and train the logistic regression model
model = LogisticRegression(learning_rate=0.01, num_iterations=1000)
model.fit(X_train, y_train)

* **model.predict(X_test):** After training, this line uses the trained model to predict the outcomes for the test dataset X_test. The *'predict'* method applies the logistic regression weights to the new data to estimate whether each instance belongs to the positive class (1) or the negative class (0).

* **np.mean(predictions == y_test):** This calculates the accuracy of the model by comparing the predicted labels (predictions) with the actual labels (y_test). 
* It computes the proportion of correct predictions by taking the mean of a boolean array where True (1) represents a correct prediction and False (0) an incorrect one.
* **print(f"Accuracy: {accuracy:.2f}"):** This prints the accuracy of the model formatted to two decimal places.

In [12]:
# Predict on the test set
predictions = model.predict(X_test)

# Evaluate the model (simple accuracy)
accuracy = np.mean(predictions == y_test)
print(f"Accuracy: {accuracy:.2f}")




Accuracy: 0.73


* **LogisticRegressionExplainability(model):** This initializes the explainability tool with the trained logistic regression model.
* **explain.feature_importance():** This method calculates the importance of each feature in the model. It typically evaluates how much each feature's weight contributes to the model's predictions.
* **print("Feature Importances:", importances):** This line prints out the calculated feature importances, which helps in understanding which features are most influential in predicting the outcome.

In [13]:
# Explainability: Feature Importance
explain = LogisticRegressionExplainability(model)
importances = explain.feature_importance()
print("Feature Importances:", importances)

Feature Importances: [ 0.09608736  0.35162005 -0.04189399  0.00262948 -0.01219192  0.24369275
  0.09218915  0.15969531]
