# Case Studies

In this section, we will explore real-world applications of logistic regression in various fields, including healthcare, finance, and marketing. These case studies demonstrate the practical utility and versatility of logistic regression in solving real-world problems.

## Real-World Examples

### Healthcare

Logistic regression is widely used in healthcare for predicting patient outcomes, understanding disease progression, and optimizing treatment plans.

1. **Predicting Disease Presence:**
   Logistic regression can be used to predict the presence or absence of a disease based on clinical parameters and patient history. For example, it can predict the likelihood of a patient having diabetes based on features such as age, BMI, blood pressure, and glucose levels.

2. **Understanding Disease Progression:**
   Logistic regression can help understand how different factors contribute to the progression of diseases such as diabetes, heart disease, and cancer. For example, it can model the relationship between lifestyle factors and the progression of diabetes.

In [1]:
# Predicting Disease Presence

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

# Load the dataset (e.g., Pima Indians Diabetes dataset)
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = [
    "Pregnancies",
    "Glucose",
    "BloodPressure",
    "SkinThickness",
    "Insulin",
    "BMI",
    "DiabetesPedigreeFunction",
    "Age",
    "Outcome",
]
data = pd.read_csv(url, names=column_names)

# Split the dataset into features and target variable
X = data.drop("Outcome", axis=1)
y = data["Outcome"]

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_prob)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"ROC-AUC: {roc_auc:.2f}")

Accuracy: 0.75
Precision: 0.64
Recall: 0.67
F1 Score: 0.65
ROC-AUC: 0.81


### Finance

In finance, logistic regression is used for risk management, portfolio optimization, and forecasting financial metrics.

1. **Credit Scoring:**
   Logistic regression can be used to develop credit scoring models that assess the creditworthiness of individuals based on factors such as income, debt, and credit history. These models help financial institutions make informed lending decisions.

2. **Fraud Detection:**
   Logistic regression can be used to detect fraudulent transactions by modeling the relationship between transaction features and the likelihood of fraud. This helps financial institutions identify and prevent fraudulent activities.

In [2]:
# Credit Scoring

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

# Generate a synthetic dataset for credit scoring
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_prob)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"ROC-AUC: {roc_auc:.2f}")

Accuracy: 0.83
Precision: 0.87
Recall: 0.82
F1 Score: 0.84
ROC-AUC: 0.91


### Marketing

In marketing, logistic regression is used to analyze consumer behavior, optimize marketing campaigns, and forecast sales.

1. **Customer Churn Prediction:**
   Logistic regression can predict whether a customer will churn (i.e., stop using a service) based on factors such as usage patterns, customer service interactions, and demographic information. This helps businesses identify at-risk customers and take proactive measures to retain them.

2. **Email Spam Detection:**
   Logistic regression can classify emails as spam or not spam based on email content features. This helps email service providers filter out spam messages and improve user experience.   

In [3]:
# Customer Churn Prediction

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

# Load the dataset (e.g., Telco Customer Churn dataset)
url = "https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv"
data = pd.read_csv(url)

# Preprocess the data
data["Churn"] = data["Churn"].map({"Yes": 1, "No": 0})
data = pd.get_dummies(data, drop_first=True)

# Split the dataset into features and target variable
X = data.drop("Churn", axis=1)
y = data["Churn"]

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_prob)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"ROC-AUC: {roc_auc:.2f}")

Accuracy: 0.82
Precision: 0.70
Recall: 0.59
F1 Score: 0.64
ROC-AUC: 0.86


In [4]:
# Email Spam Detection

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

# Load the dataset (e.g., SMS Spam Collection Dataset)
url = "https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv"
data = pd.read_csv(url, sep="\t", header=None, names=["label", "message"])

# Convert labels to binary
data["label"] = data["label"].map({"ham": 0, "spam": 1})

# Split the dataset into features and target variable
X = data["message"]
y = data["label"]

# Convert text data to TF-IDF features
vectorizer = TfidfVectorizer(stop_words="english", max_features=1000)
X_tfidf = vectorizer.fit_transform(X)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.2, random_state=42)

# Create and train the logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_prob)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"ROC-AUC: {roc_auc:.2f}")

Accuracy: 0.98
Precision: 1.00
Recall: 0.87
F1 Score: 0.93
ROC-AUC: 0.99


## Summary
These case studies highlight the versatility and practical utility of logistic regression in various fields. In healthcare, it can predict disease presence and understand disease progression. In finance, it can assess credit risk and detect fraud. In marketing, it can predict customer churn and classify emails as spam. By applying logistic regression to real-world problems, businesses and organizations can make data-driven decisions and optimize their operations.