# Logistic Regression
## Mobile Price Classification - ML Assignment 2

This notebook implements **Logistic Regression** for mobile price classification.

### Dataset:
- 20 features (mobile specifications)
- 2000 samples
- 4 classes (price ranges: 0, 1, 2, 3)

### Evaluation Metrics:
- Accuracy
- AUC Score
- Precision
- Recall
- F1 Score
- MCC Score

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, label_binarize
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score, roc_auc_score, precision_score,
    recall_score, f1_score, matthews_corrcoef,
    confusion_matrix, classification_report
)
import pickle
import warnings
warnings.filterwarnings('ignore')

print('✓ All libraries imported successfully')

In [None]:
# Load dataset
print('Loading dataset...')
df = pd.read_csv('../data/train.csv')

print(f'✓ Dataset loaded: {df.shape}')
print(f'  Features: {df.shape[1] - 1}')
print(f'  Samples: {df.shape[0]}')

# Display first few rows
df.head()

In [None]:
# Target distribution
print('Target Distribution:')
print(df['price_range'].value_counts().sort_index())

# Visualize
plt.figure(figsize=(8, 5))
df['price_range'].value_counts().sort_index().plot(
    kind='bar',
    color=['#3498db', '#2ecc71', '#f39c12', '#e74c3c']
)
plt.title('Price Range Distribution', fontsize=14, fontweight='bold')
plt.xlabel('Price Range')
plt.ylabel('Count')
plt.xticks([0, 1, 2, 3], ['Low', 'Medium', 'High', 'Very High'], rotation=0)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Separate features and target
X = df.drop('price_range', axis=1)
y = df['price_range']

# Split data (80-20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f'Training samples: {X_train.shape[0]}')
print(f'Testing samples: {X_test.shape[0]}')

In [None]:
# Feature Scaling (Required for this model)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print('✓ Features scaled using StandardScaler')

# Save scaler
with open('../scaler.pkl', 'wb') as f:
    pickle.dump(scaler, f)
print('✓ Scaler saved')

In [None]:
# Train Logistic Regression
print('Training model...')

model = LogisticRegression(max_iter=1000, random_state=42, n_jobs=-1)
model.fit(X_train_scaled, y_train)

print('✓ Training completed!')

In [None]:
# Make predictions
y_pred = model.predict(X_test_scaled)

# Get probabilities if available
if hasattr(model, 'predict_proba'):
    y_pred_proba = model.predict_proba(X_test_scaled)
else:
    y_pred_proba = None

print('✓ Predictions completed')

In [None]:
# Calculate all metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted', zero_division=0)
recall = recall_score(y_test, y_pred, average='weighted', zero_division=0)
f1 = f1_score(y_test, y_pred, average='weighted', zero_division=0)
mcc = matthews_corrcoef(y_test, y_pred)

# AUC Score
if y_pred_proba is not None:
    y_test_bin = label_binarize(y_test, classes=[0, 1, 2, 3])
    auc = roc_auc_score(y_test_bin, y_pred_proba, multi_class='ovr', average='weighted')
else:
    auc = 0.0

# Display
print('='*60)
print('EVALUATION METRICS - LOGISTIC REGRESSION')
print('='*60)
print(f'Accuracy:  {accuracy:.4f}')
print(f'AUC Score: {auc:.4f}')
print(f'Precision: {precision:.4f}')
print(f'Recall:    {recall:.4f}')
print(f'F1 Score:  {f1:.4f}')
print(f'MCC Score: {mcc:.4f}')
print('='*60)

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Low', 'Medium', 'High', 'Very High'],
            yticklabels=['Low', 'Medium', 'High', 'Very High'],
            cbar_kws={'label': 'Count'})
plt.xlabel('Predicted Label', fontsize=12, fontweight='bold')
plt.ylabel('True Label', fontsize=12, fontweight='bold')
plt.title('Confusion Matrix - Logistic Regression', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Classification Report
print('Classification Report:')
print('='*60)
report = classification_report(
    y_test, y_pred,
    target_names=['Low', 'Medium', 'High', 'Very High'],
    digits=4
)
print(report)

In [None]:
# Feature Importance (Coefficients)
feature_importance = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': np.abs(model.coef_[0])
}).sort_values('Coefficient', ascending=False)

print('Top 10 Features:')
print(feature_importance.head(10))

# Visualize
plt.figure(figsize=(10, 6))
top10 = feature_importance.head(10)
plt.barh(range(len(top10)), top10['Coefficient'])
plt.yticks(range(len(top10)), top10['Feature'])
plt.xlabel('Absolute Coefficient')
plt.title('Feature Importance')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

In [None]:
# Save model and results
with open('../logistic_regression.pkl', 'wb') as f:
    pickle.dump(model, f)
print('✓ Model saved to: logistic_regression.pkl')

# Save results
results_df = pd.DataFrame([{
    'Model': 'Logistic Regression',
    'Accuracy': round(accuracy, 4),
    'AUC': round(auc, 4),
    'Precision': round(precision, 4),
    'Recall': round(recall, 4),
    'F1': round(f1, 4),
    'MCC': round(mcc, 4)
}])

results_df.to_csv('../logistic_regression_results.csv', index=False)
print('✓ Results saved')

results_df

## Summary

### Logistic Regression Model Results:

**Performance Metrics:**
- Accuracy: See above
- AUC Score: See above
- F1 Score: See above

**Model Saved:**
- Model file: `logistic_regression.pkl`
- Results: `logistic_regression_results.csv`

**Next Steps:**
- Run other model notebooks
- Compare all results
- Deploy best model