מגישים:
ירדן בן טל, ת.ז: 308057785,
דניאל כהן, ת.ז: 211377932,
יוסף כהן, ת.ז: 208259002

In [132]:
import numpy as np
import time
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, confusion_matrix, log_loss,accuracy_score
from sklearn.model_selection import train_test_split

In [133]:
def testmymodel(model, X_data, Y_data):
    predictions = predict_with_model(model, X_data)

    return accuracy_score(Y_data, predictions)


In [134]:
def predict_with_model(model, X_data):
    if isinstance(model, list):
        probs = np.array([clf.predict_proba(X_data)[:, 1] for clf in model])
        return np.argmax(probs, axis=0)
    return model.predict(X_data)

In [135]:
X = np.load("cifar10_features.npy")
Y = np.load("cifar10_labels.npy")

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

In [136]:
# OvR Logistic Regression
start_time_ovr = time.time()
model_ovr = []
classes = list(set(y_train))

for target_class in classes:
    # Binary labels: 1 for current class, 0 for others
    binary_labels = (y_train == target_class).astype(int)

    clf = LogisticRegression(solver='lbfgs', max_iter=50000)
    clf.fit(X_train, binary_labels)
    model_ovr.append(clf)

end_time_ovr = time.time()

# Softmax Logistic Regression
start_time_softmax = time.time()
model_softmax = LogisticRegression(solver='newton-cg', max_iter=50000)
model_softmax.fit(X_train, y_train)
end_time_softmax = time.time()


ovr_success = testmymodel(model_ovr, X_test, y_test)
softmax_success = testmymodel(model_softmax, X_test, y_test)
print(f"OvR success rate: {ovr_success*100:.2f}%, OvR time: {end_time_ovr - start_time_ovr}")
print(f"softmax success rate: {softmax_success*100:.2f}%, softmax time: {end_time_softmax - start_time_softmax}")

OvR success rate: 96.10%, OvR time: 0.6329994201660156
softmax success rate: 96.29%, softmax time: 1.158998966217041


### Results – Accuracy & Training Time
| Model | Accuracy   | Training Time(s) |
|-------|------------|------------------|
| OvR   | **96.10%** | **0.6329**       |
| Softmax | **96.28%** | **1.1589**       |

#### Softmax attains a slightly higher accuracy (~0.0018 percentage‑points) but requires ~75% more training time.


In [137]:
y_ovr_predict = predict_with_model(model_ovr, X_test)
y_softmax_predict = predict_with_model(model_softmax, X_test)
probs = np.array([model.predict_proba(X_test)[:, 1] for model in model_ovr])
ovr_predict_proba = probs.T
ovr_predict_proba /= ovr_predict_proba.sum(axis=1, keepdims=True)
ovr_loss = log_loss(y_test, ovr_predict_proba)

softmax_predict_proba = model_softmax.predict_proba(X_test)
softmax_loss = log_loss(y_test, softmax_predict_proba)
print(f"OvR_loss: {ovr_loss:.4f}, softmax_loss: {softmax_loss:.4f}")

OvR_loss: 0.1346, softmax_loss: 0.1077


### Cross‑Entropy (Log‑Loss)
| Model | Log‑Loss   |
|-------|------------|
| OvR   | **0.1346** |
| Softmax | **0.1077** |

#### A lower log‑loss indicates better calibrated probability estimates hence Softmax provides more reliable confidence scores.


In [138]:
f1_mean_ovr = f1_score(y_test, y_ovr_predict, average='macro')
f1_mean_softmax = f1_score(y_test, y_softmax_predict, average='macro')
print(f"f1_mean_ovr: {f1_mean_ovr:.4f}")
print(f"f1_mean_softmax: {f1_mean_softmax:.4f}")

f1_mean_ovr: 0.9612
f1_mean_softmax: 0.9631


### F1‑Macro
| Model | F1‑Macro   |
|-------|------------|
| OvR   | **0.9612** |
| Softmax | **0.9631** |

#### The difference is small, but statistically relevant in a balanced multi-class task

In [139]:
conf_mat = confusion_matrix(y_test,y_ovr_predict)
print(conf_mat)

[[1402    2   13    5    5    3    2   10   17    5]
 [   2 1464    4    1    0    1    0    1    5    7]
 [  11    1 1370   15   16   15    5    3    1    3]
 [  14    5   15 1450   12   51    5   10    3    4]
 [   6    0   13   15 1466    5    4    9    0    1]
 [   2    3   13   45   12 1437    8   11    2    1]
 [   5    5    9   17    4    6 1415    0    1    1]
 [   3    0    4   17    9    7    0 1456    0    1]
 [  13    2    1    5    1    0    2    0 1480    6]
 [  12    7    5    6    0    4    2    3    5 1475]]


### Confusion Matrix (OvR)
*Key misclassifications*
- **Class 3 → 5**: 45 samples
- **Class 5 → 3**: 51 samples

#### These symmetric errors suggest classes 3 and 5 share similar feature representations.


In [140]:
mask = (y_train == 3) | (y_train == 5)
X_pair = X_train[mask]
Y_pair = y_train[mask]
binary_model = LogisticRegression(solver='lbfgs', max_iter=50000)
binary_model.fit(X_pair, Y_pair)

refined_predictions = []

for x_sample, y_pred in zip(X_test, y_ovr_predict):
    if y_pred in [3, 5]:
        # Use binary model only for confusing classes
        refined_pred = binary_model.predict([x_sample])[0]
        refined_predictions.append(refined_pred)
    else:
        # Keep original prediction
        refined_predictions.append(y_pred)

# Evaluate
acc = accuracy_score(y_test, refined_predictions)
f1 = f1_score(y_test, refined_predictions, average='macro')

print(f"Refined Accuracy: {acc*100:.2f}%")
print(f"Refined F1-Mean: {f1:.4f}")

Refined Accuracy: 96.09%
Refined F1-Mean: 0.9611


### Binary Refinement (Classes 3 vs 5)
| Metric | Before Refinement | After Refinement |
|--------|-------------------|------------------|
| Accuracy | 96.10%            | 96.09%           |
| F1‑Macro | 0.9612            | 0.9611           |

#### The specialised binary classifier reduced the 3 ↔ 5 confusion marginally, but that gain was offset by new errors it introduced, leaving overall accuracy and F1‑macro **slightly lower**. This suggests:

#### 1. **Sample‑volume effect** – only ~0.6% of the test set involves classes 3 ↔ 5 misclassification correcting them cannot materially shift the macro metrics.
#### 2. **Model overlap** – the binary classifier learns essentially the same decision boundary already captured (imperfectly) by the original OvR models.
#### 3. **Over‑fitting risk** – a classifier trained on the restricted subset may lack generalisation and occasionally override correct OvR decisions.

#### Given the negligible improvement and added complexity, the original OvR ensemble remains the preferable solution for this task.



### 📊 Final Performance Comparison

| **Metric**             | **OvR**     | **Softmax** | **Difference** |
|------------------------|-------------|-------------|----------------|
| **Accuracy**           | 0.9610      | 0.9628      | 0.0018         |
| **F1 Score (Macro)**   | 0.9612      | 0.9630      | 0.0018         |
| **Log-Loss**           | 0.1345      | 0.1076      | 0.0269         |
| **Training Time (sec)**| 0.6329      | 1.1589      | 0.5260         |

---

### 🧾 Final Conclusion

#### - Both models achieve **high and nearly identical performance** across all metrics (Accuracy and F1 Score differ by less than 0.2%).
#### - **Softmax** consistently provides:
   - #### **Slightly better classification quality** (higher F1, lower log-loss),
   - #### But at the cost of **~2× slower training time**.
#### - **OvR (One-vs-Rest)** is a strong, faster alternative with comparable results, making it preferable in runtime-sensitive scenarios.
#### - The refinement experiment (targeting confusion between classes 3 & 5) yielded **no measurable improvement**, reinforcing that the original OvR model is already well-optimized.

---

#### ✅ Recommendation

#### - Use **Softmax** if:
   - #### You prioritize slightly higher prediction confidence and probability calibration.
   - #### Training time is not a constraint.

#### - Use **OvR** if:
   - #### You prefer faster training and interpretable binary classifiers.
   - #### The small drop in performance is acceptable for your application.