ConfusionMatrix visualisieren. Balanced und unbalanced variablen definieren um confusionmatrix zu visualisieren. 

In [None]:
import os
import glob
import cv2 as cv
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib


In [7]:
def extract_features(image_path):
    img = cv.imread(image_path, cv.IMREAD_GRAYSCALE)
    if img is None:
        print(f"⚠️ Bild konnte nicht geladen werden: {image_path}")
        return None
    img = cv.resize(img, (100, 30))
    blur = cv.GaussianBlur(img, (5, 5), 0)
    edges = cv.Canny(blur, 30, 100)
    hist = cv.calcHist([edges], [0], None, [16], [0, 256])
    return hist.flatten()


In [8]:
features = []
labels = []

base_path = "dataset"
ok_folder = "ok"
defect_folders = ["1_leer", "5_ObjektInSpritze", "6_kleineBlase", "7_grosseBlase", "8_Stoerung"]

# OK
ok_path = os.path.join(base_path, ok_folder)
for filepath in glob.glob(os.path.join(ok_path, "*.jpg")):
    f = extract_features(filepath)
    if f is not None:
        features.append(f)
        labels.append("ok")

# DEFECT
for folder in defect_folders:
    folder_path = os.path.join(base_path, folder)
    for filepath in glob.glob(os.path.join(folder_path, "*.jpg")):
        f = extract_features(filepath)
        if f is not None:
            features.append(f)
            labels.append("defect")

print(f"{len(features)} Bilder verarbeitet. {labels.count('ok')} OK / {labels.count('defect')} DEFECT")


4305 Bilder verarbeitet. 789 OK / 3516 DEFECT


## 🧪 Modellvergleich: Standard vs. gewichtetes Klassifikationsmodell

Um die Klassifikationsleistung zu verbessern, haben wir verschiedene Modellkonfigurationen getestet.

### 🔹 1. Standardmodell ohne Gewichtung

```python
clf = RandomForestClassifier(n_estimators=100, random_state=42)


This model was trained without considering the class imbalance. As a result:

- Very high recall for the `defect` class: 0.99
- Very low recall for the `ok` class: 0.14
- F1-score for `ok` was only 0.24

**Conclusion:** The model favors the dominant class `defect` and neglects the weaker class `ok`.


In [None]:
# X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)

# clf = RandomForestClassifier(n_estimators=100, random_state=42)
# clf.fit(X_train, y_train)

# y_pred = clf.predict(X_test)
# print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

      defect       0.82      0.99      0.90       692
          ok       0.71      0.14      0.24       169

    accuracy                           0.82       861
   macro avg       0.77      0.56      0.57       861
weighted avg       0.80      0.82      0.77       861



Afterwards, we trained the model using `class_weight="balanced"` to compensate for the class imbalance. This improved the model's ability to recognize the minority class:

- Recall for the `defect` class decreased to 0.63
- Recall for the `ok` class increased significantly to 0.72
- F1-score for `ok` improved to 0.45

**Conclusion:** This configuration achieves a more balanced performance. It is better suited for practical quality control, where both false negatives and false positives need to be minimized.


In [12]:
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)

clf = RandomForestClassifier(n_estimators=100, class_weight="balanced", random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

      defect       0.90      0.63      0.74       692
          ok       0.32      0.72      0.45       169

    accuracy                           0.65       861
   macro avg       0.61      0.68      0.59       861
weighted avg       0.79      0.65      0.68       861



Visualisierung

In [10]:
joblib.dump(clf, "syringe_model.pkl")
print("✅ Modell gespeichert unter 'syringe_model.pkl'")


✅ Modell gespeichert unter 'syringe_model.pkl'
