# Project 1 — Wine Quality (Red + White)
## Supervised Learning

**Task:** Multiclass classification — quality (discrete rating/class)  
**Metrics:** Macro-F1 and Accuracy (minimum). Include confusion matrix and per-class performance discussion.  
**Workflow:** EDA → Hypotheses → Train/tune (DT, kNN, SVM, NN-sklearn, NN-PyTorch) → Interpretation → Conclusion

---
## 1. Setup and data loading

In [None]:
import sys
from config import RANDOM_SEED, DATA_PATH, TARGET_COLUMN
from utils import set_seed
from data_loading import load_wine, get_target_and_features

set_seed()
# Load and inspect
df = load_wine()
X, y = get_target_and_features(df)
print(df.shape)
print(y.value_counts())
print("\nFirst few rows:")
print(df.head())

---
## 2. Exploratory Data Analysis (EDA)

Class distribution, basic stats, plots. Ground for hypotheses.

In [None]:
from eda import run_eda

# Run full EDA
eda_results = run_eda(df, save_figures=True, save_results_to_file=True)

---
## 3. Preprocessing

TODO: Apply preprocessing based on EDA findings.

In [None]:
from preprocessing import get_dataset

# Get preprocessed train/test splits
X_train, y_train, X_test, y_test = get_dataset()

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"Target distribution (train):\n{y_train.value_counts()}")
print(f"Target distribution (test):\n{y_test.value_counts()}")

---
## 4. Hypotheses

TODO: Develop and state hypotheses based on EDA findings.

---
## 5. Decision Trees (DT)

TODO: Implement DT experiments.

In [None]:
# TODO: Implement DT experiments
# from models_dt import run_dt_model_complexity, run_dt_learning_curves, run_dt_test_eval
# 
# # Step 1: Model complexity
# dt_results = run_dt_model_complexity(X_train, y_train, X_test, y_test)
# 
# # Step 2: Learning curves
# run_dt_learning_curves(X_train, y_train, X_test, y_test, best_config=dt_results['best_config'])
# 
# # Step 3: Test evaluation
# run_dt_test_eval(X_train, y_train, X_test, y_test, best_config=dt_results['best_config'])

---
## 6. k-Nearest Neighbors (kNN)

TODO: Implement kNN experiments.

In [None]:
# TODO: Implement kNN experiments
# from models_knn import run_knn_step2, run_knn_learning_curves, run_knn_test_eval
# 
# # Step 1: Model complexity
# knn_results = run_knn_step2(X_train, y_train, X_test, y_test)
# 
# # Step 2: Learning curves
# run_knn_learning_curves(X_train, y_train, X_test, y_test, best_config=knn_results['best_k_weights_metric'])
# 
# # Step 3: Test evaluation
# run_knn_test_eval(X_train, y_train, X_test, y_test, best_config=knn_results['best_k_weights_metric'])

---
## 7. Support Vector Machines (SVM)

TODO: Implement SVM experiments.

In [None]:
# TODO: Implement SVM experiments
# from models_svm import run_svm_model_complexity, run_svm_learning_curves, run_svm_test_eval
# 
# # Step 1: Model complexity
# svm_results = run_svm_model_complexity(X_train, y_train, X_test, y_test)
# 
# # Step 2: Learning curves
# run_svm_learning_curves(X_train, y_train, X_test, y_test, best_config=svm_results['best_config'])
# 
# # Step 3: Test evaluation
# run_svm_test_eval(X_train, y_train, X_test, y_test, best_config=svm_results['best_config'])

---
## 8. Neural Networks — sklearn (MLPClassifier)

TODO: Implement sklearn NN experiments.

In [None]:
# TODO: Implement sklearn NN experiments
# from models_nn_sklearn import run_nn_model_complexity, run_nn_learning_curves, run_nn_test_eval
# 
# # Step 1: Model complexity
# nn_sklearn_results = run_nn_model_complexity(X_train, y_train, X_test, y_test)
# 
# # Step 2: Learning curves
# run_nn_learning_curves(X_train, y_train, X_test, y_test, best_config=nn_sklearn_results['best_config'])
# 
# # Step 3: Test evaluation
# run_nn_test_eval(X_train, y_train, X_test, y_test, best_config=nn_sklearn_results['best_config'])

---
## 9. Neural Networks — PyTorch

TODO: Implement PyTorch NN experiments.

In [None]:
# TODO: Implement PyTorch NN experiments
# from models_nn_pytorch import run_nn_model_complexity, run_nn_learning_curves, run_nn_test_eval
# 
# # Step 1: Model complexity
# nn_pytorch_results = run_nn_model_complexity(X_train, y_train, X_test, y_test)
# 
# # Step 2: Learning curves
# run_nn_learning_curves(X_train, y_train, X_test, y_test, best_config=nn_pytorch_results['best_config'])
# 
# # Step 3: Test evaluation
# run_nn_test_eval(X_train, y_train, X_test, y_test, best_config=nn_pytorch_results['best_config'])

---
## 10. Cross-Model Comparison & Conclusions

TODO: Compare all models, revisit hypotheses, discuss findings.

TODO: Write comprehensive analysis comparing all models, evaluating hypotheses, and discussing conclusions.