# CLASSIFICATION MODELS CHEAT SHEET
This notebook contains:
- 7 most common classification models
- How to know when they're ideal for a problem
- The ind of data to use them with
- Use cases/ scenario where they're most suitable
- Their syntaxes (for importing, fitting, testing, evaluating)



1. **Logistic Regression**:
   - How to know it's ideal: Suitable for binary or multi-class classification problems with linear decision boundaries.
   - Data Type: Numerical and categorical data (after encoding).
   - Syntax:
   ```python
   from sklearn.linear_model import LogisticRegression
   model = LogisticRegression()
   model.fit(X_train, y_train)
   y_pred = model.predict(X_test)
   ```
   - Use Cases/Scenarios: Customer churn prediction, email spam detection, medical diagnosis.
   - Best Evaluation Metrics: Accuracy, Precision, Recall, F1-score, Area Under the ROC Curve (AUC-ROC).
   - Metric Syntax and Graphs:
   ```python
   from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, roc_curve, confusion_matrix
   accuracy = accuracy_score(y_test, y_pred)
   precision = precision_score(y_test, y_pred)
   recall = recall_score(y_test, y_pred)
   f1 = f1_score(y_test, y_pred)
   auc_roc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1])
   fpr, tpr, thresholds = roc_curve(y_test, model.predict_proba(X_test)[:, 1])
   cm = confusion_matrix(y_test, y_pred)
   # Plot ROC Curve
   import matplotlib.pyplot as plt
   plt.plot(fpr, tpr)
   plt.xlabel('False Positive Rate')
   plt.ylabel('True Positive Rate')
   plt.title('ROC Curve')
   plt.show()
   ```

2. **K-Nearest Neighbors (KNN)**:
   - How to know it's ideal: Suitable for problems with a well-defined distance metric and non-linear decision boundaries.
   - Data Type: Numerical data.
   - Syntax:
   ```python
   from sklearn.neighbors import KNeighborsClassifier
   model = KNeighborsClassifier(n_neighbors=k)
   model.fit(X_train, y_train)
   y_pred = model.predict(X_test)
   ```
   - Use Cases/Scenarios: Image classification, text classification, recommendation systems.
   - Best Evaluation Metrics: Accuracy, Precision, Recall, F1-score, AUC-ROC.
   - Metric Syntax and Graphs: Same as Logistic Regression.

3. **Decision Trees**:
   - How to know it's ideal: Suitable for problems with complex decision boundaries and interpretable results.
   - Data Type: Numerical and categorical data (after encoding).
   - Syntax:
   ```python
   from sklearn.tree import DecisionTreeClassifier
   model = DecisionTreeClassifier()
   model.fit(X_train, y_train)
   y_pred = model.predict(X_test)
   ```
   - Use Cases/Scenarios: Medical diagnosis, credit risk analysis, customer segmentation.
   - Best Evaluation Metrics: Accuracy, Precision, Recall, F1-score, AUC-ROC.
   - Metric Syntax and Graphs: Same as Logistic Regression.

4. **Random Forest**:
   - How to know it's ideal: Suitable for high-dimensional data and problems with complex decision boundaries. Provides better generalization and reduces overfitting compared to decision trees.
   - Data Type: Numerical and categorical data (after encoding).
   - Syntax:
   ```python
   from sklearn.ensemble import RandomForestClassifier
   model = RandomForestClassifier(n_estimators=num_estimators)
   model.fit(X_train, y_train)
   y_pred = model.predict(X_test)
   ```
   - Use Cases/Scenarios: Predicting customer churn, image recognition, anomaly detection.
   - Best Evaluation Metrics: Accuracy, Precision, Recall, F1-score, AUC-ROC.
   - Metric Syntax and Graphs: Same as Logistic Regression.

5. **Support Vector Machines (SVM)**:
   - How to know it's ideal: Suitable for binary and multi-class classification problems with a clear margin of separation between classes.
   - Data Type: Numerical data.
   - Syntax:
   ```python
   from sklearn.svm import SVC
   model = SVC(kernel='linear')
   model.fit(X_train, y_train)
   y_pred = model.predict(X_test)
   ```
   - Use Cases/Scenarios: Text categorization, hand-written digit recognition.
   - Best Evaluation Metrics: Accuracy, Precision, Recall, F1-score, AUC-ROC.
   - Metric Syntax and Graphs: Same as Logistic Regression.

6. **Naive Bayes**:
   - How to know it's ideal: Suitable for text classification tasks and when the "naive" assumption of feature independence holds.
   - Data Type: Textual or categorical data (after converting to numerical representations such as BoW, TF-IDF, or word embeddings).
   - Syntax:
   ```python
   from sklearn.naive_bayes import MultinomialNB
   model = MultinomialNB()
   model.fit(X_train, y_train)
   y_pred = model.predict(X_test)
   ```
   - Use Cases/Scenarios: Email spam detection, sentiment analysis, text classification.
   - Best Evaluation Metrics: Accuracy, Precision, Recall, F1-score, AUC-ROC.
   - Metric Syntax and Graphs: Same as Logistic Regression.

7. **Gradient Boosting Machines (GBM)**:
   - How to know it's ideal: Suitable for binary and multi-class classification problems, particularly when high accuracy is desired.
   - Data Type: Numerical and categorical data (after encoding).
   - Syntax:
   ```python
   from sklearn.ensemble import GradientBoostingClassifier
   model = GradientBoostingClassifier(n_estimators=num_estimators)
   model.fit(X_train, y_train)
   y_pred = model.predict(X_test)
   ```
   - Use Cases/Scenarios: Click-through rate prediction, customer churn prediction.
   - Best Evaluation Metrics: Accuracy, Precision, Recall, F1-score, AUC-ROC.
   - Metric Syntax and Graphs: Same as Logistic Regression.
