In [1]:
# Import necessary libraries
import pandas as pd  # For data manipulation
from sklearn.model_selection import train_test_split  # To split the dataset into training and testing sets
from sklearn.linear_model import LogisticRegression  # Logistic Regression model
from sklearn.metrics import classification_report, confusion_matrix  # For model evaluation


In [2]:
# Load the dataset
data = pd.read_csv('credit_card_fraud.csv')  # Replace with your CSV file path

# Display the first few rows of the dataset
print(data.head())


   Time   V1   V2   V3  Amount  Class
0     0  1.0  1.0  0.1     100      0
1     1  2.0  1.5  0.4     200      0
2     2  1.5  1.2  0.3     150      0
3     3  1.2  1.8  0.5     300      1
4     4  2.2  1.1  0.2     250      0


In [3]:
# Separate features (X) and target variable (y)
X = data.drop('Class', axis=1)  # Features
y = data['Class']  # Target variable (0 = not fraud, 1 = fraud)


In [4]:
# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shapes of the sets
print(f"Training set shape: {X_train.shape}, Testing set shape: {X_test.shape}")


Training set shape: (8, 5), Testing set shape: (2, 5)


In [5]:
# Initialize the Logistic Regression model
model = LogisticRegression()

# Train the model on the training data
model.fit(X_train, y_train)


LogisticRegression()

In [6]:
# Make predictions on the test data
y_pred = model.predict(X_test)


In [8]:
# Evaluate the model's performance
from sklearn.metrics import confusion_matrix, classification_report

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))  # Confusion matrix

print("\nClassification Report:")
print(classification_report(y_test, y_pred, zero_division=0))  # Handle zero division


Confusion Matrix:
[[1 1]
 [0 0]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.50      0.67         2
           1       0.00      0.00      0.00         0

    accuracy                           0.50         2
   macro avg       0.50      0.25      0.33         2
weighted avg       1.00      0.50      0.67         2



In [9]:
"""
The output of the confusion matrix and classification report you've provided offers insight into the performance of your logistic regression model for credit card fraud detection. Here’s an explanation of what each part means:

### Confusion Matrix
```
Confusion Matrix:
[[1 1]
 [0 0]]
```
- **Matrix Structure**: The confusion matrix is structured as follows:

    \[
    \begin{array}{c|cc}
        & \text{Predicted 0} & \text{Predicted 1} \\
        \hline
        \text{Actual 0} & 1 & 1 \\
        \text{Actual 1} & 0 & 0 \\
    \end{array}
    \]

- **Interpretation**:
  - **True Negatives (TN)**: 1 (one non-fraudulent transaction correctly identified as non-fraudulent).
  - **False Positives (FP)**: 1 (one non-fraudulent transaction incorrectly identified as fraudulent).
  - **False Negatives (FN)**: 0 (no fraudulent transactions were incorrectly identified as non-fraudulent).
  - **True Positives (TP)**: 0 (no fraudulent transactions were correctly identified as fraudulent).

### Classification Report
```
Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.50      0.67         2
           1       0.00      0.00      0.00         0

    accuracy                           0.50         2
   macro avg       0.50      0.25      0.33         2
weighted avg       1.00      0.50      0.67         
```

#### Breakdown of the Metrics:
- **Precision**: 
  - For class 0 (not fraud): 1.00 (indicating that all predictions of non-fraud are correct).
  - For class 1 (fraud): 0.00 (indicating that no predictions of fraud are correct).
  
- **Recall**:
  - For class 0: 0.50 (indicating that 50% of actual non-fraud cases were correctly identified).
  - For class 1: 0.00 (indicating that no actual fraud cases were identified).

- **F1-Score**:
  - For class 0: 0.67 (a balance between precision and recall).
  - For class 1: 0.00 (no balance as there are no true positives).

- **Support**:
  - For class 0: 2 (total instances of non-fraud).
  - For class 1: 0 (no instances of fraud in the test set).

- **Overall Accuracy**: 
  - 0.50 (50% of the total instances were correctly classified).

- **Macro Average**: 
  - Averages the precision, recall, and F1-score for both classes without considering the class imbalance. It gives equal weight to both classes.

- **Weighted Average**:
  - Takes the support (number of actual occurrences) into account, giving more weight to classes with more instances.

### Key Takeaways
1. **Model Performance**:
   - The model is struggling to identify fraudulent transactions (class 1) as indicated by the recall and precision of 0.00.
   - The confusion matrix shows that it incorrectly classified one non-fraudulent transaction as fraudulent.

2. **Class Imbalance**:
   - The dataset may be imbalanced, with significantly more non-fraudulent transactions than fraudulent ones, leading to poor performance in detecting fraud.

3. **Next Steps**:
   - Consider techniques to handle class imbalance, such as:
     - **Resampling**: Oversampling the minority class (fraud) or undersampling the majority class (non-fraud).
     - **Using different algorithms**: Consider algorithms better suited for imbalanced datasets, such as Decision Trees, Random Forests, or Gradient Boosting.
     - **Anomaly Detection Techniques**: Since fraud cases are rare, consider treating this as an anomaly detection problem instead.

This analysis shows that while the model can identify non-fraudulent transactions accurately, it fails to detect any fraudulent ones
""""

SyntaxError: EOL while scanning string literal (<ipython-input-9-88204a428708>, line 81)