### Exercise

### Exercise 1: Identify Suspicious Email Domains
- Find the top 5 most frequent email domains in fraudulent transactions.
- Write a function to flag transactions from less common domains.

- **Topics:** String manipulation, Pandas DataFrames, Aggregation
- **Resources:**
  - [Pandas String Methods](https://pandas.pydata.org/docs/user_guide/text.html)
  - [Regular Expressions in Python](https://docs.python.org/3/library/re.html)
  - [Finding Frequent Elements in Pandas](https://towardsdatascience.com/finding-the-most-frequent-elements-in-a-pandas-dataframe-b29d01fe43cf)

In [1]:
import pandas as pd
import re

##Exercice 1 use the dateset CC_FRAUD.csv

df_1 = pd.read_csv("CC_FRAUD.csv")

##Find the 5 most 5 email domain

df_2 = df_1.groupby("DOMAIN").size().sort_values(ascending=False).head(5)

print(df_2)

##Function to flag the less common domain

def less_common(x):
    if x!="TMA.COM" and x!="XOSOP.COM" and x!="VUHZRNB.COM" and x!="TCN.COM" and x!="NEKSXUK.NET":
        return "high"
    else:
        return "low"

df_1["SECURITY"] = df_1["DOMAIN"].apply(lambda x:less_common(x))

df_1["SECURITY"]

DOMAIN
TMA.COM        16451
XOSOP.COM      15814
VUHZRNB.COM    11544
TCN.COM         4029
NEKSXUK.NET     3918
dtype: int64


0        high
1         low
2         low
3         low
4         low
         ... 
94677     low
94678    high
94679     low
94680     low
94681     low
Name: SECURITY, Length: 94682, dtype: object

### Exercise 2: Regular Expressions for Data Validation
- Validate that email addresses in the dataset are correctly formatted.
- Identify and extract all numeric values appearing in descriptions.

- **Topics:** Regex for validation, extracting numerical data, pattern matching
- **Resources:**
  - [Python Regular Expressions Official Docs](https://docs.python.org/3/library/re.html)
  - [Regex101 - Online Regex Tester](https://regex101.com/) (for testing expressions)
  - [Validating Email Addresses with Regex](https://www.geeksforgeeks.org/check-if-email-address-valid-or-not-in-python/)
  - [Extracting Numbers from Text in Python](https://www.datacamp.com/tutorial/python-regular-expression-tutorial)

In [2]:
import pandas as pd

#Exercice 2 use theses datasets:bank_transactions_data_2 AND fraud_detection_dataset.csv
#Validate the e-mail address

df_3 = pd.read_csv("fraud_detection_dataset.csv")

def isValide(email):
    pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
    #email = "user@example.com"
    return "Valid" if re.match(pattern, email) else "Invalid"

df_3["Email_validation"] = df_3["Customer_Email"].apply(isValide)

df_3[["Email_validation","Customer_Email"]].head()

Unnamed: 0,Email_validation,Customer_Email
0,Valid,amygreen@example.com
1,Valid,nicoleferguson@example.net
2,Valid,fergusonmatthew@example.net
3,Valid,williamsshirley@example.com
4,Valid,bondmitchell@example.org


In [3]:
#Extract all numeric values from account ID

df_4 = pd.read_csv("bank_transactions_data_2.csv")

def isNumeric(ID):
    pattern = r"\d+"
    a = re.findall(pattern,ID)
    return a

df_4["AccountID(only number)"] = df_4["AccountID"].apply(isNumeric)

df_4[["AccountID","AccountID(only number)"]]

Unnamed: 0,AccountID,AccountID(only number)
0,AC00128,[00128]
1,AC00455,[00455]
2,AC00019,[00019]
3,AC00070,[00070]
4,AC00411,[00411]
...,...,...
2507,AC00297,[00297]
2508,AC00322,[00322]
2509,AC00095,[00095]
2510,AC00118,[00118]


### Exercise 3: Optimize the Algorithm
- Improve fraud detection by incorporating past customer transaction history.
- Implement an efficient way to flag repeated transactions within a short period.

- **Topics:** Algorithm optimization, time complexity, transaction analysis
- **Resources:**
  - [Python Performance Optimization](https://realpython.com/python-performance/)
  - [Big-O Notation for Algorithm Complexity](https://www.geeksforgeeks.org/analysis-of-algorithms-big-o-analysis/)
  - [Efficient Transaction Processing Techniques](https://www.kaggle.com/learn/data-cleaning)

### Exercise 4: File Handling and Reporting
- Generate a summary report of fraudulent transactions and save it to a JSON file.
- Create a function that reads the JSON report and prints key insights.

- **Topics:** File I/O, JSON handling, saving structured reports
- **Resources:**
  - [Python File Handling](https://realpython.com/read-write-files-python/)
  - [Working with JSON in Python](https://realpython.com/python-json/)
  - [Generating and Parsing Reports in Pandas](https://towardsdatascience.com/how-to-generate-reports-with-python-and-pandas-166fdfaf0df4)

### **Exercise 5: Improve Fraud Detection using Data Patterns**
- **Topics:** Fraud detection, anomaly detection, historical analysis
- **Resources:**
  - [Introduction to Fraud Detection with Python](https://www.kaggle.com/datasets/ntnu-testimon/paysim1)
  - [Scikit-learn Outlier Detection Techniques](https://scikit-learn.org/stable/modules/outlier_detection.html)
  - [Building a Machine Learning-Based Fraud Detection System](https://towardsdatascience.com/credit-card-fraud-detection-using-machine-learning-726ed4e3b3af)
  - [Anomaly Detection in Pandas](https://towardsdatascience.com/anomaly-detection-in-python-part-1-49b65b0522dc)
