# Fraud detection

**Process:**
- The dataset seems to contains ticket with code injection and text with html tags
- Solution: Build filter for such cases as 2 additional features

In [1]:
import pandas as pd
df = pd.read_csv("../customer_support_tickets_processed.csv")

In [2]:
import re

def is_contains_html(text):
    html_tag_pattern = re.compile(r"<[^>]+>")
    return bool(html_tag_pattern.search(text))

def is_contains_code_injection(text):
    injection_keywords = [
    # JS functions / payloads
    'eval(', 'alert(', 'prompt(', 'console.log', 'Function(', 'setTimeout(', 'setInterval(',

    # SQL injection patterns
    "' OR '1'='1", "' OR 1=1 --", "'; DROP TABLE", "'; EXEC", 
    'UNION SELECT', 'SELECT * FROM', 'INSERT INTO', 'DROP TABLE', 'UPDATE SET', 
    'DELETE FROM', 'xp_cmdshell', 'sp_executesql',

    # Encoded variants (URL or base64)
    '%3Cscript%3E', 'base64,', 'data:text/html',

    # Suspicious paths
    '/etc/passwd', '../', '..\\', '%00',
    ]
    
    return any(kw.lower() in text.lower() for kw in injection_keywords)

df['has_html'] = df['Ticket Description'].apply(is_contains_html)
df['has_code_injection'] = df['Ticket Description'].apply(is_contains_code_injection)


In [3]:
print(f"There are {len(df[df['has_html']])} tickets contains html tags")
print("Some examples:\n")
for d in df[df['has_html']].head(5)["Ticket Description"]:
    print(d)
    print("-" * 80)

There are 199 tickets contains html tags
Some examples:

I'm having an issue with the {product_purchased}. Please assist.

<p>

A full time customer service representative will arrive soon.

<p>

If you would like to respond to a message I've tried different settings and configurations on my {product_purchased}, but the issue persists.
--------------------------------------------------------------------------------
I'm having an issue with the {product_purchased}. Please assist. </name> <product_purchased_url >http://www.kyle@junebug.com/tutorial/cure-all- I've tried troubleshooting steps mentioned in the user manual, but the issue persists.
--------------------------------------------------------------------------------
I'm having an issue with the {product_purchased}. Please assist. <s3> Please provide the product name, location and shipping address in the Product Overview. <s3> This message will be unread for 12 seconds. I've recently updated the firmware of my {product_purchased}, 

In [4]:
print(f"There are {len(df[df['has_code_injection']])} tickets contains code injections")
print("Some examples:\n")
for d in df[df['has_code_injection']].head(5)["Ticket Description"]:
    print(d)
    print("-" * 80)

There are 5 tickets contains code injections
Some examples:

I'm having an issue with the {product_purchased}. Please assist. <script src="../libs/products/touches/touches.js"></script>

If we make our "instructions" I've noticed that the issue occurs consistently when I use a specific feature or application on my {product_purchased}.
--------------------------------------------------------------------------------
I'm having an issue with the {product_purchased}. Please assist. <script type="text/javascript">(function() { var c1 = document.getElementById('Product')[1].parentNode; c1. The issue I'm facing is intermittent. Sometimes it works fine, but other times it acts up unexpectedly.
--------------------------------------------------------------------------------
I'm having an issue with the {product_purchased}. Please assist.

You will need to enable JavaScript.

$tw.ready(function() { console.log('Your purchase exceeded the {product_purchased I'm not sure if this issue is specific 

For complaint, we can use sentiment analyzer to select ticket desctiption with negative label and high score as described before

In [5]:
import pandas as pd
df = pd.read_csv("../customer_support_tickets_processed.csv")