# Anomaly Detection in Traffic Logs

**Key Anomaly Detection Rules:**
- Unusually high or low connection duration.
- Suspicious traffic using non-standard ports.
- Excessive traffic from a single IP address (e.g., potential DoS attack).
- Unexpected protocols (e.g., icmp, udp where tcp is expected).
- Failed connections (conn_state not SF).

## Steps in the Code
**Load the Zeek Logs:**

The zat library reads the conn.log file into a Pandas DataFrame.

**Anomaly Detection Rules:**

- Rule 1: Identifies unusually high or low connection durations.
- Rule 2: Detects non-standard ports (beyond common service ports 0-1024).
- Rule 3: Flags excessive connections originating from the same IP address.
- Rule 4: Filters unexpected protocols.
- Rule 5: Identifies failed connections based on the conn_state column.

**Summarize and Save:**

- The detected anomalies are printed to the console.
- Each anomaly type is optionally saved to a CSV file.

In [10]:
import gzip
from zat.log_to_dataframe import LogToDataFrame
import pandas as pd
import tempfile

# Anomaly detection function
def detect_anomalies(df):
    """
    Detect anomalies in the Zeek conn.log data.
    Rules:
    1. High or low duration connections
    2. Traffic using non-standard ports
    3. Excessive connections from a single IP address
    4. Unexpected protocols
    5. Failed connections (conn_state not 'SF')
    """
    anomalies = {}

    # Rule 1: High/Low Duration Connections
    anomalies['high_low_duration'] = df[
        (df['duration'].dt.total_seconds() > 1000) | (df['duration'].dt.total_seconds() < 0.01)
    ]

    # Rule 2: Traffic using non-standard ports
    anomalies['non_standard_ports'] = df[
        (df['id.orig_p'] > 1024) & (df['id.resp_p'] > 1024)
    ]

    # Rule 3: Excessive connections from a single IP
    ip_counts = df['id.orig_h'].value_counts()
    excessive_ips = ip_counts[ip_counts > 100].index
    anomalies['excessive_traffic'] = df[df['id.orig_h'].isin(excessive_ips)]

    # Rule 4: Suspicious protocols (only allow tcp, udp)
    anomalies['suspicious_protocols'] = df[
        ~df['proto'].isin(['tcp', 'udp'])
    ]

    # Rule 5: Failed connections (conn_state not 'SF')
    anomalies['failed_connections'] = df[
        df['conn_state'] != 'SF'
    ]

    return anomalies

# Load and parse Zeek conn.log.gz file
def load_conn_log(file_path):
    """
    Load conn.log.gz using ZAT library after decompressing it.
    """
    # Decompress gzip file to temporary file
    with gzip.open(file_path, 'rb') as gzipped_file:
        with tempfile.NamedTemporaryFile(mode='wb', delete=False) as temp_file:
            temp_file.write(gzipped_file.read())
            uncompressed_path = temp_file.name

    # Use ZAT to parse Zeek log into a DataFrame
    log_to_df = LogToDataFrame()
    df = log_to_df.create_dataframe(uncompressed_path)

    print("Loaded Zeek conn.log.gz file successfully!")
    # print(df.info())
    return df

# Summarize and save anomalies
def summarize_anomalies(anomalies):
    """
    Summarize detected anomalies and save them to CSV files.
    """
    for rule_name, anomaly_df in anomalies.items():
        print(f"\n[Anomaly Rule: {rule_name}]")
        print(f"Number of anomalies detected: {len(anomaly_df)}")


# Main function
def main():
    # File path to Zeek conn.log.gz file
    file_path = "../zeek/logs/2024-12-16/conn.10:00:00-11:00:00.log.gz"

    # Step 1: Load the data
    df = load_conn_log(file_path).reset_index()

    # Step 2: Detect anomalies
    anomalies = detect_anomalies(df)

    # Step 3: Summarize and save anomalies
    summarize_anomalies(anomalies)

# Run the script
if __name__ == "__main__":
    main()


Loaded Zeek conn.log.gz file successfully!

[Anomaly Rule: high_low_duration]
Number of anomalies detected: 585

[Anomaly Rule: non_standard_ports]
Number of anomalies detected: 1457

[Anomaly Rule: excessive_traffic]
Number of anomalies detected: 1887

[Anomaly Rule: suspicious_protocols]
Number of anomalies detected: 11

[Anomaly Rule: failed_connections]
Number of anomalies detected: 1502
