# **Task 1: Colecting security data**

Billy has determined this is not an area worth building any tools yet. He’s going to install a linux distro on one of his allocated systems. It comes with rsyslog:

- Configuration is incredibly simple
- Except for his Windows servers, all his security data can be forwarded via syslog
- Free
- Problem Solved.

This can simply consist of uncommenting the following lines in `/etc/rsyslog.conf` on almost every linux system:

```
$ModLoad imudp
$UDPServerRun 514
```

Billy chose a pretty simple configuration. He recommends reviwing the rsylog docs if doing this yourself

<div class="alert alert-block alert-info">
<b>Note on Windows events:</b> 
    
Billy knows this will be an issue in the future. And has some plans in mind to solve it. But he’s ignoring those for now. Because I only have 50 minutes to tell you his story.

- Windows events are not stored in plain text, so take a bit more effort to access
- There are 3rd party/commercial agents that can forward syslog. He might look into this. It’s the easiest solution.
-If he decides he can’t afford those he can always use a python library called “pywin” to remotely grab the events from his newly built syslog server
- For now, all security data is collected by a Windows Event Collector
- Priority is on Firewall and VPN data
</div>

# **Task 2: Parsing and storing data**

- Each log currently has its own format
- He needs to have a simple way to correlate and create rules to analyze the data
- Probably wants some indexes to make querying and rules simpler

#### The next block of code is just doing some simple file/directory prep before processing the logs and parsing them

In [1]:
# Billy has hard-coded his base and log directories using a relative path. 
# Making this more flexible would be a good future improvement

import os

base_dir = os.path.abspath(os.path.dirname('.'))
log_dir = os.path.join(base_dir, 'logs')
json_dir = os.path.join(base_dir, 'json')

# Billy's initial focus is just vpn and firewall logs
log_files = ['openvpnas.log', 'firewall.log']

log_paths = [os.path.join(log_dir, f) for f in log_files]

# If the director we will use as a pseudo database doesn't exist create it
# This should be an area Billy focuses on improving in the future to ensure this script works in other directories
if not os.path.isdir(json_dir):
    os.mkdir(json_dir)

#### Billy needs to define the parsing functions for re-use. Doing t his outside a function would get quite messy

In [2]:
import re
import datetime

from dateutil import parser

def parse_log(log_message):
    """
    Takes the syslog header from any type of syslog messages and builds the intial json used for all parsers
    and then attemps to parse further based on the application type
    
    This will require future updates to ensure application types are added, could be made easier with OOP
    """
    
    dt, log_source, log_app, message_body = log_message.split(' ', 3)
    log_app_map = {'sshd': 'ssh', 'openvpnas': 'vpn', 'kernel': 'firewall'}

    for app_name in log_app_map:
        if log_app.startswith(app_name):
            app = log_app_map[app_name]
            break
        else:
            app = log_app

    log_time = parser.parse(dt)
    timestamp = datetime.datetime.timestamp(log_time)
    
    parsed_syslog = {'log_source_time': timestamp, 'log_source': log_source, 'app': app, 'message': message_body}
    
    parse_funcs = {'firewall': parse_fw, 'vpn': parse_ovpn}
    
    # Try to parse the additional fields based on application
    if app in parse_funcs:
        try:
            parse_func = parse_funcs[app]
            parsed_message = parse_func(message_body)
            for k,v in parsed_message.items():
                parsed_syslog[k] = v
        except (AttributeError, KeyError):
            pass

    return parsed_syslog


def parse_fw(fw_message):
    """
    Parses IP tables firewalls (or any space seperated key=value formatted event message)
    
    This method could be a good one for re-use in the future since it isn't so locked to a one off format.
    """
    
    msg_dict = {'additional_data': []}
    for pair in fw_message.split(' '):
        try:
            k, v = pair.split('=', 1)
            msg_dict[k] = v
        except ValueError:
            msg_dict['additional_data'].append(pair)
    return msg_dict


def parse_ovpn(ovpn_message):
    """
    Parses Openvpn formatted log messages. This is a very specific and complicated format. Unlikely to be re-used
    and if Billy wants more than just authenticated users and remote IP's he'll need to make adjustments
    """
    
    # For now let's just get the IP assignment, that is post success and cleans things up a bit. 
    # This can be expanded later
    if 'primary virtual IP' not in ovpn_message:
        return None

    # If you aren't familiar with regex. Get familiar, you'll use it a lot in these types of tools
    user_pattern = re.compile('(\w+)/((\d{1,3}\.){3}\d{1,3})')
    date_pattern = re.compile('(\w{3}\s+\w{3}\s+\d{1,2}\s+(\d{1,2}:){2}\d{1,2}\s+\d{4})\s')

    user_ip = user_pattern.search(ovpn_message)
    if user_ip:
        user = user_ip.group(1)
        ip_addr = user_ip.group(2)
        date_search = date_pattern.search(ovpn_message)
        event_time = parser.parse(date_search.group(1))
        event_timestamp = datetime.datetime.timestamp(event_time)

        return {'remote_user': user, 'remote_ip': ip_addr, 'event_time': event_timestamp}

#### This is Billy's main formating/parsing prep block, calling the `parse_log` method and converting all logs to json

In [3]:
import hashlib
import json

events = []

for log_file in log_paths:
    
    with open(log_file, 'r') as lf:
        # Billy doesn't want massive json files to read. So he's going to lock each file down to 5000 events
        # This should keep files roughly under 5MBs (based on current logs) and make them easier to read
        count = 0
        event_block = {}
        
        for line in lf:
            parsed_log = parse_log(line)
            
            # Billy creates an MD5 value from each log to ensure they have a unique ID
            # He could have just incremented by one, but that added complexities and maybe even race conditions
            log_hash = hashlib.md5()
            log_hash.update(json.dumps(parsed_log).encode('utf-8'))
            log_id = log_hash.hexdigest()
            
            event_block[log_id] = parsed_log
            count += 1
            
            # This is where Billy does his count check.
            # He could also do a slightly more complicated size check if he needed to be exact in size
            if count == 5000:
                events.append(event_block)
                event_block = {}
                count = 0
        
        # For that last block that won't get to 5000 events
        events.append(event_block)

#### This is the last block for this task. Here, Billy is simply generating a unique file name for each event block and writing a json file

In [4]:
for event_block in events:
    # Billy is genearting a datetime based filename down to the microsecond.
    # This ensure even with very fast processing each file will almost surely have a unique filename.
    # He could also do a hashing method, or just incremental names
    
    dt = datetime.datetime.now()
    seconds = (dt.hour + dt.minute) * 60 + dt.second
    file_name = f"{dt.strftime('%Y%m%d')}_{seconds}.{dt.microsecond}"
    file_path = os.path.join(json_dir, file_name)
    
    # Billy convert's the python dictionary to json and write it to the file
    with open(file_path, 'w') as json_file:
        json.dump(event_block, json_file)

# Task 3: Building correlation rules

The SIEM will have a lot of correlation rules out of the box. But Billy already knows a few simple ways to identify user account compromises. He just needs to automate them:

- Users often use a VPN to work remotely. 
- This is a simple way to identify possible account compromises
- If a user logs into the VPN from two different countries within a short period of time, that’s probably bad
- Most SIEMs will include this rule by default. But Billy thinks he can write the same logic fairly easily


#### Billy needs a VPN index to make this rule a bit faster. He's going to create a user, remote ip, country code and timestamp index

This first block will be some house keeping to isolate VPN events for faster processing

In [5]:
vpn_events = {}

# This is where storing events in blocks comes in handy. 
# Processing 100s of MBs or more of data would be very resource intensive
for json_file in os.listdir(json_dir):
    fp = os.path.join(json_dir, json_file)
    with open(fp, 'r') as jf:
        events = json.load(jf)
    
    # reducing the data to just VPN events will make indexing and correlation faster
    for event_id, event in events.items():
        if event['app'] == 'vpn':
            vpn_events[event_id] = event  

#### Now Billy needs to start actually building the indexes

This will involve making an API call to enrich each IP with it's geolocation

In [6]:
# This is the only non standard libary module Billy needs for this project
import requests


def get_ip_country(ip_addr):
    """
    Takes in an IP address and queries it against ip-api.com a geo lookup resource with a free tree.
    """
    url = f'http://ip-api.com/json/{ip_addr}'
    r = requests.get(url)
    
    if r.status_code == 200:
        ip_data = r.json()
        return ip_data['countryCode']
    else:
        return None

#### Now he's use that method and some simple dictionaries to build out a set of indexes. He could right those to a file for future use, but for now he's only working with them in memory

In [7]:
import time
    
user_index = {}
ip_index = {}
cc_index = {}
time_index = {}


# Billy will be using a free API with rate limiting to get geo data. 
# This makes it important to track his call rate to ensure he doesn't get errors or not data
start_time = time.time()
call_count = 0

for event_id, event in vpn_events.items():
    user = event.get('remote_user')
    r_ip = event.get('remote_ip')
    
    if user:
        if not user_index.get(user):
            user_index[user] = [event_id]
        else:
            user_index[user].append(event_id)
        
        
    if r_ip:
        if not cc_index.get(r_ip):
            # checking the current rate
            if call_count == 145 and time.time() - start_time <= 60:
                time.sleep(60 - (time.time() - start_time))
                start_time = time.time()
                call_count = 0
            country = get_ip_country(r_ip)
            if country:
                cc_index[r_ip] = country
            call_count += 1
        
        if not ip_index.get(r_ip):
            ip_index[r_ip] = [event_id]
        else:
            ip_index[r_ip].append(event_id)
    
    if r_ip or user:
        time_index[event_id] = event['event_time']

### And now for the actual correlation piece. 

Don't be intimated if this feels like a lot of code for one thing. Billy is laying the foundation and almost everything here can be modified just slightly to be reusable by lots and lots of logs and rules.

This is just his first draft

In [8]:
from collections import Counter

rule_matches = []

# This is a huge area that Billy can clean up in the future.
# Creating some functions from this will make it more readable, repeat less code
# and make it more reusable in other code.
for user in user_index:
    matched_events = []
    
    for event in user_index[user]:
        event_time = time_index.get(event)
        
        for r_ip in ip_index:
            if event in ip_index[r_ip]:
                remote_ip = r_ip
                country = cc_index[r_ip]
                
    for c_event in user_index[user]:
        if c_event != event:
            c_event_time = time_index.get(c_event)
            
            for r_ip in ip_index:
                if c_event in ip_index[r_ip]:
                    c_remote_ip = r_ip
                    c_country = cc_index[r_ip]
            
            if country != c_country:
                if abs(c_event_time - event_time) <= 3600:
                    if event not in matched_events:
                        matched_events.append(event)
                    if c_event not in matched_events:
                        matched_events.append(c_event)
    
    if len(matched_events) > 1 and len(rule_matches) > 0:
        for match in rule_matches:
            if Counter(match) != Counter(matched_events):
                matches.append(matched_events)
    elif len(matched_events) > 1:
        rule_matches.append(matched_events)

#### And finally billy wrote some code to help give him the important information about each suspicious event

This is another area that Billy surely will be able to update to a function later. But for now he's just sticking to a simple code block

In [9]:
headers = 'time | user | country code | remote ip'
print(headers)
print ('-' * len(headers))

for match in rule_matches:
    for event_id in match:
        for f in os.listdir(json_dir):
            fp = os.path.join(json_dir, f)
            with open(fp, 'r') as infile:
                events = json.load(infile)

            if event_id in events:
                full_event = events[event_id]
                break

        event_time = datetime.datetime.fromtimestamp(full_event['event_time']).strftime('%Y-%m-%d %H:%M:%S')
        user = full_event['remote_user']
        ip = full_event['remote_ip']
        country_code = cc_index[ip]
        
        row_string = ' | '.join([event_time, user, ip, country_code])
        print(row_string)

time | user | country code | remote ip
--------------------------------------
2019-05-09 22:39:31 | smiley | 112.251.21.161 | CN
2019-05-09 21:57:04 | smiley | 76.210.33.168 | US
2019-05-09 21:59:31 | smiley | 76.210.33.168 | US
2019-05-09 22:25:36 | smiley | 94.176.148.227 | RO
2019-05-09 22:33:18 | smiley | 178.128.229.53 | CA
2019-05-09 22:35:52 | smiley | 71.85.118.117 | US


# **Task 4: Enrich our events**

Billy now can fairly quickly build rules and work with his event data. But he only has internal data, there’s a lot more info out there

- Open source and commercial threat intelligence, IOC lists
- Basic nslookup and whois data
- Port scanning
- Billy is pretty sure he can make is VPN rule more valuable in his teams’ investigations

In [10]:
# Billy is going to use an open Threat Intelligence exchange from Alienvault to get started.
# He'll be able to modify this as he adds other enrichment feeds in the future

def alienvault_ip_lookup(ip_addr):
    
    # In this case Billy has written his API token to a file to read later.
    # He could also have saved it as an environment variable and used os.environ.get('VARIABLENAME')
    # NEVER HARD CODE CREDENTIALS IN CODE. THEY WILL END UP IN GIT OR SOMEWHERE ELSE
    key_file = os.path.join(base_dir, 'alienvault.key')
    with open(key_file, 'r') as kf:
        api_token = kf.read()

    url = f'https://otx.alienvault.com:443/api/v1/indicators/IPv4/{ip_addr}'
    headers = {'X-OTX-API-KEY': api_token, 'Accept': 'application/json', 'Content-Type': 'application/json'}

    r = requests.get(url, headers=headers)

    if r.status_code == 200:
        threat = r.json()
    else:
        return None

    pulse_info = threat.get('pulse_info')
    if pulse_info:
        pulses = pulse_info.get('pulses')
        tags = [tag for pulse in pulses for tag in pulse.get('tags')]
    else:
        tags = None

    desc = threat['base_indicator'].get('description')
    reputation = threat.get('reputation')

    return {'description': desc, 'tags': tags, 'reputation': reputation}

#### Now that he has the ability to get enriched data, it's time to display it. 

Billy is just going to re-use his previous code with a slight modification. Now he's seeing why a function may have been a good idea earlier

He could also store the enriched data somewhere with his events or indexes for future correlations and investigations

In [11]:
headers = 'time | user | country code | remote ip | alienv desc | alienv tags | alienv reputation'
print(headers)
print ('-' * len(headers))

for match in rule_matches:
    for event_id in match:
        for f in os.listdir(json_dir):
            fp = os.path.join(json_dir, f)
            with open(fp, 'r') as infile:
                events = json.load(infile)

            if event_id in events:
                full_event = events[event_id]
                break

        event_time = datetime.datetime.fromtimestamp(full_event['event_time']).strftime('%Y-%m-%d %H:%M:%S')
        user = full_event['remote_user']
        ip = full_event['remote_ip']
        country_code = cc_index[ip]
        
        alv_rep = alienvault_ip_lookup(ip)
        
        print(f'{event_time} | {user} | {country_code} | {ip} | {alv_rep["description"]} | {alv_rep["tags"]}')

time | user | country code | remote ip | alienv desc | alienv tags | alienv reputation
--------------------------------------------------------------------------------------
2019-05-09 22:39:31 | smiley | CN | 112.251.21.161 | SSH bruteforce client IP | ['SSH', 'bruteforce', 'honeypot']
2019-05-09 21:57:04 | smiley | US | 76.210.33.168 | None | []
2019-05-09 21:59:31 | smiley | US | 76.210.33.168 | None | []
2019-05-09 22:25:36 | smiley | RO | 94.176.148.227 | None | []
2019-05-09 22:33:18 | smiley | CA | 178.128.229.53 | None | []
2019-05-09 22:35:52 | smiley | US | 71.85.118.117 | None | []


# **Task 5: Quick vulnerability scan**

After Billy finishes investigating the compromised VPN account, he is told there’s a large scale vulnerability in a fairly uncommon application and Billy isn’t sure if the company is vulnerable. But he doesn't want to assume they aren't

- Billy knows a string that is in the vuln application’s banner
- Billy knows the port the application uses
- So, he’s pretty sure he can quickly identify vulnerable hosts


#### This is a good opportunity for Billy to invest some time into his engineering skills

He's familiar with the concept of threading but has never used it. He does know, that it will make running several scans much quicker. So he takes about an hour before writing his scanning functions to research Threading and write his code in a way that supports it

In [12]:
import socket
import ipaddress

from threading import Thread

app_port = 9001
vuln_string = 'Vully the basic chat application'

def get_check_banner(target, port, banner_string):
    '''
    This is the core scan function that will run in a threat handler for each target
    '''
    try:
        s = socket.socket()
        s.connect((target, port))
        banner_bytes = s.recv(1024)
        print(f'Got a banner from {target} on {port}\n')
        banner = banner_bytes.decode('utf-8')
    except Exception as e:
        return

    if banner:
        if banner.startswith(banner_string):
            print(f'****{target} running vulnerable app****\n')
   

def banner_scan(network, port, banner_string):
    targets = [str(ip) for ip in ipaddress.IPv4Network(network)]
    
    threads = []
    
    print(f'Starting scan on {network}')
    
    for target in targets:
        t = Thread(target=get_check_banner, args=(target, port, banner_string))
        threads.append(t)
        t.start()
        
    for t in threads:
        t.join()
    
    print(f'Completed scan on {network}\n')


networks = ['192.168.1.0/24', '10.1.1.0/24', '10.0.1.0/24']

scan_threads = []

for network in networks:
    n = Thread(target=banner_scan, args=(network, app_port, vuln_string))
    scan_threads.append(n)
    n.start()
    
for scan in scan_threads:
    n.join()
    
print('All scans completed')

Starting scan on 192.168.1.0/24
Starting scan on 10.1.1.0/24
Starting scan on 10.0.1.0/24
Got a banner from 192.168.1.114 on 9001

****192.168.1.114 running vulnerable app****

Completed scan on 192.168.1.0/24

Completed scan on 10.0.1.0/24
Completed scan on 10.1.1.0/24

All scans completed

