# SOCBED Dataset Structure Analysis

* **Author:** Patrik Goldschmidt (igoldschmidt@fit.vut.cz)
* **Project:** Network Intrusion Datasets: A Survey, Limitations, and Recommendations
* **Date:** 2024

In [2]:
import pandas as pd
import os

In [3]:
PATH = '/home/goldy/Documents/phd/papers/datasurv/data/socbed'
HOST1_BP_PATH = os.path.join(PATH, 'host1_bestpractice')

In [4]:
data_syslog = pd.read_json(os.path.join(HOST1_BP_PATH, 'syslog_01.jsonl'), lines=True)

In [5]:
data_syslog.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19133 entries, 0 to 19132
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype                                
---  ------          --------------  -----                                
 0   @timestamp      19133 non-null  object                               
 1   @version        19133 non-null  int64                                
 2   facility        19133 non-null  int64                                
 3   facility_label  19133 non-null  object                               
 4   host            19133 non-null  object                               
 5   logsource       19133 non-null  object                               
 6   message         19133 non-null  object                               
 7   priority        19133 non-null  int64                                
 8   program         19133 non-null  object                               
 9   severity        19133 non-null  int64                        

In [8]:
data_winlogbeat = pd.read_json(os.path.join(HOST1_BP_PATH, 'winlogbeat_01.jsonl'), lines=True)

In [9]:
data_winlogbeat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9370 entries, 0 to 9369
Data columns (total 22 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   @timestamp   9370 non-null   object
 1   agent        9370 non-null   object
 2   ecs          9370 non-null   object
 3   event        9370 non-null   object
 4   hash         972 non-null    object
 5   host         9370 non-null   object
 6   log          9370 non-null   object
 7   message      9370 non-null   object
 8   process      5885 non-null   object
 9   related      2792 non-null   object
 10  user         2774 non-null   object
 11  winlog       9370 non-null   object
 12  powershell   485 non-null    object
 13  registry     1116 non-null   object
 14  rule         3676 non-null   object
 15  file         1278 non-null   object
 16  dns          471 non-null    object
 17  network      2145 non-null   object
 18  sysmon       471 non-null    object
 19  error        12 non-null   

In [10]:
data_winlogbeat['network'].value_counts()

{'protocol': 'dns'}                                                                                                                    471
{'community_id': '1:oDcWV/WI0q8UkuKik+tDTxHN2hA=', 'direction': 'outbound', 'protocol': 'smtp', 'transport': 'tcp', 'type': 'ipv4'}      2
{'community_id': '1:D9J/8uHlxF/unPP0nL7M8qKFAPk=', 'direction': 'outbound', 'protocol': 'smtp', 'transport': 'tcp', 'type': 'ipv4'}      2
{'community_id': '1:mbKESePJZ2Ca8qeXssyjkAKzbA0=', 'direction': 'outbound', 'protocol': 'smtp', 'transport': 'tcp', 'type': 'ipv4'}      2
{'community_id': '1:nijJ7lKsAIgYwm8/tTC50fO9a/U=', 'direction': 'outbound', 'protocol': 'smtp', 'transport': 'tcp', 'type': 'ipv4'}      2
                                                                                                                                      ... 
{'community_id': '1:9Y6dLaof6ngn/Q197oljgnaklRY=', 'direction': 'outbound', 'protocol': 'smtp', 'transport': 'tcp', 'type': 'ipv4'}      1
{'community_id': '1:0SWoTof

In [11]:
HOST2_DEFAULT_PATH = os.path.join(PATH, 'host2_default')

In [13]:
data_h2_winlog = pd.read_json(os.path.join(HOST2_DEFAULT_PATH, 'winlogbeat_01.jsonl'), lines=True)

In [14]:
data_h2_winlog.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5885 entries, 0 to 5884
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   @timestamp  5885 non-null   object
 1   agent       5885 non-null   object
 2   ecs         5885 non-null   object
 3   event       5885 non-null   object
 4   host        5885 non-null   object
 5   log         5885 non-null   object
 6   message     5885 non-null   object
 7   powershell  308 non-null    object
 8   process     308 non-null    object
 9   winlog      5885 non-null   object
dtypes: object(10)
memory usage: 459.9+ KB


In [15]:
data_h2_winlog.head()

Unnamed: 0,@timestamp,agent,ecs,event,host,log,message,powershell,process,winlog
0,2021-06-17T22:35:57.364Z,{'ephemeral_id': '3ea30d48-3644-4c38-8f58-b410...,{'version': '1.5.0'},"{'action': 'Provider Lifecycle', 'category': [...",{'name': 'CLIENT'},{'level': 'information'},"Provider ""Alias"" is Started. \n\nDetails: \n\t...",{'process': {'executable_version': '5.1.19041....,"{'args': ['powershell', '$cmd_man_ip', '=', 'n...","{'api': 'wineventlog', 'channel': 'Windows Pow..."
1,2021-06-17T22:35:57.364Z,{'ephemeral_id': '3ea30d48-3644-4c38-8f58-b410...,{'version': '1.5.0'},"{'action': 'Provider Lifecycle', 'category': [...",{'name': 'CLIENT'},{'level': 'information'},"Provider ""Environment"" is Started. \n\nDetails...",{'process': {'executable_version': '5.1.19041....,"{'args': ['powershell', '$cmd_man_ip', '=', 'n...","{'api': 'wineventlog', 'channel': 'Windows Pow..."
2,2021-06-17T22:35:57.696Z,{'ephemeral_id': '3ea30d48-3644-4c38-8f58-b410...,{'version': '1.5.0'},"{'action': 'Provider Lifecycle', 'category': [...",{'name': 'CLIENT'},{'level': 'information'},"Provider ""FileSystem"" is Started. \n\nDetails:...",{'process': {'executable_version': '5.1.19041....,"{'args': ['powershell', '$cmd_man_ip', '=', 'n...","{'api': 'wineventlog', 'channel': 'Windows Pow..."
3,2021-06-17T22:35:57.696Z,{'ephemeral_id': '3ea30d48-3644-4c38-8f58-b410...,{'version': '1.5.0'},"{'action': 'Provider Lifecycle', 'category': [...",{'name': 'CLIENT'},{'level': 'information'},"Provider ""Variable"" is Started. \n\nDetails: \...",{'process': {'executable_version': '5.1.19041....,"{'args': ['powershell', '$cmd_man_ip', '=', 'n...","{'api': 'wineventlog', 'channel': 'Windows Pow..."
4,2021-06-17T22:35:57.943Z,{'ephemeral_id': '3ea30d48-3644-4c38-8f58-b410...,{'version': '1.5.0'},"{'action': 'Provider Lifecycle', 'category': [...",{'name': 'CLIENT'},{'level': 'information'},"Provider ""Registry"" is Started. \n\nDetails: \...",{'process': {'executable_version': '5.1.19041....,{'args': ['C:\Windows\System32\RemoteFXvGPUDis...,"{'api': 'wineventlog', 'channel': 'Windows Pow..."


In [16]:
!head -n 20 "$HOST2_DEFAULT_PATH/attackconsole_01.log"

2021-06-18T00:49:10.049216+02:00 tbfconsole INFO [event="start_tbf_console" attacks="c2_change_wallpaper,infect_email_exe,c2_mimikatz,infect_flashdrive_exe,infect_email_url,misc_set_autostart,misc_sqlmap,c2_download_malware,c2_exfiltration,c2_take_screenshot,misc_execute_malware,disinfect_client,misc_exfiltration,misc_download_malware,c2_hashdump"] Start Console
2021-06-18T00:49:10.073180+02:00 tbfconsole INFO [event="run_attack" attack="misc_sqlmap" url="http://172.18.0.2/dvwa/vulnerabilities/sqli/?id=&Submit=Submit"] Run attack
2021-06-18T00:49:25.986140+02:00 tbfconsole INFO [event="attack_succeeded" attack="misc_sqlmap" url="http://172.18.0.2/dvwa/vulnerabilities/sqli/?id=&Submit=Submit"] Attack succeeded
2021-06-18T00:49:25.989259+02:00 tbfconsole INFO [event="exit_tbf_console"] Exit Console
2021-06-18T00:52:09.657490+02:00 tbfconsole INFO [event="start_tbf_console" attacks="c2_hashdump,c2_exfiltration,infect_email_exe,misc_exfiltration,c2_mimikatz,misc_sqlmap,infect_flashdrive_ex

In [17]:
!head -n 20 "$HOST2_DEFAULT_PATH/attackconsole_01.stdout"


+ -- - -= [	 BREACH Attack Console 		]
+ -- - -= [	 15 attack(s)          		]

attackconsole > use misc_sqlmap
attackconsole (misc_sqlmap) > run
        ___
       __H__
 ___ ___[']_____ ___ ___  {1.5.2#stable}
|_ -| . [.]     | .'| . |
|___|_  [)]_|_|_|__,|  _|
      |_|V...       |_|   http://sqlmap.org

[!] legal disclaimer: Usage of sqlmap for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to obey all applicable local, state and federal laws. Developers assume no liability and are not responsible for any misuse or damage caused by this program

[*] starting @ 00:49:11 /2021-06-18/

[00:49:11] [INFO] using 'STDIN' for parsing targets list
URL 1:


In [18]:
!head -n 20 "$HOST2_DEFAULT_PATH/vmconsole_01.log"

[INFO] Starting TBF Session
[INFO] Creating backup snapshots for ['Internet Router', 'Attacker', 'Company Router', 'Log Server', 'Internal Server', 'DMZ Server', 'Client']
[INFO] Creating clones
[INFO] Starting all VMs
[INFO] TBF Session started
[INFO] Found and loaded old session state
[INFO] Closing TBF Session
[INFO] Poweroff all VMs
[INFO] Deleting clones
[INFO] Restoring and deleting backup snapshots
[INFO] TBF Session closed


This dataset has no sign of network data, despite it was stated at COMIDDS project (https://fkie-cad.github.io/COMIDDS/content/datasets/socbed_dataset/)