To run these examples, the [Splunk Python SDK](https://github.com/splunk/splunk-sdk-python) needs to be present in your Python (venv) installation:  
`pip install splunk-sdk`

## Dataset structure

This is an example of how to work with the imported [CICIDS2017](http://www.unb.ca/cic/datasets/ids-2017.html) dataset through the Splunk Python SDK.  
The dataset has been imported as a CSV through the Web Interface (*Settings* -> *Add Data* -> *Upload*).  
It has been saved to the index *CICIDS17* to seperate it from production netflows.  

Ever entry in the dataset is structured like this:  

| Field Index | Field Name |
-------------- |---------------- |
| 0 | Flow ID |
| 1 | Source IP |
| 2 | Source Port |
| 3 | Destination IP |
| 4 | Destination Port |
| 5 | Protocol |
| 6 | Timestamp |
| 7 | Flow Duration |
| 8 | Total Fwd Packets |
| 9 | Total Backward Packets |
| 10 | Total Length of Fwd Packets |
| 11 | Total Length of Bwd Packets |
| 12 | Fwd Packet Length Max |
| 13 | Fwd Packet Length Min |
| 14 | Fwd Packet Length Mean |
| 15 | Fwd Packet Length Std |
| 16 | Bwd Packet Length Max |
| 17 | Bwd Packet Length Min |
| 18 | Bwd Packet Length Mean |
| 19 | Bwd Packet Length Std |
| 20 | Flow Bytes/s |
| 21 | Flow Packets/s |
| 22 | Flow IAT Mean |
| 23 | Flow IAT Std |
| 24 | Flow IAT Max |
| 25 | Flow IAT Min |
| 26 | Fwd IAT Total |
| 27 | Fwd IAT Mean |
| 28 | Fwd IAT Std |
| 29 | Fwd IAT Max |
| 30 | Fwd IAT Min |
| 31 | Bwd IAT Total |
| 32 | Bwd IAT Mean |
| 33 | Bwd IAT Std |
| 34 | Bwd IAT Max |
| 35 | Bwd IAT Min |
| 36 | Fwd PSH Flags |
| 37 | Bwd PSH Flags |
| 38 | Fwd URG Flags |
| 39 | Bwd URG Flags |
| 40 | Fwd Header Length |
| 41 | Bwd Header Length |
| 42 | Fwd Packets/s |
| 43 | Bwd Packets/s |
| 44 | Min Packet Length |
| 45 | Max Packet Length |
| 46 | Packet Length Mean |
| 47 | Packet Length Std |
| 48 | Packet Length Variance |
| 49 | FIN Flag Count |
| 50 | SYN Flag Count |
| 51 | RST Flag Count |
| 52 | PSH Flag Count |
| 53 | ACK Flag Count |
| 54 | URG Flag Count |
| 55 | CWE Flag Count |
| 56 | ECE Flag Count |
| 57 | Down/Up Ratio |
| 58 | Average Packet Size |
| 59 | Avg Fwd Segment Size |
| 60 | Avg Bwd Segment Size |
| 61 | Fwd Header Length |
| 62 | Fwd Avg Bytes/Bulk |
| 63 | Fwd Avg Packets/Bulk |
| 64 | Fwd Avg Bulk Rate |
| 65 | Bwd Avg Bytes/Bulk |
| 66 | Bwd Avg Packets/Bulk |
| 67 | Bwd Avg Bulk Rate |
| 68 | Subflow Fwd Packets |
| 69 | Subflow Fwd Bytes |
| 70 | Subflow Bwd Packets |
| 71 | Subflow Bwd Bytes |
| 72 | Init_Win_bytes_forward |
| 73 | Init_Win_bytes_backward |
| 74 | act_data_pkt_fwd |
| 75 | min_seg_size_forward |
| 76 | Active Mean |
| 77 | Active Std |
| 78 | Active Max |
| 79 | Active Min |
| 80 | Idle Mean |
| 81 | Idle Std |
| 82 | Idle Max |
| 83 | Idle Min |
| 84 | Label |



## Splunk SDK setup and search query

This example is based off of this howto: [How to run searches and jobs](http://dev.splunk.com/view/python-sdk/SP-CAAAEE5#oneshotjob)

In [1]:
import splunklib.client as client
import splunklib.results as results

# Import splunk host credentials
%run -i splunk_credentials
# This file contains the following lines:
# HOST = "10.x.x.x"
# PORT = "8089"
# USERNAME = "splunk_web_user"
# PASSWORD = "splunk_web_password"

service = client.connect(
    host=HOST,
    port=PORT,
    username=USERNAME,
    password=PASSWORD
)

Run a one-shot search and display the results using the results reader

Set the parameters for the search:
- Search everything in a 24-hour time range starting Jujy 2nd, 12:00pm
- Return unlimited events (oneshot searches usually only return the first 100 events)

The search itself sets additional params:
- Display the first 10 results

In [2]:
kwargs_oneshot = {"earliest_time": "2017-07-02T12:00:00.000-00:00",
                  "latest_time":   "2017-07-08T12:00:00.000-00:00",
                  "count": 0}
searchquery_oneshot = "search index=cicids17 | head 10"


oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot) 

## Search result exploration

Now that we have the results, display them using ResultsReader

In [3]:
reader = results.ResultsReader(oneshotsearch_results)
for item in reader:
    raw = item['_raw']
    print("---------------\nraw:",raw,"\n")
    raw = raw.split(',')
    fid,src_ip,src_port,dest_ip,dest_port,timestamp,label = [raw[0],raw[1],raw[2],raw[3],raw[4],raw[6],raw[84]]
    print('Name:\t{}  Date: {}\nFlow:\t{}:{} -> {}:{}\nLabel:\t{}'.format(fid, timestamp, src_ip, src_port, dest_ip, dest_port, label))


# If the results come in as anything else than XML (default), the ResultsReader cannot be used.
# Instead, just stream the raw source stuff
# print(oneshotsearch_results.read())

---------------
raw: 192.168.10.3-192.168.10.5-53-59851-17,192.168.10.5,59851,192.168.10.3,53,17,7/7/2017 12:59,153,2,2,68,100,34,34,34,0,50,50,50,0,1098039.216,26143.79085,51,84.00595217,148,2,2,2,0,2,2,3,3,0,3,3,0,0,0,0,40,40,13071.89542,13071.89542,34,50,40.4,8.76356092,76.8,0,0,0,0,0,0,0,0,1,50.5,34,50,40,0,0,0,0,0,0,2,68,2,100,-1,-1,1,20,0,0,0,0,0,0,0,0,BENIGN 

Name:	192.168.10.3-192.168.10.5-53-59851-17  Date: 7/7/2017 12:59
Flow:	192.168.10.5:59851 -> 192.168.10.3:53
Label:	BENIGN
---------------
raw: 192.168.10.17-198.100.147.178-123-123-17,192.168.10.17,123,198.100.147.178,123,17,7/7/2017 12:59,16842,1,1,48,48,48,48,48,0,48,48,48,0,5700.035625,118.7507422,16842,0,16842,16842,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,20,59.3753711,59.3753711,48,48,48,0,0,0,0,0,0,0,0,0,0,1,72,48,48,20,0,0,0,0,0,0,1,48,1,48,-1,-1,0,20,0,0,0,0,0,0,0,0,BENIGN 

Name:	192.168.10.17-198.100.147.178-123-123-17  Date: 7/7/2017 12:59
Flow:	192.168.10.17:123 -> 198.100.147.178:123
Label:	BENIGN
---------------
raw