# 4-2: APIs

**A**pplication **P**rogramming **I**nterfaces are ways for one piece of software to communicate with another. In our case, we are specifically referring to HTTP endpoints where we can retrieve and send data. These are often referred to as [REST APIs](https://restfulapi.net/).

The potential value of connecting our Notebooks to external data sources cannot be overstated. By leveraging the power of the interwebs, we can verify our findings, submit new intelligence, and enrich/correlate our data to tell a more complete story.

To demonstrate this power, we will play a bit with the [VirusTotal API](https://developers.virustotal.com/reference/overview). If you're not familiar with VT, it's a fantastic resources for community-submitted malware samples and indicators. It's a primary part of my workflow.

In order to do this, you will need a **VirusTotal API Key**, which means you'll need a VirusTotal account. They are free! Go get one. I'll wait.

...

Got one? Cool. Now, don't tell anyone. Not even me.

## Seeeecrets

Okay, got your API Key? Cool. One thing that we'll commonly have to do is enter secrets like API keys or passwords into a Notebook to handle authentication. This is tricky because we certainly don't want to expose those in plaintext, and we _definitely_ don't want to save them in the state of our Notebook unencrypted. What's a Jupyteer (I just made that up) to do?

The `getpass` module allows us to collect secrets in a Jupyter Notebook and use them securely. They are _not_ stored in the saved state of the Notebook, meaning this process it Git-safe!

Let's import it and play with it.


In [None]:
from getpass import getpass
secret = getpass("Gimme a secret!")

Gimme a secret! ········


Now you _can_ print that out, but don't. Just use these secrets as you need them.

## VT The Hard Way

VirusTotal does have a Python module we can use, but I want to show you how to use the REST API directly first. 

### Headers and Authentication

It is very common for REST APIs to use a HTTP header for authentication. In VT's case, the header `x-apikey` must be present and set to a valid API key. Let's set that up now (and import requests)

In [None]:
# Import our stuff
import requests
import json

In [3]:
# Set the API key and build the headers
api_key = getpass("VT API KEY")
headers = {
    "content-type": "application/json",
    "x-apikey": api_key
}

VT API KEY ········


Now we need a sample to test. I have one for us: a sha256sum of a strange file I found on a webserver:

**SHA256:** `1c263b3f4d21039b2a89865a4ab6600f1cc034817bae6ab1f91599674e94be72`

Now let's build our API request. Luckily, the endpoint for [Files](https://developers.virustotal.com/reference/file-info) is quite simple: a `GET` request to `https://www.virustotal.com/api/v3/files/[FILE HASH]`.

With that and our header, we should be good! Let's build that with `requests` and get our data!

In [4]:
# Build the VT Search

# How else could we get hashes? Think back to what we've done before.
search_item: str = "1c263b3f4d21039b2a89865a4ab6600f1cc034817bae6ab1f91599674e94be72"

url: str = f"https://www.virustotal.com/api/v3/files/{search_item}"

# Note the use of headers in the GET requests
# We want the JSON result
res: dict = requests.get(url, headers=headers).json()

The thing about VirusTotal responses is that they can be kind of enormous. Let's break this one down to see what we have.

In [5]:
res.keys()

dict_keys(['data'])

Okay, 2 keys. Not so bad right? Well...`data` is a little bigger.

In [6]:
data: dict = res["data"]
data.keys()

dict_keys(['attributes', 'type', 'id', 'links'])

In [9]:
# Now let's see attributes
attributes: dict = data["attributes"]
attributes.keys()

dict_keys(['type_description', 'tlsh', 'vhash', 'trid', 'crowdsourced_yara_results', 'names', 'last_modification_date', 'type_tag', 'elf_info', 'times_submitted', 'total_votes', 'size', 'popular_threat_classification', 'last_submission_date', 'last_analysis_results', 'crowdsourced_ids_stats', 'sandbox_verdicts', 'sha256', 'tags', 'crowdsourced_ids_results', 'last_analysis_date', 'unique_sources', 'first_submission_date', 'ssdeep', 'packers', 'md5', 'sha1', 'magic', 'last_analysis_stats', 'meaningful_name', 'reputation'])

So obviously there's a lot to look through. It helps to know what we're after. I like `type_description`, `names` and `popular_threat_classification` for starters. These help identify what kind of a thing it probably is.

In [10]:
# Print some basic data
print("Type Description:")
print(attributes["type_description"])
print("\nNames:")
print(attributes["names"])
print("\nPopular Threat Classification:")
print(attributes["popular_threat_classification"])

Type Description:
ELF

Names:
['MALICITOS.x86', 'db0fa4b8db0333367e9bda3ab68b8042.x86', 'jawsoutput', 'exploit.exe', 'jawsshell.x86', 'copy']

Popular Threat Classification:
{'suggested_threat_label': 'trojan.linux/mirai', 'popular_threat_category': [{'count': 21, 'value': 'trojan'}], 'popular_threat_name': [{'count': 22, 'value': 'linux'}, {'count': 13, 'value': 'mirai'}, {'count': 2, 'value': 'r002c0oik22'}]}


If you want to go a bit deeper on the results, `sandbox_verdicts` is always interesting.

In [11]:
attributes["sandbox_verdicts"]

{'Zenbox Linux': {'category': 'malicious',
  'confidence': 80,
  'sandbox_name': 'Zenbox Linux',
  'malware_classification': ['MALWARE', 'SPREADER', 'TROJAN'],
  'malware_names': ['Gafgyt', 'Mirai']}}

And of course if you want the whole list of detections, that'll be in `last_analysis_results`. Because the list is so huge, I might deconstruct it a little bit. Also, knowing that failed detections are `None`s, I might use that to see just the engines that did detect it.

In [12]:
# Use a pretty complex list comprehension to get just successful detections
positive_detections: [tuple] = [(k, v["result"]) for k, v in attributes["last_analysis_results"].items() if v["result"]]
positive_detections

[('Lionic', 'Trojan.Linux.Mirai.K!c'),
 ('DrWeb', 'Linux.Mirai.4327'),
 ('MicroWorld-eScan', 'Trojan.Linux.GenericKD.3050'),
 ('FireEye', 'Trojan.Linux.GenericKD.3050'),
 ('ALYac', 'Trojan.Linux.GenericKD.3050'),
 ('VIPRE', 'Trojan.Linux.GenericKD.3050'),
 ('Sangfor', 'Backdoor.Linux.Mirai.Vq8o'),
 ('Arcabit', 'Trojan.Linux.Generic.DBEA'),
 ('Cyren', 'E32/ABRisk.YKSD-6'),
 ('Symantec', 'Linux.Mirai'),
 ('ESET-NOD32', 'a variant of Linux/Mirai.ATO'),
 ('TrendMicro-HouseCall', 'TROJ_GEN.R002C0OIK22'),
 ('Avast', 'ELF:Agent-AYQ [Trj]'),
 ('ClamAV', 'Unix.Trojan.Mirai-7669677-0'),
 ('Kaspersky', 'HEUR:Backdoor.Linux.Mirai.b'),
 ('BitDefender', 'Trojan.Linux.GenericKD.3050'),
 ('NANO-Antivirus', 'Trojan.Elf32.Mirai.jsogey'),
 ('Tencent', 'Backdoor.Linux.Mirai.wan'),
 ('Ad-Aware', 'Trojan.Linux.GenericKD.3050'),
 ('Emsisoft', 'Trojan.Linux.GenericKD.3050 (B)'),
 ('TrendMicro', 'TROJ_GEN.R002C0OIK22'),
 ('McAfee-GW-Edition', 'GenericRXSE-LK!C770547629BE'),
 ('Sophos', 'Linux/DDoS-CI'),
 ('Ika

At this point, we have a pretty good idea that our hash belongs to a Linux executable that is a **Mirai** variant. If this came from a machine under our purview, we'd have cause for alarm! Or at the very least, incident response.

## VT The Easy Way

Now that we've explored using the API "raw," we can talk about using the Python module. Now, you don't _have_ to use it! If you prefer manual HTTP requests, that's fine. But the library can make some of the ergonomics a little better. Let's import it and see.

_Note: We also use the `nest_asyncio` library for some of how the VT library does its network requests. Those 2 lines are necessary to make it work in Jupyter, but not outside of it._

In [13]:
# import the VT library
import vt
import nest_asyncio
nest_asyncio.apply()

Many of these API libraries are built around a `Client` class, that we instantiate with our credentials. This one is no different. So up first, we'll create our client.

In [14]:
# Instantiate the VT Client
client = vt.Client(api_key)

With the client, we can now perform the same search by specifying the path and using `get_json()` or `get_object()`. In both cases, you still need the last part of the API path

In [15]:
# Get the object version of our search.
r = client.get_object(f"/files/{search_item}")
r

<vt.object.Object file 1c263b3f4d21039b2a89865a4ab6600f1cc034817bae6ab1f91599674e94be72>

`r` may not look like much, but it has everything we need easily accessible. Look!

In [16]:
r.type_description

'ELF'

In [17]:
r.sandbox_verdicts

{'Zenbox Linux': {'category': 'malicious',
  'confidence': 80,
  'sandbox_name': 'Zenbox Linux',
  'malware_classification': ['MALWARE', 'SPREADER', 'TROJAN'],
  'malware_names': ['Gafgyt', 'Mirai']}}

Handy, right? That is much easier than navigating a ton of `dict`s

## Check For Understanding

No auto test for this. Use this Notebook to experiment with the VirusTotal API! Check domains and IPs while getting comfortable with these objects. When you're ready. move on to the next lesson, where we'll scrape web content manually!

In [18]:
# This cell is for your own work! Have fun!