## IP choropleth

This notebook easily renders a list of IP addresses on a choropleth map using either plotly or folium.

### Local run

In order to run this notebook locally, download this notebook on your workstation, execute the following commands in your terminal:

```shell
mkdir /tmp/notebooks
cd /tmp/notebooks
wget "https://raw.githubusercontent.com/udgover/notebooks/main/IP%20choropleth.ipynb"
pip install jupyterlab
jupyter lab
```

Then connect to http://localhost:8888/ and double click on this notebook.

### No install

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/udgover/notebooks.git/HEAD)

### Requirements

At first, you need to install some dependencies 

In [None]:
%pip install -U plotly maxminddb requests pandas folium

Then we will import needed packages

In [None]:
import io
import json
import logging
import os

import folium
import maxminddb
import pandas as pd
import plotly.express as px
import requests

## Geolocating your IPs

In order to render IP addresses on a map, we need to find the geolocation of an IP address. There are several ways to get this information from an IP address, either from online services through API or thanks to free and open-source maxmind databases.

This notebook relies on [CIRCL GeoOpen database](https://data.public.lu/en/datasets/geo-open-ip-address-geolocation-per-country-in-mmdb-format/) which is freely available and regurlaly updated. This database has a country level resolution. In order to map an IP address with plotly, we also need to provide three-letter ISO country codes which are also provided by CIRCL. 

In [None]:
def load_geoip_data(with_asn=True, force_download=False):
    if with_asn:
        db_name = "mmdb-country-asn"
    else:
        db_name = "mmdb-country"
    db_file = f"{db_name}-latest.mmdb"
    if not os.path.exists(db_file) or force_download:
        url = f"https://cra.circl.lu/opendata/geo-open/{db_name}/latest.mmdb"
        response = requests.get(url)
        with open(db_file, "wb") as f:
            f.write(response.content)
    if not os.path.exists("country.json"):
        response = requests.get("https://raw.githubusercontent.com/adulau/mmdb-server/main/db/country.json")
        with open("country.json", "w") as f:
            f.write(response.text)
    with open("country.json") as f:
        countries = json.load(f)
    db = maxminddb.open_database(db_file, maxminddb.MODE_MEMORY)
    return countries, db

countries, db = load_geoip_data()

def ip_info(ip):
    entry = db.get(ip)['country']
    iso_code = entry['iso_code']
    if iso_code in ['None', 'Unknown']:
        logging.warning(f"{ip} not found in database")
        return
    country = countries.get(iso_code, {}).get("Alpha-3 code")
    if not country:
        logging.warning(f"{ip}: {iso_code} can't be mapped to three-letters code")
        return
    as_number = entry.get('AutonomousSystemNumber', '')
    as_ou = entry.get('AutonomousSystemOrganization', '')
    return iso_code, country, as_number, as_ou

## Loading IPs and adding geolocation

Now, let's get IP addresses we want to map by using pandas dataframe with csv file.

The following function is very generic and is provided as an example. By default, this function interprets the provided file (path or stream) as a flat list of IP addresses, one per line.

If you want to parse a csv with several columns, your csv must start with a header and you have to set ip_column to the name of the column containing IP addresses.

In [None]:
def geoloc(csv_file, ip_column=None):
    if ip_column:
        df = pd.read_csv(csv_file)
    else:
        ip_column = 'ip'
        df = pd.read_csv(csv_file, names=[ip_column])
    df[['iso_code', 'country', 'as_number', 'as_ou']] = df.apply(lambda x: ip_info(x[ip_column]), result_type='expand', axis='columns')
    return df

In [None]:
ssh_ips = io.StringIO(requests.get('https://lists.blocklist.de/lists/ssh.txt').text)
df = geoloc(ssh_ips)

Let's display the content of our dataframe

In [None]:
df

## Map!

### Plotly

We can now generate choropleth map from our dataframe.

In [None]:
def plotly_map(df):
    df = df.groupby("country", as_index=False).count()
    fig = px.choropleth(df, locations="country",
                        color="ip",
                        hover_name="country",
                        color_continuous_scale="Viridis",)
    fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
    fig.update_layout(
        autosize=False,
        width=1200,
        height=600,
    )
    fig.show()

In [None]:
# if the map does not show, ctrl+shift+r
plotly_map(df)

### Folium

In [None]:
def folium_map(df):
    world_geo = requests.get("https://raw.githubusercontent.com/python-visualization/folium/main/examples/data/world-countries.json").json()
    m = folium.Map(zoom_start=2, tiles="CartoDB Positron")    
    df = df.groupby("country", as_index=False).count()
    folium.Choropleth(
        geo_data=world_geo,
        name="choropleth",
        data=df,
        columns=["country", "ip"],
        key_on="feature.id",
        fill_color="YlGn",
        fill_opacity=0.7,
        line_opacity=0.2,
        legend_name="IP",
    ).add_to(m)    
    folium.LayerControl().add_to(m)
    display(m)

In [None]:
folium_map(df)