# Extracting data from network config files using batfish.

The code uses batfish to get vendor neutral information about the network. Batfish outputs the data in form of pandas dataframe which are then converted and stored as json files.

## Requirements:
 * Batfish tool and pybatfish.
 Both can be easily installed by following the directions on the [offical github page](https://github.com/batfish/batfish). Few things to keep in mind:
 * Pyhton version should be >=3.6
 * Java 8 is required
 * Use virtual environment to install pybatfish (as mentioned in the documentation).
 


Before running the following cells, make sure that the batfish tool is running locally. If you've followed the directions given on the github page, the following command should do it:
> docker run -v ```$(pwd)/data:/data -p 9997:9997 -p 9996:9996 batfish/allinone```

The following cell imports pybatfish and other needed packages. If you get a ```ConnectionError```, it probably means that batfish is not running locally. Other than that there's no need to understand any part of it.

In [1]:
import logging
import random
import os
import collections
import pandas as pd
from IPython.display import display
from pandas.io.formats.style import Styler

from pybatfish.client.commands import *
# noinspection PyUnresolvedReferences
from pybatfish.datamodel import Interface, Edge
from pybatfish.datamodel.flow import HeaderConstraints, PathConstraints
from pybatfish.question import bfq, load_questions  # noqa: F401
from pybatfish.util import get_html

bf_logger.setLevel(logging.WARN)

load_questions()

# pd.set_option('display.max_colwidth', -1)
# pd.set_option('display.max_columns', None)
# # Prevent rendering text between '$' as MathJax expressions
# pd.set_option('display.html.use_mathjax', False)

# # UUID for CSS styles used by pandas styler.
# # Keeps our notebook HTML deterministic when displaying dataframes
# _STYLE_UUID = "pybfstyle"


# class MyStyler(Styler):
#     """A custom styler for displaying DataFrames in HTML"""

#     def __repr__(self):
#         return repr(self.data)


# def show(df):
#     """
#     Displays a dataframe as HTML table.

#     Replaces newlines and double-spaces in the input with HTML markup, and
#     left-aligns the text.
#     """

#     # workaround for Pandas bug in Python 2.7 for empty frames
#     if not isinstance(df, pd.DataFrame) or df.size == 0:
#         display(df)
#         return
#     df = df.replace('\n', '<br>', regex=True).replace('  ', '&nbsp;&nbsp;',
#                                                       regex=True)
#     display(MyStyler(df).set_uuid(_STYLE_UUID).format(get_html)
#             .set_properties(**{'text-align': 'left', 'vertical-align': 'top'}))


## The following cell is where batfish analyzes the config files.
```NETWORK_NAME``` and ```SNAPSHOT_NAME``` don't matter if you're working with a single network. However, the final json files will be stored in a directory with name ```NETWORK_NAME json files```. The important change you should make is to ```SNAPSHOT_PATH``` which should point to the directory containing your config files. The following setup is to read the [example network](https://github.com/batfish/pybatfish/tree/master/jupyter_notebooks/networks/example) given in the [pybatfish tutorials](https://github.com/batfish/pybatfish/tree/master/jupyter_notebooks).


In [2]:
NETWORK_NAME = "campus-anon-mixed-vnet2"
SNAPSHOT_NAME = "example_snapshot"

SNAPSHOT_PATH = "networks/campus-anon-mixed-vnet2"

# Now create the network and initialize the snapshot
bf_set_network(NETWORK_NAME)
bf_init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME, overwrite=True)
load_questions()

One or more input files were not fully recognized by Batfish. Some unrecognized configuration snippets are not uncommon for new networks, and it is often fine to proceed with further analysis. You can help the Batfish developers improve support for your network by running:

    bf_upload_diagnostics(dry_run=False)

to share private, anonymized information. For more information, see the documentation with:

    help(bf_upload_diagnostics)


Batfish has a property ```namedStrucures()``` which gives the named structures of the network.``` answers().frame()``` is used to get the data in a pandas dataframe. 

The column ```Structure_Type``` denotes the type of the named structure(the code will create a json file for each value in the column). 


The following block will get the named structures in dataframe ```data```.

In [3]:
data = bfq.namedStructures().answer().frame()

In [4]:
data

Unnamed: 0,Node,Structure_Type,Structure_Name,Structure_Definition
0,st75hr82,Routing_Policy,MVS_Floating_Address,"{'name': 'MVS_Floating_Address', 'statements':..."
1,rt73sn14m4ce,Route_Filter_List,from-PE,"{'lines': [{'action': 'PERMIT', 'ipWildcard': ..."
2,st73in44p4as,IP_Access_List,98,"{'name': '98', 'lines': [{'action': 'PERMIT', ..."
3,st73in59p5as,IP_Access_List,98,"{'name': '98', 'lines': [{'action': 'PERMIT', ..."
4,rt55in04hrds,AS_Path_Access_List,93,"{'lines': [{'action': 'PERMIT', 'regex': '(,|\..."
5,st73in45p4as,IP_Access_List,125,"{'name': '125', 'lines': [{'action': 'PERMIT',..."
6,rt73sn14m4ce,IPSec_Policy,RT75SN14HRCE:10,"{'name': 'RT75SN14HRCE:10', 'pfsKeyGroupDynami..."
7,rt55in70hras,Route_Filter_List,PL-MVS-OCC-INI-EURO-RNI-OUT,"{'lines': [{'action': 'PERMIT', 'ipWildcard': ..."
8,rt73ve11m5ar,IP_Access_List,99,"{'name': '99', 'lines': [{'action': 'PERMIT', ..."
9,rt73in04m4ds,IP_Access_List,99,"{'name': '99', 'lines': [{'action': 'PERMIT', ..."


In [21]:
Structure_types = list(data.Structure_Type.unique())
for struct in Structure_types:
    
    df = data[data['Structure_Type']==struct]
    
    col_names = list(df.Structure_Name.unique())
    unique_nodes = list(df.Node.unique())
    
    struct_df = pd.DataFrame(index=unique_nodes,columns=col_names)
    
    nodes = df['Node']
    acls = df['Structure_Name']
    values = df['Structure_Definition']
    
    zip_data = zip(nodes,acls,values)
    
    for index,column,value in zip_data:
        struct_df.loc[index,column] = [value]
        
    fileName=str(struct)+".json"
    directory = "./"+str(NETWORK_NAME)+" json files"
    if not os.path.exists(directory):
        os.mkdir(directory)
    fullName = os.path.join(directory, fileName)
    struct_df.to_json(fullName,orient="index") 

print("JSON files saved")

JSON files saved


In [5]:
df = pd.read_json("campus-anon-net1 json files/IP_Access_List.json")

In [6]:
df

Unnamed: 0,perimeter,corea,coreb
allow-internal-management,,"[{'name': 'allow-internal-management', 'lines'...","[{'name': 'allow-internal-management', 'lines'..."
sshFilter,"[{'name': 'sshFilter', 'lines': [{'action': 'P...",,
