# Accesing and querying JanusGraph database

This notebook demonstrates how to create a connection to the graph database, how to query the graph database and how to visualize graph query results.


Let's start with imports and connection to JanusGraph database:

In [1]:
from thoth.lab import obtain_location
from thoth.storages import GraphDatabase

# Let's obtain location of our JanusGraph instance.
JANUSGRAPH_LOCATION = obtain_location(
    "thoth-sbu-janusgraph",  # name of JanusGraph instance
    verify=False,            # do not verify TLS on internal network
    only_netloc=True         # do not use http schema, use netloc instead
)

# Instantiate and connect the JanusGraph database
adapter = GraphDatabase.create(JANUSGRAPH_LOCATION, port=80)
adapter.connect()

adapter.is_connected()

True

Next, let's try to perform some generic queries. We will use `GraphQueryResult` wrapper that wraps our graph queries and exposes some useful methods for us:

In [2]:
from thoth.lab import GraphQueryResult as gqr

# Randomly pick 10 packages and get number of their versions available in the graph database.
query = adapter.g.V().has('package_version', 'ecosystem', 'pypi').groupCount().by('name').sample(10).toList()
query_result = gqr(query)

If we would like to access raw response, we can do so by acessing the result attribute:

In [3]:
query_result.result

[{'argcomplete': 89,
  'click': 34,
  'codegen': 1,
  'flexmock': 35,
  'graphviz': 31,
  'jsonschema': 19,
  'pyyaml': 12,
  'rainbow-logging-handler': 14,
  'selinon': 4,
  'selinonlib': 9}]

We can get results as Pandas dataframe:

In [4]:
query_result.to_dataframe()

Unnamed: 0,argcomplete,click,codegen,flexmock,graphviz,jsonschema,pyyaml,rainbow-logging-handler,selinon,selinonlib
0,89,34,1,35,31,19,12,14,4,9


If we get result as a dictionary of values, we can easily plot some graphs:

In [5]:
query = adapter.g.V().has('package_version', 'ecosystem', 'pypi').groupCount().by('name').sample(10).next()
query_result = gqr(query)

In [6]:
query_result.plot_bar()

In [7]:
# Or a pie chart

query_result.plot_pie()

To get more options on how to visualize and interact with data retrieved from graph database, use the `help` function to access up-to-date `GraphQueryResult` documentation:

In [8]:
help(gqr)

Help on class GraphQueryResult in module thoth.lab.graph:

class GraphQueryResult(builtins.object)
 |  Methods defined here:
 |  
 |  __init__(self, result)
 |      Initialization.
 |      
 |      :param result: the result to be used as a query result, can be directly coroutine that is fired.
 |  
 |  plot_bar(self)
 |      Plot histogram of results obtained.
 |  
 |  plot_pie(self)
 |      Plot a pie of results into Jupyter notebook.
 |  
 |  to_dataframe(self)
 |      Construct a panda's dataframe on results.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



For more info on how to construct queries to the graph database, see [Practical Gremlin: An Apache TinkerPop Tutorial](http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html). Also see [pandas.DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame) documentation on how to use dataframes and [plotly documentation](https://plot.ly/api/) for creating interactive figures. To visualize data available in the graph, use the [graph explorer](https://url.corp.redhat.com/thoth-sbu-graphexp).