# Accesing and querying JanusGraph database

This notebook demonstrates how to create a connection to the graph database, how to query the graph database and how to visualize graph query results.


Let's start with imports and connection to JanusGraph database:

In [80]:
from thoth.lab import obtain_location
from thoth.storages import GraphDatabase
from thoth.lab import GraphQueryResult as gqr

# Let's obtain location of our JanusGraph instance.
JANUSGRAPH_LOCATION = obtain_location(
    "thoth-test-core-janusgraph",  # name of JanusGraph instance
    verify=False,            # do not verify TLS on internal network
    only_netloc=True         # do not use http schema, use netloc instead
)

# Instantiate and connect the JanusGraph database
adapter = GraphDatabase.create('10.8.253.255', port=8182)
adapter.connect()

adapter.is_connected()

True

Next, let's try to perform some generic queries. We will use `GraphQueryResult` wrapper that wraps our graph queries and exposes some useful methods for us:

In [87]:
%%time


query = adapter.g.V().has('__label__', 'python_package_version').has('__type__', 'vertex').groupCount().by('package_name').next()
query_result = gqr(query)

CPU times: user 5.48 ms, sys: 596 µs, total: 6.07 ms
Wall time: 836 ms


If we would like to access raw response, we can do so by acessing the result attribute:

In [88]:
query_result.result

{'tzlocal': 1,
 'pycparser': 1,
 'markupsafe': 2,
 'redis': 1,
 'fedmsg': 1,
 'pyasn1-modules': 1,
 'stomper': 1,
 'moksha.common': 1,
 'lxml': 1,
 'docutils': 1,
 'm2crypto': 1,
 'werkzeug': 1,
 'click': 1,
 'sqlalchemy': 1,
 'pyyaml': 2,
 'prettytable': 1,
 'boto3': 1,
 'pyliblzma': 1,
 'pygithub': 1,
 'idna': 1,
 'decorator': 1,
 'python-daemon': 2,
 'headers.dist': 1,
 'cffi': 1,
 'extras': 1,
 'pyasn1': 1,
 'requests': 2,
 'kerberos': 1,
 'gutentag': 2,
 'pyopenssl': 1,
 'characteristic': 1,
 'selenium': 1,
 'txws': 1,
 'kitchen': 1,
 'pygments': 2,
 'botocore': 1,
 'twisted': 1,
 'beaker-common': 1,
 'jenkinsapi': 1,
 'jenkins-job-builder': 1,
 'generatejob2': 1,
 'babel': 2,
 'stevedore-examples': 1,
 'pytz': 3,
 'pyjwt': 1,
 'daiquiri': 1,
 'iniparse': 1,
 'kobo': 1,
 'arrow': 1,
 'python-dateutil': 1,
 'setuptools': 6,
 'shipshift': 1,
 'docopt': 1,
 'pip': 4,
 'pyzmq': 2,
 'certifi': 1,
 'errata-tool': 1,
 'service-identity': 1,
 'rfc5424-logging-handler': 1,
 'cryptography':

We can get results as Pandas dataframe:

In [89]:
import pandas as pd

r=query_result.result
df = pd.DataFrame(list(r.items()), columns=['Package Name', '# Versions'])
df

Unnamed: 0,Package Name,# Versions
0,tzlocal,1
1,pycparser,1
2,markupsafe,2
3,redis,1
4,fedmsg,1
5,pyasn1-modules,1
6,stomper,1
7,moksha.common,1
8,lxml,1
9,docutils,1


If we get result as a dictionary of values, we can easily plot some graphs:

In [90]:
query = adapter.g.V().has('__label__', 'python_package_version').groupCount().by('package_name').next()
query_result = gqr(query)

In [91]:
query_result.plot_bar()

In [92]:
# Or a pie chart

query_result.plot_pie()

To get more options on how to visualize and interact with data retrieved from graph database, use the `help` function to access up-to-date `GraphQueryResult` documentation:

In [93]:
help(gqr)

Help on class GraphQueryResult in module thoth.lab.graph:

class GraphQueryResult(builtins.object)
 |  Methods defined here:
 |  
 |  __init__(self, result)
 |      Initialization.
 |      
 |      :param result: the result to be used as a query result, can be directly coroutine that is fired.
 |  
 |  plot_bar(self)
 |      Plot histogram of results obtained.
 |  
 |  plot_pie(self)
 |      Plot a pie of results into Jupyter notebook.
 |  
 |  serialize(self)
 |      Serialize the output of graph query.
 |  
 |  to_dataframe(self)
 |      Construct a panda's dataframe on results.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



In [94]:
%%time

from gremlin_python.process.graph_traversal import has
from gremlin_python.process.traversal import Operator
from gremlin_python.process.traversal import Pop
from gremlin_python.process.traversal import not_
from gremlin_python.process.graph_traversal import identity
from gremlin_python.process.graph_traversal import outE
from gremlin_python.process.graph_traversal import out
from gremlin_python.process.graph_traversal import inE
from gremlin_python.process.graph_traversal import select
from gremlin_python.process.graph_traversal import values
from gremlin_python.process.graph_traversal import fold
from gremlin_python.process.graph_traversal import constant
from gremlin_python.process.graph_traversal import project


query = adapter.g.V().has('package_name', 'jinja2').has('package_version', '2.7.2').repeat(outE().simplePath().has('__label__', 'depends_on').inV().has('__label__', 'python_package_version')).emit().path().by(project('package', 'version').by('package_name').by('package_version')).by(project('depends_on').by('version_range')).toList()

query_result = gqr(query)

print(query_result.result)


    

[]
CPU times: user 4.76 ms, sys: 0 ns, total: 4.76 ms
Wall time: 776 ms


For more info on how to construct queries to the graph database, see [Practical Gremlin: An Apache TinkerPop Tutorial](http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html). Also see [pandas.DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame) documentation on how to use dataframes and [plotly documentation](https://plot.ly/api/) for creating interactive figures. To visualize data available in the graph, use the [graph explorer](https://url.corp.redhat.com/thoth-sbu-graphexp).

In [95]:

#query = adapter.g.V().has('package_name', 'markupsafe').toList()

query = adapter.g.V().has('package_name', 'markupsafe').valueMap(True).toList()



query_result = gqr(query)
print(query_result.result)


[{'package_name': ['markupsafe'], 'id': 987368, 'label': 'package', 'ecosystem': ['pypi'], '__label__': ['package'], '__type__': ['vertex']}, {'id': 42471472, 'package_version': ['0.23'], 'ecosystem': ['pypi'], '__label__': ['python_package_version'], '__type__': ['vertex'], 'package_name': ['markupsafe'], 'label': 'python_package_version'}, {'id': 82915384, 'package_version': ['0.11'], 'ecosystem': ['pypi'], '__label__': ['python_package_version'], '__type__': ['vertex'], 'package_name': ['markupsafe'], 'label': 'python_package_version'}]
