## Discover Thoth's Graph Structure

This notebook is addressed to users and developers of Thoth that want to discover the content of Thoth's Graph Database.

First it's important to look at schema model of Thoth Graph Database.

## 1. Connect to JanusGraph Instance

In order to discover what is inside the Graph Database, we need to connect to JanusGraph instance

In [1]:
from thoth.storages.graph import GraphDatabase
from thoth.lab import GraphQueryResult as gqr

graph_db = GraphDatabase.create('janusgraph.test.thoth-station.ninja', port=8182)
graph_db.connect()
g = graph_db.g   # We will use raw Gremlin traversal in examples.

ClientConnectorError: [Errno 110] Cannot connect to host janusgraph.test.thoth-station.ninja:8182 ssl:False [Can not connect to janusgraph.test.thoth-station.ninja:8182 [Connection timed out]]

We need to import objects that will be used in the notebook

In [5]:
import pandas as pd
from pprint import pprint
from thoth.solver import pip_compile
from thoth.storages.graph.models import ALL_MODELS
from gremlin_python.process.graph_traversal import has
from gremlin_python.process.traversal import Operator
from gremlin_python.process.traversal import Pop
from gremlin_python.process.traversal import not_
from gremlin_python.process.traversal import P
from gremlin_python.process.graph_traversal import identity
from gremlin_python.process.graph_traversal import outE
from gremlin_python.process.graph_traversal import out
from gremlin_python.process.graph_traversal import inE
from gremlin_python.process.graph_traversal import inV
from gremlin_python.process.graph_traversal import select
from gremlin_python.process.graph_traversal import values
from gremlin_python.process.graph_traversal import fold
from gremlin_python.process.graph_traversal import constant
from gremlin_python.process.graph_traversal import project

## 2. Vertex and Edge Labels

List all the available vertex labels in the graph database

In [142]:
# Create list of vertices 
vertex_labels = []
for element in ALL_MODELS:

    if element.__type__ == "vertex":
        vertex_labels.append(element.__label__)
        
# Create the pandas DataFrame 
df = pd.DataFrame(vertex_labels, columns = ['Vertex'])
df

Unnamed: 0,Vertex
0,rpm_requirement
1,buildtime_environment
2,python_artifact
3,python_package_index
4,deb_package_version
5,user_software_stack
6,build_observation
7,hardware_information
8,ecosystem_solver
9,rpm_package_version


List all the available edge labels in the graph database

In [143]:
# Create list of edges
edge_labels = []
for element in ALL_MODELS:

    if element.__type__ == "edge":
        edge_labels.append(element.__label__)
        
# Create the pandas DataFrame 
df = pd.DataFrame(edge_labels, columns = ['Edge'])
df

Unnamed: 0,Edge
0,observed
1,deb_pre_depends
2,requires
3,builds_on
4,solved
5,depends_on
6,runs_on
7,has_version
8,creates_stack
9,is_part_of


## 3 Vertex and Edge Instances

Let's have an idea of the size of Thoth's Graph

In [10]:
print(f"Number of vertex instances in the graph database: {gqr(g.V().count().next()).result:d}")
print(f"Number of edge instances in the graph database: {gqr(g.E().count().next()).result:d}")

Number of vertex instances in the graph database: 38877
Number of edge instances in the graph database: 586892


Let's see which vertex label has more instances

In [11]:
# Extract the number of instances for each vertex label

# List of vertex labels
vertex_labels = [element.__label__ for element in ALL_MODELS if element.__type__ == "vertex"]

# Dict of vertex labels and counts 
vertices_number = gqr(g.V().has("__label__").has("__type__", "vertex").groupCount().by("__label__").next()).result

list_vertices_counts = []

for vertex in vertex_labels:
    
    if vertex in vertices_number.keys():
        
        list_vertices_counts.append([vertex, vertices_number[vertex]])
        
    else:
        list_vertices_counts.append([vertex, 0])
        

print(f"\nNumber of vertex instances present in the graph database (sum): {sum(vertex_c[1] for vertex_c in list_vertices_counts)}")
print(f"Number of vertex instances present in the graph database: {gqr(graph_db.g.V().count().next()).result:d}")


Number of vertex instances present in the graph database (sum): 38876
Number of vertex instances present in the graph database: 38877


In [13]:
# Show the number of instances for each vertex label
df = pd.DataFrame(list_vertices_counts, columns = ['Vertex', 'N. Instances']).sort_values(by='N. Instances',ascending=False)
df

Unnamed: 0,Vertex,N. Instances
14,python_package_version,21733
5,python_artifact,14706
16,package,941
0,cve,796
6,rpm_requirement,513
15,rpm_package_version,174
7,python_package_index,7
13,ecosystem_solver,4
3,buildtime_environment,1
11,runtime_environment,1


Let's see which edge label has more instances

In [6]:
%%time

# List of edge labels
edge_labels = [element.__label__ for element in ALL_MODELS if element.__type__ == "edge"]

# Dict of edge labels and counts 
edges_number = gqr(g.E().has("__label__").has("__type__", "edge").groupCount().by("__label__").next()).result

list_edges_counts = []

for edge in edge_labels:
    
    if edge in edges_number.keys():
        
        list_edges_counts.append([edge, edges_number[edge]])
        
    else:
        list_edges_counts.append([edge, 0])
        

print(f"\nNumber of edge instances present in the graph database (sum): {sum(edge_c[1] for edge_c in list_edges_counts)}")
print(f"Number of edge instances present in the graph database: {gqr(graph_db.g.E().count().next()).result:d}")


Number of edge instances present in the graph database (sum): 599881
Number of edge instances present in the graph database: 599881
CPU times: user 8.6 ms, sys: 1.59 ms, total: 10.2 ms
Wall time: 47.8 s


In [7]:
# Show the number of instances for each vertex label
df = pd.DataFrame(list_edges_counts, columns = ['Edge', 'N. Instances']).sort_values(by='N. Instances',ascending=False)
df

Unnamed: 0,Edge,N. Instances
9,solved,404583
10,depends_on,119157
1,has_vulnerability,34796
12,has_version,22276
4,has_artifact,15455
7,requires,3350
14,is_part_of,264
0,deb_depends,0
2,runs_in,0
3,builds_in,0


# 4. Discover the packages inside Thoth

Check the allowed sources for the packages inside Thoth database

In [17]:
urls_list = graph_db.get_python_package_index_urls()

df = pd.DataFrame(urls_list, columns = ['URL'])
df

Unnamed: 0,URL
0,https://pypi.org/simple
1,https://tensorflow.pypi.thoth-station.ninja/in...
2,https://tensorflow.pypi.thoth-station.ninja/in...
3,https://tensorflow.pypi.thoth-station.ninja/in...
4,https://tensorflow.pypi.thoth-station.ninja/in...
5,https://tensorflow.pypi.thoth-station.ninja/in...
6,https://tensorflow.pypi.thoth-station.ninja/in...


Let's take a look at which python packages are inside Thoth 

In [22]:
# Select a letter
letter = 'g'

In [23]:
# Extract Packages for selected letter
all_packages = gqr(
    g.V()
    .has('__label__', 'package')
    .order().by('package_name')
    .project('package').by('package_name')
    .toList()
).result
    
packages_list = [package['package'] for package in all_packages if package['package'][0] == letter]    
print(f"The number of packages for letter {letter} is: {len(packages_list)}\n")

The number of packages for letter g is: 12



In [24]:
# Visualize packages for selected letter
df = pd.DataFrame(packages_list, columns = [letter])
df

Unnamed: 0,g
0,gandi-cli
1,gast
2,genshi
3,gevent
4,geventhttpclient
5,girder
6,gitlab-languages
7,gns3-gui
8,go-http
9,google-appengine


For the packages extracted, let's see how many versions are available

In [None]:
%%time
# Count all package versions (Python and RPM)
package_versions_results = []

for package in packages_list:
    
    n_python_package_versions = gqr(g.V()
                              .has("__label__", "package")
                              .has("package_name", package)
                              .outE()
                              .has("__label__","has_version")
                              .inV()
                              .has('__label__', 'python_package_version')
                              .count()
                              .next()
                             ).result
    
    n_rpm_package_versions = gqr(g.V()
                          .has("__label__", "package")
                          .has("package_name", package)
                          .outE()
                          .has("__label__","has_version")
                          .inV()
                          .has('__label__', 'rpm_package_version')
                          .count()
                          .next()
                         ).result
    
    package_versions_results.append([package,
                                 n_python_package_versions,
                                 n_rpm_package_versions,
                                 n_python_package_versions +  n_rpm_package_versions])
    

In [10]:
# Visualize packages for selected letter
df = pd.DataFrame(package_versions_results, columns = ['package_name', 'n_python_package_version',
                                                      'n_rpm_package_version', 'total_package_versions'])
df

Unnamed: 0,package_name,n_python_package_version,n_rpm_package_version,total_package_versions
0,gandi-cli,3,0,3
1,gast,5,0,5
2,genshi,15,0,15
3,gevent,18,0,18
4,geventhttpclient,2,0,2
5,girder,49,0,49
6,gitlab-languages,12,0,12
7,gns3-gui,51,0,51
8,google-appengine,1,0,1
9,google-pasta,3,0,3


# 5. Select one package and discover more about it

In [None]:
# Select the package
package_name = 'gast'

We retrieve all the versions available in Thoth's graph from any ecosystem for the selected package

In [None]:
%%time

gqr(
    g.V().
    has('package_name', package_name)
    .outE().has('__label__', 'has_version')
    .inV()
    .order().by('package_version')
    .project('package', 'version', 'ecosystem').by('package_name').by('package_version').by('ecosystem')
    .toList()
).to_dataframe()

# 7. Inspect packages

In [88]:
packages_vector_all = gqr(g.V().has('__label__', 'package').groupCount().by("package_name").next()).result

In [89]:
# Check how many packages are inserted in the wrong format
packages_vector = [pkg for pkg in packages_vector_all.keys() if len(pkg) < 55]
wrong_packages_vector = [pkg for pkg in packages_vector_all.keys() if len(pkg) > 55]
print(packages_vector)
print(len(packages_vector))
print(wrong_packages_vector)
print(len(wrong_packages_vector))

['edrnsite-policy', 'django-autocomplete-light', 'html5', 'blinkpy', 'django-secure-auth', 'anncolvar', 'pyarmor', 'coincurve', 'lxml', 'soupsieve', 'definitions', 'passlib', 'repoze-lru', 'nova', 'django-anymail', 'django-safedelete', 'click', 'sqlalchemy', 'django-make-app', 'js-videojs', 'python-nomad', 'idna', 'notable', 'soappy', 'yaybu', 'beautifulsoup', 'django-modern-rpc', 'django-material', 'flask-micropub', 'pygresql', 'pyjwt', 'pycapnp', 'dulwich', 'eth-hash', 'etherweaver', 'flask-security-fork', 'reg', 'flask-i18n', 'serpent', 'cryptacular', 'satosa', 'pyvcloud', 'swift', 'cinder', 'python-smooch', 'django-anonymizer', 'cryptography', 'django-fiber', 'misago', 'plone', 'pyusb', 'seed-stage-based-messaging', 'pbr', 'rope', 'paste', 'mysql-connector-python', 'mezzanine', 'requests-kerberos', 'pcp', 'fresco', 'croniter', 'gandi-cli', 'bigchaindb-driver', 'pypiserver', 'vermin', 'swauth', 'foolscap', 'cbapi', 'frozendict', 'splash', 'argh', 'kinto-dist', 'katka-core', 'tmc', '

In [90]:
# check that all packages name are unique
len(set(packages_vector))

653

In [91]:
# Check python packages
python_package_versions = gqr(g.V().has('__label__', 'python_package_version').groupCount().by("package_name").next()).result
python_packages_v = [pkg for pkg in python_package_versions.keys()]
print(python_packages_v)
print(len(python_packages_v))

['sbp', 'edrnsite-policy', 'tensorflow-estimator', 'django-autocomplete-light', 'html5', 'blinkpy', 'webassets', 'markupsafe', 'django-cms-patched', 'django-secure-auth', 'anncolvar', 'pyarmor', 'trytond', 'mccabe', 'bleach', 'coincurve', 'lxml', 'soupsieve', 'django-hijack', 'uwsgi', 'kotti', 'definitions', 'passlib', 'nova', 'repoze-lru', 'werkzeug', 'django-anymail', 'django-safedelete', 'click', 'sqlalchemy', 'django-make-app', 'js-videojs', 'djblets', 'ooniprobe', 'idna', 'python-nomad', 'pillow-simd', 'cffi', 'py-espeak-ng', 'notable', 'bottle', 'django-storages', 'flask-admin', 'yaybu', 'soappy', 'dpaste', 'pynoorm', 'beautifulsoup', 'markerlib', 'bakercm', 'murano-dashboard', 'django-modern-rpc', 'pigar', 'waitress', 'django-material', 'pyserial', 'stargate', 'boss-cli', 'flask-micropub', 'unicef-locations', 'pygresql', 'pokedex-py', 'pyplanet', 'datacube', 'django-relatives', 'multidict', 'pyjwt', 'pastescript', 'pycapnp', 'rply', 'django-airplane', 'gast', 'mollie-api-python'

In [92]:
# Check RPM packages
rpm_package_versions = gqr(g.V().has("__label__", "rpm_package_version").groupCount().by("package_name").next()).result
rpm_packages = [pkg for pkg in rpm_package_versions.keys()]
print(rpm_packages)
print(len(rpm_packages))

['libidn2', 'rpm-libs', 'curl', 'libsolv', 'chkconfig', 'p11-kit', 'python3-gpg', 'openldap', 'p11-kit-trust', 'coreutils-common', 'dbus-libs', 'python3-pip', 'python3', 'libsss_idmap', 'libseccomp', 'sqlite-libs', 'libattr', 'dnf', 'python3-libdnf', 'gdbm-libs', 'gawk', 'dnf-data', 'libssh', 'keyutils-libs', 'libgcc', 'cryptsetup-libs', 'whois-nls', 'glibc-langpack-en', 'lua-libs', 'libuuid', 'libgpg-error', 'libarchive', 'libunistring', 'rpm-build-libs', 'libdnf', 'libnsl2', 'openssl', 'rpm-sign-libs', 'ncurses-libs', 'device-mapper-libs', 'fedora-gpg-keys', 'systemd-pam', 'glibc-common', 'python3-six', 'ima-evm-utils', 'libffi', 'bzip2-libs', 'libyaml', 'libksba', 'nettle', 'gmp', 'diffutils', 'cracklib', 'python3-hawkey', 'dbus-daemon', 'pcre', 'systemd-libs', 'sed', 'libreport-filesystem', 'libsigsegv', 'python3-rpm', 'vim-minimal', 'libevent', 'python3-unbound', 'libcom_err', 'audit-libs', 'rpm-plugin-systemd-inhibit', 'glibc-minimal-langpack', 'systemd', 'basesystem', 'rootfiles

In [93]:
#Check if the error is in python packages or RPM packages
cpp = set(packages_vector) - set(python_packages_v)
print(len(cpp))
crr = set(packages_vector) & set(rpm_packages)
print(len(crr))

0
0


Check the python package version that have more than one artifact

In [35]:
gqr(g.V().has("__label__", "python_package_version").where(outE().has("__label__", "has_artifact").count().is_(P.gt(1))).valueMap(True).toList()).result

[{'index_url': ['https://pypi.org/simple'],
  'package_version': ['20.3.1'],
  '__label__': ['python_package_version'],
  'ecosystem': ['pypi'],
  'id': 147672,
  '__type__': ['vertex'],
  'label': 'python_package_version',
  'package_name': ['setuptools']},
 {'index_url': ['https://pypi.org/simple'],
  'package_version': ['3.4.0'],
  '__label__': ['python_package_version'],
  'ecosystem': ['pypi'],
  'id': 410018040,
  '__type__': ['vertex'],
  'label': 'python_package_version',
  'package_name': ['pytest']},
 {'index_url': ['https://pypi.org/simple'],
  'package_version': ['0.0.1'],
  '__label__': ['python_package_version'],
  'ecosystem': ['pypi'],
  'id': 1019912,
  '__type__': ['vertex'],
  'label': 'python_package_version',
  'package_name': ['certifi']},
 {'index_url': ['https://pypi.org/simple'],
  'package_version': ['2.1.0'],
  '__label__': ['python_package_version'],
  'ecosystem': ['pypi'],
  'id': 1294392,
  '__type__': ['vertex'],
  'label': 'python_package_version',
  'p

In [44]:
gqr(g.V().has("__label__", "python_package_version").where(outE().has("__label__", "has_artifact").count().is_(P.gt(1))).count().toList()).result

[1164]

Check the edges has_artifact connected to the single python package version

In [55]:
%%time

test_package = 'setuptools'
test_package_version = '20.3.1'

n_edges_has_artifact = gqr(g.V().has("__label__", "python_package_version").has('package_name', test_package).has('package_version', test_package_version).outE().has("__label__", "has_artifact").count().next()).result

print(n_edges_has_artifact)

3
CPU times: user 1.18 ms, sys: 4.71 ms, total: 5.89 ms
Wall time: 5.28 s


Check the corresponding python artifacts for that package version

In [56]:
%%time

gqr(g.V().has("__label__", "python_package_version").has('package_name', test_package).has('package_version', test_package_version).outE().has("__label__", "has_artifact").inV().valueMap().toList()).result

CPU times: user 5.6 ms, sys: 0 ns, total: 5.6 ms
Wall time: 5.45 s


[{'__label__': ['python_artifact'],
  '__type__': ['vertex'],
  'artifact_hash_sha256': ['a1b3f74a1dc7c81368f2bc28a34366cfa6ffe80cdee1451261aabfba1ae1f4a8']},
 {'__label__': ['python_artifact'],
  '__type__': ['vertex'],
  'artifact_hash_sha256': ['5243d74b6f462e5c7042a71c251586ad2c9bb8ebc5474de8e96073c5457e55e8']},
 {'__label__': ['python_artifact'],
  '__type__': ['vertex'],
  'artifact_hash_sha256': ['9553b654fbfef3338c15e9c6751ea4032c3d9f28ef0ed4f73c64c7403cd37b18']}]

Check which python_artifact has more than one edge connected

In [54]:
gqr(g.V().has("__label__", "python_artifact").where(inE().has("__label__", "has_artifact").count().is_(P.gt(1))).count().toList()).result

[150]

In [53]:
gqr(g.V().has("__label__", "python_artifact").where(inE().has("__label__", "has_artifact").count().is_(P.gt(1))).valueMap().toList()).result

[{'__label__': ['python_artifact'],
  '__type__': ['vertex'],
  'artifact_hash_sha256': ['295e1f38225ce2bdd85a0524e265e9adea40507333829f3a7d64c588dd78ff21']},
 {'__label__': ['python_artifact'],
  '__type__': ['vertex'],
  'artifact_hash_sha256': ['c8c2949c8d42af781437e356978f00a42b16a090612573cd7385c62451a00c2b']},
 {'__label__': ['python_artifact'],
  '__type__': ['vertex'],
  'artifact_hash_sha256': ['c4e77f3f9c7aa37ceb5184ee9f643a0fed0838433d3d2f155337404b63b8a3d4']},
 {'__label__': ['python_artifact'],
  '__type__': ['vertex'],
  'artifact_hash_sha256': ['900c5124ebdb6598ca8e8a0c5888f41a5f14117952d5515258e3d20222b21bfa']},
 {'__label__': ['python_artifact'],
  '__type__': ['vertex'],
  'artifact_hash_sha256': ['2898f992f898cd41eeb8d53b6df75495f2f423b6672890aadaf196ea1448edcc']},
 {'__label__': ['python_artifact'],
  '__type__': ['vertex'],
  'artifact_hash_sha256': ['3262c96a1ca437e7e4763e2843746588a965426550f3797a79fca9c6199c431f']},
 {'__label__': ['python_artifact'],
  '__type_

Select an artifact hash and check why there are two edges connect to it

In [96]:
artifact_hash = "295e1f38225ce2bdd85a0524e265e9adea40507333829f3a7d64c588dd78ff21"

In [75]:
gqr(g.V().has("__label__", "python_package_version").where(outE().has("__label__", "has_artifact").inV().has("__label__","python_artifact").has("artifact_hash_sha256", artifact_hash)).valueMap(True).toList()).result

[{'index_url': ['https://pypi.org/simple'],
  'label': 'python_package_version',
  'package_version': ['4.3'],
  'id': 1032384,
  '__label__': ['python_package_version'],
  'ecosystem': ['pypi'],
  '__type__': ['vertex'],
  'package_name': ['tornado']},
 {'index_url': ['https://pypi.org/simple'],
  'label': 'python_package_version',
  'package_version': ['4.3'],
  'id': 410489080,
  '__label__': ['python_package_version'],
  'ecosystem': ['pypi'],
  '__type__': ['vertex'],
  'package_name': ['tornado']}]

In [97]:
gqr(g.E().has("__label__", "has_artifact").where(inV().has("__label__","python_artifact").has("artifact_hash_sha256", artifact_hash)).valueMap().toList()).result

KeyboardInterrupt: 

Take ids of vertices instances for each label

In [78]:
vertices_ids_per_label = gqr(g.V().has("__label__").group().by("__label__").next()).result

In [None]:
vertices_ids_total = gqr(g.V().has("__label__").valueMap(True).next()).result
print(vertices_ids_total)

In [85]:
print(vertices_ids_per_label)

{'ecosystem_solver': [v[410951800], v[4232], v[4280], v[8400]], 'runtime_environment': [v[820236456]], 'cve': [v[5128296], v[3317824], v[5869568], v[5644352], v[942144], v[2089072], v[1302544], v[3788816], v[3821680], v[946280], v[4796528], v[2469952], v[1933352], v[3723512], v[7319616], v[332024], v[1433600], v[3932408], v[643184], v[9994304], v[385272], v[61480], v[5091440], v[745576], v[5148688], v[745512], v[5361832], v[8720448], v[311408], v[1831000], v[5582912], v[2338880], v[290904], v[3039296], v[3670128], v[356368], v[2752616], v[1192184], v[1196096], v[8892480], v[929960], v[1232896], v[57512], v[893016], v[3694760], v[1962152], v[1728760], v[2629736], v[3326056], v[2109504], v[1454096], v[798824], v[3088488], v[5435496], v[3452968], v[3858448], v[410112120], v[4145152], v[5300224], v[1376296], v[1081512], v[3068152], v[5050384], v[1716392], v[4489304], v[1441856], v[1425472], v[1097896], v[1548288], v[820240632], v[3469352], v[1839208], v[1466456], v[3510384], v[410099832], 

Take ids for all vertices

Check query optimization

In [119]:
%%time
gqr(g.E()
    .has("__label__", "solved")
    .has("solver_datetime")
    .has("solver_document_id")
    .has("solver_error")
    .has("solver_error_unsolvable")
    .has("solver_error_unparsable")
    .valueMap()
    .select("solver_document_id")
    .dedup()
    .count()
    .next()
   ).result

CPU times: user 4.17 ms, sys: 1.94 ms, total: 6.11 ms
Wall time: 33.7 s


3060

In [112]:
%%time
gqr(g.E()
    .has("__label__", "solved")
    .valueMap()
    .select("solver_document_id")
    .dedup()
    .count()
    .next()
   ).result

CPU times: user 2.54 ms, sys: 1.14 ms, total: 3.68 ms
Wall time: 26 s


3060

In [122]:
%%time
gqr(g.E()
    .has("__label__", "requires")
    .has("__type__", "edge")
    .has("analysis_datetime")
    .has("analysis_document_id")
    .has("analyzer_name")
    .has("analyzer_version")
    .valueMap()
    .select("analysis_document_id")
    .dedup()
    .count()
    .next()
   ).result

CPU times: user 2.86 ms, sys: 1.86 ms, total: 4.72 ms
Wall time: 1.53 s


1

In [None]:
%%time
gqr(g.E()
    .has("__label__", "requires")
    .valueMap()
    .select("analysis_document_id")
    .dedup()
    .count()
    .next()
   ).result

profiling = gqr(g.E()
    .has("__label__", "solved")
    .flatMap(
    has("__type__", "edge")
    .has("solver_datetime")
    .has("solver_document_id")
    .has("solver_error")
    .has("solver_error_unsolvable")
    .has("solver_error_unparsable")
    )
    .valueMap()
    .select("solver_document_id")
    .dedup()  
    .count()
    .profile()
    .next()
   ).result

profiling



In [113]:
for metric in profiling['@value']['metrics']:
    print()
    print('dur', metric['@value']['dur'])
    print('counts', metric['@value']['counts'])
    print('name', metric['@value']['name'])
    print('annotations', metric['@value']['annotations'])
    print('id', metric['@value']['id'])
    


dur 24682.713065
counts {'traverserCount': 402217, 'elementCount': 402217}
name JanusGraphStep([],[__label__.eq(solved)])
annotations {'percentDur': 23.304144933768587, 'condition': '(__label__ = solved)', 'isFitted': 'false', 'query': '[]', 'orders': '[]', 'isOrdered': 'true'}
id 9.0.0()

dur 64836.808368
counts {'traverserCount': 402217, 'elementCount': 402217}
name TraversalFlatMapStep([HasStep([__type__.eq(edge)]), ProfileStep, TraversalFilterStep([JanusGraphPropertiesStep([solver_datetime],value), ProfileStep]), ProfileStep, TraversalFilterStep([JanusGraphPropertiesStep([solver_document_id],value), ProfileStep]), ProfileStep, TraversalFilterStep([JanusGraphPropertiesStep([solver_error],value), ProfileStep]), ProfileStep, TraversalFilterStep([JanusGraphPropertiesStep([solver_error_unsolvable],value), ProfileStep]), ProfileStep, TraversalFilterStep([JanusGraphPropertiesStep([solver_error_unparsable],value), ProfileStep]), ProfileStep])
annotations {'percentDur': 61.215571208555545}

In [None]:
%%time
gqr(g.V().has("__label__", "python_package_version").has("__type__", "vertex").has("ecosystem", "pypi").has("package_name").has("package_version").and_(inE().has("__label__", "solved").has("__type__", "edge").has("solver_error", True).has("solver_error_unsolvable", False).has("solver_error_unparsable", True)).count().next()
).result

In [None]:
%%time
gqr(g.V().has("__label__", "python_package_version").has("__type__", "vertex").has("ecosystem", "pypi").has("package_name").has("package_version").and_(inE().has("__label__", "solved").has("__type__", "edge").has("solver_error", True).has("solver_error_unsolvable", True).has("solver_error_unparsable", False)).count().next()
).result

In [None]:
%%time
gqr(g.V().flatMap(has("__label__", "python_package_version").has("__type__", "vertex").has("ecosystem", "pypi").has("package_name").has("package_version").and_(inE().has("__label__", "solved").has("__type__", "edge").has("solver_error", True).has("solver_error_unsolvable", True).has("solver_error_unparsable", False))).count().next()
).result

In [None]:
%%time
results_dict = graph_db.get_error_python_packages_count(unsolvable=True)
print(results_dict)