# Thoth-solver dataset

- Contains datasets of software stacks observations. 
- Provides information about dependency tree, installability, performance, security, etc.
- All of them were created by various parts of Project Thoth and are stored in Thoth Knowledge Graph.
- Was created by [Thoth Dependency Solver](https://github.com/thoth-station/solver) and answers the question:

    What packages will be installed for the provided stack?

Following dataset can be easily accessed through:
- [Thoth datasets github](https://github.com/thoth-station/datasets/tree/master/notebooks/thoth-solver-dataset)
- [Kaggle](https://www.kaggle.com/thothstation/thoth-solver-dataset-v10)


## Goal 

The ultimate goals is to provide useful and easily available datasets for data scientist to train Machine Learning models.

## How to use the Data

In order to use provided data:
- cite Thoth Team as the source if you use the data
- accept that you are solely responsible of how you use the data and
- do not sell this data to anyone, it is free!

## Import packages

In [2]:
from thoth.report_processing.components.solver import Solver
import pandas as pd

## Access the data

In [3]:
from pathlib import Path
current_path = Path.cwd()
solver_reports = Solver.aggregate_solver_results(repo_path=current_path.joinpath('thoth-solver-dataset-v2.0/solver'), is_local=True)

## Access one solver report

Each of reports is created for a specific package and solved using a certain solver.

In this context **solver** example is solver-fedora-34-py-39 that is named after:
- operating system used (e.g. Fedora 34)
- Python interpreter installed (e.g. Python 3.9)

on which **specified Python package** will be installed.

In [4]:
solver_report = solver_reports['solver-rhel-8-py38-210712140154-9e9eab93c147ecab']


Every solver run result consists of:
- **metadata** that has information of dependency solver itself
- **result** that has actual inputs and outputs of solver

In [5]:
solver_report

{'metadata': {'analyzer': 'thoth-solver',
  'analyzer_version': '1.10.1',
  'arguments': {'cli.py': {'verbose': False},
   'python': {'exclude_packages': None,
    'index': 'https://pypi.org/simple',
    'limited_output': False,
    'no_pretty': False,
    'no_transitive': True,
    'output': '/mnt/workdir/solver-rhel-8-py38-210712140154-9e9eab93c147ecab',
    'requirements': 'boto3===1.12.27',
    'virtualenv': '/opt/app-root/src/solver-venv'}},
  'datetime': '2021-07-12T15:43:58.548147',
  'distribution': {'codename': 'Ootpa',
   'id': 'rhel',
   'like': 'fedora',
   'version': '8.3',
   'version_parts': {'build_number': '', 'major': '8', 'minor': '3'}},
  'document_id': 'solver-rhel-8-py38-210712140154-9e9eab93c147ecab',
  'duration': 47,
  'hostname': 'solver-rhel-8-py38-210712140154-9e9eab93c147ecab-462235803',
  'os_release': {'id': 'rhel',
   'name': 'Red Hat Enterprise Linux',
   'platform_id': 'platform:el8',
   'redhat_bugzilla_product': 'Red Hat Enterprise Linux 8',
   'redh

## Metadata

Solver report metadata has following information:
- **analyzer**, name of the analyzer;
- **analyzer_version**, analyzer version;
- **arguments**, arguments for the analyzer;
    - **python** specific inputs regarding the package to be analyzed (aka solved in this case);
    - **dependency-solver** specific inputs;
- **datetime**, when the solver report has been created;
- **distribution**, operating system specific info;
- **document_id**, unique ID of the solver report which includes the solver used (e.g. solver-fedora-31-py37);
- **duration**, duration of the solver run for a certain Python Package;
- **hostname**, Container name where the solver was run;
- **os_release**, OS info;
- **python**, Python Inrpreter info;
- **thoth_deployment_name**, Thoth architecture specific info;
- **timestamp**;


In [8]:
pd.DataFrame([solver_report["metadata"]])

Unnamed: 0,analyzer,analyzer_version,arguments,datetime,distribution,document_id,duration,hostname,os_release,python,thoth_deployment_name,timestamp
0,thoth-solver,1.10.1,"{'cli.py': {'verbose': False}, 'python': {'exc...",2021-07-12T15:43:58.548147,"{'codename': 'Ootpa', 'id': 'rhel', 'like': 'f...",solver-rhel-8-py38-210712140154-9e9eab93c147ecab,47,solver-rhel-8-py38-210712140154-9e9eab93c147ec...,"{'id': 'rhel', 'name': 'Red Hat Enterprise Lin...","{'api_version': 1013, 'implementation_name': '...",ocp4-stage,1626104638


In [30]:
pd.set_option('display.max_colwidth', None)

In [31]:
pd.DataFrame([solver_report["metadata"]])['arguments']

0    {'cli.py': {'verbose': False}, 'python': {'exclude_packages': None, 'index': 'https://pypi.org/simple', 'limited_output': False, 'no_pretty': False, 'no_transitive': True, 'output': '/mnt/workdir/solver-rhel-8-py38-210712140154-9e9eab93c147ecab', 'requirements': 'boto3===1.12.27', 'virtualenv': '/opt/app-root/src/solver-venv'}}
Name: arguments, dtype: object

In [10]:
solver_subset_metadata = Solver.extract_data_from_solver_metadata(solver_report["metadata"])
pd.DataFrame([solver_subset_metadata])

Unnamed: 0,document_id,datetime,requirements,solver,os_name,os_version,python_interpreter,analyzer_version
0,solver-rhel-8-py38-210712140154-9e9eab93c147ecab,2021-07-12T15:43:58.548147,boto3===1.12.27,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1


## Access all available solver reports

In [11]:
solver_reports_metadata = []
for solver_document in solver_reports:
    solver_reports_metadata.append(
        Solver.extract_data_from_solver_metadata(solver_reports[solver_document]["metadata"])
    )

solver_reports_metadata_df = pd.DataFrame(solver_reports_metadata)

solver_reports_metadata_df.head()

Unnamed: 0,document_id,datetime,requirements,solver,os_name,os_version,python_interpreter,analyzer_version
0,solver-rhel-8-py38-210712140154-9e9eab93c147ecab,2021-07-12T15:43:58.548147,boto3===1.12.27,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1
1,solver-rhel-8-py38-210712234008-1e13cb9a0ac76e9f,2021-07-12T23:54:36.639212,pip===6.0.6,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1
2,solver-rhel-8-py38-210713042150-50168537c03f93c5,2021-07-13T05:09:19.636851,plotly===4.0.0a6,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1
3,solver-rhel-8-py38-210713022221-360476b82475b05e,2021-07-13T03:08:51.411256,jupyterlab===3.1.0a6,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1
4,solver-rhel-8-py38-210712140218-62d570f631918924,2021-07-12T17:19:58.964555,boto3===1.16.18,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1


## Solver report result

Report result contains following information:
- **environment**, information about the environment on which the package has being solved;
- **environment_packages**, information about external packages installed on the environment;
- **errors**, if the installation of a package was not succesfull there will be information stored for each package error;
    - **details**,
        - command,
        - message,
        - return_code,
        - stderr,
        - stdout,
        - timeout,
    - **index_url** from where the package was download;
    - **package_name**;
    - **package_version**;
    - **is_provided_package**, flag for storing package;
    - **is_provided_package_version**, flag for storing package;
    - **type**, error type;
- **tree**, all the packages installed in the dependency tree and information about them;
    - **dependencies**
    - **metadata** of the package as taken from importlib_metadata;
    - **index_url** from where the package was download;
    - **package_name**;
    - **package_version**;
    - **sha256**;
    - **platform** description (introduced in this version)
    - **packages** called list (introduced in this version)
- **unparsed**, if there are packages in the tree that could not be parsed;
- **unresolved**, if there are packages in the tree that could not be solved;


In [12]:
pd.DataFrame([solver_report["result"]])

Unnamed: 0,environment,environment_packages,errors,platform,tree,unparsed,unresolved
0,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': [], 'extras': [],...",[],[]


In [35]:
pd.DataFrame([solver_report["result"]["environment"]])

Unnamed: 0,implementation_name,implementation_version,os_name,platform_machine,platform_python_implementation,platform_release,platform_system,platform_version,python_full_version,python_version,sys_platform
0,cpython,3.8.3,posix,x86_64,CPython,4.18.0-193.41.1.el8_2.x86_64,Linux,#1 SMP Wed Jan 13 11:33:33 EST 2021,3.8.3,3.8,linux


Look into environment packages for particular solver report

In [14]:
pd.set_option('display.max_colwidth', None)

In [36]:
env_packs = pd.DataFrame([solver_report["result"]["environment_packages"]])

In [37]:
print(env_packs)

                                                            0
0  {'package_name': 'pipdeptree', 'package_version': '2.0.0'}


## Consider all solver reports

In [17]:
solver_reports_extracted_data = []
solver_errors = []
for solver_document in solver_reports:
    solver_report_extracted_data = Solver.extract_data_from_solver_metadata(
        solver_reports[solver_document]["metadata"]
    )
    for k, v in solver_reports[solver_document]["result"].items():
        solver_report_extracted_data[k] = v
        if k == "errors" and v:
            errors = Solver.extract_errors_from_solver_result(v)
            for error in errors:
                solver_errors.append(error)
    
    packages = Solver.extract_tree_from_solver_result(solver_reports[solver_document]["result"])
    solver_report_extracted_data["packages"] = packages
    solver_reports_extracted_data.append(solver_report_extracted_data)

In [18]:
solver_report["result"]

{'environment': {'implementation_name': 'cpython',
  'implementation_version': '3.8.3',
  'os_name': 'posix',
  'platform_machine': 'x86_64',
  'platform_python_implementation': 'CPython',
  'platform_release': '4.18.0-193.41.1.el8_2.x86_64',
  'platform_system': 'Linux',
  'platform_version': '#1 SMP Wed Jan 13 11:33:33 EST 2021',
  'python_full_version': '3.8.3',
  'python_version': '3.8',
  'sys_platform': 'linux'},
 'environment_packages': [{'package_name': 'pipdeptree',
   'package_version': '2.0.0'}],
 'errors': [],
 'platform': 'linux-x86_64',
 'tree': [{'dependencies': [{'extra': [],
     'extras': [],
     'marker': None,
     'marker_evaluated': None,
     'marker_evaluation_error': None,
     'marker_evaluation_result': True,
     'normalized_package_name': 'botocore',
     'package_name': 'botocore',
     'resolved_versions': [{'index': 'https://pypi.org/simple',
       'versions': ['1.15.27',
        '1.15.28',
        '1.15.29',
        '1.15.30',
        '1.15.31',
     

In [19]:
pd.set_option('display.max_colwidth', 50)
solver_reports_metadata_df = pd.DataFrame(solver_reports_extracted_data)
solver_reports_metadata_df.head(10)

Unnamed: 0,document_id,datetime,requirements,solver,os_name,os_version,python_interpreter,analyzer_version,environment,environment_packages,errors,platform,tree,unparsed,unresolved,packages
0,solver-rhel-8-py38-210712140154-9e9eab93c147ecab,2021-07-12T15:43:58.548147,boto3===1.12.27,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': [], 'extras': [],...",[],[],"[{'package_name': 'boto3', 'package_version': ..."
1,solver-rhel-8-py38-210712234008-1e13cb9a0ac76e9f,2021-07-12T23:54:36.639212,pip===6.0.6,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': ['testing'], 'ext...",[],[],"[{'package_name': 'pip', 'package_version': '6..."
2,solver-rhel-8-py38-210713042150-50168537c03f93c5,2021-07-13T05:09:19.636851,plotly===4.0.0a6,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': [], 'extras': [],...",[],[],"[{'package_name': 'plotly', 'package_version':..."
3,solver-rhel-8-py38-210713022221-360476b82475b05e,2021-07-13T03:08:51.411256,jupyterlab===3.1.0a6,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': [], 'extras': [],...",[],[],"[{'package_name': 'jupyterlab', 'package_versi..."
4,solver-rhel-8-py38-210712140218-62d570f631918924,2021-07-12T17:19:58.964555,boto3===1.16.18,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': [], 'extras': [],...",[],[],"[{'package_name': 'boto3', 'package_version': ..."
5,solver-rhel-8-py38-210713022109-f0494aca092d6244,2021-07-13T02:27:00.552040,jupyter-tensorboard===0.1.1.dev0,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': [], 'extras': [],...",[],[],"[{'package_name': 'jupyter-tensorboard', 'pack..."
6,solver-rhel-8-py38-210713022239-8d67f375f7d48482,2021-07-13T03:17:50.681585,jupyterlab-s3-browser===0.8.0.dev6,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': [], 'extras': [],...",[],[],"[{'package_name': 'jupyterlab-s3-browser', 'pa..."
7,solver-rhel-8-py38-210712140122-314670eb871374f4,2021-07-12T14:14:33.008896,setuptools===10.1,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[{'details': {'message': 'Failed to successful...,linux-x86_64,[],[],[],[]
8,solver-rhel-8-py38-210713003540-974d0e8c0a0de8e1,2021-07-13T00:57:01.618410,boto3===1.9.1,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': [], 'extras': [],...",[],[],"[{'package_name': 'boto3', 'package_version': ..."
9,solver-rhel-8-py38-210713022123-a87f5f985d66aff7,2021-07-13T02:33:39.467249,jupyterhub===1.0.0b2,red hat enterprise linux-83-py38,red hat enterprise linux,83,3.8,1.10.1,"{'implementation_name': 'cpython', 'implementa...","[{'package_name': 'pipdeptree', 'package_versi...",[],linux-x86_64,"[{'dependencies': [{'extra': [], 'extras': [],...",[],[],"[{'package_name': 'jupyterhub', 'package_versi..."


## Packages under different names in import

To check packages in the ecosystem that provide modules under a different name than the package name itself we will compare data from:
- 'requirements' 
- 'packages'

In [40]:
solver_reports_metadata_df.loc[212]['requirements']

'jupyter_kernel_gateway===2.0.1'

In [39]:
solver_reports_metadata_df.loc[212]['packages']

[{'package_name': 'jupyter-kernel-gateway',
  'package_version': '2.0.1',
  'index_url': 'https://pypi.org/simple',
  'importlib_metadata': {'Author': 'Jupyter Development Team',
   'Author-email': 'jupyter@googlegroups.com',
   'Classifier': ['Intended Audience :: Developers',
    'Intended Audience :: System Administrators',
    'Intended Audience :: Science/Research',
    'License :: OSI Approved :: BSD License',
    'Programming Language :: Python',
    'Programming Language :: Python :: 2',
    'Programming Language :: Python :: 3'],
   'Home-page': 'http://github.com/jupyter-incubator/kernel_gateway',
   'Keywords': 'Interactive,Interpreter,Kernel,Web,Cloud',
   'License': 'BSD',
   'Metadata-Version': '2.0',
   'Name': 'jupyter-kernel-gateway',
   'Platform': ['Linux', 'Mac OS X', 'Windows'],
   'Requires-Dist': ['jupyter-client (>=4.2.0)',
    'jupyter-core (>=4.0)',
    'notebook (<6.0,>=5.0.0)',
    'requests (<3.0,>=2.7)',
    'tornado (>=4.2.0)',
    'traitlets (>=4.2.0)'],

## Check all the available solver reports

In [26]:
nonmatching_packages = []
empty_packages = []
len_df = len(solver_reports_metadata_df)

for i in range(len_df):
    package_name_reqs_i = solver_reports_metadata_df.loc[i]['requirements'].split('==')[0]
    
    if len(solver_reports_metadata_df.loc[i]['packages']) == 0:
        package_name_i = ''
    else:   
        package_name_i = solver_reports_metadata_df.loc[i]['packages'][0]['package_name']
    
    if package_name_i != package_name_reqs_i:
#         print("Non-Matching")
        if package_name_i != '':
            nonmatching_packages.append([package_name_reqs_i,i,package_name_i])
            print(f'{package_name_reqs_i} != {package_name_i}')
        else:       
            empty_packages.append([package_name_reqs_i,i])
            print(f'{package_name_reqs_i} and {package_name_i}')
print(f'Number of packages that provide modules under a different name than the package name itself = {len(nonmatching_packages)} ')
print(f'Number of packages that have no packages specified = {len(empty_packages)} ')

setuptools and 
supervisor and 
sqlalchemy != SQLAlchemy
pandas and 
numpy and 
cython != Cython
tensorflow and 
scipy and 
sqlalchemy != SQLAlchemy
setuptools and 
tensorflow-gpu and 
sqlalchemy != SQLAlchemy
tensorflow and 
statsmodels and 
setuptools and 
scikit-image and 
sqlalchemy != SQLAlchemy
numpy and 
dask and 
statsmodels and 
numpy and 
torchvision and 
cython and 
cython and 
scipy and 
numpy and 
tensorflow and 
cython and 
sqlalchemy != SQLAlchemy
sqlalchemy and 
tensorflow and 
setuptools and 
sqlalchemy != SQLAlchemy
scikit-image and 
tensorflow and 
setuptools and 
setuptools and 
matplotlib and 
torch and 
dask and 
sqlalchemy != SQLAlchemy
cython and 
setuptools and 
cython != Cython
setuptools and 
tensorflow and 
sqlalchemy != SQLAlchemy
pyarrow and 
plotly and 
sqlalchemy != SQLAlchemy
cython and 
cython != Cython
torch and 
setuptools and 
scikit-learn and 
jupyter_kernel_gateway != jupyter-kernel-gateway
sqlalchemy and 
setuptools and 
sqlalchemy != SQLAlchemy


setuptools and 
setuptools and 
sqlalchemy != SQLAlchemy
setuptools and 
jupyter_kernel_gateway != jupyter-kernel-gateway
cython and 
scikit-learn and 
pandas and 
setuptools and 
plotly and 
sqlalchemy != SQLAlchemy
pandas and 
pandas and 
scikit-learn and 
setuptools and 
sqlalchemy != SQLAlchemy
scikit-learn and 
statsmodels and 
scikit-learn and 
cython and 
sqlalchemy != SQLAlchemy
numpy and 
numpy and 
pandas and 
pyarrow and 
setuptools and 
scikit-learn and 
setuptools and 
setuptools and 
setuptools and 
scikit-learn and 
seaborn and 
numpy and 
setuptools and 
cython != Cython
tensorflow-gpu and 
h5py and 
setuptools and 
setuptools and 
cython and 
pyarrow and 
sqlalchemy != SQLAlchemy
sqlalchemy and 
sqlalchemy != SQLAlchemy
tensorflow-gpu and 
sqlalchemy != SQLAlchemy
pyarrow and 
tensorflow and 
scipy and 
setuptools and 
cython != Cython
pandas and 
pandas and 
numpy and 
ipywidgets and 
setuptools and 
matplotlib and 
setuptools and 
sqlalchemy and 
jupyter_kernel_gatew

Main differences: 
- Uppercase or lowercase
- '-' turned to '.'
- Empty package name in packages

In [27]:
nonmatching_packages

[['sqlalchemy', 19, 'SQLAlchemy'],
 ['cython', 29, 'Cython'],
 ['sqlalchemy', 44, 'SQLAlchemy'],
 ['sqlalchemy', 48, 'SQLAlchemy'],
 ['sqlalchemy', 63, 'SQLAlchemy'],
 ['sqlalchemy', 96, 'SQLAlchemy'],
 ['sqlalchemy', 107, 'SQLAlchemy'],
 ['sqlalchemy', 144, 'SQLAlchemy'],
 ['cython', 159, 'Cython'],
 ['sqlalchemy', 168, 'SQLAlchemy'],
 ['sqlalchemy', 173, 'SQLAlchemy'],
 ['cython', 202, 'Cython'],
 ['jupyter_kernel_gateway', 212, 'jupyter-kernel-gateway'],
 ['sqlalchemy', 222, 'SQLAlchemy'],
 ['sqlalchemy', 265, 'SQLAlchemy'],
 ['cython', 271, 'Cython'],
 ['cython', 333, 'Cython'],
 ['sqlalchemy', 335, 'SQLAlchemy'],
 ['sqlalchemy', 352, 'SQLAlchemy'],
 ['sqlalchemy', 363, 'SQLAlchemy'],
 ['sqlalchemy', 402, 'SQLAlchemy'],
 ['sqlalchemy', 441, 'SQLAlchemy'],
 ['sqlalchemy', 448, 'SQLAlchemy'],
 ['cython', 481, 'Cython'],
 ['sqlalchemy', 495, 'SQLAlchemy'],
 ['sqlalchemy', 508, 'SQLAlchemy'],
 ['jupyter_kernel_gateway', 523, 'jupyter-kernel-gateway'],
 ['sqlalchemy', 527, 'SQLAlchemy']

In [28]:
empty_packages

[['setuptools', 7],
 ['supervisor', 15],
 ['pandas', 21],
 ['numpy', 23],
 ['tensorflow', 37],
 ['scipy', 41],
 ['setuptools', 46],
 ['tensorflow-gpu', 47],
 ['tensorflow', 54],
 ['statsmodels', 55],
 ['setuptools', 56],
 ['scikit-image', 61],
 ['numpy', 70],
 ['dask', 71],
 ['statsmodels', 73],
 ['numpy', 74],
 ['torchvision', 80],
 ['cython', 86],
 ['cython', 89],
 ['scipy', 90],
 ['numpy', 91],
 ['tensorflow', 92],
 ['cython', 94],
 ['sqlalchemy', 98],
 ['tensorflow', 99],
 ['setuptools', 102],
 ['scikit-image', 111],
 ['tensorflow', 113],
 ['setuptools', 114],
 ['setuptools', 118],
 ['matplotlib', 122],
 ['torch', 130],
 ['dask', 142],
 ['cython', 149],
 ['setuptools', 158],
 ['setuptools', 162],
 ['tensorflow', 166],
 ['pyarrow', 169],
 ['plotly', 172],
 ['cython', 190],
 ['torch', 203],
 ['setuptools', 205],
 ['scikit-learn', 208],
 ['sqlalchemy', 220],
 ['setuptools', 221],
 ['statsmodels', 223],
 ['setuptools', 226],
 ['torchvision', 233],
 ['torch', 239],
 ['sqlalchemy', 244],

## Errors data from solver reports

In [29]:
solver_total_errors_df = pd.DataFrame(solver_errors)

solver_total_errors_df.head()

Unnamed: 0,package_name,package_version,index_url,type,command,message,return_code,stderr,stdout,timeout
0,setuptools,10.1,https://pypi.org/simple,command_error,,Failed to successfully execute function in Pyt...,,,,
1,supervisor,3.3.0,https://pypi.org/simple,command_error,/opt/app-root/src/solver-venv/bin/python3 -m p...,Command exited with non-zero status code (1): ...,1.0,ERROR: Command errored out with exit statu...,Collecting supervisor===3.3.0\n Downloading h...,60.0
2,pandas,0.25.0,https://pypi.org/simple,command_error,/opt/app-root/src/solver-venv/bin/python3 -m p...,Command exited with non-zero status code (1): ...,1.0,ERROR: Command errored out with exit statu...,Collecting pandas===0.25.0\n Downloading http...,60.0
3,numpy,1.11.1,https://pypi.org/simple,command_error,/opt/app-root/src/solver-venv/bin/python3 -m p...,Command exited with non-zero status code (1): ...,1.0,ERROR: Command errored out with exit statu...,Collecting numpy===1.11.1\n Downloading https...,60.0
4,ipywidgets,5.0.0.b3,https://pypi.org/simple,command_error,/opt/app-root/src/solver-venv/bin/python3 -m p...,Command exited with non-zero status code (1): ...,1.0,ERROR: Could not find a version that satisfies...,,60.0
