Give context to solver errors in Solved Python Packages

Aim

Use solver dataset provided by Thoth Dependency Solver, so that we can derive context on why dependencies cannot be solved in order to better advise users on why something cannot be used.

Background on Thoth Dependency solver

The aim of Thoth solver is to answer a simple question - what packages will be installed (resolved by pip or any Python compliant dependency resolver) for the provided stack?

What is a solver? In Thoth language a solver example is solver-fedora-31-py37 which is named after:

operating system used (e.g Fedora 31)
Python interpreter installed (e.g. Python 3.7)
on which the specific Python package is going to be installed.

Thoth Solver Dataset

Thoth Solver Dataset is part of a series of datasets related to observations regarding software stacks (e.g. dependency tree, installability, performance, security, health) as part of Project Thoth.
Thoth Solver Dataset is made by solver reports in json format, where each solver report is created for a specific package (e.g Python package from a certain index in a certain version), solved using a certain solver and it is described in the notebook called Thoth Solver Dataset.

Thoth Solver Error Data

If the installation of a package was not succesfull there will be information stored for each package error;

Solver Error data from solver reports consists of following information:

command,command used to install the package;
message,error log;
return_code,
stderr,
stdout,
timeout,
index_url, from where the package was download;
package_name;
package_version;
type, error type;

This repo contains two notebooks :

These notebooks can be parametrized using papermill. In this way, we can run the template notebooks automatically locally or in Argo workflow.

1. Template notebook to pre-process solver dataset and output clean dataset which is the input for clustering : notebooks/PreprocessSolverErrorData.ipynb

The purpose of this notebook is to preprocess solver data, i.e, extract error data from solver data, prepare data for clustering, clean and tokenize the clustering data and save the clean dataset for ClusterError notebook.

Excuecute via CLI:

papermill PreprocessSolverErrorData.ipynb PreprocessSolverErrorDataOutput.ipynb -p get_fresh_data False

where get_fresh_data is the parameter which decides if we need to get fresh data from Ceph or use the data from csv file.

2. Template notebook to cluster errors to identify the type of errors that can appear in solver reports : notebooks/ClusterErrors.ipynb

The purpose of this notebook is to cluster solver errors in order to identify the type of errors that can appear in solver reports.

Excuecute via CLI:

papermill ClusterErrors.ipynb ClusterErrorsOutput.ipynb -p preprocessed_filename  'error-clean-data.csv'

where preprocessed_filename is the parameter which contains the filename of preprocessed clean data.

Background/References

1. Thoth: https://thoth-station.ninja/
2. Thoth GitHub: https://github.com/thoth-station
3. Solver: https://github.com/thoth-station/solver
4. Thoth Solver dataset on Kaggle: https://www.kaggle.com/thothstation/thoth-solver-dataset-v10
5. Thoth Solver dataset on GitHub: https://github.com/thoth-station/datasets

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
models		models
notebooks		notebooks
.aicoe.yaml		.aicoe.yaml
.coafile		.coafile
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prow.yaml		.prow.yaml
.thoth.yaml		.thoth.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
OWNERS		OWNERS
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
version.py		version.py

License

thoth-station/solver-errors-reporter

Folders and files

Latest commit

History

Repository files navigation

Give context to solver errors in Solved Python Packages

Aim

Background on Thoth Dependency solver

Thoth Solver Dataset

Thoth Solver Error Data

Solver Error data from solver reports consists of following information:

This repo contains two notebooks :

Background/References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages