# Cluster errors to identify the type of errors that can appear in solver reports 

# Table of Contents

1. [Introduction](#Introduction)
2. [Import Packages](#Import_packages)
3. [Load the clean solver data saved by 'PreprocessSolverErrorData' notebook](#load_clean_data)
4. [Filter data using Solver / datetime](#filter)
5. [Word to Vector Conversion using Continuous Bag of Words model (CBOW)](#word2vec)
6. [Sentence (error message) to vector conversion](#sent2vec)
7. [Clustering using DBScan](#clustering)
8. [Get cluster statistics such as : "pattern", "mean_length", "mean_similarity"](#cluster_stats)
9. [Save clustered data to Ceph](#save_to_ceph)
10. [View data from each cluster](#view_data)
 1. [Cluster No. 0: FileNotFoundError](#c0)
 2. [Cluster No. 1: UnableToExecuteGccError](#c1)
 3. [Cluster No. 3: NoMatchingDistributionFoundError](#c3)
11. [Clusters with more than one error](#clusters_with_more_than_one_error)
 1. [Cluster No. 10: ImportError, HTTPError](#c10)
 2. [Cluster No. 106: CalledProcessError, FileNotFoundError, KeyError, RuntimeError](#c106)
 3. [Cluster No. 116:  ConnectionError, OSError, MaxRetryError, DistutilsError, ResponseError](#c16)
 4. [Cluster No. 7: CheckTheLogsError : Need further exploring](#c7) 

## Introduction  <a id='Introduction'></a>

The purpose of this notebook is to cluster solver errors so that we can derive context on why dependencies cannot be solved in order to better advise users on why something cannot be used.

#### Summary :
- Preprocessed data by [PreprocessSolverErrorData](./PreprocessSolverErrorData.ipynb) notebook is loaded.
- Each word in converted into a vector using [Word2Vec](https://radimrehurek.com/gensim/models/word2vec.html) (Continuous Bag of Words model). 
- Each error message is then converted into a vector(Sentence2vec using word2vec model).
- Clustering is done using [DBScan](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html).
- Cluster statistics such as "pattern", "mean_length" and "mean_similarity" is calculated.
- Error Class is defined and added to the dataframe.
- Saved the classified error data to Ceph.

## Import packages <a id='Import_packages'></a>

In [1]:
import pandas as pd
import multiprocessing
import pickle
import numpy as np
import difflib
import regex as re
import boto3
import os

from math import sqrt
from sklearn.cluster import DBSCAN
from sklearn.neighbors import NearestNeighbors
from gensim.models import Word2Vec
from kneed import KneeLocator
from string import punctuation    

In [2]:
pd.set_option('max_colwidth', 4000)
pd.set_option('display.max_rows', 200)

In [3]:
cpu_number = multiprocessing.cpu_count()
w2v_window= 7

## Load the clean solver data saved by 'PreprocessSolverErrorData' notebook <a id='load_clean_data'></a>

In [4]:
preprocessed_filename = 'error-clean-data.csv'

In [5]:
entire_error_df = pd.read_csv(preprocessed_filename)

In [6]:
entire_error_df.head()

Unnamed: 0,index,document_id,command,package_name,package_version,solver,datetime,environment,analyzer_version,message,...,Error_info,command_info,cwd,Complete_output,ERROR,Exception,specific_error,clustering_data,tokenized_clustering_data,context_message
0,0,solver-fedora-31-py37-0022caa4,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps Cython===0.23.4 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",Cython,0.23.4,solver-fedora-31-py37,2020-08-05T21:54:08.679464,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-bnnj141r/Cython/setup.py'""'""'; __file__='""'""'/tmp/pip-install-bnnj141r/Cython/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' bdist_wheel -d /tmp/pip-wheel-9eme38jp\n cwd: /tmp/pip-install-bnnj141r/Cython/\n Complete output (343 lines):\n Unable to find pgen, not compiling formal grammar.\n running bdist_wheel\n running build\n running build_py\n creating build\n creating build/lib.linux-x86_64-3.7\n copying cython.py -> build/lib.linux-x86_64-3.7\n creating build/lib.linux-x86_64-3.7/Cython\n copying Cython/__init__.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Utils.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/TestUtils.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/StringIOTree.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Shadow.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Debugging.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Coverage.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/CodeWriter.py -> build/lib.linux-x86_64-3.7/Cython\n creating build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/__init__.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/IpythonMagic.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/Inline.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/Dependencies.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/Cythonize.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/BuildExecutable.py -> build/lib.linux-x86_64-3.7/Cython/Build\n creating build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/__init__.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Visitor.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Version.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/UtilityCode.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/UtilNodes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TypeSlots.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TypeInference.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TreePath.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TreeFragment.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Symtab.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/StringEncoding.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Scanning.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/PyrexTypes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Pipeline.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Parsing.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/ParseTreeTransforms.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Options.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Optimize.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Nodes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Naming.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/ModuleNode.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/MemoryView.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Main.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Lexi...",...,Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:,"['command: /home/solver/venv/bin/python3 -u -c \'import sys, setuptools, tokenize; sys.argv[0] = \'""\'""\'/tmp/pip-install-bnnj141r/Cython/setup.py\'""\'""\'; __file__=\'""\'""\'/tmp/pip-install-bnnj141r/Cython/setup.py\'""\'""\';f=getattr(tokenize, \'""\'""\'open\'""\'""\', open)(__file__);code=f.read().replace(\'""\'""\'\\r\\n\'""\'""\', \'""\'""\'\\n\'""\'""\');f.close();exec(compile(code, __file__, \'""\'""\'exec\'""\'""\'))\' bdist_wheel -d /tmp/pip-wheel-9eme38jp', 'command: /home/solver/venv/bin/python3 -u -c \'import sys, setuptools, tokenize; sys.argv[0] = \'""\'""\'/tmp/pip-install-bnnj141r/Cython/setup.py\'""\'""\'; __file__=\'""\'""\'/tmp/pip-install-bnnj141r/Cython/setup.py\'""\'""\';f=getattr(tokenize, \'""\'""\'open\'""\'""\', open)(__file__);code=f.read().replace(\'""\'""\'\\r\\n\'""\'""\', \'""\'""\'\\n\'""\'""\');f.close();exec(compile(code, __file__, \'""\'""\'exec\'""\'""\'))\' install --record /tmp/pip-record-d45ftp_t/install-record.txt --single-version-externally-managed --compile --install-headers /home/solver/venv/include/site/python3.7/Cython']",['cwd: /tmp/pip-install-bnnj141r/Cython/'],"['Complete output (343 lines):', 'Unable to find pgen, not compiling formal grammar.', 'running bdist_wheel', 'running build', 'running build_py', 'creating build', 'creating build/lib.linux-x86_64-3.7', 'copying cython.py -> build/lib.linux-x86_64-3.7', 'creating build/lib.linux-x86_64-3.7/Cython', 'copying Cython/__init__.py -> build/lib.linux-x86_64-3.7/Cython', 'copying Cython/Utils.py -> build/lib.linux-x86_64-3.7/Cython', 'copying Cython/TestUtils.py -> build/lib.linux-x86_64-3.7/Cython', 'copying Cython/StringIOTree.py -> build/lib.linux-x86_64-3.7/Cython', 'copying Cython/Shadow.py -> build/lib.linux-x86_64-3.7/Cython', 'copying Cython/Debugging.py -> build/lib.linux-x86_64-3.7/Cython', 'copying Cython/Coverage.py -> build/lib.linux-x86_64-3.7/Cython', 'copying Cython/CodeWriter.py -> build/lib.linux-x86_64-3.7/Cython', 'creating build/lib.linux-x86_64-3.7/Cython/Build', 'copying Cython/Build/__init__.py -> build/lib.linux-x86_64-3.7/Cython/Build', 'copying Cython/Build/IpythonMagic.py -> build/lib.linux-x86_64-3.7/Cython/Build', 'copying Cython/Build/Inline.py -> build/lib.linux-x86_64-3.7/Cython/Build', 'copying Cython/Build/Dependencies.py -> build/lib.linux-x86_64-3.7/Cython/Build', 'copying Cython/Build/Cythonize.py -> build/lib.linux-x86_64-3.7/Cython/Build', 'copying Cython/Build/BuildExecutable.py -> build/lib.linux-x86_64-3.7/Cython/Build', 'creating build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/__init__.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Visitor.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Version.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/UtilityCode.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/UtilNodes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/TypeSlots.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/TypeInference.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/TreePath.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/TreeFragment.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Symtab.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/StringEncoding.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Scanning.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/PyrexTypes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Pipeline.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Parsing.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/ParseTreeTransforms.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Options.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Optimize.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Nodes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Naming.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/ModuleNode.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/MemoryView.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Main.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Lexicon.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Interpreter.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Future.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/FusedNode.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/FlowControl.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/ExprNodes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler', 'copying Cython/Compiler/Errors.py -> build/lib.linux-x86_64-3.7/Cyth...",['ERROR: Failed building wheel for Cython'],,"['SyntaxError: invalid syntax', 'SyntaxError: invalid syntax']",SyntaxError,['SyntaxError'],"['SyntaxError: invalid syntax', 'SyntaxError: invalid syntax']"
1,1,solver-fedora-31-py37-003d9de4,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps word2vec===0.7.1 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",word2vec,0.7.1,solver-fedora-31-py37,2020-08-18T17:33:54.656033,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-zim3n1sc/word2vec/setup.py'""'""'; __file__='""'""'/tmp/pip-install-zim3n1sc/word2vec/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-hv7fcspg\n cwd: /tmp/pip-install-zim3n1sc/word2vec/\n Complete output (6 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-zim3n1sc/word2vec/setup.py"", line 29\n print ' '.join(command)\n ^\n SyntaxError: invalid syntax\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.2.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",...,Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:,"['command: /home/solver/venv/bin/python3 -c \'import sys, setuptools, tokenize; sys.argv[0] = \'""\'""\'/tmp/pip-install-zim3n1sc/word2vec/setup.py\'""\'""\'; __file__=\'""\'""\'/tmp/pip-install-zim3n1sc/word2vec/setup.py\'""\'""\';f=getattr(tokenize, \'""\'""\'open\'""\'""\', open)(__file__);code=f.read().replace(\'""\'""\'\\r\\n\'""\'""\', \'""\'""\'\\n\'""\'""\');f.close();exec(compile(code, __file__, \'""\'""\'exec\'""\'""\'))\' egg_info --egg-base /tmp/pip-pip-egg-info-hv7fcspg']",['cwd: /tmp/pip-install-zim3n1sc/word2vec/'],"['Complete output (6 lines):', 'Traceback (most recent call last):', 'File ""<string>"", line 1, in <module>', 'File ""/tmp/pip-install-zim3n1sc/word2vec/setup.py"", line 29', ""print ' '.join(command)"", '^', 'SyntaxError: invalid syntax']",['ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.'],,['SyntaxError: invalid syntax'],SyntaxError,['SyntaxError'],['SyntaxError: invalid syntax']
2,2,solver-fedora-31-py37-004344f0,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps papermill===0.13.3 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",papermill,0.13.3,solver-fedora-31-py37,2020-08-11T04:00:49.015446,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-18sztwvd/papermill/setup.py'""'""'; __file__='""'""'/tmp/pip-install-18sztwvd/papermill/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-k3uklu4_\n cwd: /tmp/pip-install-18sztwvd/papermill/\n Complete output (7 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-18sztwvd/papermill/setup.py"", line 27, in <module>\n test_required = [req.strip() for req in read(test_req_path).splitlines() if req.strip()]\n File ""/tmp/pip-install-18sztwvd/papermill/setup.py"", line 20, in read\n with open(fname, 'rU' if python_2 else 'r') as fhandle:\n FileNotFoundError: [Errno 2] No such file or directory: 'requirements-dev.txt'\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.2.1 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",...,Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:,"['command: /home/solver/venv/bin/python3 -c \'import sys, setuptools, tokenize; sys.argv[0] = \'""\'""\'/tmp/pip-install-18sztwvd/papermill/setup.py\'""\'""\'; __file__=\'""\'""\'/tmp/pip-install-18sztwvd/papermill/setup.py\'""\'""\';f=getattr(tokenize, \'""\'""\'open\'""\'""\', open)(__file__);code=f.read().replace(\'""\'""\'\\r\\n\'""\'""\', \'""\'""\'\\n\'""\'""\');f.close();exec(compile(code, __file__, \'""\'""\'exec\'""\'""\'))\' egg_info --egg-base /tmp/pip-pip-egg-info-k3uklu4_']",['cwd: /tmp/pip-install-18sztwvd/papermill/'],"['Complete output (7 lines):', 'Traceback (most recent call last):', 'File ""<string>"", line 1, in <module>', 'File ""/tmp/pip-install-18sztwvd/papermill/setup.py"", line 27, in <module>', 'test_required = [req.strip() for req in read(test_req_path).splitlines() if req.strip()]', 'File ""/tmp/pip-install-18sztwvd/papermill/setup.py"", line 20, in read', ""with open(fname, 'rU' if python_2 else 'r') as fhandle:"", ""FileNotFoundError: [Errno 2] No such file or directory: 'requirements-dev.txt'""]",['ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.'],,"[""FileNotFoundError: [Errno 2] No such file or directory: 'requirements-dev.txt'""]",FileNotFoundError,['FileNotFoundError'],"[""FileNotFoundError: [Errno 2] No such file or directory: 'requirements-dev.txt'""]"
3,3,solver-fedora-31-py37-005ab6b4,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps pomegranate===0.7.7 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",pomegranate,0.7.7,solver-fedora-31-py37,2020-08-01T02:36:45.424428,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-e3nb_lq9/pomegranate/setup.py'""'""'; __file__='""'""'/tmp/pip-install-e3nb_lq9/pomegranate/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' bdist_wheel -d /tmp/pip-wheel-otmm0tho\n cwd: /tmp/pip-install-e3nb_lq9/pomegranate/\n Complete output (54 lines):\n running bdist_wheel\n running build\n running build_py\n creating build\n creating build/lib.linux-x86_64-3.7\n creating build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/__init__.py -> build/lib.linux-x86_64-3.7/pomegranate\n running egg_info\n writing pomegranate.egg-info/PKG-INFO\n writing dependency_links to pomegranate.egg-info/dependency_links.txt\n writing requirements to pomegranate.egg-info/requires.txt\n writing top-level names to pomegranate.egg-info/top_level.txt\n reading manifest file 'pomegranate.egg-info/SOURCES.txt'\n reading manifest template 'MANIFEST.in'\n writing manifest file 'pomegranate.egg-info/SOURCES.txt'\n copying pomegranate/BayesClassifier.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/BayesClassifier.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/BayesianNetwork.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/BayesianNetwork.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/FactorGraph.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/FactorGraph.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/MarkovChain.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/MarkovChain.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/NaiveBayes.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/NaiveBayes.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/__init__.pyc -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/base.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/base.pxd -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/base.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/bayes.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/bayes.pxd -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/bayes.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/distributions.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/distributions.pxd -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/distributions.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/gmm.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/gmm.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/hmm.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/hmm.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/kmeans.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/kmeans.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/parallel.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/parallel.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/utils.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/utils.h -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/utils.pxd -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/utils.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n running build_ext\n building 'pomegranate.base' extension\n creating build/temp.linux-x86_64-3.7\n creating build/temp.linux-x86_64-3.7/pomegranate\n gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe ...",...,Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:,"['command: /home/solver/venv/bin/python3 -u -c \'import sys, setuptools, tokenize; sys.argv[0] = \'""\'""\'/tmp/pip-install-e3nb_lq9/pomegranate/setup.py\'""\'""\'; __file__=\'""\'""\'/tmp/pip-install-e3nb_lq9/pomegranate/setup.py\'""\'""\';f=getattr(tokenize, \'""\'""\'open\'""\'""\', open)(__file__);code=f.read().replace(\'""\'""\'\\r\\n\'""\'""\', \'""\'""\'\\n\'""\'""\');f.close();exec(compile(code, __file__, \'""\'""\'exec\'""\'""\'))\' bdist_wheel -d /tmp/pip-wheel-otmm0tho', 'command: /home/solver/venv/bin/python3 -u -c \'import sys, setuptools, tokenize; sys.argv[0] = \'""\'""\'/tmp/pip-install-e3nb_lq9/pomegranate/setup.py\'""\'""\'; __file__=\'""\'""\'/tmp/pip-install-e3nb_lq9/pomegranate/setup.py\'""\'""\';f=getattr(tokenize, \'""\'""\'open\'""\'""\', open)(__file__);code=f.read().replace(\'""\'""\'\\r\\n\'""\'""\', \'""\'""\'\\n\'""\'""\');f.close();exec(compile(code, __file__, \'""\'""\'exec\'""\'""\'))\' install --record /tmp/pip-record-js1l_gix/install-record.txt --single-version-externally-managed --compile --install-headers /home/solver/venv/include/site/python3.7/pomegranate']",['cwd: /tmp/pip-install-e3nb_lq9/pomegranate/'],"['Complete output (54 lines):', 'running install', 'running build', 'running build_py', 'creating build', 'creating build/lib.linux-x86_64-3.7', 'creating build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/__init__.py -> build/lib.linux-x86_64-3.7/pomegranate', 'running egg_info', 'writing pomegranate.egg-info/PKG-INFO', 'writing dependency_links to pomegranate.egg-info/dependency_links.txt', 'writing requirements to pomegranate.egg-info/requires.txt', 'writing top-level names to pomegranate.egg-info/top_level.txt', ""reading manifest file 'pomegranate.egg-info/SOURCES.txt'"", ""reading manifest template 'MANIFEST.in'"", ""writing manifest file 'pomegranate.egg-info/SOURCES.txt'"", 'copying pomegranate/BayesClassifier.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/BayesClassifier.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/BayesianNetwork.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/BayesianNetwork.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/FactorGraph.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/FactorGraph.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/MarkovChain.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/MarkovChain.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/NaiveBayes.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/NaiveBayes.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/__init__.pyc -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/base.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/base.pxd -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/base.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/bayes.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/bayes.pxd -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/bayes.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/distributions.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/distributions.pxd -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/distributions.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/gmm.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/gmm.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/hmm.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/hmm.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/kmeans.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/kmeans.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/parallel.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/parallel.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/utils.c -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/utils.h -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/utils.pxd -> build/lib.linux-x86_64-3.7/pomegranate', 'copying pomegranate/utils.pyx -> build/lib.linux-x86_64-3.7/pomegranate', 'running build_ext', ""building 'pomegranate.base' extension"", 'creating build/temp.linux-x86_64-3.7', 'creating build/temp.linux-x86_64-3.7/pomegranate', 'gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/home/solver/venv/include -I/usr/include/python3.7m -I/tmp/pip-install-e3nb_lq9/pomegranate/.eggs/numpy-1.19.1-py3.7-linux-x86_64.egg/numpy/core/include -c pomegranate/base.c -o build/temp.linux-x86_64-3.7/pomegranate/base.o', ""unable to execute 'gcc': No such file or directory"", ""error: command 'gcc' faile...",['ERROR: Failed building wheel for pomegranate'],,"[""unable to execute 'gcc': No such file or directory""]",unable to execute 'gcc': No such file or directory,"['unable', 'to', 'execute', 'gcc']","[""unable to execute 'gcc': No such file or directory""]"
4,4,solver-fedora-31-py37-0063867d,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps catboost===0.6 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",catboost,0.6,solver-fedora-31-py37,2020-08-01T02:36:25.821970,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Could not find a version that satisfies the requirement catboost===0.6 (from versions: 0.1.1.2, 0.9.0a0, 0.9.0, 0.9, 0.9.1, 0.9.1.1, 0.10.0, 0.10.1, 0.10.2, 0.10.3, 0.10.4, 0.10.4.1, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.12.1, 0.12.1.1, 0.12.2, 0.13, 0.13.1, 0.14.0, 0.14.1, 0.14.2, 0.15, 0.15.1, 0.15.2, 0.16, 0.16.1, 0.16.2, 0.16.3, 0.16.4, 0.16.5, 0.17, 0.17.1, 0.17.2, 0.17.3, 0.17.4, 0.17.5, 0.18, 0.18.1, 0.19.1, 0.20, 0.20.1, 0.20.2, 0.21, 0.22, 0.23, 0.23.1, 0.23.2)\nERROR: No matching distribution found for catboost===0.6\nWARNING: You are using pip version 20.1.1; however, version 20.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",...,"Command exited with non-zero status code (1): ERROR: Could not find a version that satisfies the requirement catboost===0.6 (from versions: 0.1.1.2, 0.9.0a0, 0.9.0, 0.9, 0.9.1, 0.9.1.1, 0.10.0, 0.10.1, 0.10.2, 0.10.3, 0.10.4, 0.10.4.1, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.12.1, 0.12.1.1, 0.12.2, 0.13, 0.13.1, 0.14.0, 0.14.1, 0.14.2, 0.15, 0.15.1, 0.15.2, 0.16, 0.16.1, 0.16.2, 0.16.3, 0.16.4, 0.16.5, 0.17, 0.17.1, 0.17.2, 0.17.3, 0.17.4, 0.17.5, 0.18, 0.18.1, 0.19.1, 0.20, 0.20.1, 0.20.2, 0.21, 0.22, 0.23, 0.23.1, 0.23.2)",,,,['ERROR: No matching distribution found for catboost===0.6'],,,ERROR: No matching distribution found for catboost===0.6,"['No', 'matching', 'distribution', 'found']",['ERROR: No matching distribution found for catboost===0.6']


In [7]:
len(entire_error_df)

19219

## Filter data using Solver / datetime <a id='filter'></a>

In [8]:
def filter_data(entire_error_df, solver_name=None, start_date='2019-12-27',end_date='2020-01-14', mode='solver'):
    if mode == 'solver':
        error_df = entire_error_df.loc[entire_error_df['solver'] == solver_name]
    elif mode == 'datetime':
        mask = (entire_error_df['datetime'] >= start_date) & (entire_error_df['datetime'] <= end_date)
        error_df = entire_error_df.loc[mask]
    elif mode == 'all':
        error_df = entire_error_df
    return error_df

In [9]:
entire_error_df['solver'].unique()

array(['solver-fedora-31-py37', 'solver-fedora-31-py38',
       'solver-fedora-32-py37', 'solver-fedora-32-py38',
       'solver-rhel-8-py36'], dtype=object)

In [10]:
#error_df = filter_data(entire_error_df, solver_name = 'solver-fedora-31-py37', mode='solver')
#error_df = filter_data(entire_error_df, start_date='2019-12-24',end_date='2020-01-14', mode='datetime')
error_df = filter_data(entire_error_df, mode = 'all')

In [11]:
len(error_df)

19219

### Extract tokenized_clustering_data for clustering

In [12]:
clean_clustering_data = error_df['tokenized_clustering_data']

## Word to Vector Conversion using Continuous Bag of Words model (CBOW) <a id='word2vec'></a>

In [13]:
print('Number of rows in training data :', len(clean_clustering_data))

Number of rows in training data : 19219


In [14]:
def detect_embedding_size(tokens):
    flat_list = [item for row in tokens for item in row]
    vocab = set(flat_list)
    embedding_size = round(len(vocab) ** (2/3))
    if embedding_size >= 400:
        embedding_size = 400
    return embedding_size

w2v_size = detect_embedding_size(clean_clustering_data)

In [15]:
def tokens_vectorization(clustering_data, w2v_size, w2v_window, cpu_number, model_name):
    iterations = 100
    word2vec = Word2Vec(clustering_data,
                           size = w2v_size, 
                           window = w2v_window, 
                           min_count=1, 
                           workers = cpu_number,
                           iter=iterations)
    word2vec.save(model_name)
    return word2vec

In [16]:
word2vec = tokens_vectorization(clean_clustering_data, 
                                 w2v_size = w2v_size, 
                                 w2v_window= w2v_window, 
                                 cpu_number = cpu_number, 
                                 model_name='../models/word2vec.model')



## Sentence (error message) to vector conversion <a id='sent2vec'></a>

sum all content words in the documents and divide by the number of content words.

In [17]:
def sentence_vectorization(clustering_data, word2vec):
    sent2vec = []
    for sent in clustering_data:
        sent_vec = []
        numw = 0
        for w in sent:
            try:
                sent_vec = word2vec[w] if numw == 0 else np.add(sent_vec, word2vec[w])
                numw += 1
            except Exception:
                pass
        sent2vec.append(np.asarray(sent_vec) / numw)   
    return np.vstack(sent2vec)

In [18]:
sent2vec = sentence_vectorization(clean_clustering_data, word2vec)

  



## Clustering using DBScan  <a id='clustering'></a>

Based on a set of points DBSCAN groups together points that are close to each other based on a distance measurement(epsilon) and a minimum number of points. It also marks as outliers the points that are in low-density regions.

Find the avg_distances using NearestNeighbors between the data points.

In [19]:
def kneighbors(sent2vec):
    k = round(sqrt(len(sent2vec)))
    neigh = NearestNeighbors(n_neighbors=k)
    nbrs = neigh.fit(sent2vec)
    distances, indices = nbrs.kneighbors(sent2vec)
    distances = [np.mean(d) for d in np.sort(distances, axis=0)]
    return distances

avg_distances = kneighbors(sent2vec)

Calculate epsilon, which is the linkage distance threshold above which, clusters will not be merged.

In [20]:
def epsilon_search(distances):
    kneedle = KneeLocator(distances, list(range(len(distances))))
    epsilon = max(kneedle.all_elbows) if (len(kneedle.all_elbows) > 0) else 1
    return epsilon

In [21]:
epsilon = epsilon_search(avg_distances)

DBScan Clustering using epsilon and min_samples as 1

In [22]:
def dbscan(epsilon, min_samples, cpu_number, sent2vec):
    cluster_labels = DBSCAN(eps=epsilon,
                            min_samples= min_samples,
                            n_jobs=cpu_number).fit_predict(sent2vec)
    return cluster_labels

In [23]:
#cluster_labels = hierarchical(epsilon, sent2vec)
cluster_labels = dbscan(epsilon, 1, cpu_number, sent2vec)

In [24]:
len(cluster_labels)

19219

In [25]:
error_df['cluster_no.'] = cluster_labels

## Get cluster statistics such as : "pattern", "mean_length", "mean_similarity" <a id='cluster_stats'></a>

In [26]:
def clustered_output(error_df, mode='INDEX'):
    groups, unique_rows = {}, {}
    for key, value in error_df.groupby(['cluster_no.']):
        unique_rows[str(key)] = set(value['clustering_data'])
        if mode == 'ALL':
            groups[str(key)] = value.to_dict(orient='records')
        elif mode == 'Tokenized':
            groups[str(key)] = value['tokenized_clustering_data'].values.tolist()
        elif mode == 'CLEANED':
            groups[str(key)] = value['clustering_data'].values.tolist()
    return groups, unique_rows

In [27]:
table = str.maketrans(punctuation, ' '*len(punctuation))

def find_matching_blocks(strings):
    curr = strings[0]
    curr = curr.replace('ERROR', '')
    curr = curr.replace('Command exited with non-zero status code (1):', '')
    if len(strings) == 1:
        #return curr
        return curr.translate(table).strip()
    else:
        cnt = 1
        for i in range(cnt, len(strings)):
            matches = difflib.SequenceMatcher(None, curr, strings[i])
            common = []
            for match in matches.get_matching_blocks():
                common.append(curr[match.a:match.a + match.size])
            curr = ''.join(common)
            cnt = cnt + 1
            if cnt == len(strings):
                break
        if curr == '':
            'NO COMMON PATTERNS HAVE BEEN FOUND'
        #return curr
        return curr.translate(table).strip()

def get_similarity(rows):
    s = []
    for i in range(0, len(rows)):
        s.append(difflib.SequenceMatcher(None, rows[0], rows[i]).ratio() * 100)
    return s

In [28]:
STATISTICS = ["cluster_name", "cluster_size", "pattern", 'CLASS', "mean_similarity"]

def statistics(error_df, output_mode='frame'):
    """
    Returns dictionary with statistic for all clusters
    "cluster_name" - name of a cluster
    "cluster_size" = number of log messages in cluster
    "pattern" - all common substrings in messages in the cluster
    "mean_length" - average length of log messages in cluster
    "mean_similarity" - average similarity of log messages in cluster
    (calculated as the levenshtein distances between the 1st and all other log messages)
    :param clustered_df:
    :param output_mode: frame | dict
    :return:
    """
    clusters = []
    clustered_df, unique_rows = clustered_output(error_df, mode='CLEANED')
    clustered_df_class, unique_rows = clustered_output(error_df, mode='Tokenized')
    for item in clustered_df:
        row = clustered_df[item]
        matcher = find_matching_blocks(row)
        class_matcher = find_matching_blocks(clustered_df_class[item])
        similarity = get_similarity(row)
        clusters.append([item,
                         len(row),
                         matcher,
                         class_matcher,
                         #unique_rows[item],
                         #np.mean(lengths),
                         np.mean(similarity)])
    df = pd.DataFrame(clusters, columns=STATISTICS).round(2).sort_values(by='cluster_size', ascending=False)
    if output_mode == 'frame':
        return df
    else:
        return df.to_dict(orient='records')

In [29]:
stat = statistics(error_df, output_mode='frame')
stat_df = pd.DataFrame.from_dict(stat)

In [30]:
print('Number of clusters : ', len(stat_df))

Number of clusters :  40


Generate CLASS label

In [31]:
def get_class_label(stat_df):
    class_labels = []
    number_of_errors = []
    MachineDefinedError = []
    for item in stat_df['CLASS']:
        if "Error" in item.split():
            item = item.replace('Error', '')
        row = item.split()
        #if len(row) > 1 and len(re.findall(r'Error', str(row))) < 2:
        if not re.search('(\w\w*Error)', item):
            MachineDefinedError.append('NO')
            item = ''
            for word in row:
                item += word[0].upper() + word[1:]
            item += "Error"
        else:
            if len(re.findall(r'Error', str(row))) > 1:
                item = ', '.join(row)
            else:
                item = ''.join(row)
            MachineDefinedError.append('YES')
        class_labels.append(item)
        number_of_errors.append(len(re.findall(r'Error', str(item))))
    return class_labels, number_of_errors, MachineDefinedError

In [32]:
class_labels, number_of_errors, MachineDefinedError = get_class_label(stat_df)

In [33]:
stat_df['number_of_errors'] = number_of_errors
stat_df['MachineDefinedError?'] = MachineDefinedError
stat_df['CLASS'] = class_labels

In [34]:
stat_df.sort_values(by='cluster_size', ascending=False)

Unnamed: 0,cluster_name,cluster_size,pattern,CLASS,mean_similarity,number_of_errors,MachineDefinedError?
6,6,9718,ModuleNotFoundError,ModuleNotFoundError,100.0,1,YES
3,3,3214,No matching distribution found for,NoMatchingDistributionFoundError,84.82,1,NO
0,0,1345,SyntaxError,SyntaxError,100.0,1,YES
2,2,1294,unable to execute gcc No such file or directory,UnableToExecuteGccError,100.0,1,NO
5,5,1229,AttributeError,AttributeError,100.0,1,YES
1,1,432,FileNotFoundError,FileNotFoundError,100.0,1,YES
7,7,387,Command errored out with exit status 1,CheckTheLogsError,99.9,1,NO
9,9,281,CUDA could not be found on your system,CUDACouldNotBeFoundError,100.0,1,NO
12,12,237,RuntimeError,RuntimeError,100.0,1,YES
14,14,205,Failed building wheel for,FailedBuildingWheelError,89.27,1,NO


In [35]:
error_df['CLASS'] = error_df['cluster_no.'].map(stat_df['CLASS'])
error_df['number_of_errors'] = error_df['cluster_no.'].map(stat_df['number_of_errors'])
error_df['MachineDefinedError?'] = error_df['cluster_no.'].map(stat_df['MachineDefinedError?'])

## Save clustered data to Ceph <a id='save_to_ceph'></a>

In [36]:
import os
THOTH_S3_ENDPOINT_URL = os.environ['THOTH_S3_ENDPOINT_URL']
THOTH_CEPH_KEY_ID = os.environ['THOTH_CEPH_KEY_ID']
THOTH_CEPH_SECRET_KEY = os.environ['THOTH_CEPH_SECRET_KEY']

In [37]:
error_df.head(50)

Unnamed: 0,index,document_id,command,package_name,package_version,solver,datetime,environment,analyzer_version,message,...,ERROR,Exception,specific_error,clustering_data,tokenized_clustering_data,context_message,cluster_no.,CLASS,number_of_errors,MachineDefinedError?
0,0,solver-fedora-31-py37-0022caa4,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps Cython===0.23.4 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",Cython,0.23.4,solver-fedora-31-py37,2020-08-05T21:54:08.679464,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-bnnj141r/Cython/setup.py'""'""'; __file__='""'""'/tmp/pip-install-bnnj141r/Cython/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' bdist_wheel -d /tmp/pip-wheel-9eme38jp\n cwd: /tmp/pip-install-bnnj141r/Cython/\n Complete output (343 lines):\n Unable to find pgen, not compiling formal grammar.\n running bdist_wheel\n running build\n running build_py\n creating build\n creating build/lib.linux-x86_64-3.7\n copying cython.py -> build/lib.linux-x86_64-3.7\n creating build/lib.linux-x86_64-3.7/Cython\n copying Cython/__init__.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Utils.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/TestUtils.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/StringIOTree.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Shadow.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Debugging.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Coverage.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/CodeWriter.py -> build/lib.linux-x86_64-3.7/Cython\n creating build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/__init__.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/IpythonMagic.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/Inline.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/Dependencies.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/Cythonize.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/BuildExecutable.py -> build/lib.linux-x86_64-3.7/Cython/Build\n creating build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/__init__.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Visitor.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Version.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/UtilityCode.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/UtilNodes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TypeSlots.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TypeInference.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TreePath.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TreeFragment.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Symtab.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/StringEncoding.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Scanning.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/PyrexTypes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Pipeline.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Parsing.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/ParseTreeTransforms.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Options.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Optimize.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Nodes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Naming.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/ModuleNode.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/MemoryView.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Main.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Lexi...",...,['ERROR: Failed building wheel for Cython'],,"['SyntaxError: invalid syntax', 'SyntaxError: invalid syntax']",SyntaxError,['SyntaxError'],"['SyntaxError: invalid syntax', 'SyntaxError: invalid syntax']",0,SyntaxError,1,YES
1,1,solver-fedora-31-py37-003d9de4,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps word2vec===0.7.1 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",word2vec,0.7.1,solver-fedora-31-py37,2020-08-18T17:33:54.656033,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-zim3n1sc/word2vec/setup.py'""'""'; __file__='""'""'/tmp/pip-install-zim3n1sc/word2vec/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-hv7fcspg\n cwd: /tmp/pip-install-zim3n1sc/word2vec/\n Complete output (6 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-zim3n1sc/word2vec/setup.py"", line 29\n print ' '.join(command)\n ^\n SyntaxError: invalid syntax\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.2.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",...,['ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.'],,['SyntaxError: invalid syntax'],SyntaxError,['SyntaxError'],['SyntaxError: invalid syntax'],0,SyntaxError,1,YES
2,2,solver-fedora-31-py37-004344f0,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps papermill===0.13.3 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",papermill,0.13.3,solver-fedora-31-py37,2020-08-11T04:00:49.015446,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-18sztwvd/papermill/setup.py'""'""'; __file__='""'""'/tmp/pip-install-18sztwvd/papermill/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-k3uklu4_\n cwd: /tmp/pip-install-18sztwvd/papermill/\n Complete output (7 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-18sztwvd/papermill/setup.py"", line 27, in <module>\n test_required = [req.strip() for req in read(test_req_path).splitlines() if req.strip()]\n File ""/tmp/pip-install-18sztwvd/papermill/setup.py"", line 20, in read\n with open(fname, 'rU' if python_2 else 'r') as fhandle:\n FileNotFoundError: [Errno 2] No such file or directory: 'requirements-dev.txt'\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.2.1 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",...,['ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.'],,"[""FileNotFoundError: [Errno 2] No such file or directory: 'requirements-dev.txt'""]",FileNotFoundError,['FileNotFoundError'],"[""FileNotFoundError: [Errno 2] No such file or directory: 'requirements-dev.txt'""]",1,FileNotFoundError,1,YES
3,3,solver-fedora-31-py37-005ab6b4,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps pomegranate===0.7.7 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",pomegranate,0.7.7,solver-fedora-31-py37,2020-08-01T02:36:45.424428,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-e3nb_lq9/pomegranate/setup.py'""'""'; __file__='""'""'/tmp/pip-install-e3nb_lq9/pomegranate/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' bdist_wheel -d /tmp/pip-wheel-otmm0tho\n cwd: /tmp/pip-install-e3nb_lq9/pomegranate/\n Complete output (54 lines):\n running bdist_wheel\n running build\n running build_py\n creating build\n creating build/lib.linux-x86_64-3.7\n creating build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/__init__.py -> build/lib.linux-x86_64-3.7/pomegranate\n running egg_info\n writing pomegranate.egg-info/PKG-INFO\n writing dependency_links to pomegranate.egg-info/dependency_links.txt\n writing requirements to pomegranate.egg-info/requires.txt\n writing top-level names to pomegranate.egg-info/top_level.txt\n reading manifest file 'pomegranate.egg-info/SOURCES.txt'\n reading manifest template 'MANIFEST.in'\n writing manifest file 'pomegranate.egg-info/SOURCES.txt'\n copying pomegranate/BayesClassifier.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/BayesClassifier.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/BayesianNetwork.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/BayesianNetwork.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/FactorGraph.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/FactorGraph.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/MarkovChain.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/MarkovChain.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/NaiveBayes.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/NaiveBayes.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/__init__.pyc -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/base.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/base.pxd -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/base.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/bayes.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/bayes.pxd -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/bayes.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/distributions.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/distributions.pxd -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/distributions.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/gmm.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/gmm.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/hmm.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/hmm.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/kmeans.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/kmeans.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/parallel.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/parallel.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/utils.c -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/utils.h -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/utils.pxd -> build/lib.linux-x86_64-3.7/pomegranate\n copying pomegranate/utils.pyx -> build/lib.linux-x86_64-3.7/pomegranate\n running build_ext\n building 'pomegranate.base' extension\n creating build/temp.linux-x86_64-3.7\n creating build/temp.linux-x86_64-3.7/pomegranate\n gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe ...",...,['ERROR: Failed building wheel for pomegranate'],,"[""unable to execute 'gcc': No such file or directory""]",unable to execute 'gcc': No such file or directory,"['unable', 'to', 'execute', 'gcc']","[""unable to execute 'gcc': No such file or directory""]",2,UnableToExecuteGccError,1,NO
4,4,solver-fedora-31-py37-0063867d,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps catboost===0.6 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",catboost,0.6,solver-fedora-31-py37,2020-08-01T02:36:25.821970,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Could not find a version that satisfies the requirement catboost===0.6 (from versions: 0.1.1.2, 0.9.0a0, 0.9.0, 0.9, 0.9.1, 0.9.1.1, 0.10.0, 0.10.1, 0.10.2, 0.10.3, 0.10.4, 0.10.4.1, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.12.1, 0.12.1.1, 0.12.2, 0.13, 0.13.1, 0.14.0, 0.14.1, 0.14.2, 0.15, 0.15.1, 0.15.2, 0.16, 0.16.1, 0.16.2, 0.16.3, 0.16.4, 0.16.5, 0.17, 0.17.1, 0.17.2, 0.17.3, 0.17.4, 0.17.5, 0.18, 0.18.1, 0.19.1, 0.20, 0.20.1, 0.20.2, 0.21, 0.22, 0.23, 0.23.1, 0.23.2)\nERROR: No matching distribution found for catboost===0.6\nWARNING: You are using pip version 20.1.1; however, version 20.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",...,['ERROR: No matching distribution found for catboost===0.6'],,,ERROR: No matching distribution found for catboost===0.6,"['No', 'matching', 'distribution', 'found']",['ERROR: No matching distribution found for catboost===0.6'],3,NoMatchingDistributionFoundError,1,NO
5,5,solver-fedora-31-py37-007b905b,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps scrapy===0.9 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",scrapy,0.9,solver-fedora-31-py37,2020-08-01T02:36:23.186063,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-010ivlav/scrapy/setup.py'""'""'; __file__='""'""'/tmp/pip-install-010ivlav/scrapy/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-52c_wd29\n cwd: /tmp/pip-install-010ivlav/scrapy/\n Complete output (11 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-010ivlav/scrapy/setup.py"", line 65, in <module>\n data = [f for f in filenames if is_not_module(f)]\n File ""/tmp/pip-install-010ivlav/scrapy/setup.py"", line 65, in <listcomp>\n data = [f for f in filenames if is_not_module(f)]\n File ""/tmp/pip-install-010ivlav/scrapy/setup.py"", line 57, in is_not_module\n return os.path.splitext(f)[1] not in ['.py', '.pyc', '.pyo']\n File ""/usr/lib64/python3.7/posixpath.py"", line 122, in splitext\n p = os.fspath(p)\n TypeError: expected str, bytes or os.PathLike object, not _io.TextIOWrapper\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",...,['ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.'],,"['TypeError: expected str, bytes or os.PathLike object, not _io.TextIOWrapper']",TypeError,['TypeError'],"['TypeError: expected str, bytes or os.PathLike object, not _io.TextIOWrapper']",4,TypeError,1,YES
6,6,solver-fedora-31-py37-008441f2,,setuptools,8.0.1,solver-fedora-31-py37,2020-08-05T04:05:00.066809,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Failed to successfully execute function in Python interpreter: Traceback (most recent call last):\n File ""<string>"", line 21, in <module>\n File ""<string>"", line 8, in _find_distribution_name\n File ""/home/solver/venv/lib/python3.7/site-packages/pkg_resources.py"", line 1596, in <module>\n register_loader_type(importlib_bootstrap.SourceFileLoader, DefaultProvider)\nAttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'\n",...,,,"[""AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'""]",AttributeError,['AttributeError'],"[""AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'""]",5,AttributeError,1,YES
7,7,solver-fedora-31-py37-0084da4c,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps statsmodels===0.6.1 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",statsmodels,0.6.1,solver-fedora-31-py37,2020-08-09T04:11:06.557921,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-rfhg4q1r/statsmodels/setup.py'""'""'; __file__='""'""'/tmp/pip-install-rfhg4q1r/statsmodels/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' bdist_wheel -d /tmp/pip-wheel-qagwprnx\n cwd: /tmp/pip-install-rfhg4q1r/statsmodels/\n Complete output (824 lines):\n running bdist_wheel\n running build\n running build_py\n creating build\n creating build/lib.linux-x86_64-3.7\n creating build/lib.linux-x86_64-3.7/statsmodels\n copying statsmodels/info.py -> build/lib.linux-x86_64-3.7/statsmodels\n copying statsmodels/version.py -> build/lib.linux-x86_64-3.7/statsmodels\n copying statsmodels/api.py -> build/lib.linux-x86_64-3.7/statsmodels\n copying statsmodels/__init__.py -> build/lib.linux-x86_64-3.7/statsmodels\n creating build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/arima_process.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/varma_process.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/arima_model.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/seasonal.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/ar_model.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/adfvalues.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/api.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/descriptivestats.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/mlemodel.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/arma_mle.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/x13.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/tsatools.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/stattools.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n copying statsmodels/tsa/__init__.py -> build/lib.linux-x86_64-3.7/statsmodels/tsa\n creating build/lib.linux-x86_64-3.7/statsmodels/miscmodels\n copying statsmodels/miscmodels/tmodel.py -> build/lib.linux-x86_64-3.7/statsmodels/miscmodels\n copying statsmodels/miscmodels/nonlinls.py -> build/lib.linux-x86_64-3.7/statsmodels/miscmodels\n copying statsmodels/miscmodels/api.py -> build/lib.linux-x86_64-3.7/statsmodels/miscmodels\n copying statsmodels/miscmodels/count.py -> build/lib.linux-x86_64-3.7/statsmodels/miscmodels\n copying statsmodels/miscmodels/try_mlecov.py -> build/lib.linux-x86_64-3.7/statsmodels/miscmodels\n copying statsmodels/miscmodels/__init__.py -> build/lib.linux-x86_64-3.7/statsmodels/miscmodels\n creating build/lib.linux-x86_64-3.7/statsmodels/emplike\n copying statsmodels/emplike/elanova.py -> build/lib.linux-x86_64-3.7/statsmodels/emplike\n copying statsmodels/emplike/elregress.py -> build/lib.linux-x86_64-3.7/statsmodels/emplike\n copying statsmodels/emplike/api.py -> build/lib.linux-x86_64-3.7/statsmodels/emplike\n copying statsmodels/emplike/originregress.py -> build/lib.linux-x86_64-3.7/statsmodels/emplike\n copying statsmodels/emplike/descriptive.py -> build/lib.linux-x86_64-3.7/statsmodels/emplike\n copying statsmodels/emplike/__init__.py -> build/lib.linux-x86_64-3.7/statsmodels/emplike\n copying statsmodels/emplike/koul_and_mc.py -> build/lib.linux-x86_64-3.7/statsmodels/emplike\n copying statsmodels/emplike/aft_el.py -> build/lib.linux-x86_64-3.7/statsmodels/emplike\n creating build/lib.linux-x86_64-3.7/statsmodels/formula\n copying statsmodels/formula/api.py -> build/lib.linux-x86_64-3.7/statsmodels/formula\n copying statsmodels/formula/formulatools.py -> build/lib.linux-x86...",...,['ERROR: Failed building wheel for statsmodels'],,"[""unable to execute 'gcc': No such file or directory""]",unable to execute 'gcc': No such file or directory,"['unable', 'to', 'execute', 'gcc']","[""unable to execute 'gcc': No such file or directory""]",2,UnableToExecuteGccError,1,NO
8,8,solver-fedora-31-py37-00912d02,"/home/solver/venv/bin/python3 -m pip install --force-reinstall --no-cache-dir --no-deps horovod===0.14.1 --index-url ""https://pypi.org/simple"" --trusted-host pypi.org",horovod,0.14.1,solver-fedora-31-py37,2020-08-07T16:17:21.305951,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-dwuvyc5h/horovod/setup.py'""'""'; __file__='""'""'/tmp/pip-install-dwuvyc5h/horovod/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' bdist_wheel -d /tmp/pip-wheel-6aw8om7_\n cwd: /tmp/pip-install-dwuvyc5h/horovod/\n Complete output (30 lines):\n WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.\n WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.\n running bdist_wheel\n running build\n running build_py\n creating build\n creating build/lib.linux-x86_64-3.7\n creating build/lib.linux-x86_64-3.7/horovod\n copying horovod/__init__.py -> build/lib.linux-x86_64-3.7/horovod\n creating build/lib.linux-x86_64-3.7/horovod/keras\n copying horovod/keras/__init__.py -> build/lib.linux-x86_64-3.7/horovod/keras\n copying horovod/keras/callbacks.py -> build/lib.linux-x86_64-3.7/horovod/keras\n creating build/lib.linux-x86_64-3.7/horovod/torch\n copying horovod/torch/mpi_ops.py -> build/lib.linux-x86_64-3.7/horovod/torch\n copying horovod/torch/__init__.py -> build/lib.linux-x86_64-3.7/horovod/torch\n creating build/lib.linux-x86_64-3.7/horovod/common\n copying horovod/common/__init__.py -> build/lib.linux-x86_64-3.7/horovod/common\n creating build/lib.linux-x86_64-3.7/horovod/tensorflow\n copying horovod/tensorflow/mpi_ops.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow\n copying horovod/tensorflow/__init__.py -> build/lib.linux-x86_64-3.7/horovod/tensorflow\n creating build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib\n copying horovod/torch/mpi_lib/__init__.py -> build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib\n creating build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib_impl\n copying horovod/torch/mpi_lib_impl/__init__.py -> build/lib.linux-x86_64-3.7/horovod/torch/mpi_lib_impl\n running build_ext\n gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -fPIC -std=c++11 -fPIC -O2 -I/home/solver/venv/include -I/usr/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o\n unable to execute 'gcc': No such file or directory\n gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -fPIC -std=c++11 -fPIC -O2 -stdlib=libc++ -I/home/solver/venv/include -I/usr/include/python3.7m -c build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.cc -o build/temp.linux-x86_64-3.7/test_compile/test_cpp_flags.o\n unable to execute 'gcc': No such file or directory\n error: Unable to determine C++ compilation flags (see error above).\n ----------------------------------------\n ERROR: Failed building wheel for horovod\n ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python...",...,['ERROR: Failed building wheel for horovod'],,"[""unable to execute 'gcc': No such file or directory""]",unable to execute 'gcc': No such file or directory,"['unable', 'to', 'execute', 'gcc']","[""unable to execute 'gcc': No such file or directory""]",2,UnableToExecuteGccError,1,NO
9,9,solver-fedora-31-py37-0095681b,,setuptools,19.7,solver-fedora-31-py37,2020-07-31T14:24:01.548120,"{'implementation_name': 'cpython', 'implementation_version': '3.7.7', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_python_implementation': 'CPython', 'platform_release': '4.18.0-147.8.1.el8_1.x86_64', 'platform_system': 'Linux', 'platform_version': '#1 SMP Wed Feb 26 03:08:15 UTC 2020', 'python_full_version': '3.7.7', 'python_version': '3.7', 'sys_platform': 'linux'}",1.6.0,"Failed to successfully execute function in Python interpreter: Traceback (most recent call last):\n File ""<string>"", line 21, in <module>\n File ""<string>"", line 9, in _find_distribution_name\nModuleNotFoundError: No module named 'pkg_resources._vendor.packaging.utils'\n",...,,,"[""ModuleNotFoundError: No module named 'pkg_resources._vendor.packaging.utils'""]",ModuleNotFoundError,['ModuleNotFoundError'],"[""ModuleNotFoundError: No module named 'pkg_resources._vendor.packaging.utils'""]",6,ModuleNotFoundError,1,YES


In [96]:
from io import StringIO
import datetime

DATETIME = datetime.datetime.utcnow()
DATE = DATETIME.strftime('%Y-%m-%d')

def store_csv_to_ceph(error_df):
    csv_buffer = StringIO()
    error_df = error_df.drop(columns =['index', "command", 'environment', 'message','split_message', 'Error_info', 'command_info', 
                                       'cwd', 'Complete_output','ERROR', 'Exception', 'specific_error', 'tokenized_clustering_data',
                                      "context_message"])
    error_df.to_csv(csv_buffer, header=False, index=False)
    bucket = 'thoth'
    s3_resource = boto3.resource('s3',
                        endpoint_url= THOTH_S3_ENDPOINT_URL,
                        aws_access_key_id = THOTH_CEPH_KEY_ID,
                        aws_secret_access_key= THOTH_CEPH_SECRET_KEY)
    s3_resource.Object(bucket, f'data/ocp-stage/solver-error-context/solver-error-context-{DATE}.csv').put(Body=csv_buffer.getvalue())
    
    
    bucket = 'DH-PLAYPEN'
    s3_resource = boto3.resource('s3',
                        endpoint_url= THOTH_S3_ENDPOINT_URL,
                        aws_access_key_id = THOTH_CEPH_KEY_ID,
                        aws_secret_access_key= THOTH_CEPH_SECRET_KEY)
    s3_resource.Object(bucket, f'data/ocp-stage/solver-error-context/solver-error-context-{DATE}.csv').put(Body=csv_buffer.getvalue())

In [97]:
store_csv_to_ceph(error_df)

## View data from each cluster <a id='view_data'></a>

In [38]:
def get_data_from_cluster(df_processed, clusters, cluster_number):
    """Get data from a specific cluster."""
    indices = [i for i, x in enumerate(clusters) if x == cluster_number]
    df_grouped = df_processed.iloc[indices]
    print(len(df_grouped))
    return df_grouped

def split_log(log_messages):
    """Split log."""
    log_messages = log_messages.split('\n')
    return log_messages

### Check packages with error for a specific cluster	<a id='c1'></a>

In [52]:
cluster_ = 0

get_data_from_cluster(error_df, cluster_labels, cluster_)[['package_name', 'package_version', 'solver','message', 
                                                    'specific_error', 'CLASS', 'MachineDefinedError?']]

1345


Unnamed: 0,package_name,package_version,solver,message,specific_error,CLASS,MachineDefinedError?
0,Cython,0.23.4,solver-fedora-31-py37,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-bnnj141r/Cython/setup.py'""'""'; __file__='""'""'/tmp/pip-install-bnnj141r/Cython/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' bdist_wheel -d /tmp/pip-wheel-9eme38jp\n cwd: /tmp/pip-install-bnnj141r/Cython/\n Complete output (343 lines):\n Unable to find pgen, not compiling formal grammar.\n running bdist_wheel\n running build\n running build_py\n creating build\n creating build/lib.linux-x86_64-3.7\n copying cython.py -> build/lib.linux-x86_64-3.7\n creating build/lib.linux-x86_64-3.7/Cython\n copying Cython/__init__.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Utils.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/TestUtils.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/StringIOTree.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Shadow.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Debugging.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/Coverage.py -> build/lib.linux-x86_64-3.7/Cython\n copying Cython/CodeWriter.py -> build/lib.linux-x86_64-3.7/Cython\n creating build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/__init__.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/IpythonMagic.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/Inline.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/Dependencies.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/Cythonize.py -> build/lib.linux-x86_64-3.7/Cython/Build\n copying Cython/Build/BuildExecutable.py -> build/lib.linux-x86_64-3.7/Cython/Build\n creating build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/__init__.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Visitor.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Version.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/UtilityCode.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/UtilNodes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TypeSlots.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TypeInference.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TreePath.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/TreeFragment.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Symtab.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/StringEncoding.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Scanning.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/PyrexTypes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Pipeline.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Parsing.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/ParseTreeTransforms.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Options.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Optimize.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Nodes.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Naming.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/ModuleNode.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/MemoryView.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Main.py -> build/lib.linux-x86_64-3.7/Cython/Compiler\n copying Cython/Compiler/Lexi...","['SyntaxError: invalid syntax', 'SyntaxError: invalid syntax']",SyntaxError,YES
1,word2vec,0.7.1,solver-fedora-31-py37,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-zim3n1sc/word2vec/setup.py'""'""'; __file__='""'""'/tmp/pip-install-zim3n1sc/word2vec/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-hv7fcspg\n cwd: /tmp/pip-install-zim3n1sc/word2vec/\n Complete output (6 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-zim3n1sc/word2vec/setup.py"", line 29\n print ' '.join(command)\n ^\n SyntaxError: invalid syntax\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.2.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",['SyntaxError: invalid syntax'],SyntaxError,YES
17,sqlalchemy,0.3.4,solver-fedora-31-py37,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-4ckdmny1/sqlalchemy/setup.py'""'""'; __file__='""'""'/tmp/pip-install-4ckdmny1/sqlalchemy/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-zpu9szq_\n cwd: /tmp/pip-install-4ckdmny1/sqlalchemy/\n Complete output (8 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-4ckdmny1/sqlalchemy/setup.py"", line 1, in <module>\n from ez_setup import use_setuptools\n File ""/tmp/pip-install-4ckdmny1/sqlalchemy/ez_setup.py"", line 85\n except pkg_resources.VersionConflict, e:\n ^\n SyntaxError: invalid syntax\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.2.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",['SyntaxError: invalid syntax'],SyntaxError,YES
20,roundup,1.6.0,solver-fedora-31-py37,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-lj520vir/roundup/setup.py'""'""'; __file__='""'""'/tmp/pip-install-lj520vir/roundup/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-nfhsqavh\n cwd: /tmp/pip-install-lj520vir/roundup/\n Complete output (8 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-lj520vir/roundup/setup.py"", line 23, in <module>\n from roundup.dist.command.build_scripts import build_scripts\n File ""/tmp/pip-install-lj520vir/roundup/roundup/dist/command/build_scripts.py"", line 143\n os.chmod(outfile, 0755)\n ^\n SyntaxError: invalid token\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.2.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",['SyntaxError: invalid token'],SyntaxError,YES
26,Flask,0.7.1,solver-fedora-31-py37,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-ugbir2be/Flask/setup.py'""'""'; __file__='""'""'/tmp/pip-install-ugbir2be/Flask/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-875u37sp\n cwd: /tmp/pip-install-ugbir2be/Flask/\n Complete output (6 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-ugbir2be/Flask/setup.py"", line 62\n print ""Audit requires PyFlakes installed in your system.""""""\n ^\n SyntaxError: Missing parentheses in call to 'print'. Did you mean print(""Audit requires PyFlakes installed in your system."""""")?\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.1.1; however, version 20.2.1 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n","['SyntaxError: Missing parentheses in call to \'print\'. Did you mean print(""Audit requires PyFlakes installed in your system."""""")?']",SyntaxError,YES
...,...,...,...,...,...,...,...
19183,numpy,1.3.0,solver-rhel-8-py36,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-niz7k8zb/numpy/setup.py'""'""'; __file__='""'""'/tmp/pip-install-niz7k8zb/numpy/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-7wc0erlq\n cwd: /tmp/pip-install-niz7k8zb/numpy/\n Complete output (6 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-niz7k8zb/numpy/setup.py"", line 62\n print "" --- Could not run svn info --- ""\n ^\n SyntaxError: Missing parentheses in call to 'print'. Did you mean print("" --- Could not run svn info --- "")?\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.2; however, version 20.2.1 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n","['SyntaxError: Missing parentheses in call to \'print\'. Did you mean print("" --- Could not run svn info --- "")?']",SyntaxError,YES
19201,sqlalchemy,0.2.8,solver-rhel-8-py36,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-au6ulmyp/sqlalchemy/setup.py'""'""'; __file__='""'""'/tmp/pip-install-au6ulmyp/sqlalchemy/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-hfvd3l_7\n cwd: /tmp/pip-install-au6ulmyp/sqlalchemy/\n Complete output (8 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-au6ulmyp/sqlalchemy/setup.py"", line 1, in <module>\n from ez_setup import use_setuptools\n File ""/tmp/pip-install-au6ulmyp/sqlalchemy/ez_setup.py"", line 172\n print ""Setuptools version"",version,""or greater has been installed.""\n ^\n SyntaxError: Missing parentheses in call to 'print'. Did you mean print(""Setuptools version"",version,""or greater has been installed."")?\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.2; however, version 20.2.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n","['SyntaxError: Missing parentheses in call to \'print\'. Did you mean print(""Setuptools version"",version,""or greater has been installed."")?']",SyntaxError,YES
19204,networkx,0.34,solver-rhel-8-py36,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-cfgu2z_t/networkx/setup.py'""'""'; __file__='""'""'/tmp/pip-install-cfgu2z_t/networkx/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-7371y6fk\n cwd: /tmp/pip-install-cfgu2z_t/networkx/\n Complete output (6 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-cfgu2z_t/networkx/setup.py"", line 16\n print ""To install, run 'python setup.py install'""\n ^\n SyntaxError: Missing parentheses in call to 'print'. Did you mean print(""To install, run 'python setup.py install'"")?\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.2; however, version 20.2.1 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n","['SyntaxError: Missing parentheses in call to \'print\'. Did you mean print(""To install, run \'python setup.py install\'"")?']",SyntaxError,YES
19216,sqlalchemy,0.3.2,solver-rhel-8-py36,"Command exited with non-zero status code (1): ERROR: Command errored out with exit status 1:\n command: /home/solver/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '""'""'/tmp/pip-install-w18zep_z/sqlalchemy/setup.py'""'""'; __file__='""'""'/tmp/pip-install-w18zep_z/sqlalchemy/setup.py'""'""';f=getattr(tokenize, '""'""'open'""'""', open)(__file__);code=f.read().replace('""'""'\r\n'""'""', '""'""'\n'""'""');f.close();exec(compile(code, __file__, '""'""'exec'""'""'))' egg_info --egg-base /tmp/pip-pip-egg-info-grlr5ggh\n cwd: /tmp/pip-install-w18zep_z/sqlalchemy/\n Complete output (8 lines):\n Traceback (most recent call last):\n File ""<string>"", line 1, in <module>\n File ""/tmp/pip-install-w18zep_z/sqlalchemy/setup.py"", line 1, in <module>\n from ez_setup import use_setuptools\n File ""/tmp/pip-install-w18zep_z/sqlalchemy/ez_setup.py"", line 85\n except pkg_resources.VersionConflict, e:\n ^\n SyntaxError: invalid syntax\n ----------------------------------------\nERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.\nWARNING: You are using pip version 20.2; however, version 20.2.2 is available.\nYou should consider upgrading via the '/home/solver/venv/bin/python3 -m pip install --upgrade pip' command.\n",['SyntaxError: invalid syntax'],SyntaxError,YES


### Check errors for a specific package <a id='c0'></a>

In [51]:
package_name = "fbprohet"

for cluster_number in stat_df['cluster_name'].values:
    print(cluster_number)
    cluster_df = get_data_from_cluster(error_df, cluster_labels, cluster_number)[['package_name', 'package_version', 'solver','message', 
                                                        'specific_error', 'CLASS', 'MachineDefinedError?']]
    
    records = cluster_df[cluster_df['package_name'] == "fbprohet"].shape[0]

    
    if records > 0:
        
        print("records", records)

6
0
3
0
0
0
2
0
5
0
1
0
7
0
9
0
12
0
14
0
4
0
11
0
13
0
21
0
8
0
19
0
20
0
16
0
10
0
24
0
18
0
17
0
38
0
31
0
34
0
32
0
30
0
29
0
25
0
27
0
26
0
23
0
15
0
36
0
28
0
33
0
22
0
37
0
35
0
39
0


### Checkin errors type for a specific cluster <a id='c0'></a>

In [109]:
number_cluster = 0

for un in get_data_from_cluster(error_df, cluster_labels, number_cluster)[['package_name', 'package_version', 'solver','message', 
                                                    'specific_error', 'CLASS', 'MachineDefinedError?']].sort_values(by='specific_error')['specific_error'].unique():
    
    print(un)
    print("Count: ", error_df[error_df['specific_error'] == un].shape[0])

1345
["SyntaxError: Missing parentheses in call to 'exec'"]
Count:  52
["SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Created:', bat_path)?"]
Count:  10
["SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Unsupported operating system:',os.name)?"]
Count:  5
["SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Windows users please use github installation.')?"]
Count:  47
["SyntaxError: Missing parentheses in call to 'print'. Did you mean print('adding pytz')?"]
Count:  5
["SyntaxError: Missing parentheses in call to 'print'. Did you mean print('found system libevent for', sys.platform)?"]
Count:  6
["SyntaxError: Missing parentheses in call to 'print'. Did you mean print('installing data to', datapath)?"]
Count:  12
["SyntaxError: Missing parentheses in call to 'print'. Did you mean print('send: GET CACHE %s'%uri)?"]
Count:  5
["SyntaxError: Missing parentheses in call to 'print'. Did you mean print(80 * '*')?"]
Co