# Performance TensorFlow==2.1.0 Software Stack Combinations Errors

The purpose of this notebook is to show the errors present in software stacks during run using PIimport.

## Inputs for Dataset created using Amun

All data have been gathered using [Amun Service](https://github.com/thoth-station/amun-api) and [Performance Indicators](https://github.com/thoth-station/performance) evaluated by Thoth Team.


## TensorFlow builds

Tensorflow builds have been created considering combinations of the following parameters:

**Software stacks and native dependencies**

All inspections use a combination of all stacks from the dependencies of TensorFlow in version 2.1.0. 


  * `upstream TensorFlow` - `tensorflow==2.1.0` available on PyPI (inspections prefixed with `tf`)

**OS images**

  * `rhel-8` 

**Python Interpreters**

  * `3.6` 
  
**Hardware**

No node pinning used, any hardware available on OCP is used. No GPU was used. 
Analysis across inspection run will show which hardware have been identified.

`Number of CPUs` used to run is selected a priori as input to Amun: 1

## Performance indicators
Performance Indicators (PI) used for performance analysis:

  * [import](https://github.com/thoth-station/performance/blob/master/tensorflow/import.py)

Each performance indicator was run `1 times` per inspection run (`batch size == 1`), performance indicators reported median of inspections to be further compared.

## Dataset content

Inspection specification, build logs, job logs, hardware information of the node where the performance indicator was run and the actual inspection job result are included in the dataset.

No buildtime errors spotted with the tested stack.

There are some runtime errors spotted with specific stack.


## Analysis

Analysis show which versions exactly failed during run, so that Thoth can discard packages failing during run.

## Assign environment variables and import libraries

In [None]:
%env THOTH_CEPH_KEY_ID=LLEzCoxu7pvjzO4inoL8
%env THOTH_CEPH_SECRET_KEY=1HnDVoIS2jt3h3xEpgeQlCX5+FeOUH0wOrvWVvZP
%env THOTH_CEPH_BUCKET_PREFIX=thoth
%env THOTH_S3_ENDPOINT_URL=https://s3-openshift-storage.apps.smaug.na.operate-first.cloud
%env THOTH_CEPH_BUCKET=opf-datacatalog
%env THOTH_DEPLOYMENT_NAME=datasets

In [None]:
from thoth.report_processing.components.inspection import AmunInspections
from thoth.report_processing.components.inspection import AmunInspectionsSummary
from thoth.report_processing.components.inspection import AmunInspectionsStatistics
from thoth.report_processing.components.inspection import AmunInspectionsFailedSummary

inspection = AmunInspections()
inspection_runs_summary = AmunInspectionsSummary()
inspection_statistics = AmunInspectionsStatistics()
inspection_failed_summary = AmunInspectionsFailedSummary()

import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 1000)
pd.set_option('display.width', 1500)
pd.set_option('display.max_colwidth', 400)
pd.options.plotting.backend = "plotly"  # Convert to matplotlib

In [None]:
inspections_identifiers = ['tf-dm-six']  # List of identifiers for the analysis

## Retrieve and process data

In [None]:
inspection_runs = inspection.aggregate_thoth_inspections_results(
    inspections_identifiers=inspections_identifiersS
)

In [None]:
processed_inspection_runs, failed_inspection_runs = inspection.process_inspection_runs(
    inspection_runs,
)

In [None]:
inspections_df = inspection.create_inspections_dataframe(
    processed_inspection_runs=processed_inspection_runs,
    include_statistics=True
)

In [None]:
inspections_df.head()

# Inspections summary report

In [None]:
report_results, _ = inspection_runs_summary.produce_summary_report(inspections_df=inspections_df)

## Hardware

In [None]:
report_results["hardware"]['platform'].head()

In [None]:
report_results["hardware"]['processor']

In [None]:
report_results["hardware"]['flags']

In [None]:
report_results["hardware"]['ncpus']

In [None]:
report_results["hardware"]['info']

## Operating System

In [None]:
report_results["base_image"]['base_image']

In [None]:
report_results["base_image"]['number_cpus_run']

## Performance Indicator

In [None]:
report_results["pi"]['pi']

## Software Stack

In [None]:
report_results["software_stack"]['requirements_locked'].head()

In [None]:
python_packages_dataframe, python_packages_versions = inspection.create_python_package_df(inspections_df=inspections_df)
python_packages_dataframe.head()

# Failed Inspection Summary

In [None]:
failed_inspections_df = inspection.create_inspections_dataframe(
    processed_inspection_runs=failed_inspection_runs,
)

In [None]:
failed_inspections_df.head()

In [None]:
comparison_df = inspection_failed_summary.show_software_stack_differences(
    inspections_df,
    failed_inspections_df
)
comparison_df

In [None]:
comparison_df[comparison_df['package'].isin(['six', 'urllib3'])]