# Pipeline units in a software stack resolution process

| Info | Data |
| ------:| -----------:|
| **Author** | Fridolin Pokorny <fridolin@redhat.com> |
| **Date** | 27th Oct 2020 |
| **Last change** | 27th Oct 2020 |

![Resolution pipeline](https://github.com/thoth-station/adviser/raw/master/docs/source/_static/pipeline.gif?raw=true)

This Jupyter Notebook demonstrates pipeline units and pipeline configuration in [Thoth's adviser](https://github.com/thoth-station/adviserhttps://github.com/thoth-station/adviser). The scenario shown resolves ``intel-tensorflow==2.0.1`` instead of ``tensorflow==2.1.0`` based on pipeline configuration supplied to the resolution process. Follow [online documentation of project Thoth for more info](https://thoth-station.ninja/docs/developers/adviser/https://thoth-station.ninja/docs/developers/adviser/).

## Importing required bits and library versions

In [1]:
import yaml
import random
import sys
from pprint import pprint

from thoth.adviser import Resolver
from thoth.adviser import PipelineBuilder
from thoth.adviser import PipelineConfig
from thoth.adviser import RecommendationType
from thoth.adviser import __version__
from thoth.python import Project
from thoth.common import RuntimeEnvironment
from thoth.common import init_logging
from thoth.storages import GraphDatabase
import thoth.adviser.predictors as predictors

init_logging()
print("Adviser version: ", __version__)

2020-10-27 22:33:49,633 1345444 INFO     thoth.common:366: Logging to rsyslog endpoint is turned off


Adviser version:  0.19.0


## Project instantiation

We declare a dependency ``tensorflow==2.1.0`` which runs on Red Hat Enterprise Linux 8 (linux, x86_64). The notebook will use pre-aggregated knowledge stored and exposed locally. See [thoth-station/storages](https://github.com/thoth-station/storages) for more info on how to setup a local database instance.

In [2]:
PIPFILE = """
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
tensorflow = "==2.1.0"

[requires]
python_version = "3.6"
"""

# The runtime environment configuration can capture various parameters. We use just OS information, Python version and platform info. Information about hardware is unused.
runtime_environment = RuntimeEnvironment.from_dict({
    "hardware": {
        "cpu_family": None,
        "cpu_model": None
    },
    "operating_system": {
        "name": "rhel",
        "version": "8"
    },
    "python_version": "3.6",
    "cuda_version": None,
    "platform": "linux-x86_64"
})
project = Project.from_strings(PIPFILE, runtime_environment=runtime_environment)

## Pipeline configuration

Next, we will create a pipeline configuration we want to use during the software stack resolution process.

In [3]:
_PIPELINE_CONF = """
boots:
- configuration:
    package_name: null
  name: PythonVersionBoot
- configuration:
    package_name: null
  name: RHELVersionBoot
- configuration:
    default_platform: linux-x86_64
  name: PlatformBoot
- configuration:
    package_name: null
  name: FullySpecifiedEnvironment
pseudonyms:
- configuration:
    package_name: tensorflow
    package_version: "2.1.0"
    index_url: "https://pypi.org/simple"
    aliases:
      - package_name: intel-tensorflow
        package_version: "2.1.0"
        index_url: "https://pypi.org/simple"
      - package_name: intel-tensorflow
        package_version: "2.0.1"
        index_url: "https://pypi.org/simple"
  name: AliasPseudonym
sieves:
- configuration:
    package_name: null
    without_error: true
  name: SolvedSieve
steps:
- configuration:
    package_name: "intel-tensorflow"
    package_version: "2.1.0"
    index_url: "https://pypi.org/simple"
    score: -0.2
  name: SetScoreStep
- configuration:
    package_name: "intel-tensorflow"
    package_version: "2.0.1"
    index_url: "https://pypi.org/simple"
    score: 1.0
  name: SetScoreStep
- configuration:
    package_name: "protobuf"
    package_version: "3.11.1"
    index_url: "https://pypi.org/simple"
    score: -0.5
  name: SetScoreStep
strides: []
wraps: []
"""
                           

def get_pipeline_config() -> PipelineConfig:
    """Get pipeline configuration."""
    conf = yaml.safe_load(_PIPELINE_CONF)
    return PipelineBuilder.from_dict(conf)

pipeline_config = get_pipeline_config()

One of the pipelines registered is a pipeline unit called ``AliasPseudonym``. As the name suggests, it's a [pipeline unit of type pseudonym](https://thoth-station.ninja/docs/developers/adviser/pseudonyms.html) which will consider packages as alternatives (pseudonyms). More specifically, it will consider the following two packages:

* intel-tensorflow in version 2.1.0 from PyPI
* intel-tensorflow in version 2.0.1 from PyPI

as pseudonyms to tensorflow 2.1.0 comming from PyPI. One can see this operation as replacing nodes in the dependency graph to generate alternatives - besides tensorflow==2.1.0 from PyPI, the dependency graph will provide also the two alternatives stated. This operation can be done on transitive dependencies as well as on the direct ones.

**Note** Mind the minor and the patch version in ``intel-tensorflow`` packages.

In [4]:
yaml.safe_dump(pipeline_config.to_dict()["pseudonyms"], sys.stdout)

- configuration:
    aliases:
    - index_url: https://pypi.org/simple
      package_name: intel-tensorflow
      package_version: 2.1.0
    - index_url: https://pypi.org/simple
      package_name: intel-tensorflow
      package_version: 2.0.1
    index_url: https://pypi.org/simple
    package_name: tensorflow
    package_version: 2.1.0
  name: AliasPseudonym
  unit_run: false


Let's move on to the next [pipeline unit which is of type sieve](https://thoth-station.ninja/docs/developers/adviser/sieves.html). The main aim of this pipeline unit is to keep dependencies that are solved using [Thoth's solver](https://github.com/thoth-station/solver), meaning the dependency graph can be fully constructed and the resolution can lead to a valid software stack considering Python packaging rules (version range specifications). Moreover, this pipeline unit will filter out all the packages that have installation errors in the target runtime environment. By doing so, we are sure the resolution pipeline produces software stacks that do not fail during application assembling in the target environment.

In [5]:
yaml.safe_dump(pipeline_config.to_dict()["sieves"], sys.stdout)

- configuration:
    package_name: null
    without_error: true
  name: SolvedSieve
  unit_run: false


Now, let's move on to [pipeline units of type step](https://thoth-station.ninja/docs/developers/adviser/steps.html). These pipeline units were primarly designed to score software packages and thus tell the resolution process how good a resolved software stack is. The scoring can consider various aspects of the software stack. An example can be known vulnerabilities of packages or performance aspects of the resolved stack.

For simplicity, we assign scores to the packages explicitly without any semantics. The three pipeline units registered will make sure:

* intel-tensorflow in version 2.1.0 from PyPI will be scored -0.2 (negative score)
* intel-tensorflow in version 2.0.1 from PyPI will be scored 1.0 (high positive score)
* protobuf in version 3.11.3 from PyPI will be scored -0.5 (negative score)

The resolver will use these "observations" to come up with the best possible software stack respecting the score assigned.

In [6]:
yaml.safe_dump(pipeline_config.to_dict()["steps"], sys.stdout)

- configuration:
    index_url: https://pypi.org/simple
    multi_package_resolution: false
    package_name: intel-tensorflow
    package_version: 2.1.0
    score: -0.2
  name: SetScoreStep
  unit_run: false
- configuration:
    index_url: https://pypi.org/simple
    multi_package_resolution: false
    package_name: intel-tensorflow
    package_version: 2.0.1
    score: 1.0
  name: SetScoreStep
  unit_run: false
- configuration:
    index_url: https://pypi.org/simple
    multi_package_resolution: false
    package_name: protobuf
    package_version: 3.11.1
    score: -0.5
  name: SetScoreStep
  unit_run: false


![State space](https://thoth-station.ninja/docs/developers/adviser/images/state_space_interpolated.png)

## Resolution process

Let's proceed to the resolution process. We will use "[Approximating latest](https://thoth-station.ninja/docs/developers/adviser/predictors/latest.htmlhttps://thoth-station.ninja/docs/developers/adviser/predictors/latest.html)" predictor which will try to come up with the most recent packages in the stack, considering their versioning.

In [7]:
%%time

predictor = predictors.ApproximatingLatest(keep_history=False)
resolver = Resolver.get_adviser_instance(
    predictor=predictor,
    project=project,
    recommendation_type=RecommendationType.LATEST,  # Use "latest" recommendation type, has no effect in pipeline units used.
    limit=10000,  # Limit number of software stacks scored.
    count=1,  # We want just one software stack to be shown in the final report.
    beam_width=None,  # No limitation in memory consumption for internal resolver states.
    pipeline_config=pipeline_config,
)

2020-10-27 22:33:49,989 1345444 INFO     alembic.runtime.migration:155: Context impl PostgresqlImpl.
2020-10-27 22:33:49,989 1345444 INFO     alembic.runtime.migration:162: Will assume transactional DDL.


CPU times: user 180 ms, sys: 12.2 ms, total: 192 ms
Wall time: 210 ms


In [8]:
%%time

random.seed(30)  # Set seed to have reproducible results across runs.
resolver.graph.cache_clear()  # Clear the cache so it does not affect speed in multiple invocations.
report = resolver.resolve(with_devel=False, user_stack_scoring=False)

2020-10-27 22:33:50,053 1345444 INFO     thoth.adviser.resolver:1083: No scoring done on user's stack - see https://thoth-station.ninja/j/user_stack
2020-10-27 22:33:50,054 1345444 INFO     thoth.adviser.resolver:1085: Preparing initial states for the resolution pipeline
2020-10-27 22:33:50,055 1345444 INFO     thoth.adviser.resolver:618: Resolving direct dependencies
2020-10-27 22:33:50,281 1345444 INFO     thoth.adviser.resolver:653: Found direct dependency 'tensorflow' with version specification '==2.1.0'
2020-10-27 22:33:50,461 1345444 INFO     thoth.adviser.resolver:1089: Hold tight, Thoth is computing recommendations for your application...
2020-10-27 22:33:57,043 1345444 INFO     thoth.adviser.resolver:1196: Pipeline reached 1 final states out of 10000 requested in iteration 342 (pipeline pace 0.14 stacks/second); top rated software stack in beam has a score of 1.00; top rated software stack found so far has a score of -0.20
2020-10-27 22:34:06,735 1345444 INFO     thoth.adviser

CPU times: user 16.9 s, sys: 399 ms, total: 17.3 s
Wall time: 24.3 s


Results shown below demonstrate that the resolution process found ``intel-tensorflow==2.0.1`` as an alternative to ``tensorflow==2.1.0`` which was originally stated in the requirements file (Pipfile). Moreover, the resolved software stack does not provide specific version of ``protobuf`` which would affect the application stack negatively. All these statements support the pipeline configuration we provided.

The `stack_info` part of the report shows which packages were not considered during the resolution process as they would produce application assembling issues (they cannot be installed into the given runtime environment).

In [9]:
yaml.safe_dump(report.to_dict(), sys.stdout, sort_keys=True, indent=2)

accepted_final_states_count: 10000
discarded_final_states_count: 0
pipeline:
  boots:
  - configuration:
      package_name: null
    name: PythonVersionBoot
    unit_run: true
  - configuration:
      package_name: null
    name: RHELVersionBoot
    unit_run: true
  - configuration:
      default_platform: linux-x86_64
    name: PlatformBoot
    unit_run: true
  - configuration:
      package_name: null
    name: FullySpecifiedEnvironment
    unit_run: true
  pseudonyms:
  - configuration:
      aliases:
      - index_url: https://pypi.org/simple
        package_name: intel-tensorflow
        package_version: 2.1.0
      - index_url: https://pypi.org/simple
        package_name: intel-tensorflow
        package_version: 2.0.1
      index_url: https://pypi.org/simple
      package_name: tensorflow
      package_version: 2.1.0
    name: AliasPseudonym
    unit_run: true
  sieves:
  - configuration:
      package_name: null
      without_error: true
    name: SolvedSieve
    unit_run: tr

Just to compare results obtained above, let's trigger another resolution process, but now we will not provide any ``intel-tensorflow`` packages as pseudonyms and we will not perform any package scoring. The resolved software stack will hold ``tensorflow==2.1.0`` as required by the application (respecting the Pipfile file) and more recent ``protobuf`` that is not penalized.

In [10]:
_PIPELINE_CONF = """
boots:
- configuration:
    package_name: null
  name: PythonVersionBoot
- configuration:
    package_name: null
  name: RHELVersionBoot
- configuration:
    default_platform: linux-x86_64
  name: PlatformBoot
- configuration:
    package_name: null
  name: FullySpecifiedEnvironment
pseudonyms: []
sieves:
- configuration:
    package_name: null
    without_error: true
  name: SolvedSieve
steps: []
strides: []
wraps: []
"""
                           

def get_pipeline_config() -> PipelineConfig:
    """Get pipeline configuration."""
    conf = yaml.safe_load(_PIPELINE_CONF)
    return PipelineBuilder.from_dict(conf)

pipeline_config = get_pipeline_config()

In [11]:
%%time

predictor = predictors.ApproximatingLatest(keep_history=False)
resolver = Resolver.get_adviser_instance(
    predictor=predictor,
    project=project,
    recommendation_type=RecommendationType.LATEST,  # Use "latest" recommendation type, has no effect in pipeline units used.
    limit=1,  # Limit number of software stacks scored.
    count=1,  # We want just one software stack to be shown in the final report.
    beam_width=None,  # No limitation in memory consumption for internal resolver states.
    pipeline_config=pipeline_config,
)

2020-10-27 22:34:14,932 1345444 INFO     alembic.runtime.migration:155: Context impl PostgresqlImpl.
2020-10-27 22:34:14,933 1345444 INFO     alembic.runtime.migration:162: Will assume transactional DDL.


CPU times: user 64 ms, sys: 3.95 ms, total: 67.9 ms
Wall time: 73 ms


In [12]:
%%time

random.seed(30)  # Set seed to have reproducible results across runs.
resolver.graph.cache_clear()  # Clear the cache so it does not affect speed in multiple invocations.
report = resolver.resolve(with_devel=False, user_stack_scoring=False)

2020-10-27 22:34:15,003 1345444 INFO     thoth.adviser.resolver:1083: No scoring done on user's stack - see https://thoth-station.ninja/j/user_stack
2020-10-27 22:34:15,004 1345444 INFO     thoth.adviser.resolver:1085: Preparing initial states for the resolution pipeline
2020-10-27 22:34:15,005 1345444 INFO     thoth.adviser.resolver:618: Resolving direct dependencies
2020-10-27 22:34:15,017 1345444 INFO     thoth.adviser.resolver:653: Found direct dependency 'tensorflow' with version specification '==2.1.0'
2020-10-27 22:34:15,023 1345444 INFO     thoth.adviser.resolver:1089: Hold tight, Thoth is computing recommendations for your application...
2020-10-27 22:34:23,089 1345444 INFO     thoth.adviser.resolver:1196: Pipeline reached 1 final states out of 1 requested in iteration 4925 (pipeline pace 0.12 stacks/second); top rated software stack in beam has a score of 0.00; top rated software stack found so far has a score of 0.00
2020-10-27 22:34:23,568 1345444 INFO     thoth.adviser.res

CPU times: user 5.79 s, sys: 158 ms, total: 5.94 s
Wall time: 8.57 s


In [13]:
yaml.safe_dump(report.to_dict(), sys.stdout, sort_keys=True, indent=2)

accepted_final_states_count: 1
discarded_final_states_count: 0
pipeline:
  boots:
  - configuration:
      package_name: null
    name: PythonVersionBoot
    unit_run: true
  - configuration:
      package_name: null
    name: RHELVersionBoot
    unit_run: true
  - configuration:
      default_platform: linux-x86_64
    name: PlatformBoot
    unit_run: true
  - configuration:
      package_name: null
    name: FullySpecifiedEnvironment
    unit_run: true
  pseudonyms: []
  sieves:
  - configuration:
      package_name: null
      without_error: true
    name: SolvedSieve
    unit_run: true
  steps: []
  strides: []
  wraps: []
products:
- advised_manifest_changes: []
  advised_runtime_environment: null
  justification: []
  project:
    requirements:
      dev-packages: {}
      packages:
        tensorflow: ==2.1.0
      requires: &id001
        python_version: '3.6'
      source:
      - name: pypi
        url: https://pypi.org/simple
        verify_ssl: true
      - name: pypi-org
 