Skip to content
Deployment configuration for project Thoth
Python Gherkin Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.git_init/hooks
.github/ISSUE_TEMPLATE Update issue templates Jul 24, 2018
bots Use safe_load() instead of load() Mar 13, 2019
doc Introduce Amun sync namespace Jul 25, 2019
features Unify headers, bump year in copyright Feb 1, 2019
grafana minor changes Dec 6, 2018
openshift
playbooks removed janusgraph May 27, 2019
tep added motivation for this tep Nov 6, 2018
.gitignore
.thoth.yaml Add Thoth's configuration file Mar 18, 2019
.vault_pass
.zuul.yaml
CHANGELOG.md
LICENSE
OWNERS added the OWNERS file, it is used by Sesheta and has a certain format: Apr 21, 2018
Pipfile addressing CVEs and relocked Feb 7, 2019
Pipfile.lock addressing CVEs and relocked Feb 7, 2019
README.rst Updates for Dgraph May 15, 2019
VERSION
fdl-1.3.txt added GNU FDL text, started drafting TEP1 Nov 5, 2018
git-clone-repos.yaml
git-update-repos.yaml ansible playbooks for cloning repos and updating repos Jun 27, 2018
requirements.yaml Removed janusgraph service Jun 20, 2019
tox.ini added e2e test for analysing container images Apr 23, 2018

README.rst

Thoth-core

Welcome to the Thoth-core project README file!

The main aim for this project is to provide a deployment for Thoth core components. For more information about Thoth project and its goals see the Thoth repository.

Installation

The Ansible playbooks require a few Roles to be unstalled, and a vault password.

git clone https://github.com/thoth-station/core
cd core
ansible-galaxy install --role-file=requirements.yaml --roles-path=/etc/ansible/roles --force # to update any existing role
vim playbooks/group_vas/all/vars  # review deployment parameters
ansible-playbook playbooks/provision.yaml

Deprovisioning

oc login <OCP_URL>
cd core
ansible-playbook playbooks/deprovision.yaml --extra-vars THOTH_NAMESPACE=<NAMESPACE>

See operations documentation for more info.

Architecture Overview

The whole deployment is divided into multiple namespaces (or OpenShift projects) - thoth-frontend, thoth-middletier, thoth-backend, inspection and tensorflow build.

https://raw.githubusercontent.com/thoth-station/core/master/doc/architecture.png

Some components are deployed multiple times. They serve the same purpose, but the operated namespace is parametrized. For example cleanup-job responsible for cleaning backend namespace is in the architecture overview shown as a rectangle with curly braces donating namespace which is operated by the cleanup-job. The same applies for other components, such as workload-operator.

Frontend Namespace

The thoth-frontend is used as a management namespace. Services running in this namespace have usually assigned a service account for running and managing pods that are available inside the thoth-middletier and thoth-backend namespaces.

A user can interact with the user-facing API that is the key interaction point for a user or bots. The user-facing API specifies its endpoints using Swagger/OpenAPI specification. See Thamos repo for a library/CLI for interacting (and its documentation) with the user API service and the user API service repo itself for more info.

Besides user API there are run periodically CronJobs that keep application in sync and operational:

  • cleanup-job - a job responsible for cleaning up resources left in the cluster
  • graph-refresh-job - - a job responsible for scheduling analyses of packages that were not yet analyzed
  • graph-sync-job - a job responsible for syncing data in JSON format persisten on Ceph to DGraph database
  • package-releases-job - a job responsible for tracking new releases on Python's package index (the public one is PyPI.org)
  • cve-update-job - a job responsible for gathering CVE information about packages
  • workload-operator - an OpenShift operator responsible for scheduling jobs into namespaces, it respects allocated resources dedicated for the namespace in which jobs run
  • graph-sync-operator - an OpenShift operator responsible for scheduling graph-sync-jobs that sync results of analyzer job runs into graph database

Middletier Namespace

The middletier namespace is used for analyzes and actual resource consuming tasks that compute results for Thoth's database. This namespace was separated from the frontend namespace to guarantee application responsibility. All pods that requiure compute results for the database are scheduled in this namespace. This namespace has an allocated pool of resources for such un-predicable amount of computational pods needed for this purpose (so pods are not scheduled besides running user API possibly making user API non-responsive).

As some of the analyses performed can execute possibly malicious code, this namespace is guarded using network policy rules to run containers in fully isolated environment (besides namespace separation). A special service - result API abstracts away any database operations (that can be possibly dangerous when executing an untrusted code). Each analyzer that is run in this namespace seriales its results to a structured (text) JSON format and these results are submited to the user API service that stores results in the Ceph object storage. All results computed by Thoth are first stored in JSON format for later analyses and making the graph instance fully recovarable and reconfigurable based on previous results.

Currently, there are run following analyzers in the middletier namespace:

  • package-extract - an analyzer responsible for extracting packages from runtime/buildtime environments (container images)
  • solver - an analyzer run to gather information about dependencies between packages (on which packages the given package depends on?, what versions satisfy version ranges?) and gathers observations such as whether the given package is installable into the given environment and if it is present on a Python package index
  • dependency-monkey - an analyzer that dynamically constructs package stacks and submits them to Amun for dynamic analysis (can be the given stack installed?, what are runtime observations - e.g. performance index?) (this is currently WIP)

Backend Namespace

The backend part of application is used for executing code that, based on gathered information from analyzers run in the middletier namespace, compute results for actual Thoth users (bots or humans).

This namespace is, as in the case of middletier namespace, allocated pool of resources that serve in this case user requests. Each time a user requests a recommendation to be computed, pods are dynamically created in this namespace to compute results for users.

As of now, there are run the folowing analyzers to compute recommendations for a user:

  • adviser - a recommendation engine computing stack level recommendations for a user for the given runtime environment
  • provenance-checker - an analyzer that checks for provenance (origin) of packages so that a user uses correct packages from corrent package sources (Python indexes) - note that Python packaging format does not guarantee this - neigher Pipenv nor pip itself! (the implementation now lies besides adviser)

Amun

Amun is a standalone project within Thoth - it's aim is to act as an execution engine. Based on requests comming in from Thoth itself (dependency-monkey jobs), it can build the requested application (create builds and image streams) on requested runtime environment (a container base image with optionally additional native packages installed in) and execute the supplied testsuite to verify whether the given application stack works on targeted hardware (also part of the dependency-monkey request). The result of Amun API are "observations" from inspection jobs (build and run inspections). These observations are subsequently synced into the graph database as part of graph-sync-job.

For more information see Amun API repository and autogenerated Amun client.

Thamos

Thamos is a CLI tool created for end-users of Thoth. Thamos offers a simple command line interface to consume Thoth's advises (recommendations) and Thoth's provenance checks both done against data stored in the Graph database.

Kebechet

Another consumer of Thoth's data is a bot called Kebechet that operates directly on repositories on hosted on GitHub or GitLab and it opens pull requests or issues automatically for users.

TensorFlow build pipeline

The TensorFlow build pipeline was designed and implemented to build and release optimized TensorFlow builds. This pipeline is automatically triggered on new TensorFlow releases via package-releases-job that checks new releases on PyPI.

The TensorFlow build pipeline can be used using its release API - there can be triggered build of TensorFlow wheels in a specific configuration.

Cluster requirements

In order to create NetworkPolicy objects, there needs to be enabled the ovs-networkpolicy plugin - see docs for more details and OpenShift 3.5 or newer as NetworkPolicy objects were introduced starting OpenShift version 3.5 as a tech preview.

As of now, NetworkPolicy is not applied so there are no network restrictions to created pods. This enables pods to reach outside world without any fine-granted control. That is not that critical as containers running inside pods have restricted execution time, restricted resource requirements and run in a separate namespace.

The implementation of NetworkPolicy restriction is not ready - ideally there should be made an API call to Kubernetes master to create a new NetworkPolicy that would be applied to the pod created in the proceeding API call (using unique label selectors per pod creation).

You can’t perform that action at this time.