Skip to content

Home

nawkboy edited this page Oct 14, 2014 · 76 revisions

Artifactory Now Supports PyPI

Artifactory recently added support for PyPI repositories. http://www.jfrog.com/confluence/display/RTF/PyPI+Repositories

Many of the continuous delivery challenges in Python build tooling probably still exist.

Project Overview

Defend Against Fruit is focused on providing a pragmatic, continuous deployment-style build system for Python. Current Python build systems do not properly account for the needs of effective continuous deployment. This package extends the Python tooling to add the missing pieces.

Project Goals - Why Should You Care?

With an eye to agile development principles and fast feedback, we want a build system which satisfies the following goals:

  • Goal 1: Every SCM change-set committed should result in a potentially shippable release candidate.
  • Goal 2: When a defect is introduced, we want to immediately detect and isolate the offending SCM change-set. This is true even if the defect was introduced into a library we depend upon.
  • Goal 3: Library management should be so easy as to never impede code changes, even in multi-component architectures.

All of these goals are easily satisfied by effective continuous deployment-style build tooling. Much of what is needed to accomplish this in the Python ecosystem is available using off-the-shelf tooling, but the last few pieces of the puzzle were missing.

The missing pieces were:

Defend Against Fruit extends the Python tooling to add the missing pieces.

Please make this project obsolete

We hope the lessons we have learned, the code, and the design patterns described here will encourage the Python community to improve the current Python build tooling. Nothing would make us happier than to quickly move back to a standard, out-of-the-box solution.

Understand Theory Before Using

A good understanding of continuous deployment will go a very long way in helping you to understand this project. Please take the time to read the theory sections below before diving into the details of our Python specific implementation.

When you are ready, the nitty-gritty details can be found on the How-To page.

Fast Feedback is Critical

The primary goal of a continuous integration and deployment system is to minimize the time between making a code change and validating the change. The need for quick validation extends from first-level validation by automated unit tests, to validating whether a new feature is as important to the consumer as predicted. Making fast feedback a practical reality requires a significant amount of automation, but that is a (big) implementation detail.

Fast Feedback in the Agile Literature

Just to drive the point home, here are a couple examples from the Agile literature dealing with the need for fast feedback:

  • From the Agile Manifesto:

    Respond to change over following a plan

  • In Chapter 4 of "The Principles of Product Development Flow: Second Generation Lean Product Development", Donald Reinertsen puts forth his 15th principle:

    V15:The Principle of Iteration Speed: It is usually better to improve iteration speed than defect rate.

    Reinertsen explains that halving the iteration length while maintaining a 2 percent defect rate is 25 times better than halving the defect rate to a 1 percent defect rate.

    Reinertsen is talking about defects in terms of product features, but the concept is just as applicable to the build system feedback cycle. With a significant number of teams, libraries, and subsystems, poor tooling can easily introduce days if not weeks of delay. Good tooling can frequently reduce this time down to a matter of minutes.

Additional Motivations For Project

A bit more detail on our thoughts on continuous deployment and the motivation for building this package can be found on the following Wiki pages.

Role of Build Artifact Repository in Continuous Deployment

Continuous deployment tooling typically introduces a build artifact repository for storing libraries, potential release candidates, and other related build artifacts.

The following simplified diagram shows the relationship of a build artifact repository to the other aspects of your build system infrastructure:

Simplified CI Flow Diagram

As shown in the diagram, any source code changes trigger the continuous integration system. The continuous integration system then executes the build system, which does all the heavy lifting. At the very end, the build system will publish any relevant build artifacts into the build artifact repository.

A more detailed description of an artifact repository can be found in the Build Artifact Repository Defined Wiki page.

For guidance on whether your environment is complex enough to justify the overhead of a build artifact repository, see When to Use A Build Artifact Repository.

Artifact Repository Related Project Goal

The directly related project goal is:

A build artifact repository makes it possible to manage a very large number of libraries with relatively low overhead.

How Changes Cascade Through A Continuous Integration Environment

It is useful to understand how changes to an upstream library are percolated to downstream systems. The diagram below explains the process far better than words alone can.

At this level of detail, there is no significant difference between the tooling needs of continuous deployment and more basic continuous integration. In many ways, a continuous deployment system can be seen as a more specialized and evolved continuous integration system.

Cascading Dependencies Diagram

A few observations that may not be immediately obvious from the diagram:

  • When the build system deployed dependency B to the artifact repository, it also included meta-data detailing B's dependencies. Therefore, the build system for C is able to work out that a specific version of A will also be needed. Not all dependency management solutions support transitive dependencies, but mature solutions such as Ivy, Maven and Gradle do.
  • Dependencies X and Y represent external third-party libraries available from the artifact repository. Assuming transitive dependency support, only A needs to specify that it depends on X and Y.
  • Each CI build configuration has a trigger for source control changes and a trigger for upstream dependency changes. A more evolved system will make use of transitive dependency meta-data to self-configure upstream triggering details.

    Supporting an improved upstream trigger in Python would require better static dependency meta-data than currently exists. The build system must first be capable of generating a directed, acyclic graph of dependencies, or at least a viable build order. Once that is in place, a custom plug-in to the CI tool (or similar glue code) can be built that integrates with the change event notification mechanism of the artifact repository.

Continuous Integration Related Project Goals

The continuous integration related project goals are:

Goal 2 is the defining characteristic of continuous integration. Goal 3 is addressed by making it possible to rapidly validate any code change, even when multiple libraries are involved.

Continuous integration provides tremendous value, even if every build doesn't result in a release candidate as required by Goal 1.

Artifact Promotion and Advanced Artifact Repository Features

Defend Against Fruit makes use of a promotion-based artifact version scheme to support the needs of automated continuous deployment. If you are not already somewhat familiar with the difference between a SNAPSHOT and a promotion versioning scheme, please see the Comparing Snapshot and Promotion Version Schemes Wiki page.

Key features of a promotion based artifact version scheme include:

  • Every incremental change in source control results in a uniquely versioned build artifact.
  • As a build artifact is promoted, its downstream visibility is affected.

Introducing the Library Analogy Diagram

Our design for build artifact management is fairly close to library management of physical books. Processes and actors in the physical realm are labeled in black, with their analogous build artifact management design pairing labeled in red.

Interpretation Hints

  • Reading the Artifactory documentation on Managing Repositories while you study the library analogy diagram will help.
  • Each shelf level in the library analogy corresponds to a separate "local" repository in Artifactory.
  • Each of the drawers in the library card catalog labeled as "Virtual Repo Indexes" corresponds to a separate "virtual" repository in Artifactory.
  • Any co-released set of code-bases will typically consume dependencies via the same "virtual" repository in Artifactory.
  • There is usually a strong correlation between a co-released set of code-bases and a group of engineers who actively collaborate. It may therefore be easier to think of a "virtual" repository as being associated with a given group of engineers.
  • The concept of promoting build artifacts to higher level repositories as they successfully pass through a gauntlet of tests slightly strains the analogy with a physical library.
  • In a continuous delivery presentation by Jez Humble and Martin Fowler, a release candidate is described as a hero of a Greek myth that must progress through a series of tests in order to be victorious. Promotion of a finished book to a higher level shelf in the library analogy is equivalent to a build artifact successfully passing another round of tests on its way to production.

Library Analogy Diagram

Additional Details

  • Virtual repository indices can be amalgamated from any set of other indices.
  • No artifact should be promoted above the promotion level of any of its dependencies. For example, an otherwise production-ready software build which is using pre-production libraries cannot be shipped. If we did so, we might not be able to reproduce the build a decade later.
  • Generally, a given set of code should not leverage build artifacts from lower promotion levels. If this is necessary, pay particular attention to subsystem and organizational boundaries.
  • Although only 3 levels of promotion are shown, every significant milestone a build artifact successfully passes will typically correspond to a separate promotion level.
  • Builds that produce multiple artifacts should ensure all artifacts are published atomically.

Promotion Level and Visibility

Visibility Management Requirement

You may frequently collaborate with your immediate family members while you are still in your pajamas, and a bit disheveled. Only once you have cleaned up and are presentably dressed will you go out in public.

A similar pattern occurs with artifacts in an artifact repository. Developers working together on closely related code-bases may need one code-base to consume an unpolished library from another code-base. The developers may not be ready to show the same unpolished changes to a broader audience.

Artifact Visibility Guidelines

Artifact visibility and promotion level are closely related concepts in continuous deployment. Although build tooling can be configured to consume any set of artifact repositories, a sensible configuration will usually follow these guidelines:

  • Compilation of libraries within a subsystem boundary is unaffected by the promotion of an artifact within the same boundary.
  • Compilation of libraries in a separate subsystem boundary will not have visibility to an artifact until it reaches a sufficiently high promotion level.

Speaking specifically to "compilation" was not coincidental. Promotion of an artifact from one promotion level to the next is frequently triggered by the successful execution of an automated high-level integration test. Such an integration test may very well have different artifact visibility constraints than the build tooling responsible for compiling the code.

A subsystem boundary in this case is any tightly related set of co-developed build artifacts (think immediate family members). The subsystem discussed here would likely, but not necessarily, correspond to the code-bases under an individual service of a service oriented architecture. Other names for "subsystem boundary" in this context would be "library cohesion boundary" or "collaboration group boundary".

Strictly speaking, there is a subtle difference between these three terms. Obsessing over the distinction at this point is likely to be more confusing than enlightening.

Simplified Visibility Diagram

The following diagram shows multiple consumers of a build artifact repository, and how different consumers are granted different levels of visibility. The consumers in the lowest purple box are able to see any version of the artifact regardless of artifact promotion level (shelf). The consumers in the outer-most green box are only able to see versions of the artifact at promotion level 3. The consumers in the blue box can see versions of the artifact at levels 2 and 3 but not at level 1.

In practice, the consumers shown as stick figures correspond to the build systems of particular code-bases.

Promotion Affects Visibility

Configurable Collaboration Groups

The Simplified Visibility Diagram diagram above shows a rather simplistic view of promotion and visibility. When using a sophisticated artifact repository manager such as Artifactory, one typically creates a variety of "virtual" repositories. A virtual repository amalgamates other repositories to create a new synthetic repository. Without virtual repository support within the repository manager, similar functionality would need to be baked into other parts of your build tooling. Whether the virtual repositories you create follow the artifact visibility guidelines detailed earlier is up to you.

Even within the constraints of the artifact visibility guidelines there is a great deal of useful flexibility. Just as you choose who your close friends are, a development team can choose who their close collaborators are. What these collaboration boundaries turn out to be is usually reflected in the virtual repository configurations. The example diagram and subsequent discussion below will provide a bit more clarity.

Collaboration Groups Diagram

Configurable Collaboration Groups

Example Scenario Modeled in Diagram

Code Structure

Let us assume we have three development teams, each working on their own subsystems. Each development team has their own set of local repositories. As with the library analogy, we will assume there are only three promotion levels for any artifact regardless of subsystem. Assuming the team is using Artifactory, each subsystem will need three separate local repositories in Artifactory. For example subsystem A will have repo-A1-local, repo-A2-local, and repo-A3-local; one "local" repository for each promotion level.

Let us further assume that subsystem A and subsystem C have monolithic builds which produce a single library each (represented by purple and blue balls respectively). On the other hand, subsystem B is composed of two independently built and versioned libraries (represented by the green and red balls). Although the diagram has been simplified to only show a single version of each library per promotion level, in practice there are usually several versions of a library stored at a given promotion level.

Development Team Dynamics

In this example, subsystem A only barely interacts with subsystem B and never directly interacts with subsystem C. The members of Team A barely know the members of the other teams, and therefore don't really trust them.

In contrast, subsystem B and subsystem C are closely related. Developing a feature in subsystem B frequently involves first making a change in subsystem C. It is assumed the developers in Team A and Team B are good friends, and occasionally pair program with each other to make changes that cut across subsystems B and C. The members of Teams B and C are trusted with commit access to each other's source control repositories. Committing to a sister team's source repository is considered culturally acceptable as long as a code review by the owning team is done first.

Virtual Repository Implications

In addition to the nine "local" repositories, three "virtual" repositories have been established. The local repositories amalgamated by each virtual repository are shown as collaboration groups on the diagram. The virtual repository configurations for each subsystem are as follows:

Notice the virtual repository configurations are actually associated with particular code-bases, rather than the development teams themselves. Developers are migratory, code bases are not.

  • Virtual Repo: group-X

    Includes:

    • repo-A1-local
    • repo-A2-local
    • repo-A3-local
    • repo-B3-local

    Used By: Build tooling for subsystem A

  • Virtual Repo: group-Y

    Includes:

    • repo-B1-local
    • repo-B2-local
    • repo-B3-local
    • repo-A3-local
    • repo-C3-local

    Used By: Build tooling for subsystem B

  • Virtual Repo: group-Z

    Includes:

    • repo-C1-local
    • repo-C2-local
    • repo-C3-local
    • repo-B3-local
    • repo-B2-local

    Used By: Build tooling for subsystem C

Accumulation of Validation Artifacts

As already discussed, a build artifact becomes worthy of promotion by successfully passing a series of tests at each promotion level. As an artifact is promoted, it is often important to record the results of any tests performed. One solution is to associate an archive of the test results with the build artifact and promote them together going forward.

We anticipate implementing this by adding additional test report artifacts to existing build-infos within Artifactory.

Notice that although an individual build artifact is immutable, the meta-data about the build artifact is not.

Artifact Visibility vs. Attributes

So far, we have been discussing build artifact promotion as a simple linear progression through a series of artifact repositories (shelves). It may occasionally be desirable to concurrently perform a series of validations before changing the visibility of an artifact.

Artifactory allows the placement of arbitrary properties on a build artifact. A custom promotion script could easily require an artifact to have certain property values before being promoted to a higher level repository. For example, a promotion script could require "smoketestA=true" and "smoketestB=true" before allowing a build artifact to be promoted to a given local repository (level of visibility).

Artifactory's virtual repositories configurations are currently expressed as an amalgamation of other repositories. That implies a change in artifact visibility is equivalent to moving the artifact to another local repository. This view is reinforced by Artifactory's build promotion call in the web service API.

Supporting different levels of artifact visibility is a valuable feature for any build artifact repository. Doing so by promoting artifacts through a series of local repositories is an implementation detail of Artifactory. For a completely different perspective consider workflows in Go or a build pipeline using Bamboo.

Artifact Promotion Related Project Goals

Artifact promotion and the advanced artifact repository features described are compatible with every project goal.

A promotion based artifact version scheme makes Goal 1 possible while still supporting the needs of Goal 2 and Goal 3. A snapshot version scheme can't do this, nor can any other large scale solution we are aware of.

Polyglot Continuous Deployment

Polyglot Build System with Build Artifact Repository Diagram

If a release candidate involves multiple language ecosystems, then the continuous deployment environment must somehow support every ecosystem involved. At scale, this means somehow providing build artifact repository support for every language ecosystem involved. It also requires a mechanism for automatically orchestrating the various builds.

This is a big task, and time is precious. The challenge is to figure out how to provide an effective solution with a reasonable level of investment. The world pays us for the cool products we make, not for time spent building the tools used to make them.

Avoid Black Holes

Constructing a single, all-encompassing build system is a Sisyphean task. At best, it is a full-time effort better left to companies like GradleWare who make their living selling build tools. Most of us are better off riding on the coattails of existing build system products. If a build solution like Gradle has a mature solution for every language ecosystem required, then by all means use it. Unfortunately, it doesn't always work out that way.

If using a single build tool is not an option, the best remaining choice is a thin orchestration layer above the various ecosystem specific build tools. The lowest cost of ownership is typically achieved by keeping the orchestration layer as thin as possible. A loosely coupled orchestration layer will also make it easier to take advantage of advances in the ecosystem-specific build tools.

Choosing an Artifact Repository Manager

Large scale continuous deployment efforts will probably benefit from a build artifact repository. If you're lucky, each language ecosystem will already have a mature continuous deployment-friendly artifact repository solution available. If you're very lucky, you will even be able to install a single low-maintenance solution that serves the needs of every ecosystem. If you're blessed by the gods of software, or you choose your ecosystem wisely, the best solution will be mature, full-featured, and open-source. Obviously, we were not that lucky.

If you have to bolt on a build artifact repository manager, the most mature solutions currently available got their start as Maven repository managers. That makes for the following rather short list:

Applying the following criteria to Python builds:

  • build promotion support
  • "simple" PyPI layout support
  • lowest total cost of ownership
  • reasonable development time

narrows the choice down to Nexus Pro and Artifactory Pro. The open-source offerings might have been made to work, but more precious development time would have been required to supplement missing functionality.

Polyglot Related Project Goals

Defend Against Fruit does not directly address the needs of a polyglot architecture, but it doesn't get in the way either. Some of the Python modules used in creating and publishing build information into Artifactory might be useful in a Python based orchestration layer.

See Also

Something went wrong with that request. Please try again.