Skip to content

When To Use A Build Artifact Repository

JimVitrano edited this page May 16, 2013 · 5 revisions

When to use a Build Artifact Repository

An in-house build artifact repository such as Artifactory, Nexus, or CheeseShop takes effort to setup and maintain, so why have one? In simple circumstances, you probably shouldn't bother with the overhead. But, as your project and environment grow more complex, an artifact repository is critical to maintaining order and agility.

Formal libraries strengthen the case for an artifact repository

Good software design focuses on clean, well factored, loosely coupled systems. Two common ways such organization is enforced are namespacing and more formal library mechanisms.

Namespace based organization

If a related set of functionality is only used by a single code-base, something as simple as namespace separation is frequently sufficient. As an example, a developer might place all math-related Python modules into a "math" subdirectory. Usage code can then leverage the code with an import line such as:

from math.money import Money

This is not sufficient to avoid circular dependencies between the library namespace and the usage code, but otherwise it does a decent job of grouping related functionality.

Library based organization

When more than one code-base requires access to the same functionality, organizing the common code into its own library has a large number of advantages. These advantages include:

  • Avoiding duplication of code and additional maintenance effort (DRY principle)
  • Avoiding circular dependencies
  • Allowing usage code to selectively specify which version of a library it depends upon
  • Allowing each subsystem referencing the library to maintain an independent release cadence
  • Simplifying cross-team coordination

Library-based organization is typically used in conjunction with namespace-based organization. As an example, a variety of math related utilities might be grouped together in a versioned Python library such as math_stuff-1.2.tgz. Internally, the package might contain separate namespaces (directories) for math.matricies, math.complexnumbers, and math.simple.

Why not use source control features instead of an artifact repository?

If you have experience using complex source control features to weave together an amalgamated code-base, you may question the value of sharing libraries in binary format. For example, many developers working in ClearCase leverage complex config specs to create an amalgamated source view. Similar techniques are possible using Subversion externals or Git submodules.

This approach inevitably results in multiple long-lived branches of the library. The effort of fully validating each change is multiplied by the number of branches the change is merged into. Full validation requires a significant set of activities, typically including: unit testing, integration testing, regression testing, load testing, and conducting code reviews.

Another problem is that this approach doesn't do anything to cut the time required to compile and execute automated tests. For a small code-base this may not be a concern, but with a large enough code-base you can quickly get into builds that take several hours to build and test. As has already been discussed, fast feedback is critical.

Aside: Apologies to any reader who won't be able to sleep at night after seeing ClearCase mentioned.

Why not use CI server features to share libraries?

Some continuous integration servers have the ability to pass artifacts from one build configuration down to dependent build configurations. As an example, take a look at TeamCity's build artifact management features. In simple circumstances, this can be an effective, low-overhead approach to sharing libraries. At scale, it starts to break down. Deficiencies of this approach include:

  • Lack of transitive dependency support
  • Dependency specifications are not co-managed with source code
  • Inability to easily depend upon library versions other than the most recent version

If you only have one or two libraries that are co-developed and co-released, using the CI server build artifact features may be a good choice. For a large scale code-base, the advantages of a proper build artifact repository will easily compensate for the overhead incurred to maintain one.

An artifact repository makes sharing libraries easy

An artifact repository makes it possible to completely automate the distribution and consumption of shared libraries. A typical sequence for manually updating a library is as follows:

  1. One developer builds and tests the library. (This step may be done by a continuous integration server.)

  2. The same developer drops the library into a shared storage location.

  3. Every developer working on code that consumes the library, manually deploys the library to their workstation.

A build system which leverages an artifact repository implements the same activities. It simply does them in a fully automated fashion, picking up the correct available library version for every build.

Sharing in a fully automated fashion may seem like a trivial technical detail, but in practice the amount of time saved is tremendous. Sophisticated automated library management ensures that it is quick to make an upstream library change and rapidly leverage that change in downstream code. This is all about keeping your feedback loops as tight as possible.