Skip to content

Repository synchronization

Vladimir Kotal edited this page Oct 19, 2022 · 67 revisions

While by itself OpenGrok does not provide a way how to synchronize repositories it is shipped with Python script that makes it easy to synchronize.

opengrok-mirror

The script synchronizes the repositories of projects by running appropriate commands (e.g. git pull for Git). While it can run perfectly fine standalone, it is meant to be run from within opengrok-sync (see above).

The script accepts the configuration either in JSON or YAML.

The script assumes that OpenGrok is setup with projects (i.e. use the -P indexer option).

When run in batch mode, the script logs the output to a file for each project. It rotates the logs.

It can be used within the opengrok-sync script - see https://github.com/OpenGrok/OpenGrok/wiki/Per-project-management-and-workflow for more details.

Configuration example

The configuration file contents in YML can look e.g. like this:

#
# Commands (or paths - for specific repository types only)
#
commands:
  hg: /usr/bin/hg
  svn: /usr/bin/svn
  teamware: /ontools/onnv-tools-i386/teamware/bin
#
# The proxy environment variables will be set for a project's repositories
# if the 'proxy' property is True.
#
proxy:
  http_proxy: proxy.example.com:80
  https_proxy: proxy.example.com:80
  ftp_proxy: proxy.example.com:80
  no_proxy: example.com,foo.example.com
hookdir: /tmp/hooks
# per-project hooks relative to 'hookdir' above
logdir: /tmp/logs
command_timeout: 300
hook_timeout: 1200
# as if opengrok-mirror was run with -I
incoming_check: true

#
# Per project configuration.
#
projects:
  http:
    proxy: true
  opengrok-stable:
    disabled: true
  foo:
    # override the incoming check for this project
    incoming_check: false
  userland:
    proxy: true
    hook_timeout: 3600
    hooks:
      pre: userland-pre.ksh
      post: userland-post.ksh
  opengrok-master:
    ignored_repos:
      - testdata/repositories/*
  jdk.*:
    proxy: true
    hooks:
      post: jdk_post.sh
  dpdk-next-net:
    strip_outgoing: true
  special:
    ignore: true

In the above config, the userland project will be run with environment variables in the proxy section, plus it will also run scripts specified in the hook section before and after all its repositories are synchronized. The hook scripts will be run with the current working directory set to that of the project.

The opengrok-master project contains a RCS repository that would make the mirroring fail (since opengrok-mirror does not support RCS yet) so it is marked as ignored.

Repository commands

Repository commands use extended syntax, generally there are two commands utilized by the tools:

  1. incoming check
  2. repository synchronization

The tools internally utilizes the necessary logic to perform these tasks, using the basic repository commands. It is possible to override the repository commands with:

commands:
  git: /usr/local/bin/git
  hg: /usr/bin/hg
  svn: /usr/bin/svn
  # Note: unlike other repository types, Teamware needs a path to the binaries, i.e. directory.
  teamware: /ontools/onnv-tools-i386/teamware/bin

When this basic configuration is not enough for you, it is possible to override the logic by providing custom command for each task:

commands:
  git:
    incoming: ['/bin/echo', 'some new changes!']
    sync: ['git', 'pull']

If you override only one of the commands, the tools will use the default internal logic to perform the other command. For a special case when you want to override one of the commands while using the default routine for the other with different repository command, use following syntax:

commands:
  git:
    # override repository command
    command: /my/custom/git
    # override incoming check with custom command (/my/custom/git is not called for incoming check)
    incoming: ['/bin/echo', 'some new changes!']

Custom sync command

The command is run in the repository directory as the cwd and is expected to return:

  • 0 - for successful synchronization
  • non-zero status - for failed synchronization (with possible error output)

Custom mirroring command

The command is run in the repository directory as the cwd and is expected to return:

  • 0 - for successful incoming check and
    • empty stdout for no incoming changes
    • non-empty stdout for incoming changes
  • non-zero status - for failed incoming check (with possible error output)

URI specifications

Just like opengrok-sync, opengrok-mirror also queries the web app for various properties, so if the web application is not listening on default host/port, the URI location has to be specified using the -U option.

Project matching

Multiple projects can share the same configuration using regular expressions as demonstrated with the jdk.* pattern in the above configuration. The patterns are matched from top to the bottom of the configuration file, first match wins.

Disabling project mirroring

The opengrok-stable project is marked as disabled. This means that the opengrok-mirror script will exit with special value of 2 that is interpreted by the opengrok-sync script to avoid any reindex. It is not treated as an error.

Ignoring repositories

Some repositories under the project are not meant to be synchronized (e. g. the remote does not exist anymore or it is a testing repository for tests in that project). opengrok-mirror can ignore them if you provide them in the ignored_repos list. This is a list of paths relative to the matched project (see project-matching) and supports filename glob expansion (see the example).

Ignoring errors

opengrok-mirror returns distinct codes that are interpreted by opengrok-sync. When a repository fails to sync, e.g. because there are uncommitted changes, opengrok-mirror returns 1 that signifies an error and opengrok-sync terminates the execution. To make it always return 0, the ignore_errors configuration property can be set both per project and on global configuration level. This setting is handy when using opengrok-sync with a project under development where uncommitted files are common occurrence.

Ignoring project completely

Sometimes, running opengrok-mirror on a project is undesirable. For that, set the project propery ignore to true and the opengrok-mirror will skip it and return success.

Batch mode

In batch mode, log messages will be written to a log file under the logdir directory specified in the configuration and rotated for each run, up to default count (8) or count specified using the --backupcount option.

Hooks

If pre and post mirroring hooks are specified, they are run before and after project synchronization. If any of the hooks fail, the program is immediately terminated. However, if the synchronization (that is run in between the hook scripts) fails, the post hook will be executed anyway. This is done so that the project is in sane state - usually the post hook which is used to apply extract source archives and apply patches. If the pre hook is used to clean up the extracted work and project synchronization failed, the project would be left barebone.

Timeouts

Both repository synchronization commands and hooks can have a timeout. By default there is no timeout, unless specified in the configuration file. There are global and per project timeouts, the latter overriding the former. For instance, in the above configuration file, the userland project overrides global hook timeout to 1 hour while inheriting the command timeout.

Overriding incoming check

The opengrok-mirror can be run with the -I option to perform a check whether there are any incoming changes from the parent repository. If there are no incoming changes, opengrok-mirror exits with return code of 2. This code is interpreted by the opengrok-sync program in a special way - it will skip subsequent processing for given project, avoiding running the indexer unnecessarily.

The incoming_check configuration property can be used to override. It can be set on global and per project level.

There is a special case: if the project being mirrored has not been indexed yet, the incoming check will be overridden. This is useful when adding a new project and running opengrok-sync that has opengrok-mirror -I in the configuration.

Strip outgoing changesets

The opengrok-mirror can be run with the --strip-outgoing option to check whether there are any outgoing changesets in repositories of given project(s) and strip these before synchronization of the repositories. If such changes are found and stripped, the project data (not source code) will be deleted so that the project can be reindexed from scratch.

This is handy when performing synchronization of a repository that often rewrites the history.

The strip_outgoing configuration property can be used to override. It can be set on global and per project level.

Disabled project handling

It is possible to configure a command to be called/executed for disabled projects. Like with opengrok-sync this supports both RESTful API calls as well as command execution. This allows for instance to tag the disabled projects with Messages so they are annotated in the UI (set the duration to be less than mirroring/syncing period to avoid duplicating messages).

The disabled command is configured globally and will vary based on project thanks to pattern substitution/append.

Any failures in disabled command processing are logged and do not change the overall result of the mirroring command.

Command examples:

API call:

disabled_command:
  call:
    uri: '%URL%/api/v1/messages'
    method: POST
    data:
      messageLevel: warning
      duration: PT1H
      tags: ['%PROJECT%']
      text: resync + reindex in progress
projects:
  foo:
    disabled: true
  bar:
    disabled: true
    disabled-reason: "bar is not active anymore"

With the above config, a Message will be sent to the OpenGrok web application that will in turn be visible in the user interface for particular project. For project foo, simple disabled project message will appear. For the bar project, message disabled project: bar is not active anymore message will appear. For more documentation on RESTful API for the web application see https://github.com/oracle/opengrok/wiki/Web-services

command exec:

disabled_command:                                                               
  command: [cat]
projects:
  foo:
    disabled: true