Plastron
Utility for batch operations on a Fedora 4 repository.
Installation
Requires Python 3.6+
**TODO**
add end-user instructions here once this is available via PyPI/pip
**TODO**
Installing Python 3 with pyenv (Optional)
If you don't already have a Python 3 environment, or would like to install Plastron into its own isolated environment, a very convenient way to do this is to use the pyenv Python version manager.
# install Python 3.7.0
pyenv install 3.7.0
# create a new virtual environment based on 3.7.0 for Plastron
pyenv virtualenv 3.7.0 plastron
# switch to that environment in your current shell
pyenv shell plastron
Installation for development
To install Plastron in development mode, do the following:
git clone git@github.com:umd-lib/plastron.git
cd plastron
pip install -e .
Common Options
$ plastron --help
usage: plastron [-h] (-r REPO | -V) [-v] [-q]
{ping,load,list,ls,mkcol,delete,del,rm,extractocr} ...
Batch operation tool for Fedora 4.
optional arguments:
-h, --help show this help message and exit
-r REPO, --repo REPO Path to repository configuration file.
-V, --version Print version and exit.
-v, --verbose increase the verbosity of the status output
-q, --quiet decrease the verbosity of the status output
commands:
{ping,load,list,ls,mkcol,delete,del,rm,extractocr}
Check version
$ plastron --version
2.1.0
Commands
All commands require you to specify a repository configuration file using
the -r or --repo option before the command name. For example,
plastron -r path/to/repo.yml ping.
Ping (ping)
$ plastron ping --help
usage: plastron ping [-h]
Check connection to the repository
optional arguments:
-h, --help show this help message and exit
Load (load)
$ plastron load --help
usage: plastron load [-h] -b BATCH [-d] [-n] [-l LIMIT] [-% PERCENT]
[--noannotations] [--ignore IGNORE] [--wait WAIT]
Load a batch into the repository
optional arguments:
-h, --help show this help message and exit
-d, --dryrun iterate over the batch without POSTing
-n, --nobinaries iterate without uploading binaries
-l LIMIT, --limit LIMIT
limit the load to a specified number of top-level
objects
-% PERCENT, --percent PERCENT
load specified percentage of total items
--noannotations iterate without loading annotations (e.g. OCR)
--ignore IGNORE, -i IGNORE
file listing items to ignore
--wait WAIT, -w WAIT wait n seconds between items
required arguments:
-b BATCH, --batch BATCH
path to batch configuration file
List (list, ls)
$ plastron list --help
usage: plastron list [-h] [-l] [-R RECURSIVE] [uris [uris ...]]
List objects in the repository
positional arguments:
uris URIs of repository objects to list
optional arguments:
-h, --help show this help message and exit
-l, --long Display additional information besides the URI
-R RECURSIVE, --recursive RECURSIVE
List additional objects found by traversing the given
predicate(s)
Create Collection (mkcol)
$ plastron mkcol --help
usage: plastron mkcol [-h] -n NAME [-b BATCH]
Create a PCDM Collection in the repository
optional arguments:
-h, --help show this help message and exit
-n NAME, --name NAME Name of the collection.
-b BATCH, --batch BATCH
Path to batch configuration file.
Delete (delete, del, rm)
$ plastron delete --help
usage: plastron delete [-h] [-R RECURSIVE] [-d] [-f FILE] [uris [uris ...]]
Delete objects from the repository
positional arguments:
uris Repository URIs to be deleted.
optional arguments:
-h, --help show this help message and exit
-R RECURSIVE, --recursive RECURSIVE
Delete additional objects found by traversing the
given predicate(s)
-d, --dryrun Simulate a delete without modifying the repository
-f FILE, --file FILE File containing a list of URIs to delete
Extract OCR (extractocr)
$ plastron extractocr --help
usage: plastron extractocr [-h] [--ignore IGNORE]
Create annotations from OCR data stored in the repository
optional arguments:
-h, --help show this help message and exit
--ignore IGNORE, -i IGNORE
file listing items to ignore
Configuration
Configuration Templates
Templates for creating the configuration files can be found at config/templates
Repository Configuration
The repository connection is configured in a YAML file and passed to plastron
with the -r or --repo option. These are the recognized configuration keys:
Required
| Option | Description |
|---|---|
REST_ENDPOINT |
Respository root URL |
RELPATH |
Path within repository to load objects to |
LOG_DIR |
Directory to write log files |
Client Certificate Authentication
| Option | Description |
|---|---|
CLIENT_CERT |
PEM-encoded client SSL cert for authentication |
CLIENT_KEY |
PEM-encoded client SSL key for authentication |
Password Authentication
| Option | Description |
|---|---|
FEDORA_USER |
Username for authentication |
FEDORA_PASSWORD |
Password for authentication |
Optional
| Option | Description |
|---|---|
SERVER_CERT |
Path to a PEM-encoded copy of the server's SSL certificate; only needed for servers using self-signed certs |
Batch Configuration
Required
| Option | Description |
|---|---|
BATCH_FILE |
The "main" file of the batch |
COLLECTION |
URI of the repository collection that the objects will be added to |
HANDLER |
The handler to use |
Optional
| Option | Description | Default |
|---|---|---|
ROOT_DIR |
The directory containing the batch configuration file | |
DATA_DIR |
Where to find the data files for the batch; relative to ROOT_DIR |
data |
LOG_DIR |
Where to write the mapfile, skipfile, and other logging info; relative to ROOT_DIR |
logs |
MAPFILE |
Where to store the record of completed items in this batch; relative to LOG_DIR |
mapfile.csv |
HANDLER_OPTIONS |
Any additional options required by the handler |
Note: The plastron.load.*.log files are currently written to the repository log directory, not to batch log directory.
Extending
Adding Commands
Commands are implemented as a package in plastron.commands.{cmd_name} that
contain, at a minimum, a class name Command. This class must have an __init__
method that takes an argparse subparsers object and creates and configures a
subparser to handle its specific command-line arguments. It must also have a
__call__ method that takes a pcdm.Repository object and an argparse.Namespace
object, and executes the actual command.
For a simple example, see the ping command, as implemented in
plastron/commands/ping.py:
from plastron.exceptions import FailureException
class Command:
def __init__(self, subparsers):
parser_ping = subparsers.add_parser('ping',
description='Check connection to the repository')
parser_ping.set_defaults(cmd_name='ping')
def __call__(self, fcrepo, args):
try:
fcrepo.test_connection()
except:
raise FailureException()The FailureException is caught by the plastron script and causes it to exit with
a status code of 1. Any KeyboardInterrupt exceptions (for instance, due to the
user pressing Ctrl+C) are also caught by the plastron script and cause
it to exit with a status code of 2.
License
See the LICENSE file for license rights and limitations (Apache 2.0).