Skip to content

Commit

Permalink
Merge 6116af7 into 3de32a5
Browse files Browse the repository at this point in the history
  • Loading branch information
weaverba137 committed May 17, 2019
2 parents 3de32a5 + 6116af7 commit 2cee638
Show file tree
Hide file tree
Showing 9 changed files with 177 additions and 113 deletions.
7 changes: 6 additions & 1 deletion doc/changes.rst
Expand Up @@ -7,9 +7,14 @@ Release Notes

*This release drops support for Python 2.*

* Remove all Python 2 code. (PR `#8`_).
* Remove all Python 2 code (PR `#8`_).
* Support fine-grained exclusion in configuration files (PR `#10`_).
* Avoid commonly-used names for metadata in configuration files (PR `#10`_).
* Detect newer files on disk that map to older HPSS files (PR `#10`_).
* Allow top-level directories to contain only files (PR `#10`_).

.. _`#8`: https://github.com/weaverba137/hpsspy/pull/8
.. _`#10`: https://github.com/weaverba137/hpsspy/pull/10

0.4.2 (2019-01-29)
------------------
Expand Down
39 changes: 30 additions & 9 deletions doc/configuration.rst
Expand Up @@ -30,12 +30,12 @@ described below.
Metadata
++++++++

The configuration file should contain a top-level keyword ``"config"``.
The configuration file should contain a top-level keyword ``"__config__"``.
The value should itself be a :class:`dict`, containing some important
metadata::

{
"config": {
"__config__": {
"root": "/global/project/projectdirs/my_project",
"hpss_root": "/nersc/projects/my_project",
"physical_disks": ["my_project"]
Expand Down Expand Up @@ -65,17 +65,17 @@ Sections

Inside the root directory, as described above, there may be several top-level
directories. For the purposes of this documentation, these are called
"sections" or "releases". The terms are interchangable. Each section
"sections" or "releases". The terms are interchangeable. Each section
has configuration items that describe its structure::

{
"config": {
"__config__": {
"root": "/projects/my_project",
"hpss_root": "/hpss/projects/my_project",
"physical_disks": ["my_project"]
},
"data": {
"exclude": [],
"__exclude__": [],
"d1": {
"d1/batch/.*$": "d1/batch.tar",
"d1/([^/]+\\.txt)$": "d1/\\1",
Expand All @@ -89,14 +89,20 @@ The name of the section is passed on the command-line::

missing_from_hpss config.json data

This would read the data section above.
This would read the ``"data"`` section above.

Each section should have an ``"exclude"`` keyword, whose value is a list
Each section should have an ``"__exclude__"`` keyword, whose value is a list
of files to be ignored. In the example above, in order to ignore the file
``/projects/my_project/data/d1/README.html``, the ``"exclude"`` value
``/projects/my_project/data/d1/README.html``, the ``"__exclude__"`` value
would be ``["d1/README.html"]``. Note that this is relative to the
path ``/projects/my_project/data``, since ``"data"`` is the section being
processed.
processed. Generally, this should only be used for a handful of top-level
files, like README files. For more precise exclusion, see the ``"EXCLUDE"``
statement below.

In the special case where a section contains only files, and no
subdirectories, the special pseudo-subdirectory ``"__top__"`` can be
used to contain the configuration.

Mapping File Names to HPSS Archives
+++++++++++++++++++++++++++++++++++
Expand Down Expand Up @@ -129,6 +135,12 @@ In coding terms we describe a portion of a directory tree hierarchy
using regular expressions to match *files* in that portion. Then we map
files that match that regular expression to tape archive files.

Finally, it should be noted that the configuration of each section is
organized by subdirectory in order to speed up the process of mapping files
to backup files. Instead of looking through every possible configuration
of files, only the configurations in a subdirectory need to be considered
when examining files in that subdirectory.

Regular Expression Details
++++++++++++++++++++++++++

Expand All @@ -153,6 +165,15 @@ imposes some additional requirements, conventions and idioms:
and that command will be used to construct it.
- Any archive file *not* ending in ``.tar`` will simply be copied to
HPSS as is.
- The special string ``"EXCLUDE"`` can be used to prevent backups of
parts of a directory tree that might otherwise be archival. For example,
``"d1/data/preproc/.*$" : "EXCLUDE"`` would prevent the ``preproc``
directory from being backed up, even if other parts of ``d1/data``
were configured for backup.
- The special string ``"AUTOMATED"`` behaves the same way as ``"EXCLUDE"``,
but is a human-readable way to denote data sets that are backed up by
automation independently of :command:`missing_from_hpss`, as opposed
to not being backed up at all.
- When constructing an archive file, :command:`missing_from_hpss` will
obtain the directory it needs to archive from the name of the *archive*
file, not the regular expression itself. This is because regular
Expand Down
14 changes: 7 additions & 7 deletions hpsspy/data/desi.json
@@ -1,28 +1,28 @@
{
"config":{
"__config__":{
"root":"/global/project/projectdirs/desi",
"hpss_root":"/nersc/projects/desi",
"physical_disks":["desi"]
},
"datachallenge":{
"exclude":[],
"__exclude__":[],
"dc2":{
"dc2/batch/.*$":"dc2/batch.tar",
"dc2/([^/]+\\.txt)$":"dc2/\\1",
"dc2/templates/[^/]+$":"dc2/templates/templates_files.tar"
}
},
"imaging":{
"exclude":[]
"__exclude__":[]
},
"mocks":{
"exclude":[]
"__exclude__":[]
},
"release":{
"exclude":[]
"__exclude__":[]
},
"spectro":{
"exclude":[],
"__exclude__":[],
"data":{
},
"redux":{
Expand All @@ -34,6 +34,6 @@
}
},
"target":{
"exclude":[]
"__exclude__":[]
}
}
12 changes: 6 additions & 6 deletions hpsspy/data/sdss.json
@@ -1,11 +1,11 @@
{
"config":{
"__config__":{
"root":"/clusterfs/riemann/raid006",
"hpss_root":"/nersc/projects/boss",
"physical_disks":["raid006","raid000","raid005","raid007","raid008","raid2008","netapp"]
},
"dr8":{
"exclude":[
"__exclude__":[
"env/index.html",
"boss/photoObj/frames/301/index.html",
"sdss/spectro/data/README.html"
Expand Down Expand Up @@ -61,7 +61,7 @@
}
},
"dr9":{
"exclude":[
"__exclude__":[
"env/index.html"
],
"casload":{
Expand Down Expand Up @@ -107,7 +107,7 @@
}
},
"dr10":{
"exclude":[
"__exclude__":[
"env/index.html"
],
"casload":{
Expand Down Expand Up @@ -151,7 +151,7 @@
}
},
"dr11":{
"exclude":[
"__exclude__":[
"env/index.html",
"boss/lss/index.html"
],
Expand Down Expand Up @@ -196,7 +196,7 @@
}
},
"dr12":{
"exclude":[
"__exclude__":[
"env/index.html",
"boss/lss/index.html"
],
Expand Down

0 comments on commit 2cee638

Please sign in to comment.