Skip to content
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ __pycache__/
Pipfile.lock
__pypackages__/
.env
.envcmd
.venv
env/
venv/
Expand Down
175 changes: 173 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,173 @@
# clarin-submission-python
DSpace python library to allow ingestion of metadata with files to create new submission
## clarin-submission-python
DSpace python library to allow ingestion of metadata with files to create new submission.

### Environment variables used by library
Example:
```
AUTHORIZATION_TOKEN = ey....0uYw
DSPACE_API_ENDPOINT = 'https://lindat.mff.cuni.cz/repository/server/api'
DSPACE_COLLECTION_ID = 34352d2c-3296-4448-aeb5-d18f2e179126
SUBMISSION_DEFINITION_NAME = traditional
```
Where:

**AUTHORIZATION_TOKEN** is the persistence token used for authentication. The token can be created by admin or by submitter

**DSPACE_API_ENDPOINT** is the base URL for DSpace Server API Endpoints

**DSPACE_COLLECTION_ID** is the collection UUID string, specifying the (parent) collection where the submission data will be stored

**SUBMISSION_DEFINITION_NAME** is the submission definition name used by the collection. The submission definition specifies the submission metadata form used in submission requests

All of these environment variables will be given to user by admin.
Note that every script allows to override these environment variables by providing the corresponding command-line arguments:
- **-t, --token** to override AUTHORIZATION_TOKEN
- **-e, --dspace-api-endpoint** to override DSPACE_API_ENDPOINT
- **-c, --collection-id** to override DSPACE_COLLECTION_ID
- **-d, --submission-definition-name** to override SUBMISSION_DEFINITION_NAME

### Create Submission Metadata Template file (CSV format)

```
python generate_submission_metadata_template.py [-h] [-m SUBMISSION_METADATA]
[-t TOKEN]
[-e DSPACE_API_ENDPOINT]
[-s SUBMISSION_DEFINITION_NAME]
[-r RESOURCE_TYPE]

Command-line arguments

options:
-h, --help show this help message and exit
-m, --submission-metadata SUBMISSION_METADATA
Template name for submission metadata, in CSV format
(optional). Default: submission.csv
-t, --token TOKEN Authorization token, or use the AUTHORIZATION_TOKEN
env variable
-e, --dspace-api-endpoint DSPACE_API_ENDPOINT
DSpace API Endpoint, or use the DSPACE_API_ENDPOINT
env variable
-d, --submission-definition-name SUBMISSION_DEFINITION_NAME
Submission Definition Name, or use the
SUBMISSION_DEFINITION_NAME env variable
-r, --resource-type RESOURCE_TYPE
Resource Type (optional), sample values: corpus
(default), lexicalConceptualResource,
languageDescription, toolService
```

Example:
```
python generate_submission_metadata_template.py -m /User/John/Submissions/submission-data.csv -r toolService
```
This command creates a **submission-data.csv** file, in **/Users/John/Submissions** directory, for submissions with **toolService** resource type.

### Submission Metadata Template file (CSV format)

File generated by previous command, can be completed by adding the metadata values for individual keys.

Example:
<pre>
__section__,traditionalpageone
dc.type,toolService
dc.title,<b>DEMO Submission</b>
dc.source.uri,
local.demo.uri,
dc.relation.isreferencedby,
dc.date.issued,
dc.publisher,
dc.contributor.author,<b>John Lizard,Mark Hagues,"Emily, Smith Lion"</b>
local.contact.person,
local.sponsor,
__section__,traditionalpagetwo
dc.description,
dc.language.iso,
dc.subject,
metashare.ResourceInfo#ContentInfo.detailedType,
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent,
__section__,specialFields
local.submission.note,
dc.relation.replaces,
</pre>

Note, here, only the following metadata values are defined: **dc.type, dc.title and dc.contributor.author** where
- **dc.type** (single) value is automatically defined during metadata template file generation
- **dc.title** (single) value is defined by user
- **dc.contributor.author** (list) values are defined by user

So, this way user may define any number of metadata values, where each metadata value can be set as a single value or a list of values.
The **\_\_section\_\_** lines define the individual submission sections, where the metadata belong to

### Submission Metadata + Files Upload (metadata in CSV format)

```
python upload_submission.py [-h] [-m SUBMISSION_METADATA] [-s SUBMISSION_ID]
[-f FILES [FILES ...]] [-t TOKEN]
[-e DSPACE_API_ENDPOINT] [-c COLLECTION_ID]

Command-line arguments

options:
-h, --help show this help message and exit
-m, --submission-metadata SUBMISSION_METADATA
Submission metadata file name, in CSV format
(optional). Default: submission.csv
-s, --submission-id SUBMISSION_ID
submission ID (optional), if provided, metadata +
files will be uploaded to this submission
-f, --files FILES [FILES ...]
Files to upload (optional)
-t, --token TOKEN Authorization token, or use the AUTHORIZATION_TOKEN
env variable
-e, --dspace-api-endpoint DSPACE_API_ENDPOINT
DSpace API, or use the DSPACE_API_ENDPOINT env
variable
-c, --collection-id COLLECTION_ID
DSpace Collection ID, or use the DSPACE_COLLECTION_ID
env variable
```

Example 1:
```
python upload_submission.py -m /User/John/Submissions/submission-data.csv -f articles.zip sample-video.mp4 logo.png
```

In this case new submission will be created with metadata defined in **submission-data.csv** file and
3 files (bitstreams): **articles.zip, sample-video.mp4, logo.png** will be uploaded to created submission.

Example 2:
```
python upload_submission.py -s 7401 -m /User/John/Submissions/new-submission-data.csv -f new-video.mp4
```

In this case existing submission with ID 7401 will be updated with metadata defined in **new-submission-data.csv**
file and one video file **new-video.mp4** will be uploaded to this submission.

### Submission File(s) upload

```
python upload_submission_files.py [-h] -s SUBMISSION_ID -f FILES [FILES ...]
[-t TOKEN] [-e DSPACE_API_ENDPOINT]

Command-line arguments

options:
-h, --help show this help message and exit
-s, --submission-id SUBMISSION_ID
submission ID (required)
-f, --files FILES [FILES ...]
Files to upload (required)
-t, --token TOKEN Authorization token (optional), r use the
AUTHORIZATION_TOKEN env variable
-e, --dspace-api-endpoint DSPACE_API_ENDPOINT
DSpace API Endpoint (optional), or use the
DSPACE_API_ENDPOINT env variable
```


Example:
```
python upload_submission_files.py -s 7401 -f new-logo.png examples.zip
```

In this case two files (bitstreams): **new-logo.png** and **examples.zip** will be uploaded to existing submission with ID 7401.
81 changes: 81 additions & 0 deletions generate_submission_metadata_template.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# This software is licenced under the BSD 3-Clause licence
# available at https://opensource.org/licenses/BSD-3-Clause
# and described in the LICENCE file in the root of this project

"""
Python 3 application for Submission Template Generation, using the dspace.py API client library.
"""
import argparse
import os

from rest_client.submission_client import SubmissionClient

# Example system variables needed for authentication and submission template generation
# (all of these variables can be overwritten with command line arguments)
# AUTHORIZATION_TOKEN=
# DSPACE_API_ENDPOINT=
# SUBMISSION_DEFINITION_NAME=

# Parse command-line arguments
parser = argparse.ArgumentParser(description="Command-line arguments")
parser.add_argument("-m", "--submission-metadata",
help="Template name for submission metadata, in CSV format (optional). Default: submission.csv")
parser.add_argument("-t", "--token",
help="Authorization token, or use the AUTHORIZATION_TOKEN env variable")
parser.add_argument("-e", "--dspace-api-endpoint",
help="DSpace API Endpoint, or use the DSPACE_API_ENDPOINT env variable")
parser.add_argument("-d", "--submission-definition-name",
help="Submission Definition Name, or use the SUBMISSION_DEFINITION_NAME env variable")
parser.add_argument("-r", "--resource-type",
help="Resource Type (optional), "
"sample values: corpus (default), lexicalConceptualResource, languageDescription, toolService")
args = parser.parse_args()

SUBMISSION_METADATA = 'submission.csv'
if args.submission_metadata:
SUBMISSION_METADATA = args.submission_metadata

AUTHORIZATION_TOKEN = None
if args.token:
AUTHORIZATION_TOKEN = args.token
elif 'AUTHORIZATION_TOKEN' in os.environ:
AUTHORIZATION_TOKEN = os.environ['AUTHORIZATION_TOKEN']

if AUTHORIZATION_TOKEN is None:
print('No authorization token provided!')
exit(1)

# SUBMISSION_DEFINITION_NAME = traditional
SUBMISSION_DEFINITION_NAME = None
if args.submission_definition_name:
SUBMISSION_DEFINITION_NAME = args.submission_definition_name
elif 'SUBMISSION_DEFINITION_NAME' in os.environ:
SUBMISSION_DEFINITION_NAME = os.environ['SUBMISSION_DEFINITION_NAME']

if SUBMISSION_DEFINITION_NAME is None:
print('No submission definition name provided!')
exit(1)

API_ENDPOINT = 'http://localhost:8080/server/api'
if args.dspace_api_endpoint:
API_ENDPOINT = args.dspace_api_endpoint
elif 'DSPACE_API_ENDPOINT' in os.environ:
API_ENDPOINT = os.environ['DSPACE_API_ENDPOINT']

RESOURCE_TYPE = 'corpus'
if args.resource_type:
RESOURCE_TYPE = args.resource_type

FILE_TYPE = 'csv'

d = SubmissionClient(api_endpoint=API_ENDPOINT, authorization_token=AUTHORIZATION_TOKEN)

# Authenticate against the DSpace client
authenticated = d.authenticate()
if not authenticated:
print('Error logging in! Giving up.')
exit(1)

# for now, only CSV templates are generated
if FILE_TYPE == 'csv':
d.generate_csv_template(SUBMISSION_METADATA, SUBMISSION_DEFINITION_NAME, RESOURCE_TYPE)
Loading