Skip to content
Brian Tingle edited this page May 15, 2019 · 13 revisions

Deep Harvesting Protocol

The media json file is used for the "deep harvesting" of object content into media servers for images, audio, or video from DAMS such as the UCLDC Nuxeo Shared DAMS. Every object in Nuxeo that is harvested into Calisphere will have a -media.json file. Deep harvesting should be designed in a way that it is decoupled from Nuxeo and so that it is possible to incorporate other content sources, such as legacy OAC METS, and potentially local campus DAMS.

Filename

The file name will key off of the id: on the root content division.

File extension

id-media.json

Content division

example content division for a simple image object

{ 
  'href': 'full path to master image (tiff)',
  'id': 'globally unique identifier (for example nuxeo object id)',
  'metadata': { 'title': 'title of the object', '...': '...' },
  'dimensions': '1024:1024'
}

Default format is image, but other formats may be specified (video, audio or file)

{
  'format': 'video',
  'href': 'path to access transcoded mp4 file',
  'id': 'video id on streaming server',
  'metadata': { 'title': 'vacation video', '...': '...' },
  'dimensions': '1024:1024'
}

Objects with complex structure

For a complex object, include a 'structMap': property that has an array of content divisions as a value.

{
 'id': '...',
 'href': '...',
 'metadata': { '...': '...' },
 'structMap': [ {} , {} ],
}

The array will be a series of content divisions. No nesting of 'sturctMap': is supported in the first version of the protocol.

Content harvesting

When -media.json files are submitted for publication in Calisphere, a content harvesting process is run. The content in 'href': will be fetched, processed, and placed in a media server. For the default 'format': 'image' the 'href': property should point to a high resolution tiff file, it will be converted into JPEG2000 and loaded into a Loris-IIIF cluster. For audio and video the 'href': should refer to a .mp3 or .mp4 file that has been pre-transcoded for streaming from Amazon CloudFront. For file format media; the file is stored in an S3 bucket with the key id.

Indexing

in solr, add two new fields

  • structmap_url The URL to -media.json (if the url can be computed based on calisphere ID, then is this needed?)
  • structmap_text all the words from 'metadata': values in the JSON (not the keys).

Calisphere public display

The Calisphere website (and potentially other users of the solr API) will use the 'id': properties of content divisions to calculate the URLs needed to provide access to the files on the appropriate media server.

see also

[structMap meeting](structMap meeting)