# OAI-PMH 
Open Archives Initiative Protocol for Metadata Harvesting. OAI-PMH requires that Unique Identifiers are URIs, they have to look like urls..

## Model
![OAI-PMH model](./img/OAI-PMH.png)

## Glosary
**Resource** Original physical objects, and metadata records about this resources.

**Record** Different metadata schema. Metadata record

**Unique Identifier** Name or address for a resource or for a metadata record

**Repository** Collection of metadata records. OAI-PMHs the server has to respond to requests about metadata records.

**Harvester** Algorithm that makes those requests to a repository and collects the metadata from the request collected in the repository.

## Getting a record

In [49]:
import requests
from xml.etree import ElementTree
from xml.dom import minidom

req_list = [
    "https://api.figshare.com/v2/oai",
    "https://api.figshare.com/v2/oai?verb=ListRecords&metadataPrefix=oai_dc&until=2010-08-18T08:33:01Z"
]

response = requests.get(req_list[1])

tree = ElementTree.fromstring(response.content)
xmlstr = minidom.parseString(ElementTree.tostring(tree)).toprettyxml(indent="  ")

In [50]:
def pretty_print(current, parent=None, index=-1, depth=0):
    for i, node in enumerate(current):
        _pretty_print(node, current, i, depth + 1)
    if parent is not None:
        if index == 0:
            parent.text = '\n' + ('\t' * depth)
        else:
            parent[index - 1].tail = '\n' + ('\t' * depth)
        if index == len(parent) - 1:
            current.tail = '\n' + ('\t' * (depth - 1))

In [51]:
pretty_print(tree)

In [47]:
import xml.etree.ElementTree as ET
root = ET.fromstring('''<?xml version='1.0' encoding='utf-8'?>
<root>
    <data version="1"><data>76939</data>
</data><data version="2">
        <data>266720</data><newdata>3569</newdata>
    </data> <!--root[-1].tail-->
    <data version="3"> <!--addElement's text-->
<data>5431</data> <!--newData's tail-->
    </data> <!--addElement's tail-->
</root>
''')
_pretty_print(root)

tree = ET.ElementTree(root)
tree.write("pretty.xml")
with open("pretty.xml", 'r') as f:
    print(f.read())

## References
- [Understanding the Protocol for Metadata Harvesting of the Open Archives Initiative](https://librarytechnology.org/document/9944)
- [Open Archives Initiative Protocol for Metadata Harvesting](https://www.openarchives.org/pmh/)
- [Video: Metadata MOOC 5-12: Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)](https://www.youtube.com/watch?v=fpz4fzKvVTg&ab_channel=JeffreyPomerantz)