# DOI Negotiation

## Alumno: Gerardo de Miguel González

## 0. Bibliotecas Python

Librerías que usaremos:

- requests
- json

::GMG::**Referencias**

- [Requests: HTTP for Humans<sup>TM</sup>](http://docs.python-requests.org/en/master/)
- [Python for Beginners: Using the Requests Library in Python](https://www.pythonforbeginners.com/requests/using-requests-in-python)
- [StackAbuse: The Python Requests Module](https://stackabuse.com/the-python-requests-module/)
- [wsSchools: Python JSON](https://www.w3schools.com/python/python_json.asp)
- [Real Python: Working with JSON data in Python](https://realpython.com/python-json/)
- [The Hitchhiker's Guide to Python: JSON](https://docs.python-guide.org/scenarios/json/)
- [Python Docs: JSON encoder and decoder](https://docs.python.org/3/library/json.html)

In [0]:
import requests
import json

## 1 Introduction

DOIs provide a persistent link to content. They identify many types of work, from journal articles to research data sets. Typically, someone interacting with DOIs will be a researcher, who will resolve DOIs found in scholarly references to content using a DOI resolver. Such researchers may not even realise they are using DOIs and a DOI resolver since they may follow links with embedded DOIs.

Yet DOIs can provide more than a permanent, indirect link to content. DOI registration agencies such as CrossRef, DataCite and mEDRA collect bibliographic metadata about the works they link to. This metadata can be retrieved from a DOI resolver too, using content negotiation to request a particular representation of the metadata.

For some DOIs content negotiation can be used to retrieve different representations of a work. For example, some DataCite DOIs identify data sets that may be available in a number of data formats and container formats.

::GMG::**References**

- [DOI FAQ](http://www.doi.org/faq.html)
- [DOI Registration Agencies](http://www.doi.org/registration_agencies.html) (i.e.[CrossRef](https://www.crossref.org/), [DataCite](https://www.datacite.org/), [mEDRA](https://www.medra.org/))

## 2 Redirection

The DOI resolver at doi.org will normally redirect a user to the resource location of a DOI. For example, the DOI "10.1126/science.169.3946.635" redirects to a landing page describing the article, "The Structure of Ordinary Water". Content negotiated requests to doi.org that ask for a content type which isn't "text/html" will be redirected to a metadata service hosted by the DOI's registration agency. CrossRef, DataCite and mEDRA support content negotiated DOIs via https://data.crossref.org, https://data.datacite.org and http://data.medra.org respectively.

::GMG::**Referencias**

- [DataCite Content Negotiation](https://www.datacite.org/content.html)
- [CrossCite DOI Content Negotiation](https://crosscite.org/docs.html)
- [mEDRA Content Negotiation](http://data.medra.org/)

### Ejemplo de petición GET con text/html

---


<div class="alert alert-warning" role="alert" style="margin: 10px">

       GET "Accept: text/html"
https://doi.org/10.1126/science.169.3946.635<br>

                   |<br>
                   |<br>
                   |<br>
                   V<br>
<br>
       Publisher landing page
https://www.sciencemag.org/content/169/3946/635
</div>

Normal browser requests or explicit requests for text/html redirect to the content's landing page.

---

::GMG::**nota**: he probado con un cliente REST de Firefox [RESTED](https://addons.mozilla.org/en-US/firefox/addon/rested/) a hacer esa petición:

[ver captura](https://drive.google.com/file/d/16C75-ZVATCfArkYs59Eg76Vq4JCDnDkS/view?usp=sharing)

---

### Ejemplo con petición GET application/rdf+xml



<div class="alert alert-warning" role="alert" style="margin: 10px">

             GET "Accept: application/rdf+xml"
https://doi.org/10.1126/science.169.3946.635<br>

                   |<br>
                   |<br>
                   |<br>
                   V<br>
<br>
CrossRef metadata service
http://data.crossref.org/10.1126/science.169.3946.635
</div>

Requests for a data type redirect to a registration agency's metadata service.

---

::GMG::**nota**: mi prueba con RESTED:

[ver captura](https://drive.google.com/open?id=1itIel2TzZFY3WJqIGPpeP9A07WJfKDmZ)

---


::GMG::**nota**: este es el documento devuelto:

```xml
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://purl.org/dc/terms/"
    xmlns:j.1="http://prismstandard.org/namespaces/basic/2.1/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:j.2="http://purl.org/ontology/bibo/"
    xmlns:j.3="http://xmlns.com/foaf/0.1/">
  <rdf:Description rdf:about="http://dx.doi.org/10.1126/science.169.3946.635">
    <j.1:startingPage>635</j.1:startingPage>
    <owl:sameAs rdf:resource="doi:10.1126/science.169.3946.635"/>
    <owl:sameAs rdf:resource="info:doi/10.1126/science.169.3946.635"/>
    <j.0:identifier>10.1126/science.169.3946.635</j.0:identifier>
    <j.0:publisher>American Association for the Advancement of Science (AAAS)</j.0:publisher>
    <j.0:creator>
      <j.3:Person rdf:about="http://id.crossref.org/contributor/h-s-frank-3new7r2ulpnaj">
        <j.3:name>H. S. Frank</j.3:name>
        <j.3:familyName>Frank</j.3:familyName>
        <j.3:givenName>H. S.</j.3:givenName>
      </j.3:Person>
    </j.0:creator>
    <j.1:doi>10.1126/science.169.3946.635</j.1:doi>
    <j.2:pageEnd>641</j.2:pageEnd>
    <j.2:doi>10.1126/science.169.3946.635</j.2:doi>
    <j.2:volume>169</j.2:volume>
    <j.0:isPartOf>
      <j.2:Journal rdf:about="http://id.crossref.org/issn/0036-8075">
        <j.1:issn>1095-9203</j.1:issn>
        <j.2:issn>1095-9203</j.2:issn>
        <owl:sameAs>urn:issn:1095-9203</owl:sameAs>
        <owl:sameAs>urn:issn:0036-8075</owl:sameAs>
        <j.0:title>Science</j.0:title>
        <j.1:issn>0036-8075</j.1:issn>
        <j.2:issn>0036-8075</j.2:issn>
      </j.2:Journal>
    </j.0:isPartOf>
    <j.0:title>The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance</j.0:title>
    <j.1:endingPage>641</j.1:endingPage>
    <j.0:date rdf:datatype="http://www.w3.org/2001/XMLSchema#date"
    >1970-08-14</j.0:date>
    <j.1:volume>169</j.1:volume>
    <j.2:pageStart>635</j.2:pageStart>
  </rdf:Description>
</rdf:RDF>
```

### ::GMG::Prueba propia con cabecera y [tipo de contenido JSON](https://stackoverflow.com/questions/477816/what-is-the-correct-json-content-type)


[ver captura](https://drive.google.com/open?id=1jap_LIw4a0onW-d1dvaRaVYURDIEpgqN) de RESTED

Documento devuelto:

```json
{
  "indexed": {
    "date-parts": [
      [
        2018,
        10,
        12
      ]
    ],
    "date-time": "2018-10-12T20:02:40Z",
    "timestamp": 1539374560300
  },
  "reference-count": 0,
  "publisher": "American Association for the Advancement of Science (AAAS)",
  "issue": "3946",
  "content-domain": {
    "domain": [],
    "crossmark-restriction": false
  },
  "published-print": {
    "date-parts": [
      [
        1970,
        8,
        14
      ]
    ]
  },
  "DOI": "10.1126/science.169.3946.635",
  "type": "article-journal",
  "created": {
    "date-parts": [
      [
        2006,
        10,
        5
      ]
    ],
    "date-time": "2006-10-05T12:56:56Z",
    "timestamp": 1160053016000
  },
  "page": "635-641",
  "source": "Crossref",
  "is-referenced-by-count": 57,
  "title": "The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance",
  "prefix": "10.1126",
  "volume": "169",
  "author": [
    {
      "given": "H. S.",
      "family": "Frank",
      "sequence": "first",
      "affiliation": []
    }
  ],
  "member": "221",
  "container-title": "Science",
  "original-title": [],
  "language": "en",
  "link": [
    {
      "URL": "https://syndication.highwire.org/content/doi/10.1126/science.169.3946.635",
      "content-type": "unspecified",
      "content-version": "vor",
      "intended-application": "similarity-checking"
    }
  ],
  "deposited": {
    "date-parts": [
      [
        2016,
        12,
        23
      ]
    ],
    "date-time": "2016-12-23T19:54:07Z",
    "timestamp": 1482522847000
  },
  "score": 1,
  "subtitle": [],
  "short-title": [],
  "issued": {
    "date-parts": [
      [
        1970,
        8,
        14
      ]
    ]
  },
  "references-count": 0,
  "journal-issue": {
    "published-print": {
      "date-parts": [
        [
          1970,
          8,
          14
        ]
      ]
    },
    "issue": "3946"
  },
  "URL": "http://dx.doi.org/10.1126/science.169.3946.635",
  "relation": {},
  "ISSN": [
    "0036-8075",
    "1095-9203"
  ],
  "container-title-short": "Science"
}
```

## 3 What is Content Negotiation?

Content negotiation allows a user to request a particular representation of a web resource. DOI resolvers use content negotiation to provide different representations of metadata associated with DOIs.

A content negotiated request to a DOI resolver is much like a standard HTTP request, except server-driven negotiation will take place based on the list of acceptable content types a client provides.

### The Accept Header

Making a content negotiated request requires the use of a HTTP header, "Accept". Content types that are acceptable to the client (those that it knows how to parse), each with an optional "quality" value indicating its relative suitability. For example, a client that wishes to receive citeproc JSON if it is available, but which can also handle RDF XML if citeproc JSON is unavailable, would make a request with an Accept header listing both "application/citeproc+json" and "application/rdf+xml":

In [0]:
url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'application/rdf+xml;q=0.5'} #Type of response accpeted
r = requests.post(url, headers=headers) #POST with headers
print(r.status_code)
print(r.text)


200
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://purl.org/dc/terms/"
    xmlns:j.1="http://prismstandard.org/namespaces/basic/2.1/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:j.2="http://purl.org/ontology/bibo/"
    xmlns:j.3="http://xmlns.com/foaf/0.1/">
  <rdf:Description rdf:about="http://dx.doi.org/10.1126/science.169.3946.635">
    <j.1:startingPage>635</j.1:startingPage>
    <owl:sameAs rdf:resource="doi:10.1126/science.169.3946.635"/>
    <owl:sameAs rdf:resource="info:doi/10.1126/science.169.3946.635"/>
    <j.0:identifier>10.1126/science.169.3946.635</j.0:identifier>
    <j.0:publisher>American Association for the Advancement of Science (AAAS)</j.0:publisher>
    <j.0:creator>
      <j.3:Person rdf:about="http://id.crossref.org/contributor/h-s-frank-3new7r2ulpnaj">
        <j.3:name>H. S. Frank</j.3:name>
        <j.3:familyName>Frank</j.3:familyName>
        <j.3:givenName>H. S.</j.3:givenName>
      </j.3:Person>
  

::GMG::**nota**: la petición que ho hice en el apartado anteror con RESTED la hice con `GET` en vez de `POST` y me dio el mismo resultado. Y he conprobado que en `Python` es __igual__:

In [0]:
url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'application/rdf+xml;q=0.5'} #Type of response accpeted
r = requests.get(url, headers=headers) #POST with headers
print(r.status_code)
print(r.text)


200
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://purl.org/dc/terms/"
    xmlns:j.1="http://prismstandard.org/namespaces/basic/2.1/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:j.2="http://purl.org/ontology/bibo/"
    xmlns:j.3="http://xmlns.com/foaf/0.1/">
  <rdf:Description rdf:about="http://dx.doi.org/10.1126/science.169.3946.635">
    <j.1:startingPage>635</j.1:startingPage>
    <owl:sameAs rdf:resource="doi:10.1126/science.169.3946.635"/>
    <owl:sameAs rdf:resource="info:doi/10.1126/science.169.3946.635"/>
    <j.0:identifier>10.1126/science.169.3946.635</j.0:identifier>
    <j.0:publisher>American Association for the Advancement of Science (AAAS)</j.0:publisher>
    <j.0:creator>
      <j.3:Person rdf:about="http://id.crossref.org/contributor/h-s-frank-3new7r2ulpnaj">
        <j.3:name>H. S. Frank</j.3:name>
        <j.3:familyName>Frank</j.3:familyName>
        <j.3:givenName>H. S.</j.3:givenName>
      </j.3:Person>
  

### Ejemplo JSON

 When exchanging data between a browser and a server, the data can only be text.

JSON is text, and we can convert any JavaScript object into JSON, and send JSON to the server.

We can also convert any JSON received from the server into JavaScript objects.

This way we can work with the data as JavaScript objects, with no complicated parsing and translations.

Different python libraries are oriented to manage JSON objects or files, and the information can be parsed easily. From the previous request, we can get the answer in JSON format and store it in a variable:

In [0]:
url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'application/vnd.citationstyles.csl+json;q=1.0'} #Type of response accpeted
r = requests.post(url, headers=headers) #POST with headers
print("Status code: %s" % r.status_code) #200 means that the resource exists

Status code: 200


::GMG::**nota**: en mi petición con RESTED utilicé un `content type` más simple que el que se acaba de poner, i.e. `application/json`.  Podemos comparar respuestas ([1](https://stackoverflow.com/questions/16877422/whats-the-best-way-to-parse-a-json-response-from-the-requests-library),[2](https://stackoverflow.com/questions/23718896/pretty-print-json-in-python-pythonic-way))...

In [0]:
json_data = json.loads(r.text)
print(json.dumps(json_data, indent=2))

{
  "indexed": {
    "date-parts": [
      [
        2018,
        10,
        12
      ]
    ],
    "date-time": "2018-10-12T20:02:40Z",
    "timestamp": 1539374560300
  },
  "reference-count": 0,
  "publisher": "American Association for the Advancement of Science (AAAS)",
  "issue": "3946",
  "content-domain": {
    "domain": [],
    "crossmark-restriction": false
  },
  "published-print": {
    "date-parts": [
      [
        1970,
        8,
        14
      ]
    ]
  },
  "DOI": "10.1126/science.169.3946.635",
  "type": "article-journal",
  "created": {
    "date-parts": [
      [
        2006,
        10,
        5
      ]
    ],
    "date-time": "2006-10-05T12:56:56Z",
    "timestamp": 1160053016000
  },
  "page": "635-641",
  "source": "Crossref",
  "is-referenced-by-count": 57,
  "title": "The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance",
  "prefix": "10.1126",
  "volume": "169",
  "author": [
    {
      

### Response Codes

Code	Meaning<br>
200	The request was OK.<br>
204	The request was OK but there was no metadata available.<br>
404	The DOI requested doesn't exist.<br>
406	Can't serve any requested content type.<br>

Individual metadata services may utilise additional response codes but they will always use the response codes above in event of the case described.

If multiple content types specified by the client are supported by a DOI then the content type with the highest "q" value (or, if no "q" values are specified, the one that appears first in the "accept" header) will be returned.



After ask for a json response, if we get a 200, we can transform that received text into JSON

In [0]:
data = json.loads(r.text) #Data is now a json object
print(data)

{'indexed': {'date-parts': [[2018, 10, 12]], 'date-time': '2018-10-12T20:02:40Z', 'timestamp': 1539374560300}, 'reference-count': 0, 'publisher': 'American Association for the Advancement of Science (AAAS)', 'issue': '3946', 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'published-print': {'date-parts': [[1970, 8, 14]]}, 'DOI': '10.1126/science.169.3946.635', 'type': 'article-journal', 'created': {'date-parts': [[2006, 10, 5]], 'date-time': '2006-10-05T12:56:56Z', 'timestamp': 1160053016000}, 'page': '635-641', 'source': 'Crossref', 'is-referenced-by-count': 57, 'title': 'The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance', 'prefix': '10.1126', 'volume': '169', 'author': [{'given': 'H. S.', 'family': 'Frank', 'sequence': 'first', 'affiliation': []}], 'member': '221', 'container-title': 'Science', 'original-title': [], 'language': 'en', 'link': [{'URL': 'https://syndication.highwire.org/content/doi/

In [0]:
#::GMG::Es un diccionario lo que hemos obtenido al convertir el texto json
#       a un objeto Python
type(data)

dict

In [0]:
#::GMG::Se puede mejorar un poco la presentación ...
# https://stackoverflow.com/questions/23718896/pretty-print-json-in-python-pythonic-way
from pprint import pprint
pprint(data)

{'DOI': '10.1126/science.169.3946.635',
 'ISSN': ['0036-8075', '1095-9203'],
 'URL': 'http://dx.doi.org/10.1126/science.169.3946.635',
 'author': [{'affiliation': [],
             'family': 'Frank',
             'given': 'H. S.',
             'sequence': 'first'}],
 'container-title': 'Science',
 'container-title-short': 'Science',
 'content-domain': {'crossmark-restriction': False, 'domain': []},
 'created': {'date-parts': [[2006, 10, 5]],
             'date-time': '2006-10-05T12:56:56Z',
             'timestamp': 1160053016000},
 'deposited': {'date-parts': [[2016, 12, 23]],
               'date-time': '2016-12-23T19:54:07Z',
               'timestamp': 1482522847000},
 'indexed': {'date-parts': [[2018, 10, 12]],
             'date-time': '2018-10-12T20:02:40Z',
             'timestamp': 1539374560300},
 'is-referenced-by-count': 57,
 'issue': '3946',
 'issued': {'date-parts': [[1970, 8, 14]]},
 'journal-issue': {'issue': '3946',
                   'published-print': {'date-parts': [

In order to know the different elements in the JSON, we can run a loop:

In [0]:
for elem in data:
    print(elem)

indexed
reference-count
publisher
issue
content-domain
published-print
DOI
type
created
page
source
is-referenced-by-count
title
prefix
volume
author
member
container-title
original-title
language
link
deposited
score
subtitle
short-title
issued
references-count
journal-issue
URL
relation
ISSN
container-title-short


For getting the value, it works like "dictionary" in python (Key-Value)

In [0]:
data['URL']

'http://dx.doi.org/10.1126/science.169.3946.635'

You can also print both keys and values to know the JSON structure

::GMG::**nota**: por que _es_ un `dict` :)

In [0]:
for elem in data:
    print(elem,":", data[elem])

indexed : {'date-parts': [[2018, 10, 12]], 'date-time': '2018-10-12T20:02:40Z', 'timestamp': 1539374560300}
reference-count : 0
publisher : American Association for the Advancement of Science (AAAS)
issue : 3946
content-domain : {'domain': [], 'crossmark-restriction': False}
published-print : {'date-parts': [[1970, 8, 14]]}
DOI : 10.1126/science.169.3946.635
type : article-journal
created : {'date-parts': [[2006, 10, 5]], 'date-time': '2006-10-05T12:56:56Z', 'timestamp': 1160053016000}
page : 635-641
source : Crossref
is-referenced-by-count : 57
title : The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance
prefix : 10.1126
volume : 169
author : [{'given': 'H. S.', 'family': 'Frank', 'sequence': 'first', 'affiliation': []}]
member : 221
container-title : Science
original-title : []
language : en
link : [{'URL': 'https://syndication.highwire.org/content/doi/10.1126/science.169.3946.635', 'content-type': 'unspecified', 'con

## 4 Formatted Citations

[CrossRef](https://www.crossref.org/), [DataCite](https://www.datacite.org/) and [mEDRA](https://www.medra.org/) support *formatted citations* via the `text/x-bibliography` content type. 

::GMG::**nota**: ver por ejemplo [en Crossref](https://www.crossref.org/labs/citation-formatting-service/)

These are the output of the [Citation Style Language](https://citationstyles.org/2010/03/22/citation-style-language-1-0/) ([developers](https://citationstyles.org/developers/)) processor, **`citeproc-js`** (see [documentation](https://citeproc-js.readthedocs.io/en/latest/)). 

The content type can take two additional parameters to customise its response format. A "style" can be chosen from the list of style names found in the *CSL style repository* ([?](https://www.zotero.org/styles), [?](https://csl.mendeley.com/about/)). Many styles are supported, including common styles such as [apa](https://www.mendeley.com/guides/apa-citation-guide) and [harvard](https://www.mendeley.com/guides/harvard-citation-guide):

In [0]:
url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'text/x-bibliography; style=bibtex'} #Type of response accpeted
r = requests.post(url, headers=headers) #POST with headers
print(r.status_code)
print(r.text)

200
 @article{Frank_1970, title={The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance}, volume={169}, ISSN={1095-9203}, url={http://dx.doi.org/10.1126/science.169.3946.635}, DOI={10.1126/science.169.3946.635}, number={3946}, journal={Science}, publisher={American Association for the Advancement of Science (AAAS)}, author={Frank, H. S.}, year={1970}, month={Aug}, pages={635â641}}



::GMG::**nota**: en el código anterior se usa [Bibtex](http://www.bibtex.org/Using/). La parte de *pages* no se visualiza bien en la celda de Jupyter ... ¿tal vez sean [special symbols](http://www.bibtex.org/SpecialSymbols/)?

### Ejemplo Zenodo (DOI)

Let's try with a DOI at Zenodo

::GMG::**nota**: no entiendo muy bien el `content type` que se pone ...



In [0]:
url = "https://doi.org/10.5281/zenodo.842715" #DOI solver URL
headers = {'Accept': 'application/vnd.citationstyles.csl+json;q=1.0'} #Type of response accpeted
r = requests.post(url, headers=headers) #POST with headers
print(r.status_code)
print(r.text)

200
{
  "type": "dataset",
  "id": "https://doi.org/10.5281/zenodo.842715",
  "categories": [
    "Cuerda del Pozo",
    "Reservoir",
    "Freshwater",
    "Water Quality",
    "AMT",
    "beginDate:'2010-01-01'",
    "endDate:'2010-12-31'",
    "location:'CdP'",
    "attributeLabel:'Temp'",
    "attributeLabel:'Press'",
    "attributeLabel:'Cond'",
    "attributeLabel:'Salinity'",
    "attributeLabel:'DO'",
    "attributeLabel:'rawO2'",
    "attributeLabel:'OxySat'",
    "attributeLabel:'ph'",
    "attributeLabel:'redox'",
    "gnd:2010-01-01"
  ],
  "author": [
    {
      "family": "Aguilar",
      "given": "Fernando"
    },
    {
      "family": "Marco",
      "given": "Jesús"
    },
    {
      "family": "Monteoliva",
      "given": "Agustín"
    }
  ],
  "issued": {
    "date-parts": [
      [
        2017,
        8,
        14
      ]
    ]
  },
  "abstract": "AMT data from Cuerda del Pozo Reservoir in 2010. It includes: Temperature, Pressure, Conductivity, Dissolved Oxygen, ra

::GMG::**nota**:  Estas son las cabeceras de la *respuesta* del servidor según me aparecen en RESTED:

```txt
date: Mon, 03 Dec 2018 19:14:43 GMT
content-type: application/vnd.citationstyles.csl+json; charset=utf-8
status: 200 OK
cache-control: max-age=0, private, must-revalidate
access-control-allow-origin: *
vary: Accept-Encoding, Origin
content-encoding: gzip
access-control-max-age: 1728000
x-request-id: 17d30ca8-fcbe-456c-ab60-be529d35e925
accept: application/vnd.citationstyles.csl+json
access-control-allow-methods: GET, POST, PUT, PATCH, DELETE, OPTIONS, HEAD
etag: W/"c24cdef43f0fadbd8cd938d70b1caa6c"
x-runtime: 0.103430
x-powered-by: Phusion Passenger 5.3.7
server: nginx/1.14.0 + Phusion Passenger 5.3.7
X-Firefox-Spdy: h2
```

## ::GMG::Exercise 1

Show title and description.

In [0]:
#::GMG::The starting point is a plain text response from the server encoded
#       in json, then I import them as a Python native dict object
json_zenodo = json.loads(r.text)
print('Type: {}\n\nMetadata: \n{}'
      .format(type(json_zenodo), json.dumps(json_zenodo, indent=2))
     )

Type: <class 'dict'>

Metadata: 
{
  "type": "dataset",
  "id": "https://doi.org/10.5281/zenodo.842715",
  "categories": [
    "Cuerda del Pozo",
    "Reservoir",
    "Freshwater",
    "Water Quality",
    "AMT",
    "beginDate:'2010-01-01'",
    "endDate:'2010-12-31'",
    "location:'CdP'",
    "attributeLabel:'Temp'",
    "attributeLabel:'Press'",
    "attributeLabel:'Cond'",
    "attributeLabel:'Salinity'",
    "attributeLabel:'DO'",
    "attributeLabel:'rawO2'",
    "attributeLabel:'OxySat'",
    "attributeLabel:'ph'",
    "attributeLabel:'redox'",
    "gnd:2010-01-01"
  ],
  "author": [
    {
      "family": "Aguilar",
      "given": "Fernando"
    },
    {
      "family": "Marco",
      "given": "Jes\u00fas"
    },
    {
      "family": "Monteoliva",
      "given": "Agust\u00edn"
    }
  ],
  "issued": {
    "date-parts": [
      [
        2017,
        8,
        14
      ]
    ]
  },
  "abstract": "AMT data from Cuerda del Pozo Reservoir in 2010. It includes: Temperature, Press

In [0]:
#::GMG::I iterate over the dict keys to see what I'm goint to single out
# https://stackoverflow.com/questions/17793364/python-iterate-dictionary-by-index
#for elem in data:
#    print(elem)
for key in json_zenodo.keys():
    print ('{}: {}'.format(key, json_zenodo[key]))

type: dataset
id: https://doi.org/10.5281/zenodo.842715
categories: ['Cuerda del Pozo', 'Reservoir', 'Freshwater', 'Water Quality', 'AMT', "beginDate:'2010-01-01'", "endDate:'2010-12-31'", "location:'CdP'", "attributeLabel:'Temp'", "attributeLabel:'Press'", "attributeLabel:'Cond'", "attributeLabel:'Salinity'", "attributeLabel:'DO'", "attributeLabel:'rawO2'", "attributeLabel:'OxySat'", "attributeLabel:'ph'", "attributeLabel:'redox'", 'gnd:2010-01-01']
author: [{'family': 'Aguilar', 'given': 'Fernando'}, {'family': 'Marco', 'given': 'Jesús'}, {'family': 'Monteoliva', 'given': 'Agustín'}]
issued: {'date-parts': [[2017, 8, 14]]}
abstract: AMT data from Cuerda del Pozo Reservoir in 2010. It includes: Temperature, Pressure, Conductivity, Dissolved Oxygen, raw O2, Oxygen saturation, ph and redex values.
DOI: 10.5281/zenodo.842715
publisher: Zenodo
title: Amt Cuerda Del Pozo 2010
URL: https://zenodo.org/record/842715


In [0]:
#::GMG::I've been asked to print out 'title'and 'description'.
#       I take the latter meaning 'abstract' as there is no 
#       'description' key to be seen :)
print('Dataset:\nTitle: {} \nDescription (a.k.a. Abstract): \n{}'.format(json_zenodo['title'],json_zenodo['abstract']))


Dataset:
Title: Amt Cuerda Del Pozo 2010 
Description (a.k.a. Abstract): 
AMT data from Cuerda del Pozo Reservoir in 2010. It includes: Temperature, Pressure, Conductivity, Dissolved Oxygen, raw O2, Oxygen saturation, ph and redex values.


## 5. Solving PIDs

### **Handle HTTP GET request**

By default, [handle](http://www.handle.net/) redirects you to the URL field in PID

In [0]:
url = "http://hdl.handle.net/1895.22/1013" #PID solver URL
r = requests.get(url) #GET
print(r.status_code)
print(r.text)

200
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<META NAME="Generator" CONTENT="Microsoft Word 97">
<TITLE>License for PYTHON 1.6.1</TITLE>
</HEAD>
<BODY LINK="#0000ff" VLINK="#800080">

<FONT FACE="Century Schoolbook">
<P>&nbsp;</P>
<P ALIGN="CENTER">PYTHON 1.6.1</P>
<P ALIGN="CENTER">PYTHON 1.6.1 LICENSE AGREEMENT<BR>
</P>
<P>1. This LICENSE AGREEMENT is between the Corporation for National Research Initiatives, having an office at 1895 Preston White Drive, Reston, VA 20191 ("CNRI"), and the Individual or Organization ("Licensee") accessing and otherwise using Python 1.6.1 software in source or binary form and its associated documentation.</P>

<P>2. Subject to the terms and conditions of this License Agreement, CNRI hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 1.6.1 alone or in any derivat

::GMG::**note**: [screen capture](https://drive.google.com/open?id=1sPRUL938dXh8q-G84kVxwJ6CQACaSIrY) of firefox [RESTED](https://addons.mozilla.org/en-US/firefox/addon/rested/)..

### Noredirect Query Parameter

The handle System has different options that we can manage:

http://www.handle.net/proxy_servlet.html

For example, we can tell the server not to redirect to URL field:

In [0]:
url = "http://hdl.handle.net/1895.22/1013?noredirect" #PID URL with ?noredirect
r = requests.get(url) #POST with headers
print(r.status_code)
print(r.text)

200
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html><head><title>Handle Proxy</title></head>

<body bgcolor="#ffffff">

<a href="http://www.handle.net">
<img src="/static/images/res_tool.gif" width="270" height="40" alt="Handle.net Logo" border=0></a>

<table width="100%">
<tbody>
<tr><th colspan="4" align="left" bgcolor="#dddddd">Handle Values for: 1895.22/1013</th></tr>
<tr><td align="left" valign="top">Index</td><td align="left" valign="top">Type</td><td align="left" valign="top">Timestamp</td><td align="left" valign="top">Data</td></tr>
<tr bgcolor="#dddddd"><td align="left" valign="top"><b>100</b></td><td align="left" valign="top"><b><a href="http://hdl.handle.net/0.TYPE/HS_ADMIN">HS_ADMIN</a></b></td><td valign="top"><span style='white-space:nowrap'>2015-04-03&nbsp;21:20:10Z</span></td>
<td>handle=200/1; index=300; [create hdl,delete hdl,create derived prefix,delete derived prefix,read val,modify val,del val,add val,modify admin,del admin,add admin,list]</t

::GMG::**note**: [screen capture](https://drive.google.com/open?id=1m-_UzZR8GjEOPm5YqhzQBslGroU4ExW4) of firefox [RESTED](https://addons.mozilla.org/en-US/firefox/addon/rested/).

### Query Parameters

This proxy server system REST API is [CORS-compliant,](https://en.wikipedia.org/wiki/Cross-origin_resource_sharing) however, [JSONP callbacks](https://en.wikipedia.org/wiki/JSONP) are also supported using a "callback" query parameter.

The presence of the "pretty" query parameter instructs the server to pretty-print the JSON output.

The "auth" query parameter instructs the proxy server to bypass its cache and query a primary handle server directly for the newest handle data.

The "cert" query parameter instructs the proxy server to request an authenticated response from the source handle server. Not generally needed by end users.

The "type" and "index" query parameters allow the resolution response to be restricted to specific types and indexes of interest. "Type" is the key defined by the user to store a metadata term. "Index" is a number associated to that term. Multiple "type" and "index" parameters are allowed and values are returned which match any of the specified types or indexes. For example,

For example, http://hdl.handle.net/api/handles/4263537/4000?type=URL&type=EMAIL&callback=processResponse yields the response

```JSON
processResponse({
   "responseCode":1,
   "handle":"4263537/4000",
   "values":[
      {
         "index":1,
         "type":"URL",
         "data":{ "format":"string", "value":"http://www.handle.net/index.html" },
         "ttl":86400,
         "timestamp":"2001-11-21T16:21:35Z"
      },
      {
         "index":2,
         "type":"EMAIL",
         "data":{ "format":"string", "value":"hdladmin@cnri.reston.va.us" },
         "ttl":86400,
         "timestamp":"2000-04-10T22:41:46Z"
      }
   ]
});
```

::GMG::**note**: [screen capture](https://drive.google.com/open?id=1HR4kwwkXoBov68n8LQQHm5LRyLBBIuzZ) from firefox [RESTED](https://addons.mozilla.org/en-US/firefox/addon/rested/).

<div class="alert alert-warning" role="alert" style="margin: 10px">
Recuerda!<br>
Si no indicas el Content-type, el servidor actuará como si recibiera una petición por navegador, devolviendo un html
</div>

In [0]:
url = "http://hdl.handle.net/api/handles/4263537/4000?type=URL&type=EMAIL&callback=processResponse" #PID URL with ?noredirect
headers = {'Content-Type': 'application/json'} #Type of response accpeted
r = requests.get(url, headers=headers) #POST with headers
print(r.text) 


processResponse({"responseCode":1,"handle":"4263537/4000","values":[{"index":1,"type":"URL","data":{"format":"string","value":"http://www.handle.net/index.html"},"ttl":86400,"timestamp":"2015-04-03T21:20:22Z"},{"index":2,"type":"EMAIL","data":{"format":"string","value":"hdladmin@cnri.reston.va.us"},"ttl":86400,"timestamp":"2015-04-03T21:20:22Z"}]});


### ::GMG::REST API

The handle proxy REST API allows programmatic access to handle resolution using HTTP.

Example Request/Response

A REST API request can be made by performing a standard HTTP GET of

```
/api/handles/<handle>
```

The API returns JSON.

For example, 

http://hdl.handle.net/api/handles/1895.22/1013

yields the response

```json
{
  "responseCode": 1,
  "handle": "1895.22/1013",
  "values": [
    {
      "index": 100,
      "type": "HS_ADMIN",
      "data": {
        "format": "admin",
        "value": {
          "handle": "200/1",
          "index": 300,
          "permissions": "111111111111",
          "legacyByteLength": true
        }
      },
      "ttl": 86400,
      "timestamp": "2015-04-03T21:20:10Z"
    },
    {
      "index": 3,
      "type": "URL",
      "data": {
        "format": "string",
        "value": "http://www.handle.net/python_licenses/python1.6.1-2-23-01.html"
      },
      "ttl": 86400,
      "timestamp": "2015-04-03T21:20:10Z"
    }
  ]
}
```

with `headers` (from RESTED)

```
Access-Control-Expose-Headers: Content-Length
Access-Control-Allow-Origin: moz-extension://859432ef-e049-43ef-8ebf-8cdcb4c6224c
Content-Type: application/json;charset=UTF-8
Content-Length: 422
Date: Wed, 05 Dec 2018 08:32:20 GMT
```

In [0]:
#::GMG::Lo hacemos en Python también ... 
url = "http://hdl.handle.net/api/handles/1895.22/1013"
r = requests.get(url)
print("Status code: %s" % r.status_code)
#print('Status Code: {}\n\nJSON response:\n{}'.format(r.status_code,r.text))
json_handle = json.loads(r.text)
print(json.dumps(json_handle, indent=2))

Status code: 200
{
  "responseCode": 1,
  "handle": "1895.22/1013",
  "values": [
    {
      "index": 100,
      "type": "HS_ADMIN",
      "data": {
        "format": "admin",
        "value": {
          "handle": "200/1",
          "index": 300,
          "permissions": "111111111111",
          "legacyByteLength": true
        }
      },
      "ttl": 86400,
      "timestamp": "2015-04-03T21:20:10Z"
    },
    {
      "index": 3,
      "type": "URL",
      "data": {
        "format": "string",
        "value": "http://www.handle.net/python_licenses/python1.6.1-2-23-01.html"
      },
      "ttl": 86400,
      "timestamp": "2015-04-03T21:20:10Z"
    }
  ]
}


## ::GMG::Exercise 2

### 1: Try to find *TWO* different repositories that manage PIDs or DOIs (e.g figshare.com, DataONE, zenodo, etc.)

I've chosen [Figshare](https://figshare.com/) and  [Zenodo](https://zenodo.org) as repositories.

I've searched interesting references of assets to use for this exercise. I've found out two references from  [Figshare](https://figshare.com/):

1. https://figshare.com/articles/7368278/1/citations/datacite
2. https://figshare.com/articles/7007153/1/citations/datacite

and [another one](https://zenodo.org/search?page=1&size=20&q=cortaderia%20selloana&access_right=open&file_type=xml) from [Zenodo](https://zenodo.org):

3. https://zenodo.org/record/238999


### 2: Check an example file or the documentation to get the identifier from a resource

#### Zenodo 


It has a fairly good documentation to ease our task:

- [Developers REST API](http://developers.zenodo.org/)
- [ReadTheDocs](https://zenodo.readthedocs.io/en/latest/)


#### Figshare

Figshare dicumentation is trickier to find. In its main web page there is a section about its API:

>**Use the figshare API**
>
> automate your research workflows [check out some examples](https://github.com/search?utf8=%E2%9C%93&q=figshare) of how others are using it.

The link however is a proxy search on Github that yields **71** repository results. I've skimmed the list and found out these two references:

- [figshare](https://github.com/figshare)/[user_documentation](https://github.com/figshare/user_documentation),  ( [API feature list](https://github.com/figshare/user_documentation/blob/master/docs/api/index.md) )
- [rmcgibbo](https://github.com/rmcgibbo)/[figshare](https://github.com/rmcgibbo/figshare), Command line client for figshare

In the *first reference* there is [a documentation file](https://github.com/figshare/user_documentation/blob/master/docs/index.md) with an external link to [the main documenation web page](https://docs.figshare.com/).

### 3: Resolve the identifier using python request

#### Zenodo (REST API)

Zenodo provides the DOI identifier of their records. For example [the scientific article](https://zenodo.org/record/238999) I've chosen from Zenodo:

- https://doi.org/10.3897/phytokeys.76.10808

and also it can be found in the Datacite (XML) *formatted citation*:

- https://zenodo.org/record/238999/export/dcite4

which has the `DOI` in `<relatedIdentifiers>`:

```xml
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.3897/phytokeys.76.10808</relatedIdentifier>
```


In [16]:
#::GMG::Fisrt I've successfully managed to get the DOI from the URL
#       resource record using the Zenodo REST API with Python
headers = {"Content-Type": "application/json"}
r = requests.get("https://zenodo.org/api/records/238999",headers=headers)
r.json()

{'conceptrecid': '724496',
 'created': '2017-01-11T14:41:40.056743+00:00',
 'doi': '10.3897/phytokeys.76.10808',
 'files': [{'bucket': 'f10053f9-a92c-4e18-bea8-e5306049aa48',
   'checksum': 'md5:56ad51eabdaf0e64091169c234bb9664',
   'key': 'PK_article_10808.pdf',
   'links': {'self': 'https://zenodo.org/api/files/f10053f9-a92c-4e18-bea8-e5306049aa48/PK_article_10808.pdf'},
   'size': 4824689,
   'type': 'pdf'},
  {'bucket': 'f10053f9-a92c-4e18-bea8-e5306049aa48',
   'checksum': 'md5:7aa488cee4bdf327d7ae09e7cd1cd092',
   'key': 'PK_article_10808.xml',
   'links': {'self': 'https://zenodo.org/api/files/f10053f9-a92c-4e18-bea8-e5306049aa48/PK_article_10808.xml'},
   'size': 254837,
   'type': 'xml'}],
 'id': 238999,
 'links': {'badge': 'https://zenodo.org/badge/doi/10.3897/phytokeys.76.10808.svg',
  'bucket': 'https://zenodo.org/api/files/f10053f9-a92c-4e18-bea8-e5306049aa48',
  'doi': 'https://doi.org/10.3897/phytokeys.76.10808',
  'html': 'https://zenodo.org/record/238999',
  'latest': 

In [17]:
#::GMG::Using JSON library and Python Dict iteration I've been able to
#       locate the DOI identifier from the metadata Zenodo record
json_record = json.loads(r.text)
for key in json_record.keys():
    print ('{}: {}'.format(key, json_record[key]))


conceptrecid: 724496
created: 2017-01-11T14:41:40.056743+00:00
doi: 10.3897/phytokeys.76.10808
files: [{'bucket': 'f10053f9-a92c-4e18-bea8-e5306049aa48', 'checksum': 'md5:56ad51eabdaf0e64091169c234bb9664', 'key': 'PK_article_10808.pdf', 'links': {'self': 'https://zenodo.org/api/files/f10053f9-a92c-4e18-bea8-e5306049aa48/PK_article_10808.pdf'}, 'size': 4824689, 'type': 'pdf'}, {'bucket': 'f10053f9-a92c-4e18-bea8-e5306049aa48', 'checksum': 'md5:7aa488cee4bdf327d7ae09e7cd1cd092', 'key': 'PK_article_10808.xml', 'links': {'self': 'https://zenodo.org/api/files/f10053f9-a92c-4e18-bea8-e5306049aa48/PK_article_10808.xml'}, 'size': 254837, 'type': 'xml'}]
id: 238999
links: {'badge': 'https://zenodo.org/badge/doi/10.3897/phytokeys.76.10808.svg', 'bucket': 'https://zenodo.org/api/files/f10053f9-a92c-4e18-bea8-e5306049aa48', 'doi': 'https://doi.org/10.3897/phytokeys.76.10808', 'html': 'https://zenodo.org/record/238999', 'latest': 'https://zenodo.org/api/records/238999', 'latest_html': 'https://zeno

In [18]:
print('Record DOI: {}'.format(json_record['doi']))

Record DOI: 10.3897/phytokeys.76.10808


#### Zenodo (handle)

In [19]:
#::GMG::Solving the Zenodo record DOI identifier with handle
# 
url = "http://hdl.handle.net/api/handles/"
doi = json_record['doi']
r = requests.get(url +  doi)
json_handle_zenodo = json.loads(r.text)
for key in json_handle_zenodo.keys():
    print ('{}: {}'.format(key, json_handle_zenodo[key]))

responseCode: 1
handle: 10.3897/phytokeys.76.10808
values: [{'index': 1, 'type': 'URL', 'data': {'format': 'string', 'value': 'http://phytokeys.pensoft.net/articles.php?id=10808'}, 'ttl': 86400, 'timestamp': '2017-01-11T14:38:26Z'}, {'index': 700050, 'type': '700050', 'data': {'format': 'string', 'value': '1484145481'}, 'ttl': 86400, 'timestamp': '2017-01-11T14:38:26Z'}, {'index': 100, 'type': 'HS_ADMIN', 'data': {'format': 'admin', 'value': {'handle': '0.na/10.3897', 'index': 200, 'permissions': '111111110010'}}, 'ttl': 86400, 'timestamp': '2017-01-11T14:38:26Z'}]


In [20]:
#::GMG::Here we get the URL to the online digital publication landing page
print('handle: {} solves to url: {}'.format(json_handle_zenodo['handle'], 
                json_handle_zenodo['values'][0]['data']['value']))

handle: 10.3897/phytokeys.76.10808 solves to url: http://phytokeys.pensoft.net/articles.php?id=10808


#### Zenodo (doi.org)

In [21]:
#::GMG::Solving also the identifier with doi.org REST API
url = "https://dx.doi.org/"
headers = {'Accept': 'application/vnd.citationstyles.csl+json;q=1.0'}
r = requests.post(url + doi, headers=headers)
#print(r.status_code)
print(r.text)

{"indexed":{"date-parts":[[2018,4,18]],"date-time":"2018-04-18T02:13:05Z","timestamp":1524017585964},"reference-count":40,"publisher":"Pensoft Publishers","license":[{"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/","start":{"date-parts":[[2017,1,11]],"date-time":"2017-01-11T00:00:00Z","timestamp":1484092800000},"delay-in-days":0,"content-version":"vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"DOI":"10.3897\/phytokeys.76.10808","type":"article-journal","created":{"date-parts":[[2017,1,11]],"date-time":"2017-01-11T14:38:26Z","timestamp":1484145506000},"page":"39-69","source":"Crossref","is-referenced-by-count":0,"title":"Synoptic taxonomy of Cortaderia Stapf (Danthonioideae, Poaceae)","prefix":"10.3897","volume":"76","author":[{"given":"Daniel","family":"Testoni","sequence":"first","affiliation":[]},{"given":"H. Peter","family":"Linder","sequence":"additional","affiliation":[]}],"member":"2258","published-online":{"date-parts":[[2017,1,11]]},"reference":[

In [22]:
json_doi_zenodo = json.loads(r.text)
for key in json_doi_zenodo.keys():
    print ('{}: {}'.format(key, json_doi_zenodo[key]))

indexed: {'date-parts': [[2018, 4, 18]], 'date-time': '2018-04-18T02:13:05Z', 'timestamp': 1524017585964}
reference-count: 40
publisher: Pensoft Publishers
license: [{'URL': 'http://creativecommons.org/licenses/by/4.0/', 'start': {'date-parts': [[2017, 1, 11]], 'date-time': '2017-01-11T00:00:00Z', 'timestamp': 1484092800000}, 'delay-in-days': 0, 'content-version': 'vor'}]
content-domain: {'domain': [], 'crossmark-restriction': False}
DOI: 10.3897/phytokeys.76.10808
type: article-journal
created: {'date-parts': [[2017, 1, 11]], 'date-time': '2017-01-11T14:38:26Z', 'timestamp': 1484145506000}
page: 39-69
source: Crossref
is-referenced-by-count: 0
title: Synoptic taxonomy of Cortaderia Stapf (Danthonioideae, Poaceae)
prefix: 10.3897
volume: 76
author: [{'given': 'Daniel', 'family': 'Testoni', 'sequence': 'first', 'affiliation': []}, {'given': 'H. Peter', 'family': 'Linder', 'sequence': 'additional', 'affiliation': []}]
member: 2258
published-online: {'date-parts': [[2017, 1, 11]]}
referen

In [23]:
#::GMG::Here we get the URL to the actual pdf digital scintific paper document
#       Note that with handle we didn't get the actual document (!)
print('doi: {} solves to \nurl: {}'.format(json_doi_zenodo['DOI'], 
                json_doi_zenodo['link'][0]['URL']))

doi: 10.3897/phytokeys.76.10808 solves to 
url: http://phytokeys.pensoft.net/lib/ajax_srv/article_elements_srv.php?action=download_pdf&item_id=10808


#### Figshare (REST API)



With fighare I must use [its API](https://docs.figshare.com/#articles_search). Documenttaion gives an example with the following Python Template:

```python
from __future__ import print_statement
import time
import swagger_client
from swagger_client.rest import ApiException
from pprint import pprint

# create an instance of the API class
api_instance = swagger_client.ArticlesApi()
articleId = 789 # Long | Article Unique identifier

try: 
    # View article details
    api_response = api_instance.article_details(articleId)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling ArticlesApi->articleDetails: %s\n" % e)
```

or `curl` from command-line *shell*:

```bash
curl -X GET "https://api.figshare.com/v2/articles/{article_id}"
```


In [30]:
#::GMG::I'm going to use only requests and json Python libraries instead of
#       the swagger client ...
#
headers = {"Content-Type": "application/json"}
r = requests.get("https://api.figshare.com/v2/articles/7368278",
                 headers=headers)
r.json()

{'authors': [{'full_name': 'David Brandão Nunes',
   'id': 5996879,
   'is_active': False,
   'orcid_id': '',
   'url_name': '_'},
  {'full_name': 'José de Paula Barros Neto',
   'id': 4702705,
   'is_active': False,
   'orcid_id': '',
   'url_name': '_'},
  {'full_name': 'Silvia Maria de Freitas',
   'id': 5996882,
   'is_active': False,
   'orcid_id': '',
   'url_name': '_'}],
 'categories': [{'id': 548,
   'parent_id': 5,
   'title': 'Civil Geotechnical Engineering'},
  {'id': 549, 'parent_id': 5, 'title': 'Construction Engineering'},
  {'id': 1109,
   'parent_id': 1101,
   'title': 'Architecture not elsewhere classified'}],
 'citation': 'Nunes, David Brandão; Barros Neto, José de Paula; Freitas, Silvia Maria de (2019): Multiple linear regression model to evaluate the market value of residential apartments in Fortaleza, CE. SciELO journals. Fileset.',
 'confidential_reason': '',
 'created_date': '2018-11-21T02:51:21Z',
 'custom_fields': [{'name': 'Read full text',
   'value': ['http

In [31]:
json_article_figshare = json.loads(r.text)
for key in json_article_figshare.keys():
    print ('{}: {}'.format(key, json_article_figshare[key]))

status: public
embargo_date: None
citation: Nunes, David Brandão; Barros Neto, José de Paula; Freitas, Silvia Maria de (2019): Multiple linear regression model to evaluate the market value of residential apartments in Fortaleza, CE. SciELO journals. Fileset.
url_private_api: https://api.figshare.com/v2/account/articles/7368278
embargo_reason: 
references: []
funding_list: []
url_public_api: https://api.figshare.com/v2/articles/7368278
id: 7368278
custom_fields: [{'name': 'Read full text', 'value': ['http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1678-86212019000100089&lng=en&tlng=en']}]
size: 1337052
metadata_reason: 
funding: None
figshare_url: https://figshare.com/articles/Multiple_linear_regression_model_to_evaluate_the_market_value_of_residential_apartments_in_Fortaleza_CE/7368278
embargo_type: None
title: Multiple linear regression model to evaluate the market value of residential apartments in Fortaleza, CE
defined_type: 4
is_embargoed: False
version: 1
confidential_reas

In [32]:
#::GMG::I get the DOI
print('Figshare article DOI: {}'.format(json_article_figshare['doi']))

Figshare article DOI: 10.6084/m9.figshare.7368278.v1


#### Figshare (handle)

In [33]:
#::GMG::I resolve the DOI with handle
#::GMG::Solving the figshare DOI identifier with handle
# 
url = "http://hdl.handle.net/api/handles/"
doi = json_article_figshare['doi']
r = requests.get(url +  doi)
json_handle_figshare = json.loads(r.text)
for key in json_handle_figshare.keys():
    print ('{}: {}'.format(key, json_handle_figshare[key]))

responseCode: 1
handle: 10.6084/m9.figshare.7368278.v1
values: [{'index': 100, 'type': 'HS_ADMIN', 'data': {'format': 'admin', 'value': {'handle': '10.admin/codata', 'index': 300, 'permissions': '111111111111'}}, 'ttl': 86400, 'timestamp': '2018-11-21T02:51:22Z'}, {'index': 1, 'type': 'URL', 'data': {'format': 'string', 'value': 'https://figshare.com/articles/Multiple_linear_regression_model_to_evaluate_the_market_value_of_residential_apartments_in_Fortaleza_CE/7368278/1'}, 'ttl': 86400, 'timestamp': '2018-11-21T02:51:22Z'}]


In [34]:
#::GMG::Then I get the URL
#::note::The member of the list of values that holds the digital asset URL
#        here has changed to the second place ...
print('handle: {} solves to url: {}'.format(json_handle_figshare['handle'], 
                json_handle_figshare['values'][1]['data']['value']))

handle: 10.6084/m9.figshare.7368278.v1 solves to url: https://figshare.com/articles/Multiple_linear_regression_model_to_evaluate_the_market_value_of_residential_apartments_in_Fortaleza_CE/7368278/1


#### Figshare (doi.org)

In [38]:
#::GMG::I resolve DOI now with DOI.org REST API
url = "https://dx.doi.org/"
headers = {'Accept': 'application/vnd.citationstyles.csl+json;q=1.0'}
r = requests.post(url + doi, headers=headers)
#print(r.status_code)
print(r.text)

{
  "type": "article",
  "id": "https://doi.org/10.6084/m9.figshare.7368278.v1",
  "categories": [
    "90501 Civil Geotechnical Engineering",
    "90502 Construction Engineering",
    "120199 Architecture not elsewhere classified"
  ],
  "author": [
    {
      "family": "Nunes",
      "given": "David Brandão"
    },
    {
      "family": "Neto",
      "given": "José De Paula Barros"
    },
    {
      "family": "Freitas",
      "given": "Silvia Maria De"
    }
  ],
  "issued": {
    "date-parts": [
      [
        2019
      ]
    ]
  },
  "abstract": "Abstract The valuation of real estate, which assists in the definition of market value, is an important science with a wide field of action, which includes the collection of taxes, commercial transactions, insurance and judicial expertise. This study presents the construction of a linear regression model to determine the market value (dependent variable) of residential apartments in the city of Fortaleza-CE. The studied database presen

In [36]:
json_doi_figshare = json.loads(r.text)
for key in json_doi_figshare.keys():
    print ('{}: {}'.format(key, json_doi_figshare[key]))

type: article
id: https://doi.org/10.6084/m9.figshare.7368278.v1
categories: ['90501 Civil Geotechnical Engineering', '90502 Construction Engineering', '120199 Architecture not elsewhere classified']
author: [{'family': 'Nunes', 'given': 'David Brandão'}, {'family': 'Neto', 'given': 'José De Paula Barros'}, {'family': 'Freitas', 'given': 'Silvia Maria De'}]
issued: {'date-parts': [[2019]]}
abstract: Abstract The valuation of real estate, which assists in the definition of market value, is an important science with a wide field of action, which includes the collection of taxes, commercial transactions, insurance and judicial expertise. This study presents the construction of a linear regression model to determine the market value (dependent variable) of residential apartments in the city of Fortaleza-CE. The studied database presents 17,493 apartments, divided into 227 plan types in a total of 154 projects launched between the years of 2011 and 2014. The model developed was obtained usi

In [39]:
#::GMG::Here we get the article indirect URL (id) which redirects 
#       to the figshare article's landing page ...
print('doi: {} solves to \nurl: {}'.format(json_doi_figshare['DOI'], 
                json_doi_figshare['id']))

doi: 10.6084/m9.figshare.7368278.v1 solves to 
url: https://doi.org/10.6084/m9.figshare.7368278.v1


### 4: Show relevant information, like author, title, description...

#### Zenodo (REST API)

In [0]:
#::GMG::I use 'metadata' key from Zenodo REST API response here
zenodo_json_metadata = json_record['metadata']
zenodo_json_metadata


{'access_right': 'open',
 'access_right_category': 'success',
 'communities': [{'id': 'biosyslit'}],
 'creators': [{'affiliation': '<div>Universidad Nacional del Sur, Bahía Blanca, Argentina</div>',
   'name': 'Testoni, Daniel'},
  {'affiliation': '<div>University of Zurich, Zürich, Switzerland</div>',
   'name': 'Linder, H. Peter'}],
 'description': '<p>Cortaderia (Poaceae; Danthonioideae) is a medium-sized genus of C3 tussock grasses, widespread in the temperate to tropic-alpine regions of South America. It is particularly important in the subalpine and alpine zones of the Andes. We revised the classification of the genus, and recognize 17 species grouped into five informal groups. We describe one new species, Cortaderia echinata H.P.Linder, from Peru. We provide a key to the groups and the species, complete nomenclature for each species including new lectotypes, and notes on the ecology, distribution and diagnostic morphological and anatomical characters.</p>',
 'doi': '10.3897/phyt

In [0]:
#::GMG::I must strip out the alien tags embedded in the description ... :)
# https://stackoverflow.com/questions/753052/strip-html-from-strings-in-python
import re
print('Title: {}\nDescription: {}\nAuthors: {} and {}'.format(
    json_record['metadata']['title'],
    re.sub('<[^<]+?>', '', json_record['metadata']['description']),
    json_record['metadata']['creators'][0]['name'],
    json_record['metadata']['creators'][1]['name']
    )
)
#::note::I'm curious abut the most pythonic way of accessing values in this 
#        type of nested complex dictionaries ....
#
# https://www.programiz.com/python-programming/nested-dictionary
# https://www.haykranen.nl/2016/02/13/handling-complex-nested-dicts-in-python/
# https://stackoverflow.com/questions/14692690/access-nested-dictionary-items-via-a-list-of-keys

Title: Synoptic taxonomy of Cortaderia Stapf (Danthonioideae, Poaceae)
Description: Cortaderia (Poaceae; Danthonioideae) is a medium-sized genus of C3 tussock grasses, widespread in the temperate to tropic-alpine regions of South America. It is particularly important in the subalpine and alpine zones of the Andes. We revised the classification of the genus, and recognize 17 species grouped into five informal groups. We describe one new species, Cortaderia echinata H.P.Linder, from Peru. We provide a key to the groups and the species, complete nomenclature for each species including new lectotypes, and notes on the ecology, distribution and diagnostic morphological and anatomical characters.
Authors: Testoni, Daniel and Linder, H. Peter


#### Zenodo (doi.org)

In [29]:
#::GMG::The resolved DOI also has metadata ...
print('Title: {}\nDescription: {}\nAuthors: {} and {}'.format(
    json_doi_zenodo['title'],
    '--> ::GMG::No description available',
    json_doi_zenodo['author'][0]['given'] + ' ' +
    json_doi_zenodo['author'][0]['family'],
    json_doi_zenodo['author'][1]['given'] + ' ' +
    json_doi_zenodo['author'][1]['family']    
    )
)

Title: Synoptic taxonomy of Cortaderia Stapf (Danthonioideae, Poaceae)
Description: --> ::GMG::No description available
Authors: Daniel Testoni and H. Peter Linder


#### Figshare (REST API)

In [42]:
#::GMG::The figshare REST API provides the metadata requested
#  I must strip out the alien tags embedded in the description ...again! ;)
# https://stackoverflow.com/questions/753052/strip-html-from-strings-in-python
import re
print('Title: {}\nDescription: {}\nAuthors: {} and {}'.format(
    json_article_figshare['title'],
    re.sub('<[^<]+?>', '', json_article_figshare['description']),
    json_article_figshare['authors'][0]['full_name'],
    json_article_figshare['authors'][1]['full_name']
    )
)

Title: Multiple linear regression model to evaluate the market value of residential apartments in Fortaleza, CE
Description: Abstract The valuation of real estate, which assists in the definition of market value, is an important science with a wide field of action, which includes the collection of taxes, commercial transactions, insurance and judicial expertise. This study presents the construction of a linear regression model to determine the market value (dependent variable) of residential apartments in the city of Fortaleza-CE. The studied database presents 17,493 apartments, divided into 227 plan types in a total of 154 projects launched between the years of 2011 and 2014. The model developed was obtained using Multiple Linear Regression associated with the Ridge Regression technique to solve the existing multicollinearity problem. In the analysis of 30 variables (12 quantitative and 18 dummy type qualitative variables), an equation with 6 variables was reached, which meets the the

#### Figshare (doi.org)

In [43]:
#::GMG::I have more metadata info available here than in Zenodo (DOI)
print('Title: {}\nDescription: {}\nAuthors: {} and {}'.format(
    json_doi_figshare['title'],
    json_doi_figshare['abstract'],
    json_doi_figshare['author'][0]['given'] + ' ' +
    json_doi_figshare['author'][0]['family'],
    json_doi_figshare['author'][1]['given'] + ' ' +
    json_doi_figshare['author'][1]['family'],
    )
)

Title: Multiple linear regression model to evaluate the market value of residential apartments in Fortaleza, CE
Description: Abstract The valuation of real estate, which assists in the definition of market value, is an important science with a wide field of action, which includes the collection of taxes, commercial transactions, insurance and judicial expertise. This study presents the construction of a linear regression model to determine the market value (dependent variable) of residential apartments in the city of Fortaleza-CE. The studied database presents 17,493 apartments, divided into 227 plan types in a total of 154 projects launched between the years of 2011 and 2014. The model developed was obtained using Multiple Linear Regression associated with the Ridge Regression technique to solve the existing multicollinearity problem. In the analysis of 30 variables (12 quantitative and 18 dummy type qualitative variables), an equation with 6 variables was reached, which meets the the

### Others (pending)

* 5: Which kind of "types" are defined in the PIDs?
* 6: Is there any difference managing the DOIs/PIDs between the different repositories? (textual answer)