# Batching editing publication dates in OJS using API

This code will use the OJS 3.3 API to batch edit the publication date for published articles ("publications"). in the archive of an OJS journal. Here's the link to the API documentation: https://docs.pkp.sfu.ca/dev/api/ojs/3.3

The journal's back issues were all uploaded at one time, long after print publication. The publication dates in OJS match the date of OJS upload, not of journal publication. This has implications for harvesting, citation output, DOI registration (and cost), and copyright.

The current OJS API can only make get requests for issues, not put. That means I had to update all the issue publication dates through the web interface.

## Workflow

Schematically:
1. GET list of all issues (filter 2016 and prior)
2. In a loop:
    1. Pull the issue IDs
    2. use issue IDs to GET issue data
    3. pull pub date
    4. pull sub and pub IDs from attached articles
    5. in a loop:
        1. use sub and pub ID to PUT unpublish publication
        2. PUT edit publication date
        3. PUT edit publish publication

## Managing credentials and endpoints

We need a list of endpoints and the API token. Problems is, the endpoints will be generated dynamically by the submission and publication IDs so I'm not sure how that will work in the implementation.

For now I'm using placeholders and I've got CSV of credentials looking like this (two rows: one header, one values):

* `token`: `SUPER TOP SECRET API TOKEN`
* `subs_endpoint`: `https://historicalpapers.journals.yorku.ca/index.php/historicalpapers/api/v1/submissions`
* `unpub_endpoint`: `https://historicalpapers.journals.yorku.ca/index.php/historicalpapers/api/v1/submissions/{submissionId}/publications/{publicationId}/unpublish`
* `edit_endpoint`: `https://historicalpapers.journals.yorku.ca/index.php/historicalpapers/api/v1/submissions/{submissionId}/publications/{publicationId}`
* `pub_endpoint`: `https://historicalpapers.journals.yorku.ca/index.php/historicalpapers/api/v1/submissions/{submissionId}/publications/{publicationId}/publish`

So let's load that up. Then, assign the token and endpoints to variables

In [44]:
# import required libraries
import os # need this to run outside of jupyter
import pandas as pd # used to manage CSV files instead of CSV library
import requests # API calls
import json # handle the output
import datetime # to handle the publication date

# set the working directory
my_dir = 'C:\\Users\\tmrozesws\\Documents\\Historical Society pub dates' # wd set here
os.chdir(my_dir)

# open the CSV files with credentials and endpoints
my_keys = pd.read_csv("CSCH_creds.csv")

# assign values using cell index location in CSV
key = my_keys.iat[0,0] # API token
issueList_endpoint = my_keys.iat[0,1] # get list of all issues
issue_endpoint = my_keys.iat[0,2] # get issue by ID
unpub_endpoint = my_keys.iat[0,3] # unpub by sub & pub ID
edit_endpoint = my_keys.iat[0,4] # edit by sub & pub ID
pub_endpoint = my_keys.iat[0,5] # repub by sub & pub ID

## Get list of all issues

EZ PZ. We've done this before.

In [45]:
# API GET call
# set 'count' to get all published issues (default 20)
issueList_call = requests.get(
issueList_endpoint,
params={'apiToken':key,'isPublished':'true','count':'100'}
)

# assign the json output of the call to variable z
y = json.dumps(issueList_call.json())
z = json.loads(y)

# bit of code to print the output to make sure it worked the first time - keep commented out
# json_object = json.dumps(z, indent=4)
# with open("sample_fullissue.json", "w") as outfile:
#     outfile.write(json_object)

## Get issue IDs

Now we have to loop through the JSON file to pull out the issue IDs.

The structure is:
```
{
    "items": [
        {
            "id":"2264"
        },
        {
            "id":"2275"
            ...
```
We'll have to adapt this snippet of code (the next operator) from the monthly stats to pull out the IDs:  
`d = next(item for item in c if item["date"] == monthLookup)`  
In that, c is the same at z

Aaaaanyway, Kris helped me out with a lot of backing and forthing and futzing about syntax to come up with this:

In [46]:
# create empty list of IDs we're going to iterate through
id_list = []
# go through the API results (z) and iterate through all the items (z['items'])
# each time it goes through a structure in the JSON file, that's call item, and we can specify to get 'id' from item
for item in z['items']:
    id_list.append(item['id'])

Proof of concept:

In [47]:
print(id_list)

[2275, 2274, 2273, 2272, 2271, 2269, 2268, 2228, 2229, 2227, 2230, 2231, 2232, 2233, 2234, 2235, 2239, 2240, 2241, 2242, 2243, 2246, 2247, 2248, 2249, 2250, 2251, 2252, 2253, 2267, 2266, 2265, 2264, 2263, 2254, 2262, 2261, 2260, 2259, 2258, 2257, 2256, 2255]


Okay, we now have a list of all our issue ids stored in id_list.

## Use issue ID to GET full issue data

Loop through id_list and call the API for each. We need to stick that issue ID in the API endpoint that takes the form `/issues/{issueId}`. The endpoint with the URL and placeholder is stored in the variable `issue_endpoint`. Have to figure out how to do that.

Let's start updating the endpoints with replace():

In [48]:
'''
for x in id_list:
    # convert the integer to string
    y = str(x)
    # replace placeholder text with the issue ID as string
    ep = issue_endpoint.replace("{issueId}",y)
    # print it to show it worked
    print(ep)
'''

'\nfor x in id_list:\n    # convert the integer to string\n    y = str(x)\n    # replace placeholder text with the issue ID as string\n    ep = issue_endpoint.replace("{issueId}",y)\n    # print it to show it worked\n    print(ep)\n'

Holy shit, that worked!

## Get issue publication date

Not we have to update that the loop to include the API call, using the variable ep (the endpoint), and pull out the date:

In [49]:
for x in id_list:
    # convert the integer to string
    y = str(x)
    # replace placeholder text with the issue ID as string
    ep = issue_endpoint.replace("{issueId}",y)
    # run the API call
    issue_call = requests.get(ep,params={'apiToken':key})
    # assign the json output of the call to variable a, load it as b
    a = json.dumps(issue_call.json())
    b = json.loads(a)
    # assign the date (in YYYY-MM-DD HH:MM:SS format) to variable c as string
    c = b["datePublished"]
    # substring to YYYY-MM-DD. The date we want to enter is stored as d
    d = c[0:10]

That worked to extract the date, but now we need to next another loop within that loop to go and get the sub and pub IDs for each article attached to that article.

## Get article sub & pub IDs per issue

The structure of the json file we got from the API call (where we also got our date) is:
```
{
    "articles": [
        {
            "id":39740,
            "currentPublicationId": 581
        }
    ]
}
```
Both the sub ID ("id") and pub ID ("currentPublicationId") are both at the same level. This is great because if there were any extra versions (publications) of any of these, it would really mess things up.

Now to extract those sub and pub IDs from each article.

In [52]:
    for article in b["articles"]:
        e = article['id']
        print(e)
        f = article['currentPublicationId']
        print(f)

39610
339
39609
338
39565
294
39564
293


That output looks a little weird here but when I ran it all in Spyder, it looks like it spit out all the values it needed to.

Now I'm going to try using the API to unpublish, edit, and republish an article outside of the loop. I'm going to assign the sub & pub IDs as integers and publication dates as a string to static variables for a single article.

## Unpublish and article

In [60]:
subID = 39564
pubID = 293
pubdate = "1967-06-02"
unpub ="https://historicalpapers.journals.yorku.ca/index.php/historicalpapers/api/v1/submissions/39564/publications/293/unpublish"

In [61]:
requests.put(unpub,params={'apiToken':key})

<Response [200]>

That worked! The submission is unpublished. Now, we edit the date (d) and put in the API request to change the publication date.

## Edit the article

In [64]:
eddate = "https://historicalpapers.journals.yorku.ca/index.php/historicalpapers/api/v1/submissions/39564/publications/293"
requests.put(eddate,params={'apiToken':key,'datePublished':"1967-06-02"})

<Response [200]>

In [65]:
pub = "https://historicalpapers.journals.yorku.ca/index.php/historicalpapers/api/v1/submissions/39564/publications/293/publish"
requests.put(pub,params={'apiToken':key})

<Response [200]>

ARGH. It should work but the date hasn't changed despite the 200 responses. The Activity Log shows that activity has happened with the submission but the original publication date doesn't show. Have triple-checked the sub and pub IDs, they're both correct.

Am writing the PKP forum to figure this shit out.