# <span style="color:darkblue"> Lecture 5b - APIs and JSON </span>

<font size = "5">

In this lecture we will cover the basics of dictionaries and <br>
interacting with web-based APIs

<font size = "5">

Import packages

In [44]:
import requests      # Requests data from an API
import json          # Handles JSON object
import jmespath      # Allows you to mor easily navigate JSON objects
import pandas as pd  # Handles data frames

# <span style="color:darkblue"> I. Best Coding Practices </span>

<font size = "5">

Splitting a command into multiple lines

<font size = "3">

- Sometimes strings are very long, e.g. URLs
- Best practice to code lines with maximum 80 characters
- It is easier to read code if you scroll vertically <br>
rather than sideways

In [45]:
# You can split commands into multiple lines with a backlash
example_backslash = "This is a very long string" \
                    + "and I would like to break it down into multiple lines"

# You can also split commands into multiple lines by wrapping them in parentheses)
example_parenthesis = ( "This is a very long string" 
                    + "and I would like to break it down into multiple lines")


<font size = "5">

Examples of Dictionary Formatting

In [46]:
# Define dictionary
dictionary_student = \
{
    "first_name": "John",
    "last_name" : "Smith",
    "location"  : "Atlanta",
    "university": "Emory",
}


<font size = "5">
Nested dictionaries + Lists

In [47]:
dictionary_student_wgrades =\
{
    "first_name": "John",
    "last_name" : "Smith",
    "location"  : "Atlanta",
    "university": "Emory",
    "grades":{
        "computing":"A",
        "reasoning":"B",
        "applied":"A-"
    }
}

dictionary_liststudents = \
[{
    "first_name": "Ned",
    "last_name" : "Stark",
    "location"  : "Atlanta",
    "university": "Emory",
    
 },
 {
    "first_name": "Cersei",
    "last_name" : "Lannister",
    "location"  : "Decatur",
    "university": "Georgia Tech",
 }
]

<font size = "5">

Convert dictionaries to JSON

In [48]:
# Convert dictionary to JSON
## json.dumps() converts the dictionary to a string
## json.loads() converts this string to a JSON object

json_student         = json.loads(json.dumps(dictionary_student ))
json_student_wgrades = json.loads(json.dumps(dictionary_student_wgrades))
json_liststudents    = json.loads(json.dumps(dictionary_liststudents))


<font size = "5">

Navigating a JSON file by name

In [49]:
# The search command tries to search down the hierarchy, first "grades", then
# computing. The "." is used to separate the queries at different levels.

extract_info1 = jmespath.search('grades.computing', json_student_wgrades)
print(extract_info1)

A


<font size = "5">

Navigating a JSON list by position

In [50]:
extract_info2 = jmespath.search('[0]', json_liststudents)
print(extract_info2)

extract_info3 = jmespath.search('[0].first_name', json_liststudents)
print(extract_info3)


{'first_name': 'Ned', 'last_name': 'Stark', 'location': 'Atlanta', 'university': 'Emory'}
Ned


<font size = "5">

Extracting elements at all sublevels

In [51]:
# This commands extracts the value of "first_name" for all students
# The [*] command 

extract_info3 = jmespath.search('[*].first_name', json_liststudents)
print(extract_info3)


['Ned', 'Cersei']


# <span style="color:darkblue"> II. Accessing an API </span>

<font size = "5">

About the example

- OpenAlex is one of largest open access repositories <br>
of bibliographic information
- Contains millions of references and metada on <br>
books, papers, authors, and citation networks

https://docs.openalex.org

<font size = "5">

Define an URL to access API

``` https://website.com/option=12345```

- Usually it's a domain followed by a field
- Many search engines use this type of format <br>
for search queries

In [52]:
# Define a new string variable called "url_openalex_api"
# Notice that we used the backslash to splot the command, since
# the URL is quite long.

# In this case where are searching for all the works published 
# by an author whose ID is "A5023888391"

url_openalex_api = \
      "https://api.openalex.org/works?filter=author.id:A5023888391"


<font size = "5">

Request data


In [53]:
# Obtain the JSON data of records from the open alex API
# Note: you can break the link using backslash

search_results = requests.\
    get(url_openalex_api).json()


<font size = "5">

Convert to a JSON file for easier readability

In [54]:
# This commands opens the file to write it.
# The code in parenthesis of file.write(...) 
#     nicely formats the json file, with four character indentation.

with open('json_files/convert_search_results.json', 'w') as file: 
    file.write(json.dumps(search_results, indent =4))

<font size = "5">

Browsing with the JSON data viewer

<font size = "3">

- Open the ```convert_search_results.json``` file from the folder in VSCode
- Click on VS-Code's "View" tab at the top and click on the <br>
 option "Command Palette". This will open a pop-up window. <br>
<br>

<img src="figures/command_palette.png" alt="drawing" width="150"/>
<img src="figures/open_jsonviewer.png" alt="drawing" width="350"/>

<br>

- Search "Open in JSON viewer" and click enter. This <br>
should open a new tab with the following layout, which has <br>
convenient drop-down menus: <br>
<br>

<img src="figures/json_viewer.png" alt="drawing" width="400"/>

<font size = "5">

Try it yourself! Experiment with the JSON data viewer. <br>
Diagnose how the data is structured and the different levels.

# <span style="color:darkblue"> III. Navigating the nested structure </span>

<font size = "5">

Browse keys in the "first level"


In [63]:
# The "len" command is used to 
print(len(search_results))

# The "keys" subfunction extract the names of the "first-level" objects
print(search_results.keys())

# Store keys to data fram to easy access and visualization
search_results_keys = pd.DataFrame(search_results.keys())


3
dict_keys(['meta', 'results', 'group_by'])


<font size = "5">

Browse subdictionary within a particular key

In [56]:
search_results["meta"]

{'count': 62,
 'db_response_time_ms': 127,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

<font size = "5">

Convert dictionary to pandas

In [57]:
# This code converts to Pandas

data = pd.DataFrame(search_results["meta"],index = [0])

data

Unnamed: 0,count,db_response_time_ms,page,per_page,groups_count
0,62,127,1,25,


<font size = "5">

Count elements at a particular level

In [58]:
# Count number of elements in list
len(search_results["results"])

25

<font size = "5">

You can access different sub elements

In [59]:
print(search_results["results"][0])
print(search_results["results"][1])

search_results["results"][0].keys()

{'id': 'https://openalex.org/W2741809807', 'doi': 'https://doi.org/10.7717/peerj.4375', 'title': 'The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles', 'display_name': 'The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles', 'publication_year': 2018, 'publication_date': '2018-02-13', 'ids': {'openalex': 'https://openalex.org/W2741809807', 'doi': 'https://doi.org/10.7717/peerj.4375', 'mag': '2741809807', 'pmid': 'https://pubmed.ncbi.nlm.nih.gov/29456894', 'pmcid': 'https://www.ncbi.nlm.nih.gov/pmc/articles/5815332'}, 'language': 'en', 'primary_location': {'is_oa': True, 'landing_page_url': 'https://doi.org/10.7717/peerj.4375', 'pdf_url': 'https://peerj.com/articles/4375.pdf', 'source': {'id': 'https://openalex.org/S1983995261', 'display_name': 'PeerJ', 'issn_l': '2167-8359', 'issn': ['2167-8359'], 'is_oa': True, 'is_in_doaj': True, 'is_core': True, 'host_organization': 'https://openalex.org/P4310320104', 'ho

dict_keys(['id', 'doi', 'title', 'display_name', 'publication_year', 'publication_date', 'ids', 'language', 'primary_location', 'type', 'type_crossref', 'indexed_in', 'open_access', 'authorships', 'countries_distinct_count', 'institutions_distinct_count', 'corresponding_author_ids', 'corresponding_institution_ids', 'apc_list', 'apc_paid', 'fwci', 'has_fulltext', 'fulltext_origin', 'cited_by_count', 'citation_normalized_percentile', 'cited_by_percentile_year', 'biblio', 'is_retracted', 'is_paratext', 'primary_topic', 'topics', 'keywords', 'concepts', 'mesh', 'locations_count', 'locations', 'best_oa_location', 'sustainable_development_goals', 'grants', 'datasets', 'versions', 'referenced_works_count', 'referenced_works', 'related_works', 'abstract_inverted_index', 'cited_by_api_url', 'counts_by_year', 'updated_date', 'created_date'])

<font size ="5">

Gradually work through the nested structure!

<font size = "5">

Try it yourself!

<font size = "3">

- Navigate the "results" file using the JSON Data Viewer
- Use the "jmes.search()" to extract the "title" of all the <br>
search results.


In [60]:
# Write your own code




<font size = "5">

Try it yourself!

<font size = "3">

- Convert ```search_results["results"]``` to a Pandas dataframe
- Are there any columns with conversion issues?

Note: Pandas conversion can into issues for variables, with <br>
several levels of nesting and more data cleaning may be required <br>
to properly code these cases.

In [61]:
# Write your own code




# <span style="color:darkblue"> IV. Hands-on Experience </span>

The Open Library also contains bibliographic information.

You can search for sports books with the API link

http://openlibrary.org/search.json?title=sports

1. Get information from this website using requests
2. Store the result in a JSON file and open it with data viewer.
2. Find out the main keys
3. How many search results are there?
4. What data can you obtain? Think carefully about <br>
how to convert this into a workable pandas format

In [62]:
# Write your own code







# <span style="color:darkblue"> V. Further Reading </span>

<font size = "5">

Learn more about how to use advanced JSON queries with JMES at

- This can be very useful to navigate the nested structure
- Examples of syntax and advanced search terms

https://jmespath.org/tutorial.html

<font size = "5">

Learn more about JSON at <br>

https://www.w3schools.com/js/js_json_intro.asp

<font size = "5">

https://saturncloud.io/blog/how-to-convert-nested-json-to-pandas-dataframe-with-specific-format/


<font size = "5">

Check the following textbook on our reading list, Chapter 7:

https://www.amazon.com/Introduction-Programming-Business-Science-Applications/dp/1544377444

<font size = "5">

There are several online resources for further practice at:

https://study.sagepub.com/researchmethods/statistics/kaefer-intro-to-python