In [1]:
import requests
import pandas as pd
import math
from time import sleep

Here I am using a book search API to get information on books written by whichever author I choose. Here I will be extracting data on Agatha Christie and Stephen King.


The documentation for the API can be found here https://openlibrary.org/dev/docs/api/search

In [2]:
# Initially I explore what keys exist in the API
base_url = 'http://openlibrary.org/search.json'
parameters = {'author':'Agatha Christie'}
r = requests.get(base_url, params = parameters)
r.json().keys()

dict_keys(['numFound', 'start', 'numFoundExact', 'docs', 'num_found', 'q', 'offset'])

<code>docs</code> is they key for all the information regarding an author. However, <code>numFound</code> can be used to know how many entries there are for a specific author, allowing me to be able to effectively loop and gather all the data on the author.

In [3]:
# Define function to get the entries of a specified page
def get_books(Author,pagenum):
    base_url = 'http://openlibrary.org/search.json'
    parameters = {'author':Author,'page':pagenum}
    r = requests.get(base_url, params = parameters)
    df = pd.DataFrame(r.json()['docs'])
    max_page = r.json()['numFound'] 
    return df

# Define function to get max number of pages for the author
def get_maxpage(Author):
    base_url = 'http://openlibrary.org/search.json'
    parameters = {'author':Author}
    r = requests.get(base_url, params = parameters)
    max_page = r.json()['numFound']
    return max_page

In [12]:
# Define function that uses both previous functions to get all entries on an author
def get_books_info(Author):
    dfs = []# empty list to store dataframes
    maxpage = math.ceil(get_maxpage(Author)/100) # 100 since I know the length of a page is 100
    # can check this by simply doing len(get_books('Agatha Christie'))
    
    for page_number in range(1,maxpage+1):
        new_df = get_books(Author, page_number)
        dfs.append(new_df)
        sleep(1)
        
    df = pd.concat(dfs,ignore_index = True)
    return df

Now that the functions have been created, we can easily use them and get all the info on any author of our choice.

In [13]:
Agatha_Christie = get_books_info('Agatha Christie')

In [14]:
Agatha_Christie.to_csv("Agatha Christie.csv")

In similar fashion, I can extract all the information for any other author, such as Stephen King.

In [15]:
Stephen_King = get_books_info('Stephen King')

In [16]:
Stephen_King.to_csv("Stephen King.csv")