# CSV + API

In this reboot, we are going to use:

- The [Goodreads books](https://www.kaggle.com/jealousleopard/goodreadsbooks) dataset from Kaggle.
- The [Open Library Books API](https://openlibrary.org/dev/docs/api/books)

The goal of this livecode is to load the data from a CSV + loop over rows to enrich each row with information such as:

- List of subjects (Science, Humor, Travel, etc.)
- The cover URL of the book
- Other information you'd find useful in the JSON API

First, download the CSV in the local folder:

In [2]:
!curl -L https://gist.githubusercontent.com/ssaunier/351b17f5a7a009808b60aeacd1f4a036/raw/books.csv > books.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1509k  100 1509k    0     0  3202k      0 --:--:-- --:--:-- --:--:-- 3224k


In [3]:
!ls -lh

total 3040
-rw-r--r--  1 kleinyann  staff   579B Nov 29  2022 README.md
-rw-r--r--  1 kleinyann  staff   3.8K Jul 11 14:11 Recap.ipynb
-rw-r--r--  1 kleinyann  staff   1.5M Jul 11 17:02 books.csv


Then import the usual suspects!

In [3]:
import requests
import pandas as pd
import numpy as np

## Load books from CSV

In [5]:
# YOUR CODE HERE 
# Only title, authors, isbn13, #num_pages
books_df = pd.read_csv('books.csv', usecols=['title', 'authors', 'isbn13', '# num_pages'])
# books_df = books_df[['title', 'authors', 'isbn13', '# num_pages']]

# drop a column
# books_df = books_df.drop(columns=['title'])

books_df.head(3)

Unnamed: 0,title,authors,isbn13,# num_pages
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320


In [6]:
books_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13719 entries, 0 to 13718
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   title        13719 non-null  object
 1   authors      13719 non-null  object
 2   isbn13       13719 non-null  int64 
 3   # num_pages  13719 non-null  int64 
dtypes: int64(2), object(2)
memory usage: 428.8+ KB


Let's add a new column full of empty cells

In [7]:
# YOUR CODE HERE
# for example "cover_url"
# books_df['cover_url'] = np.nan
books_df['cover_url'] = None

books_df.head(3)

Unnamed: 0,title,authors,isbn13,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,


## API - Open Library

In [8]:
# YOUR CODE HERE 
# Enrich the DF with the "cover URL" (you can add other data, example: "List of subjects")
# Optional: make it IDEMPOTENT
isbn = 9780439785969
def fetch_cover(isbn):
    isbn_formatted = f"ISBN:{isbn}"
    url = f"https://openlibrary.org/api/books"
    try:
        response = requests.get(
            url,
            params={
                'bibkeys': isbn_formatted,
                'format': 'json',
                'jscmd': 'data'
            }
        )
        data = response.json()
        cover_url = data[isbn_formatted]["cover"]["large"]
    except:
        return None
    return cover_url

In [9]:
fetch_cover(isbn)

'https://covers.openlibrary.org/b/id/9326654-L.jpg'

In [10]:
%%time
for index, row in books_df.head(50).iterrows():
    # print(row['isbn13'])
    cover_url = fetch_cover(row['isbn13'])
    # print(cover_url)
    books_df.loc[index, 'cover_url'] = cover_url

CPU times: user 1.32 s, sys: 86.2 ms, total: 1.41 s
Wall time: 21.1 s


In [12]:
books_df.head(20)

Unnamed: 0,title,authors,isbn13,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,https://covers.openlibrary.org/b/id/9326654-L.jpg
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,https://covers.openlibrary.org/b/id/12025650-L...
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,https://covers.openlibrary.org/b/id/7572543-L.jpg
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352,https://covers.openlibrary.org/b/id/10301720-L...
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435,https://covers.openlibrary.org/b/id/8778528-L.jpg
5,Harry Potter Boxed Set Books 1-5 (Harry Potte...,J.K. Rowling-Mary GrandPré,9780439682589,2690,https://covers.openlibrary.org/b/id/278981-L.jpg
6,"Unauthorized Harry Potter Book Seven News: ""Ha...",W. Frederick Zimmerman,9780976540601,152,https://covers.openlibrary.org/b/id/742235-L.jpg
7,Harry Potter Collection (Harry Potter #1-6),J.K. Rowling,9780439827607,3342,https://covers.openlibrary.org/b/id/279436-L.jpg
8,The Ultimate Hitchhiker's Guide: Five Complete...,Douglas Adams,9780517226957,815,https://covers.openlibrary.org/b/id/12617870-L...
9,The Ultimate Hitchhiker's Guide to the Galaxy,Douglas Adams,9780345453747,815,


## Calling the API with multiple ISBNs at a time

In [1]:
# YOUR CODE HERE 
# Hint: https://openlibrary.org/api/books?bibkeys=ISBN:9780980200447,ISBN:9780140328721&jscmd=details&format=json
isbns = [9780439785969, 9780439358071, 9780439554930]

## CORRECTION: Calling the API with multiple ISBNs at a time

In [15]:
isbns = [9780439785969, 9780439358071, 9780439554930] 
[f"ISBN:{isbn}" for isbn in isbns]

['ISBN:9780439785969', 'ISBN:9780439358071', 'ISBN:9780439554930']

In [16]:
",".join([f"ISBN:{isbn}" for isbn in isbns])

'ISBN:9780439785969,ISBN:9780439358071,ISBN:9780439554930'

In [18]:
def fetch_books(isbns):
    # Define the URL and build bibkeys from ISBN
    url = "https://openlibrary.org/api/books"
    bibkeys = ",".join([f"ISBN:{isbn}" for isbn in isbns])
    # Define parameters for HTTP request
    params = {
        'bibkeys': bibkeys, 'format': 'json', 'jscmd': 'data'
    }
     # Perform request
    response = requests.get(url, params=params).json() 
    return response

In [19]:
books_df.set_index("isbn13", inplace=True)

In [20]:
books_df.head()

Unnamed: 0_level_0,title,authors,# num_pages,cover_url
isbn13,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
9780439785969,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,652,https://covers.openlibrary.org/b/id/9326654-L.jpg
9780439358071,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,870,https://covers.openlibrary.org/b/id/12025650-L...
9780439554930,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,320,https://covers.openlibrary.org/b/id/7572543-L.jpg
9780439554893,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,352,https://covers.openlibrary.org/b/id/10301720-L...
9780439655484,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,435,https://covers.openlibrary.org/b/id/8778528-L.jpg
