# CSV + API

In this reboot, we are going to use:

- The [Goodreads books](https://www.kaggle.com/jealousleopard/goodreadsbooks) dataset from Kaggle.
- The [Open Library Books API](https://openlibrary.org/dev/docs/api/books)

The goal of this livecode is to load the data from a CSV + loop over rows to enrich each row with information such as:

- List of subjects (Science, Humor, Travel, etc.)
- The cover URL of the book
- Other information you'd find useful in the JSON API

First, download the CSV in the local folder:

In [9]:
!curl -L https://gist.githubusercontent.com/ssaunier/351b17f5a7a009808b60aeacd1f4a036/raw/books.csv > books.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1509k  100 1509k    0     0  10.7M      0 --:--:-- --:--:-- --:--:-- 11.2M


In [10]:
!ls -lh

total 3048
-rw-r--r--  1 kleinyann  staff   579B Nov 29  2022 README.md
-rw-r--r--@ 1 kleinyann  staff   4.7K Oct 10 16:52 Recap.ipynb
-rw-r--r--@ 1 kleinyann  staff   1.5M Oct 10 17:01 books.csv


Then import the usual suspects!

In [11]:
import requests
import pandas as pd
import numpy as np

## Load books from CSV

Let's load the books, careful with the bad rows!

In [26]:
# YOUR CODE HERE quote
good_csv = pd.read_csv("books.csv")
good_csv.head(1)

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,439785960,9780439785969,eng,652,1944099,26249


Let's keep only the columns title, authors, insb13, #num_pages

In [27]:
# YOUR CODE HERE ( : is colon)
# good_csv = good_csv.loc[:,["title", "authors", "isbn13", "# num_pages"]]
# good_csv = good_csv[["title", "authors", "isbn13", "# num_pages"]]
good_csv = good_csv.drop(columns=["bookID", "isbn", "average_rating", "language_code", "text_reviews_count"])
good_csv.head(1)

Unnamed: 0,title,authors,isbn13,# num_pages,ratings_count
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,1944099


Let's add a new column "cover_url" (with None values in it)

In [37]:
# YOUR CODE HERE
good_csv["cover_url"] = None
good_csv.head(5)

Unnamed: 0,title,authors,isbn13,# num_pages,ratings_count,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,1944099,
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,1996446,
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,5629932,
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352,6267,
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435,2149872,


## API - Open Library

Create a function that returns a book info for a certain ISBN number

In [33]:
# YOUR CODE HERE
def get_book(isbn):
    url = "https://openlibrary.org/api/books"
    response = requests.get(
        url,
        params={
            "bibkeys": f"ISBN:{isbn}",
            "format": "json",
            "jscmd": "data"
        }
    ).json()
    # print(response[f"ISBN:{isbn}"])
    return response[f"ISBN:{isbn}"]

Test is with '0-7475-3269-9'

In [36]:
# YOUR CODE HERE
isbn = '0-7475-3269-9'
book_info = get_book(isbn)
book_info["cover"]["large"]

'https://covers.openlibrary.org/b/id/7355968-L.jpg'

Fetch and add the cover URL (and/or other infos) to our original book dataframe

In [47]:
# YOUR CODE HERE
# good_csv["cover_url"] = good_csv["isbn13"].apply(get_book)
for index, book_row in good_csv.head(10).iterrows():
    book_info = get_book(book_row["isbn13"])
    if "cover" in book_info:
        # print(book_info["cover"]["large"])
        good_csv.loc[index, "cover_url"] = book_info["cover"]["large"]
    else:
        good_csv.loc[index, "cover_url"] = "No url"
    # good_csv.loc[index, "cover_url"] = book_info.get("cover",{})["large"]
    
good_csv.head(3)

Unnamed: 0,title,authors,isbn13,# num_pages,ratings_count,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,1944099,https://covers.openlibrary.org/b/id/9326654-L.jpg
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,1996446,https://covers.openlibrary.org/b/id/12025650-L...
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,5629932,https://covers.openlibrary.org/b/id/7572543-L.jpg


## Calling the API with multiple ISBNs at a time

In [7]:
isbns = [9780439785969, 9780439358071, 9780439554930]

Use only one API call!

In [50]:
# YOUR CODE HERE
",".join([f"ISBN:{isbn}" for isbn in isbns])

def fetch_books(isbn):
    url = "https://openlibrary.org/api/books"
    bibkeys = ",".join([f"ISBN:{isbn}" for isbn in isbns])
        # Define parameters for HTTP request
    params = {
    'bibkeys': bibkeys, 'format': 'json', 'jscmd': 'data'
    }
        # Perform request
    response = requests.get(url, params=params).json() 
    return response

Set the ISBN13 column as an index

In [52]:
# YOUR CODE HERE
good_csv.set_index("isbn13", inplace=True)

In [53]:
good_csv

Unnamed: 0_level_0,title,authors,# num_pages,ratings_count,cover_url
isbn13,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
9780439785969,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,652,1944099,https://covers.openlibrary.org/b/id/9326654-L.jpg
9780439358071,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,870,1996446,https://covers.openlibrary.org/b/id/12025650-L...
9780439554930,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,320,5629932,https://covers.openlibrary.org/b/id/7572543-L.jpg
9780439554893,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,352,6267,https://covers.openlibrary.org/b/id/10301720-L...
9780439655484,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,435,2149872,https://covers.openlibrary.org/b/id/8778528-L.jpg
...,...,...,...,...,...
9780061186424,M Is for Magic,Neil Gaiman-Teddy Kristiansen,260,11317,
9780930289553,Black Orchid,Neil Gaiman-Dave McKean,160,8710,
9780061238963,InterWorld (InterWorld #1),Neil Gaiman-Michael Reaves,239,14334,
9780743201117,The Faeries' Oracle,Brian Froud-Jessica Macbeth,224,1550,
