# CSV + API

In this reboot, we are going to use:

- The [Goodreads books](https://www.kaggle.com/jealousleopard/goodreadsbooks) dataset from Kaggle.
- The [Open Library Books API](https://openlibrary.org/dev/docs/api/books)

The goal of this livecode is to load the data from a CSV + loop over rows to enrich each row with information such as:

- List of subjects (Science, Humor, Travel, etc.)
- The cover URL of the book
- Other information you'd find useful in the JSON API

First, download the CSV in the local folder:

In [1]:
!curl -L https://gist.githubusercontent.com/ssaunier/351b17f5a7a009808b60aeacd1f4a036/raw/books.csv > books.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1509k  100 1509k    0     0  16.4M      0 --:--:-- --:--:-- --:--:-- 17.1M


In [2]:
!ls -lh

total 4272
-rw-r--r--  1 kleinyann  staff   579B Jan 10 00:52 README.md
-rw-r--r--@ 1 kleinyann  staff    17K Jan 16 18:03 Recap.ipynb
-rw-r--r--@ 1 kleinyann  staff   1.5M Jan 16 18:03 books.csv


Then import the usual suspects!

In [3]:
import requests
import pandas as pd
import numpy as np

## Load books from CSV

In [23]:
books_df = pd.read_csv('books.csv')
# books_df = pd.read_csv('books.csv', usecols=['title', 'authors', 'isbn13', '# num_pages'])
books_df.head(3)

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,439785960,9780439785969,eng,652,1944099,26249
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,439358078,9780439358071,eng,870,1996446,27613
2,3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,439554934,9780439554930,eng,320,5629932,70390


Let's keep only the columns title, authors, insb13, #num_pages

In [24]:
books_df = books_df[['title','authors','isbn13','# num_pages']]
# books_df = books_df.loc[:,["title", "authors", "isbn13", "# num_pages"]]
# books_df = books_df.drop(["bookID", "isbn", "average_rating", "language_code", "text_reviews_count"])
# TO BE CHECKED ~books_df["bookID", "isbn", "average_rating", "language_code", "text_reviews_count"]
books_df.head(3)

Unnamed: 0,title,authors,isbn13,# num_pages
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320


Let's add a new column "cover_url" (with None values in it)

In [25]:
books_df.loc[:,'cover_url'] = None
# books_df['cover_url'] = None
books_df.head(3)

Unnamed: 0,title,authors,isbn13,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,


## API - Open Library

Create a function that returns a book info for a certain ISBN number (https://openlibrary.org/dev/docs/api/books)

In [26]:
import requests

Test is with '0-7475-3269-9'

In [27]:
# YOUR CODE HERE
isbn = '0-7475-3269-9'

In [28]:
# url = f'https://openlibrary.org/api/books?bibkeys=ISBN:{isbn}&jscmd=data&format=json'
# response = requests.get(url).json()
# response["ISBN:0-7475-3269-9"]["cover"]["medium"]

Fetch and add the cover URL (and/or other infos) to our original book dataframe

In [29]:
def fetch_books(isbn):
    try:
        url = f'https://openlibrary.org/api/books'
        response = requests.get(url, params={
            'bibkeys': f"ISBN:{isbn}",
            'jscmd': 'data',
            'format': 'json'
        }).json()
        book_data = response[f"ISBN:{isbn}"]["cover"]["medium"]
        return book_data
    except:
        return ""

In [30]:
fetch_books("0-7475-3269-9")


'https://covers.openlibrary.org/b/id/7355968-M.jpg'

In [31]:
books_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13719 entries, 0 to 13718
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   title        13719 non-null  object
 1   authors      13719 non-null  object
 2   isbn13       13719 non-null  int64 
 3   # num_pages  13719 non-null  int64 
 4   cover_url    0 non-null      object
dtypes: int64(2), object(3)
memory usage: 536.0+ KB


In [32]:
# for index, row in books_df.head(10).iterrows():
#     url_data = fetch_books(row["isbn13"])
#     books_df.loc[index,["cover_url"]] = url_data
#     print(url_data)

books_df.loc[:10, "cover_url"] = books_df.loc[:10,"isbn13"].apply(fetch_books)

In [33]:
books_df

Unnamed: 0,title,authors,isbn13,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,https://covers.openlibrary.org/b/id/9326654-M.jpg
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,https://covers.openlibrary.org/b/id/12025650-M...
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,https://covers.openlibrary.org/b/id/7572543-M.jpg
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352,https://covers.openlibrary.org/b/id/10301720-M...
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435,https://covers.openlibrary.org/b/id/8778528-M.jpg
...,...,...,...,...,...
13714,M Is for Magic,Neil Gaiman-Teddy Kristiansen,9780061186424,260,
13715,Black Orchid,Neil Gaiman-Dave McKean,9780930289553,160,
13716,InterWorld (InterWorld #1),Neil Gaiman-Michael Reaves,9780061238963,239,
13717,The Faeries' Oracle,Brian Froud-Jessica Macbeth,9780743201117,224,


## Calling the API with multiple ISBNs at a time

In [34]:
isbns = [9780439785969, 9780439358071, 9780439554930] 
[f"ISBN:{isbn}" for isbn in isbns]

['ISBN:9780439785969', 'ISBN:9780439358071', 'ISBN:9780439554930']

Use only one API call!

In [35]:
# YOUR CODE HERE
",".join([f"ISBN:{isbn}" for isbn in isbns])

'ISBN:9780439785969,ISBN:9780439358071,ISBN:9780439554930'

Set the ISBN13 column as an index

In [36]:
# YOUR CODE HERE
def fetch_books(isbns):
    # Define the URL and build bibkeys from ISBN
    url = "https://openlibrary.org/api/books"
    bibkeys = ",".join([f"ISBN:{isbn}" for isbn in isbns])
    # Define parameters for HTTP request
    params = {
        'bibkeys': bibkeys, 'format': 'json', 'jscmd': 'data'
    }
     # Perform request
    response = requests.get(url, params=params).json() 
    return response

In [37]:
print(fetch_books(isbns))

{'ISBN:9780439785969': {'url': 'https://openlibrary.org/books/OL24280830M/Harry_Potter_and_the_Half-Blood_Prince', 'key': '/books/OL24280830M', 'title': 'Harry Potter and the Half-Blood Prince', 'authors': [{'url': 'https://openlibrary.org/authors/OL23919A/J._K._Rowling', 'name': 'J. K. Rowling'}], 'identifiers': {'amazon': ['0439785960'], 'goodreads': ['53178655'], 'isbn_10': ['0439785960'], 'isbn_13': ['9780439785969'], 'oclc': ['70666878', '819153929'], 'openlibrary': ['OL24280830M']}, 'classifications': {'lc_classifications': ['PZ7.R79835Halc 2005']}, 'publishers': [{'name': 'Scholastic'}], 'publish_places': [{'name': 'New York, USA'}], 'publish_date': '2006-09', 'subjects': [{'name': 'orphans', 'url': 'https://openlibrary.org/subjects/orphans'}, {'name': 'foster homes', 'url': 'https://openlibrary.org/subjects/foster_homes'}, {'name': 'romans', 'url': 'https://openlibrary.org/subjects/romans'}, {'name': 'magie', 'url': 'https://openlibrary.org/subjects/magie'}, {'name': 'adolescen

In [None]:
books_df.set_index("isbn13", inplace=True)

In [None]:
books_df.head()