# CSV + API

In this reboot, we are going to use:

- The [Goodreads books](https://www.kaggle.com/jealousleopard/goodreadsbooks) dataset from Kaggle.
- The [Open Library Books API](https://openlibrary.org/dev/docs/api/books)

The goal of this livecode is to load the data from a CSV + loop over rows to enrich each row with information such as:

- List of subjects (Science, Humor, Travel, etc.)
- The cover URL of the book
- Other information you'd find useful in the JSON API

First, download the CSV in the local folder:

In [None]:
!curl -L https://gist.githubusercontent.com/ssaunier/351b17f5a7a009808b60aeacd1f4a036/raw/books.csv > books.csv

In [1]:
!ls

README.md   Recap.ipynb books.csv


Then import the usual suspects!

In [71]:
import requests
import pandas as pd
import numpy as np

## Load books from CSV

In [72]:
books_df = pd.read_csv('books.csv')
books_df.head()

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,0439785960,9780439785969,eng,652,1944099,26249
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,0439358078,9780439358071,eng,870,1996446,27613
2,3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,0439554934,9780439554930,eng,320,5629932,70390
3,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,0439554896,9780439554893,eng,352,6267,272
4,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,043965548X,9780439655484,eng,435,2149872,33964


Let's keep only the columns title, authors, isbn13, #num_pages

In [73]:
books_df.loc[:, ['title', 'authors', 'isbn13', '# num_pages']]
books_df.head()

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,4.56,0439785960,9780439785969,eng,652,1944099,26249
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,4.49,0439358078,9780439358071,eng,870,1996446,27613
2,3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,4.47,0439554934,9780439554930,eng,320,5629932,70390
3,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,0439554896,9780439554893,eng,352,6267,272
4,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,4.55,043965548X,9780439655484,eng,435,2149872,33964


In [74]:
books_df = books_df[['title', 'authors', 'isbn13', '# num_pages']]
books_df.head()

Unnamed: 0,title,authors,isbn13,# num_pages
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435


In [75]:
books_df = books_df.drop(['bookID', 'average_rating', 'isbn', 'ratings_count', 
                          'text_reviews_count'], axis = 1)
books_df.head()

KeyError: "['bookID' 'average_rating' 'isbn' 'ratings_count' 'text_reviews_count'] not found in axis"

Let's add a new column "cover_url" (with None values in it)

In [76]:
books_df["cover_url"] = None
books_df.head()

Unnamed: 0,title,authors,isbn13,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352,
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435,


In [77]:
books_df["cover_url"] = pd.Series(None)

books_df.head(5)

  books_df["cover_url"] = pd.Series(None)


Unnamed: 0,title,authors,isbn13,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352,
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435,


## API - Open Library

Create a function that returns a book info for a certain ISBN number (https://openlibrary.org/dev/docs/api/books)

Test is with '0-7475-3269-9'

In [78]:
# YOUR CODE HERE
isbn = '0-7475-3269-9'

In [79]:
def find_book_info(isbn):
    key=f'ISBN:{isbn}'
    response=requests.get('https://openlibrary.org/api/books',
                         params={'bibkeys':key, 'format':'json',
                                 'jscmd':'data'}).json()
    try:
        return response[key]['cover']['medium']
    except:
        return None

Fetch and add the cover URL (and/or other infos) to our original book dataframe

In [80]:
find_book_info(isbn)

'https://covers.openlibrary.org/b/id/7355968-M.jpg'

In [81]:
for index, book in books_df.head(2).iterrows():
    cover=find_book_info(book['isbn13'])
    books_df.loc[index, 'cover_url']=cover

In [82]:
books_df.head()

Unnamed: 0,title,authors,isbn13,# num_pages,cover_url
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,https://covers.openlibrary.org/b/id/9326654-M.jpg
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,https://covers.openlibrary.org/b/id/12025650-M...
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352,
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435,


In [83]:
books_df.loc[:2, 'cover_url2'] = books_df['isbn13'].head(2).apply(find_book_info)

In [84]:
books_df.head()

Unnamed: 0,title,authors,isbn13,# num_pages,cover_url,cover_url2
0,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,9780439785969,652,https://covers.openlibrary.org/b/id/9326654-M.jpg,https://covers.openlibrary.org/b/id/9326654-M.jpg
1,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,9780439358071,870,https://covers.openlibrary.org/b/id/12025650-M...,https://covers.openlibrary.org/b/id/12025650-M...
2,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,9780439554930,320,,
3,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,9780439554893,352,,
4,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,9780439655484,435,,


## Calling the API with multiple ISBNs at a time

In [85]:
# https://openlibrary.org/api/books?bibkeys=ISBN:9780439785969,ISBN:9780439358071,ISBN:9780439554930
isbns = [9780439785969, 9780439358071, 9780439554930] 

Use only one API call!

In [86]:
# YOUR CODE HERE
def find_book_info(isbns):
    # keys = []
    # for isbn in isbns:
    #     keys.append(f'ISBN:{isbn}')
    try:
        keys = [ f'ISBN:{isbn}' for isbn in isbns ]

        key_string = ','.join(keys)
        response=requests.get('https://openlibrary.org/api/books',
                             params={'bibkeys':key_string, 'format':'json',
                                     'jscmd':'data'}).json()
        # result = []
        # for key in keys:
        #     result.append(response[key]['cover']['medium'])

        result = [ response[key]['cover']['medium'] for key in keys ]
        return result
    except:
        return None

In [87]:
print(find_book_info(isbns))

['https://covers.openlibrary.org/b/id/9326654-M.jpg', 'https://covers.openlibrary.org/b/id/12025650-M.jpg', 'https://covers.openlibrary.org/b/id/7572543-M.jpg']


Set the ISBN13 column as an index

In [88]:
# YOUR CODE HERE
books_df = books_df.set_index('isbn13')

In [90]:
books_df.head()

Unnamed: 0_level_0,title,authors,# num_pages,cover_url,cover_url2
isbn13,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
9780439785969,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling-Mary GrandPré,652,https://covers.openlibrary.org/b/id/9326654-M.jpg,https://covers.openlibrary.org/b/id/9326654-M.jpg
9780439358071,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling-Mary GrandPré,870,https://covers.openlibrary.org/b/id/12025650-M...,https://covers.openlibrary.org/b/id/12025650-M...
9780439554930,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling-Mary GrandPré,320,,
9780439554893,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,352,,
9780439655484,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling-Mary GrandPré,435,,
