## Scenario: Sorting and Searching in Python

Let's imagine we're looking for a specific book in a large library.  The endless rows of shelves can be overwhelming, and finding that one title can feel like searching for a needle in a haystack.

We'll learn how to use Python to make searching a digital library much easier and more efficient. We'll explore sorting algorithms, which organize book data in a logical way, just like arranging books on a shelf. Then, we'll explore search algorithms - these are like having your own digital librarian to help you quickly find the exact book you're looking for.

In [1]:
import pandas as pd

book_catalog_df = pd.read_csv('book_catalog_10.csv')

book_catalog = book_catalog_df.to_dict(orient='records')

print(book_catalog)

[{'title': '1984', 'author': 'George Orwell', 'publication_year': 1949}, {'title': 'Pride and Prejudice', 'author': 'Jane Austen', 'publication_year': 1813}, {'title': 'The Catcher in the Rye', 'author': 'J.D. Salinger', 'publication_year': 1951}, {'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'publication_year': 1925}, {'title': 'To Kill a Mockingbird', 'author': 'Harper Lee', 'publication_year': 1960}, {'title': 'The Lord of the Rings', 'author': 'J.R.R. Tolkien', 'publication_year': 1954}, {'title': "Harry Potter and the Philosopher's Stone", 'author': 'J.K. Rowling', 'publication_year': 1997}, {'title': "The Hitchhiker's Guide to the Galaxy", 'author': 'Douglas Adams', 'publication_year': 1979}, {'title': 'The Da Vinci Code', 'author': 'Dan Brown', 'publication_year': 2003}, {'title': 'The Hunger Games', 'author': 'Suzanne Collins', 'publication_year': 2008}]


In [3]:
def get_title(book):
    return book['title']

def sort_catalog_by_title(catalog):
    catalog.sort(key=get_title)

sort_catalog_by_title(book_catalog)

for book in book_catalog:
    print(f"Title: {book['title']}, Author: {book['author']}, Publication Year: {book['publication_year']}")
    

Title: 1984, Author: George Orwell, Publication Year: 1949
Title: Harry Potter and the Philosopher's Stone, Author: J.K. Rowling, Publication Year: 1997
Title: Pride and Prejudice, Author: Jane Austen, Publication Year: 1813
Title: The Catcher in the Rye, Author: J.D. Salinger, Publication Year: 1951
Title: The Da Vinci Code, Author: Dan Brown, Publication Year: 2003
Title: The Great Gatsby, Author: F. Scott Fitzgerald, Publication Year: 1925
Title: The Hitchhiker's Guide to the Galaxy, Author: Douglas Adams, Publication Year: 1979
Title: The Hunger Games, Author: Suzanne Collins, Publication Year: 2008
Title: The Lord of the Rings, Author: J.R.R. Tolkien, Publication Year: 1954
Title: To Kill a Mockingbird, Author: Harper Lee, Publication Year: 1960


In [5]:
def search_books(catalog, query):
    results = []
    for book in catalog:
        if query.lower() in book['title'].lower() or query.lower() in book['author'].lower():
            results.append(book)
    return results

In [7]:
query = "The Great Gatsby"
search_results = search_books(book_catalog, query)

if search_results:
    print("Search results:")
    for book in search_results:
        print(f"Title {book['title']}, Author: {book['author']}, Publication Year: {book['publication_year']}")
else:
    print("No books found matching your query.")

Search results:
Title The Great Gatsby, Author: F. Scott Fitzgerald, Publication Year: 1925


In [9]:
query = "Dune"
search_results = search_books(book_catalog, query)

if search_results:
    print("Search results:")
    for book in search_results:
        print(f"Title {book['title']}, Author: {book['author']}, Publication Year: {book['publication_year']}")
else:
    print("No books found matching your query.")

No books found matching your query.


In [11]:
query = "J.K. Rowling"
search_results = search_books(book_catalog, query)

if search_results:
    print("Search results:")
    for book in search_results:
        print(f"Title {book['title']}, Author: {book['author']}, Publication Year: {book['publication_year']}")
else:
    print("No books found matching your query.")

Search results:
Title Harry Potter and the Philosopher's Stone, Author: J.K. Rowling, Publication Year: 1997


In [21]:
def binary_search_books(catalog, query):
    low = 0
    high = len(catalog) - 1

    while low <= high:
        mid = (low + high) // 2
        if catalog[mid]['title'].lower() == query.lower():
            return catalog[mid]
        elif catalog[mid]['title'].lower() < query.lower():
            low = mid + 1
        else:
            high = mid -1

    return None

In [17]:
import time

big_book_catalog_df = pd.read_csv('big_book_catalog.csv', low_memory=False)

big_book_catalog_df['title'] = big_book_catalog_df['title'].fillna('').astype(str)
big_book_catalog_df['author'] = big_book_catalog_df['author'].fillna('').astype(str)

sorted_df = big_book_catalog_df.sort_values(by=['title'])
big_book_catalog = sorted_df.to_dict(orient='records')

query = "The Great Gatsby"

start_time = time.time()

search_results = search_books(big_book_catalog, query)

end_time = time.time()

elapsed_time_linear = end_time - start_time

print(f"Linear search took {elapsed_time_linear:.5f} seconds.")

if search_results:
    print("Search results:")
    for book in search_results:
        print(f"Title: {book['title']}, Author: {book['author']}, Publication Year: {book['publication_year']}")
else:
    print("No books found matching your query.")
    

Linear search took 0.07834 seconds.
Search results:
Title: Der Grobe Gatsby/the Great Gatsby, Author: F. Scott Fitzgerald, Publication Year: 1994
Title: F. Scott Fitzgerald's The Great Gatsby: A Literary Reference, Author: Matthew Bruccoli, Publication Year: 2002
Title: F. Scott Fitzgerald's the Great Gatsby, Author: F. Scott Fitzgerald, Publication Year: 1976
Title: Fitzgerald's The Great Gatsby (Cliffs Notes), Author: Cliffs Notes, Publication Year: 2000
Title: Fitzgerald's The Great Gatsby (Cliffs Notes), Author: P. Northman, Publication Year: 1976
Title: Notes on The Great Gatsby: Notes (York Notes), Author: Tang Soo Ping, Publication Year: 1980
Title: The GREAT GATSBY (A Scribner Classic), Author: F. Scott Fitzgerald, Publication Year: 1992
Title: The GREAT GATSBY (A Scribner Classic), Author: F. Scott Fitzgerald, Publication Year: 1920
Title: The GREAT GATSBY (Great Gatsby Hre), Author: F. Scott Fitzgerald, Publication Year: 1981
Title: The GREAT GATSBY (Scribner Classic), Author

In [25]:
import time

big_book_catalog_df = pd.read_csv('big_book_catalog.csv', low_memory=False)

big_book_catalog_df['title'] = big_book_catalog_df['title'].fillna('').astype(str)
big_book_catalog_df['author'] = big_book_catalog_df['author'].fillna('').astype(str)

sorted_df = big_book_catalog_df.sort_values(by=['title'])
big_book_catalog = sorted_df.to_dict(orient='records')

query = "The Great Gatsby"

start_time = time.time()

search_results = binary_search_books(big_book_catalog, query)

end_time = time.time()

elapsed_time_binary = end_time - start_time

print(f"Binary search took {elapsed_time_binary:.5f} seconds.")

if search_results:
    print("Binary Search Result for The Great Gatsby:")
    print(f"Title: {search_results['title']}, Author: {search_results['author']}, Publication Year: {search_results['publication_year']}")
else:
    print("Binary Search: Book not found.")

Binary search took 0.00010 seconds.
Binary Search Result for The Great Gatsby:
Title: The Great Gatsby, Author: F. Scott Fitzgerald, Publication Year: 1995
