## Mini Project 1

## Recommendation System

## 120 Points.

### Implementation

<font color="blue">**Task 1: Reading Data**</font>

1. <font color="red">[10 pts]</font> Write a function <font color="brown">read_ratings_data(f)</font> that takes in a ratings file name, and returns a dictionary. (Note: the parameter is a file name string such as "ratings.csv", NOT a file pointer.) The dictionary should have ISBN as key, and the list of all ratings for it as value.
For example:  book_ratings_dict = { '034545104X': [9, 8, 7], '0486282406': [10, 9, 8] }

In [30]:
import csv

def read_ratings_data(f):
    '''
    IN: f (str) - filename
    OUT: book_ratings_dict (dict{str: list[int]}) - dictionary of ratings
    '''

    # Set up dictionary to store ratings
    book_ratings_dict = {}

    # Set up csv reader
    with open(f, 'r') as csvfile:
        reader = csv.DictReader(csvfile)

        # Read in data
        for row in reader:
            isbn = row['ISBN']
            rating = int(row['Rating'])
            
            # Add rating to dictionary
            if isbn in book_ratings_dict:
                book_ratings_dict[isbn].append(rating)
            else:
                book_ratings_dict[isbn] = [rating]             

    # Return dictionary
    return book_ratings_dict

#-----

# Create a CSV with missing authors to induce a crash
malformed_authors_csv = """ISBN,Author
034545104X,
0486282406,Author 2
"""
with open('malformed_authors.csv', 'w') as f:
    f.write(malformed_authors_csv)

# Run the function and catch the crash
try:
    result = read_book_author('malformed_authors.csv')
    print(result)
except Exception as e:
    print("Crash with missing author data (read_book_author):", e)
#-----


Crash with missing author data (read_book_author): Missing author for ISBN: 034545104X


2. <font color="red">[10 pts]</font> Write a function <font color="brown">read_book_author(f)</font> that takes in a books.csv file name and returns a dictionary. The dictionary should have a one-to-one mapping from ISBN to author.
For example:   book_author_dict = { '0195153448': 'Mark P. O. Morford', '0373037430': 'Rebecca Winters' }

Note: Some books may have multiple authors. In this case, you can take the entire string as a macro author.

In [11]:
def read_book_author(f):
    '''
    IN: f (str) - filename
    OUT: book_author_dict (dict{str: str}) - dictionary of book authors
    '''

    # Set up dictionary to store authors
    book_author_dict = {}

    # Set up csv reader
    with open(f, 'r') as csvfile:
        reader = csv.DictReader(csvfile)

        # Read in data
        for row in reader:
            isbn = row['ISBN']
            author = row['Author']
            # Add author to dictionary
            book_author_dict[isbn] = author
        
        #-----
            if not author or author.strip() == "":
                raise ValueError(f"Missing author for ISBN: {isbn}")
        #-----
        
    # Return dictionary
    return book_author_dict


#-----
# Create a CSV with missing authors to induce a crash
malformed_authors_csv = """ISBN,Author
034545104X,
0486282406,Author 2
"""
with open('malformed_authors.csv', 'w') as f:
    f.write(malformed_authors_csv)

# Run the function and catch the crash
try:
    result = read_book_author('malformed_authors.csv')
    print(result)
except Exception as e:
    print("Crash with missing author data (read_book_author):", e)
#-----

Crash with missing author data (read_book_author): Missing author for ISBN: 034545104X


<font color="blue">**Task 2: Processing Data**</font>

1. <font color="red">[8 pts]</font> author  dictionary 

    Write a function<font color="brown"> create_author_dict</font> that takes as a parameter a book dictionary, of the kind created in Task 1.2. The function should return another dictionary in which a author is mapped to all the books in that author.

    For example:   { 'Author 1': ['034545104X', '0385333498'], 'Author 2': ['0142000663'] }

In [12]:
def create_author_dict(book_author_dict):
    '''
    IN: book_author_dict (dict{str: str}) - dictionary of book authors
    OUT: author_to_books_dict (dict{str: list[str]}) - dictionary of authors and their books
    '''

    # Set up dictionary to store authors and their books
    author_to_books_dict = {}

    # Populate dictionary
    for ISBN, author in book_author_dict.items():
        if author in author_to_books_dict:
            author_to_books_dict[author].append(ISBN)
        else:
            author_to_books_dict[author] = [ISBN]
        

    # Return dictionary
    return author_to_books_dict


#-----
# Pass None to the function to induce a crash
try:
    result = create_author_dict(None)
    print(result)
except Exception as e:
    print("Crash with None input (create_author_dict):", e)

# Pass an invalid dictionary structure
invalid_dict = [("034545104X", "Author 1")]  # This is a list, not a dict

try:
    result = create_author_dict(invalid_dict)
    print(result)
except Exception as e:
    print("Crash with invalid structure (create_author_dict):", e)
#-----

Crash with None input (create_author_dict): 'NoneType' object has no attribute 'items'
Crash with invalid structure (create_author_dict): 'list' object has no attribute 'items'


2. <font color="red">[8 pts]</font> Average Rating
    Write a function <font color="brown">calculate_average_rating</font> that takes as a parameter a ratings dictionary, of the kind created in Task 1.1. It should return a dictionary where the book ISBN is mapped to its average rating computed from the ratings list.

    For example:   {'034545104X': 4.0, '0375803482': 7.0 }

In [13]:
def calculate_average_rating(book_ratings_dict):
    '''
    IN: book_ratings_dict (dict{str: list[int]}) - dictionary of ratings
    OUT: book_to_average_dict (dict{str: float}) - dictionary of average ratings
    '''

    # Set up dictionary to store average ratings
    book_to_average_dict = {}

    # Calculate average rating for each book
    for ISBN, ratings in book_ratings_dict.items():
        if ratings: # Check if the ratings list is not empty
            average_rating = sum(ratings) / len(ratings)
            book_to_average_dict[ISBN] = round(average_rating, 2) # Rounds to 2 decimal places

    # Return dictionary
    return book_to_average_dict


#-----
# Pass a dictionary with non-integer ratings to induce a crash
invalid_ratings_dict = {
    '034545104X': [5, 'invalid', 4],  # Non-integer rating
    '0486282406': [None, 3, 2]        # None value in ratings
}

try:
    result = calculate_average_rating(invalid_ratings_dict)
    print(result)
except Exception as e:
    print("Crash with invalid ratings (calculate_average_rating):", e)
#-----

Crash with invalid ratings (calculate_average_rating): unsupported operand type(s) for +: 'int' and 'str'


<font color="blue">**Task 3: Recommendation**</font>

1. <font color="red">[10 pts]</font> Popularity based

    In services such as kindle and goodnotes, you often see recommendations with the heading “Popular Books or “Trending top 10”.

    Write a function <font color="brown">get_popular_books</font> that takes as parameters a dictionary of book-to-average rating ( as created in Task 2.2), and an integer n (default should be 10). The function should return a dictionary ( book:average, same structure as input dictionary) of top n books based on the average ratings. If there are fewer than n books, it should return all books in ranked order of average ratings from highest to lowest.



In [18]:
def get_popular_books(book_to_average_dict, n=10):
    '''
    IN: book_to_average_dict (dict{str: float}) - dictionary of average ratings
        n (int) - number of books to return
    OUT: popular_books_dict (dict{str: float}) - dictionary of top n books
    '''

    # Sort books by average rating
    sorted_books = sorted(book_to_average_dict.items(), key=lambda x: x[1], reverse=True)

    # Get the top n books or all books if there are fewer than n
    popular_books = sorted_books[:n]

    # Convert list of tuples back to dictionary
    popular_books_dict = dict(popular_books)

    # Return top n books
    return popular_books_dict


#-----
# Test Case: Passing a negative n to induce a crash
try:
    result = get_popular_books({'034545104X': 4.5, '0486282406': 3.0}, n=-10)
    print(result)
except Exception as e:
    print("Crash with negative n (get_popular_books):", e)
  
#-----

# Test Case: Passing non-numeric values in the dictionary
try:
    result = get_popular_books({'034545104X': 'not_a_number', '0486282406': 3.0})
    print(result)
except Exception as e:
    print("Crash with non-numeric values (get_popular_books):", e)
#----

{}
Crash with non-numeric values (get_popular_books): '<' not supported between instances of 'str' and 'float'


2. <font color="red">[10  pts]</font> Threshold Rating

    Write a function <font color="brown"> filter_books </font> that takes as parameters a dictionary of book-to-average rating (same as for the popularity based function above), and a threshold rating with default value of 3. The function should filter books  based on the threshold rating, and return a dictionary with same structure as the input. 
    For example, if the threshold rating is 3.5, the returned dictionary should have only those books from the input whose average rating is equal to or greater than 3.5.

In [19]:
def filter_books(book_to_average_dict, threshold=3.0):
    '''
    IN: book_to_average_dict (dict{str: float}) - dictionary of average ratings
        threshold (float) - minimum rating to keep
    OUT: filtered_books_dict (dict{str: float}) - dictionary of books above threshold
    '''

    # Filter books above threshold
    filtered_books_dict = {isbn: avg_rating for isbn, avg_rating in book_to_average_dict.items() if avg_rating >= threshold}
    # Return filtered books
    return filtered_books_dict


#-----
# Crash-inducing test case
# Pass a string instead of a numeric threshold
try:
    result = filter_books({'034545104X': 4.5, '0486282406': 3.0}, threshold="high")
    print(result)
except Exception as e:
    print("Crash with invalid threshold type (filter_books):", e)#----

Crash with invalid threshold type (filter_books): '>=' not supported between instances of 'float' and 'str'


3. <font color="red">[10 pts]</font> Popularity + Author Based - Edit

    In most recommendation systems, creator of the movie/song/book plays an important role. Often, features like popularity, author(creator) are combined to present recommendations to a user.

    Write a function <font color="brown">get_popular_by_author</font> that, given a author, a author-to-books  dictionary (as created in Task 2.1), a dictionary of book-to-average rating (as created in Task 2.2), and an integer n (default 5), returns the top n most popular books  in that author  based on the average ratings. The return value should be a dictionary of book-to-average rating of books that make the cut. If there are fewer than n books, it should return all books in ranked order of average ratings from highest to lowest.

    Note: some books in the `author_to_books_dict` dictionary may not appear in the `book_to_average_dict` dictionary. You should ignore such books.

In [20]:
def get_popular_by_author(author, author_to_books_dict, book_to_average_dict, n=5):
    '''
    IN: author (str) - author name
        author_to_books_dict (dict{str: list[str]}) - dictionary of authors and their books
        book_to_average_dict (dict{str: float}) - dictionary of average ratings
        n (int) - number of books to return
    OUT: popular_books_by_author_dict (dict{str: float}) - dictionary of top n books by author
    '''

    # Get books by author
    if author not in author_to_books_dict:
        return {} #Return empty dictionary if author not found
    
    author_books = author_to_books_dict[author]

    # Filter only those books that have average rating in book_to_average_dict
    books_with_ratings = {isbn: book_to_average_dict[isbn] for isbn in author_books if isbn in book_to_average_dict}

    # Sort books by average rating
    sorted_books = sorted(books_with_ratings.items(), key=lambda x: x[1], reverse=True)

    # Get the top n books or all books if there are fewer than n
    popular_books_by_author = sorted_books[:n]

    # Convert the list of tuples back to a dictionary
    popular_books_by_author_dict = dict(popular_books_by_author)

    # Return top n books
    return popular_books_by_author_dict

#----
try:
    result = get_user_top_author(9999, {1: [('034545104X', 5)]}, {'034545104X': 'Author 1'})
    print(result)
except Exception as e:
    print("Crash with non-existent user ID (get_user_top_author):", e)

# Pass a dictionary with no valid books in the book_author_dict
try:
    result = get_user_top_author(1, {1: [('9999999999', 5)]}, {'034545104X': 'Author 1'})
    print(result)
except Exception as e:
    print("Crash with no matching books (get_user_top_author):", e)
#------

Crash with non-existent user ID (get_user_top_author): name 'get_user_top_author' is not defined
Crash with no matching books (get_user_top_author): name 'get_user_top_author' is not defined


4. <font color="red">[10  pts]</font>  Author Rating 

    One important analysis for content platforms is to determine ratings by Author

    Write a function <font color="brown">get_author_rating</font> that takes the same parameters as <font color="brown">get_popular_by_author</font> above, except for n, and returns the average rating of the books in the given author.

In [21]:
def get_author_rating(author, author_to_books_dict, book_to_average_dict):
    '''
    IN: author (str) - author name
        author_to_books_dict (dict{str: list[str]}) - dictionary of authors and their books
        book_ratings_dict (dict{str: list[int]}) - dictionary of ratings
    OUT: author_rating (float) - average rating for author's books
    '''

    # Get books by author
    if author not in author_to_books_dict:
        return None # Return None if author not found
    
    author_books = author_to_books_dict[author]

    # Get average ratings for books by the author
    all_ratings = []
    for isbn in author_books:
        if isbn in book_to_average_dict: # Check if the book has an average rating
            all_ratings.append(book_to_average_dict[isbn])

    # If no ratings were found, return none
    if not all_ratings:
        return None

    # Return average rating
    author_rating = sum(all_ratings) / len(all_ratings)
    return round(author_rating, 2)


#-----
try:
    result = recommend_books(9999, {}, {'034545104X': 'Author 1'}, {'034545104X': 4.5})
    print(result)
except Exception as e:
    print("Crash with no user ratings (recommend_books):", e)

# Pass an empty book_author_dict to induce a crash
try:
    result = recommend_books(1, {1: [('034545104X', 5)]}, {}, {'034545104X': 4.5})
    print(result)
except Exception as e:
    print("Crash with empty author data (recommend_books):", e)
    #-----

Crash with no user ratings (recommend_books): name 'recommend_books' is not defined
Crash with empty author data (recommend_books): name 'recommend_books' is not defined


5. <font color="red">[10 pts]</font> Author Popularity - Edit

    Write a function <font color="brown">author_popularity </font> that takes as parameters a author-to-books  dictionary (as created in Task 2.1), a book-to-average  rating dictionary (as created in Task 2.2), and n (default 5), and returns the top-n rated authors  as a dictionary of author-to-average  rating. If there are fewer than n authors , it should return all authors in ranked order of average ratings from highest to lowest. 
    Hint: Use the above get_author_rating function as a helper.

In [24]:
def author_popularity(author_to_books_dict, book_to_average_dict, n=5):
    '''
    IN: author_to_books_dict (dict{str: list[str]}) - dictionary of authors and their books
        book_to_average_dict (dict{str: float}) - dictionary of average ratings
        n (int) - number of authors to return
    OUT: popular_authors_dict (dict{str: float}) - dictionary of top n authors
    '''

    # Calculate average rating for each author
    author_ratings = {}
    for author in author_to_books_dict:
        # Get the average rating for the author using get_author_rating
        avg_rating = get_author_rating(author, author_to_books_dict, book_to_average_dict)
        if avg_rating is not None:
            author_ratings[author] = avg_rating

    # Sort atuhors by average rating in descending order
    sorted_authors = sorted(author_ratings.items(), key=lambda x: x[1], reverse=True)

    # Get the top n authors or all authors if there are fewer than n 
    popular_authors = sorted_authors[:n]
    
    # Convert list of typles back to a dictionary
    popular_authors_dict = dict(popular_authors)

    # Return top n authors
    return popular_authors_dict


#-----


#-----

{'Author 1': 4.0}


<font color="blue">**Task 4: User Focused**</font>

1. <font color="red">[10 pts]</font> Read the ratings file to return a user-to-books dictionary that maps user ID to a list of the books they rated, along with the rating they gave. Write a function named <font color="brown">read_user_ratings</font> for this, with the ratings file as the parameter.
For example: { 1: [('034545104X', 5), ('0385333498', 4)], 2: [('0142000663', 3)] } 

In [25]:
def read_user_ratings(f):
    '''
    IN: f (str) - filename
    OUT: user_to_book_ratings_dict (dict{str: list[tuple(str, int)]}) - dictionary of user ratings
    '''

    # Set up dictionary to store user ratings
    user_to_book_ratings_dict = {}

    # Open the file and read its contents
    with open(f, 'r') as csvfile:
        reader = csv.DictReader(csvfile)

        # Loop through each row in the CSV file
        for row in reader:
            user_id = row['UserID']
            isbn = row['ISBN']
            rating = int(row['Rating'])

        # Add the rating to the user-to-book dictionary
        if user_id in user_to_book_ratings_dict:
            user_to_book_ratings_dict[user_id].append((isbn, rating))
        else:
            user_to_book_ratings_dict[user_id] = [(isbn, rating)]

    # Return the dictionary
    return user_to_book_ratings_dict


#----
# Crash-inducing test for read_user_ratings
invalid_user_ratings_csv = """UserID,ISBN,Rating
1,034545104X,5
2,INVALIDISBN,NotANumber
"""

with open('invalid_user_ratings.csv', 'w') as f:
    f.write(invalid_user_ratings_csv)

try:
    result = read_user_ratings('invalid_user_ratings.csv')
    print(result)
except Exception as e:
    print("Crash with malformed user ratings (read_user_ratings):", e)
#------

Crash with malformed user ratings (read_user_ratings): invalid literal for int() with base 10: 'NotANumber'


2. <font color="red">[10 pts]</font> Write a function <font color="brown"> get_user_top_author</font> that takes as parameters a userID, the user-to-books dictionary (as created in Task 4.1 above), and the book information dictionary (as created in Task 1.2), and returns the top author that the user likes based on the user's ratings. Here, the top author for the user will be determined by taking the average rating of the books author that the user has rated. If multiple author have the same highest ratings for the user, return any one of author (arbitrarily) as the top author.

Notes: 
- Some books in the `user_to_book_ratings_dict` dictionary may not appear in the `book_author_dict` dictionary. You should ignore such books. 
- If none of the books rated by the user are present in the `book_author_dict` dictionary, return `None`.

In [26]:
def get_user_top_author(user_ID, user_to_book_ratings_dict, book_author_dict):
    '''
    IN: userID (str) - user ID
        user_to_book_ratings_dict (dict{str: list[tuple(str, int)]}) - dictionary of user ratings
        book_author_dict (dict{str: str}) - dictionary of book authors
    OUT: top_author (str) - author with highest average ratings by user
    '''
    # Check if the user exists in the user_to_book_ratings_dict
    if user_ID not in user_to_book_ratings_dict:
        return None
    
    # Dictionary to syore total ratings and count of books for each author
    author_ratings = {}

    # Loop through each book and rating for the user
    for isbn, rating in user_to_book_ratings_dict[user_ID]:
        # Check if the book has an author in the book_author_dict
        if isbn in book_author_dict:
            author = book_author_dict[isbn]
            # Add the rating to the author's total
            if author in author_ratings:
                author_ratings[author]['total_rating'] += rating
                author_ratings[author]['book_count'] += 1
            else:
                author_ratings[author] = {'total_rating': rating, 'book_count': 1}
    
    # If no authors were found for the user's books, return None
    if not author_ratings:
        return None
    
    # Calculate the average rating for each authior and find the top author
    top_author = None
    highest_avg_rating = 0

    for author, data in author_ratings.items():
        avg_rating = data['total_rating'] / data['book_count']
        if avg_rating > highest_avg_rating:
            highest_avg_rating = avg_rating
            top_author = author

    return top_author



#-----
#----

None


3. <font color="red">[10 pts]</font> Recommend 3 most popular (highest average rating) books from the user's top author that the user has not yet rated. Write a function <font color="brown">recommend_books</font> for this, that takes as parameters a user id, the user-to-books dictionary (as created in Task 4.1 above), the author-to-books  dictionary (as created in Task 1.2), and the book-to-average  rating dictionary (as created in Task 2.2). The function should return a dictionary of book-to-average  rating. If fewer than 3  books  make the cut, then return all the  books that make the cut in ranked order of average ratings from highest to lowest.

In [27]:
RECOMMEND_NUM = 3
def recommend_books(user_ID, user_to_book_ratings_dict, book_author_dict, book_to_average_dict):
    '''
    IN: userID (str) - user ID
        user_to_book_ratings_dict (dict{str: list[tuple(str, int)]}) - dictionary of user ratings
        book_author_dict (dict{str: str}) - dictionary of book authors
        book_to_average_dict (dict{str: float}) - dictionary of average ratings
    OUT: recommended_books_dict (dict{str: float}) - dictionary of recommended books
    '''
    # Get the user's top author
    top_author = get_user_top_author(user_ID, user_to_book_ratings_dict, book_author_dict)

    # If there is no top author, return an empty dictionary
    if top_author is None:
        return {}
    
    # Get the list of books by the top author
    author_books = [isbn for isbn, author in book_author_dict.items() if author == top_author]

    # Get the books the user has already rates
    rated_books = {isbn for isbn in author_books if isbn not in rated_books and isbn in book_to_average_dict}

    # Filter books by the top author that the user has not yet rated
    unrated_books = [isbn for isbn in author_books if isbn not in rated_books and isbn in book_to_average_dict]

    # Sort the unrated books by their average ratings in descending order
    sorted_unrated_books = sorted(unrated_books, key=lambda isbn: book_to_average_dict[isbn], reverse=True)

    # Get the top Recommend-NUM books (or fewer if there are not enough)
    top_books = sorted_unrated_books[:RECOMMEND_NUM]

    # Create a dictionary of recommended books with their average ratings
    recommended_books_dict = {isbn: book_to_average_dict[isbn] for isbn in top_books}

    return recommended_books_dict


# Crash-inducing test for recommend_books
user_ratings_dict = {
    1: [('034545104X', 5)],
    2: [('InvalidISBN', 3)]
}

book_author_dict = {
    '034545104X': 'Author 1',
    'InvalidISBN': 'Author 2'
}

book_avg_dict = {
    '034545104X': 4.5,
    # Missing average rating for 'InvalidISBN'
}

try:
    result = recommend_books(2, user_ratings_dict, book_author_dict, book_avg_dict)
    print(result)
except Exception as e:
    print("Crash with missing book average rating (recommend_books):", e)


Crash with missing book average rating (recommend_books): cannot access local variable 'rated_books' where it is not associated with a value
