# **CSC-40076: Summative Assessment 1 - Programming Task to Assess Basic Design and Programming Skills 1**
### Assignment 1

Completed by: **Ajantha Wirasinghe**

Keele Student number: **24027813**

date: 31/03/2024

-----------

**The Problem**
A small college lending library run by volunteers has donated its books to the case study library. They utilised a simple but effective spreadsheet-based cataloguing and loan system. Their books and data files are thus available for some experimental work. We have been given 3 files, books.csv,
bookloans.csv, and membership.csv. We are ignoring membership data for this assignment. The library has a standard loan period of 14 days. A borrowed book returned after 19 days is 5 days late.
The text file **books.csv** contains a subset list of their books in CSV format. It has a header row. The CSV files are encoded as UTF-8. You may assume it is free from errors i.e., no records are flawed.

The text file **bookloans.csv** contains data on borrowed books in CSV format. It does not have a header row. Be warned that this file has errors. Each row holds a book_number, member_number, date_of_loan and date_of_return separated by commas. Only transactions of books borrowed from our library and returned during the year of the transaction are valid. The order of the records is from the transaction of a member borrowing a book, so it is in the order of the date of the loan. Assumethat all the books borrowed and returned were in 2023.were in 2023.

##### **The year of the transaction is 2023.**

Some observations already made are:
- Each book has only a single copy for a loan, so if it is on loan, it cannot be borrowed. In other words, it can only be borrowed if it is on the shelf.CSC-40076/10/2023
- A date_of_return of 0 (zero) means that the book has not been returned yet so is not on the shelf and cannot be borrowed.
- All loans are returned within the year or not returned at all.
- The file contains thousands of valid loan transactions but many invalid ones.
- The date_of_loan is a single float number representing the date of the loan in Microsoft Excel Epoch Format that is stored for our processing as an integer or a string that evaluates to an integer. We are ignoring fractions of a day, so integers are fine. It represents the number of days elapsed since 1/1/1900.
- In the year 2023, the 1st of January 2023 had an Excel epoch date of 44927 and the 31st of December 2023 had an Excel epoch date 45291.

  
There is a separate notebook containing functions that may be useful that relate to Excel epoch dates and the more usual ‘YYYY-MM-DD’ format. It is possible to do the tasks without the ‘YYYY-MM-DD’ format, but you may wish to convert dates for your own convenience in understanding the data.

Statistics received from the library tell us some facts about the files provided by them. These are listed
below for your information.

**books.csv**
- the highest book number in the collection is 120.
- the highest member ID number of a borrower is 200.
  
**bookloans.csv**
- The number of invalid loans (not returned) is 51.
- The maximum days borrowed: 20.
- The minimum days borrowed and returned: 1.
- The lowest book number borrowed: 1.
- The highest book number borrowed: 140.
- The lowest member number of borrowers: 1.
- The highest member id number of a borrower: 200.

  
We need to compute the lost data from the files

## <span style='color:blue'>**Task 1**</span>
--------------

This task requires reading books.csv and bookloans.csv into nested lists or dictionaries and ensuring you have a list of valid book loans. It requires processing to produce a printed report. You need to consider the validity of the data used in your report or your statistics will be wrong.

A report of **books borrowed in 2023** is required listing the returned book number, book title, author, and the number of days borrowed. This should be a valid list excluding the false books and nonreturned borrows. We want to identify and show the least popular and most popular books.

What is the best measure for the popularity of a book in a lending library, the number of times it is borrowed in a year, or the number of days for which it was borrowed in a year?

That is a design issue.

- **I.** The number of times a book is borrowed in 2023 provides a measure of its overall popularity and demand among library members. This measure can determine which books:
  
   - a. we purchase additional copies of,
   - b. we feature in promotions or displays, and
   - c. we recommend to members looking for recommendations.

- **II.** The number of days for which a book is borrowed in a year provides a measure of its sustained popularity and engagement among library members. This measure can be useful in determining which books to keep in the library's collection over time, as books that are consistently checked out for longer periods may be more valuable to the library's members.

To measure the number of times a book is borrowed, you can simply count the number of times each book appears on the loan list in 2023. This measure is relatively easy to obtain as it only requires tallying the number of times each book is checked out for a loan.

To measure the number of days for which a book is borrowed, you need to consult the record of the dates on which each book is checked out and returned, and then calculate the number of days between those dates, and the period of the loan. This measure requires arguably more effort as it involves tracking individual loans and calculating the length of time for each loan. The file bookloans.csv is a transaction file in order of the date a book is borrowed. We wish you to process that file to compute the **number of days for which each book was borrowed.***

Popularity is measured by the **number of days for which each book was** borrowed in 2023.

You may disagree with this. That is okay. You are welcome to argue your countercase under the quality section of the assessment. But for this task the number of days for which each book is borrowed

*Note: If you are doing a reassessment, this measure may be different from the presentation that you studied and were assessed for this assignment. This measure supersedes the previous requirement.*

We require the listing of books sorted by **reverse order** of their popularity.

Write code to accomplish this. Use functions where appropriate and comment on your code often to
show your assessor what you are doing and why.


-----------------------------

## **Solution:**
To start with I am validating the provided statistics:

-------------------


**books.csv**
- the highest book number in the collection is 120.
- the highest member ID number of a borrower is 200.

**bookloans.csv**
- The number of invalid loans (not returned) is 51.
- The maximum days borrowed: 20.
- The minimum days borrowed and returned: 1.
- The lowest book number borrowed: 1.
- The highest book number borrowed: 140.
- The lowest member number of borrowers: 1.
- The highest member id number of a borrower: 200.



----------------------------
### **Important**:
**I have opted not using the formal structure of Python coding of declaring functions first on the top of the body and then calling them in the main body. Instead, I have used a function or functions for each operation or task and then called them immediately to generate the required results. This is an educational assignment and will allow me to choose one task at a time and solve it in this chosen structure.**

----------------


- all the below functions with cross-validation for provided statistics, provide neccessry evidence of function testing, connecting CSV files for analysis and data check and cleaning.
- In below functions of conversion of excel dates formats and then printing lists and dicts I have performed exception handling.
- All the fuctions are included with necessary docstrings and comments 

_______________
#### **Reading csv files**
_____________


In [1]:
import csv

# Step 1: Read the data from books.csv and bookloans.csv files
import csv

def read_books(filename):
    """
    Reads the books.csv file and creates a dictionary with book IDs as keys and book details as values.
    Args:
        filename (str): The filename of the books.cs stored in the same location as the working directory file.
    Returns:
        dict: A dictionary containing book details.
    """
    book_data = {}
    with open(filename, 'r', encoding='utf-8-sig') as file:
        reader = csv.DictReader(file)
        for row in reader:
            book_id = row['Number']
            book_title = row['Title']
            book_author = row['Author']
            book_genre = row['Genre']
            book_subgenre = row['SubGenre']
            book_publisher = row['Publisher']
            book_data[book_id] = {'title': book_title, 'author': book_author, 'genre': book_genre, 'subgenre': book_subgenre, 'publisher': book_publisher}
    return book_data



import csv

def read_loans(filename):
    """
    Reads the bookloans.csv file and creates a list of dictionaries with loan details.
    Args:
        filename (str): The filename of  stored in the same location as the working directoryhe bookloans.csv file.
    Returns:
        list: A list of dictionaries containing loan details.
    """
    loan_data = []
    with open(filename, 'r', encoding='utf-8-sig') as file:
        reader = csv.reader(file)
        for row in reader:
            book_number, member_number, date_of_loan, date_of_return = row
            loan_data.append({'book_number': book_number, 'member_number': member_number,
                              'date_of_loan': int(date_of_loan), 'date_of_return': int(date_of_return)})
    return loan_data



In [2]:
    book_data = read_books('books.csv')
    loan_data = read_loans('bookloans.csv')

_________________

**Statistics received from the library tell us some facts about the files provided by them. We are going to calculate them to verify the results:**

_______________________

#### **books.csv**

____________
### **1. The higest book number:**

The highest book number of the collection is the length of the data set (after excluding the header) because 'Number' is the primary key and there cannot be any gaps or missing numbers for the integrity of the dataset. 1 is the first valid ID.

The library says that the *books* dataset has no errors. So I assume that the highest number of recordings are the highest number of book collections. We can use len() function to calculate it.

In [3]:
print("The highest book number in the collection is: {}".format(len(book_data)))

The highest book number in the collection is: 120


_________________
### **2. The highest member ID number of a borrower:**

The highest number (***Note** in the task this part is given under books.csv, but member details are only in bookloans.csv dataset. books.csv contains only data related to books.*)

In [4]:
def highest_member_id(loan_data):
    """
    Finds the highest member ID of borrowers in the loan data.
    Args:
        loan_data (list): A list of dictionaries containing loan details.
    Returns:
        int: The highest member ID of borrowers.
    """
    max_member_id = 0  # Initialize with 0, as member IDs are positive integers
    for loan in loan_data:
        member_id = int(loan['member_number'])
        if member_id > max_member_id:
            max_member_id = member_id
    return max_member_id


highest_id = highest_member_id(loan_data)
print("The highest member ID of borrowers is:", highest_id)


The highest member ID of borrowers is: 200


__________________


#### **Statistics provided by the Library** *bookloans.csv*
- 1. The number of invalid loans (not returned) is 51.
- 2. The maximum days borrowed: 20.
- 3. The minimum days borrowed and returned: 1.
- 4. The lowest book number borrowed: 1.
- 5. The highest book number borrowed: 140.
- 6. The lowest member number of borrowers: 1.
- 7. The highest member id number of a borrower: 200.

_____________________

In the following steps, I will cross-validate if those statistics are true. Thereafter we can safely assume the datasets are correctly processed and also an opportunity to test my functions.

### **1. The number of invalid loans (not returned)**

we will split the dataset into valid data and invalid data by using the criteria "Only transactions of books borrowed from our library and returned during the year of the transaction are valid." While iterating each record at the same time I can store them in to separate data Lists.

In [5]:
# Filter out invalid loans (not returned during the year 2023) and 
# save as "invalid_loans" and the valid data saved in "valid_loans"

def filter_valid_loans(loan_data):
    """
    Filters out invalid loans (not returned during the year 2023).
    Args:
        loan_data (list): A list of dictionaries containing loan details.
    Returns:
        list: A list of dictionaries containing valid loan details and invalid_loans.
    """
    valid_loans = []
    invalid_loans = []
    
    for loan in loan_data:
        if loan['date_of_return'] > 0 and 44927 <= loan['date_of_loan'] <= 45291:
            valid_loans.append(loan)
        else:
            invalid_loans.append(loan)
            
    return valid_loans, invalid_loans

valid dates are considered as positive numbers, between 01/01/2023 and 31/12/2023 with corresponding Excel epoch date of 44927 and 45291. I will conticue carry on calculations in Excel epoch dates as it is easier to do calculations with integers. Where necessary I intend to do the convesrions if necessary.

if any invlaid data is existing will be added into a separate list invalid_loans

In [6]:
# To access to both valid_loans and invalid_loans in the main code (so no need to use global variables).
valid_loans, invalid_loans = filter_valid_loans(loan_data)

In [7]:
number_invalid_loans = len(invalid_loans)
print('1. The number of invalid loans (not returned) is ', number_invalid_loans)

1. The number of invalid loans (not returned) is  51


***The result verified the provided value*** 

__________________


### **2. The maximum days borrowed:**
The following function calculate and filter out the loans with the maximum days borrowed and apend the details to the "max_borrowed_details" list. At the moment I would need only the value of days borrowed but the list contain additionaldetails of book number and the member number as well for any future references.I print the max_number of days borrowed and number of such occurances. Also I can print the list by using print(max_borrowed_details(valid_loans)) if needed for analytical purpose. 

In [8]:
def calculate_max_borrowed_days_with_each_details(valid_loans):
    """
    Calculates the maximum number of days a book was borrowed in 2023 along with book and member details.
    Args:
        valid_loans (list): A list of dictionaries containing valid loan details.
    Returns:
        list: A list containing tuples of maximum borrowed days, book numbers, and member IDs for all occurrences.
    """
    max_borrowed_days = 0  # Initialize max borrowed days to 0
    max_borrowed_details = []  # Initialize list to store details of the book(s) and member(s) with max borrowed days
    
    for loan in valid_loans:  # Iterate over each valid loan
        # Calculate the number of days the book was borrowed by subtracting the loan date from the return date
        days_borrowed = loan['date_of_return'] - loan['date_of_loan']
        
        # If the current loan's borrowed days exceed the current maximum, update max borrowed days and associated book number and member ID
        if days_borrowed > max_borrowed_days:
            max_borrowed_days = days_borrowed
            max_borrowed_details = [(days_borrowed, loan['book_number'], loan['member_number'])]
        elif days_borrowed == max_borrowed_days:
            # If there are multiple occurrences of maximum borrowed days, append details to the list
            max_borrowed_details.append((days_borrowed, loan['book_number'], loan['member_number']))
    
    return max_borrowed_details
    
print("The max number of days borrowed: {}".format(calculate_max_borrowed_days_with_each_details(valid_loans)[0][0]))
print("(Number of similar occurrences of max_days_borrowed: {})".format(len(calculate_max_borrowed_days_with_each_details(valid_loans))))
#print(calculate_max_borrowed_days_with_each_details(valid_loans))

The max number of days borrowed: 20
(Number of similar occurrences of max_days_borrowed: 98)


***The result verified the provided statistical value***
_____________________


### **3. The minimum days borrowed and returned**

In [9]:
def calculate_min_borrowed_days_with_each_details(valid_loans):
    """
    Calculates the minimum number of days a book was borrowed in 2023 along with book and member details.
    Args:
        valid_loans (list): A list of dictionaries containing valid loan details.
    Returns:
        list: A list containing tuples of minimum borrowed days book numbers, and member IDs for all occurrences.
    """
    min_borrowed_days = 365  # Initialize min borrowed days to 365 as 1 year is the max of (valid) possible days borrowed for
    min_borrowed_details = []  # Initialize list to store details of the book(s) and member(s) with min borrowed days
    
    for loan in valid_loans:  # Iterate over each valid loan
        # Calculate the number of days the book was borrowed by subtracting the loan date from the return date
        days_borrowed = loan['date_of_return'] - loan['date_of_loan']
       
        # If the current loan's borrowed days less than the current minimum, update min borrowed days and associated book number and member ID
        if days_borrowed < min_borrowed_days:
            min_borrowed_days = days_borrowed
            min_borrowed_details = [(days_borrowed, loan['book_number'], loan['member_number'])]
        elif days_borrowed == min_borrowed_days:
            # If there are multiple occurrences of minimum borrowed days, append details to the list
            min_borrowed_details.append((days_borrowed, loan['book_number'], loan['member_number']))
    
    return min_borrowed_details
    
print("The min number of days borrowed: {}".format(calculate_min_borrowed_days_with_each_details(valid_loans)[0][0]))
print("(The number of similar occurrences: {})".format(len(calculate_min_borrowed_days_with_each_details(valid_loans))))
#print(calculate_min_borrowed_days_with_each_details(valid_loans))

The min number of days borrowed: 1
(The number of similar occurrences: 109)


_____________________

***NOTE*** The function *calculate_min_borrowed_days_with_each_details* returns a list of tuples. Each tuple contains the details of a loan, including:

- The minimum number of days borrowed (days_borrowed)
- The book number (book_number)
- The member number (member_number)

The function returns a list of tuples, where each tuple contains the details of a loan. Each tuple in the list contains:
- The minimum number of days borrowed (days_borrowed), which is accessed using [0] within the tuple.
- The book number (book_number), which is accessed using [1] within the tuple.
- The member number (member_number), which is accessed using [2] within the tuple.

  *In this case I am interested in validating the min number of days borrowed only, so I have to print the first element of the first tuple [0][0].*

  

***The result verified the provided statistical value***
________


### **4. The lowest book number borrowed:**

________________

***Note:*** To initialize min_book_number, I have used positive infinity (float('inf')). In that way it is guaranteed to be greater than any possible book number in the dataset, if the book numbers are numerical. The very first book iterated will be the lowest book number borrowed then. This ensures that any comparison with a real book number will result in the real book number being considered as the new minimum.
With this setup, during the iteration, if a real book number is found that is smaller than the current min_book_number, it will update min_book_number accordingly.
________________


In [10]:
def lowest_book_number_borrowed(valid_loans):
    """
    Find the lowest book number in the valid loans.
    Args:
        valid_loans (list): A list of dictionaries containing loan details.
    Returns:
        List: The lowest book number in the valid loans and along with loan details.
    """
    lowest_book_number_details = []
    
    # Initialize min_book_number with positive infinity
    min_book_number = float('inf')
    min_book_number_details = []
    
    for loan in valid_loans:
        book_number = int(loan['book_number'])  # Convert book number to integer for comparison
        if book_number < min_book_number:
            min_book_number = book_number
            min_book_number_details = [(book_number, loan['member_number'])]
        elif book_number == min_book_number:
            min_book_number_details.append((book_number, loan['member_number']))
            
    return min_book_number_details

lowest_book_Number_borrowed = lowest_book_number_borrowed(valid_loans)
print("The lowest book number borrowed: {}".format(lowest_book_Number_borrowed[0][0]))
print("(The Nummber of the lowest book number borrowed : {})".format(len(lowest_book_Number_borrowed)))



The lowest book number borrowed: 1
(The Nummber of the lowest book number borrowed : 22)



***NOTE*** lowest_book_number_borrowed(valid_loans) returns a list of tuples, where each tuple contains a book number and a member number.

- [0] accesses the first tuple in the list.
- [0][0] accesses the first element (book number) of the first tuple.


***The result verified the provided statistical value***
___________


### **5. The highest book number borrowed:**

In [11]:
def highest_book_number_borrowed(valid_loans):
    """
    Find the highest book number borrowed in the valid loans.
    Args:
        valid_loans (list): A list of dictionaries containing loan details.
    Returns:
        List: The highest book number in the valid loans and along with loan details.
    """
    highest_book_number_details = []
    
    # Initialize highest_book_number with 0 (as the book number is an integer and 1 is the lowest possible book number)
    highest_book_number = 0
    highest_book_number_details = []
    
    for loan in valid_loans:
        book_number = int(loan['book_number'])  # Convert book number to integer for comparison
        if book_number > highest_book_number:
            highest_book_number = book_number
            highest_book_number_details = [(book_number, loan['member_number'])]
        elif book_number == highest_book_number:
            highest_book_number_details.append((book_number, loan['member_number']))
            
    return highest_book_number_details

print("The highest book number borrowed: {}".format(highest_book_number_borrowed(valid_loans)[0][0]))
print("(The number of similar occurrences: {})".format(len(highest_book_number_borrowed(valid_loans))))

The highest book number borrowed: 140
(The number of similar occurrences: 6)


***The result verified the provided statistical value***
__________


### **6. The lowest member number of borrowers:**

In [12]:
def lowest_member_id(loan_data):
    """
    Finds the lowest member ID of borrowers in the loan data.
    Args:
        loan_data (list): A list of dictionaries containing loan details.
    Returns:
        int: The lowest member ID of borrowers.
    """
    
    # Initialize lowest_book_number with positive infinity. 200 is another option as we have already found it
    lowest_member_id = float('inf')  
    for loan in loan_data:
        member_id = int(loan['member_number'])
        if member_id < lowest_member_id:
            lowest_member_id = member_id
    return lowest_member_id


lowest_id = lowest_member_id(loan_data)
print("The lowest member number of borrowers is:", lowest_id)


The lowest member number of borrowers is: 1


***The result verified the provided statistical value***
___________


### **7. The highest member ID number of a borrower:**


In [13]:
highest_id = highest_member_id(loan_data)
print("The highest member ID of borrowers is:", highest_id)

The highest member ID of borrowers is: 200


***The result verified the provided statistical value***
___________


___________


### **Computing the lost data:**


I have printed the earlier filtered-out invalid_loan dataset. This dataset contains returned dates as zero or the books were never returned with the year. For visual understanding, I have converted Excel epoch dates into normal format. The same function can be used to convert dates from loan_data as well.

In [14]:


import datetime

''' This function takes an Excel Epoch Format date
    (represented as the number of days since the start date)
    and returns a string in "YYYY-MM-DD" format. '''

def excel_to_date(excel_date):
    try:
        if excel_date == 0:
            return 0
        excel_epoch_start = datetime.datetime(1899, 12, 30)
        py_date = excel_epoch_start + datetime.timedelta(days=int(excel_date))
        return py_date.strftime('%Y-%m-%d')
    except ValueError:
        return "Invalid Date"

def loans_converted(loan_data):
    """
    Filters out invalid loans (not returned during the year 2023) and converted data into normal format. The books not
    returned within the year, the returned date will be 0.
    Args:
        loan_data (list): A list of dictionaries containing loan details.
    Returns:
        list: A list of dictionaries of loans, containing converted dates.
    """
    loans_converted_dates = []
    
    for row in loan_data:
        try:
            book_number = row['book_number']
            member_number = row['member_number']
            date_of_loan = int(row['date_of_loan']) #converting strings to integer
            date_of_return = int(row['date_of_return']) #converting strings to integer

            # If date_of_return is 0, set it to 0, else convert it to date
            # excel_to_date() cannot handle 0 values, so if it is zero the data will remain as 0.
            date_of_return_converted = 0 if date_of_return == 0 else excel_to_date(date_of_return)
            date_of_loan_converted = excel_to_date(date_of_loan)
            #converted dates will be stored in a separate list of dictionaries
            loans_converted_dates.append({'book_number': book_number, 'member_number': member_number, 'date_of_loan': date_of_loan_converted, 'date_of_return': date_of_return_converted})
        except KeyError as e:
            print(f"KeyError: Missing key {e} in the loan data.")
        except ValueError as e:
            print(f"ValueError: {e}")
                
    return loans_converted_dates


#### **Custom fuctions for printing lists, dictionaries and tuples with various options:**

In [15]:
# I have created additional functions to print results of lists or dictionaries or tuples in a meaningful way.

# I have used here Try, Except error handling as well

# I have used those functions to get appropriate results where needed, to avoid printing screenful of large datasets

def printing_dict_10_lines(my_dict):
    """
    This function will print only the first 10 lines of the dictionary.
    Args:
        my_dict: A dictionary
    Returns:
        print out only the first 10 lines of the dictionary.
    """
    try:
        if not isinstance(my_dict, dict):
            raise TypeError("Input is not a dictionary.")
        keys = list(my_dict.keys())[:10]
        for key in keys:
            print(key, my_dict[key])
    except TypeError as e:
        print(f"TypeError: {e}")
    except Exception as e:
        print(f"An error occurred: {e}")
        
def printing_dict_bottom_10_lines(my_dict):
    """
    This function will print only the bottom 10 lines of the dictionary.
    Args:
        my_dict: A dictionary
    Returns:
        print out only the bottom 10 lines of the dictionary.
    """
    try:
        if not isinstance(my_dict, dict):
            raise TypeError("Input is not a dictionary.")
        keys = list(my_dict.keys())[-10:-1]
        for key in keys:
            print(key, my_dict[key])
    except TypeError as e:
        print(f"TypeError: {e}")
    except Exception as e:
        print(f"An error occurred: {e}")
        
def printing_list_10_lines(my_list):
    """
    This function will print only the first 10 lines of the list.
    Args:
        my_list: A List
    Returns:
        print out only the first 10 lines of the list.
    """   
    try:
        if not isinstance(my_list, list):
            raise TypeError("Input is not a list.")
        first_10_items = my_list[:10]
        for item in first_10_items:
            print(item)
    except TypeError as e:
        print(f"TypeError: {e}")
    except Exception as e:
        print(f"An error occurred: {e}")

def printing_list_bottom_10_items(my_list):
    """
    This function will print only the bottom 10 items of the list.
    Args:
        my_list: A list
    Returns:
        print out only the bottom 10 items of the list.
    """
    try:
        if not isinstance(my_list, list):
            raise TypeError("Input is not a list.")
        for item in my_list[-10:]:
            print(item)
    except TypeError as e:
        print(f"TypeError: {e}")
    except Exception as e:
        print(f"An error occurred: {e}")


def printing_list(my_list):
    """
    This function will print only the given number of lines of the list. It will indicate the maximumm
    number of lines which is based on the calculated size of the List. The minimumm will be 1. 
    If any other number out of the range is called, the function will exit.
    Args:
        my_list: A List
    Returns:
        print out only the first user defined lines of the list.
    """   
    n = 0
    try:
        if not isinstance(my_list, list):
            raise TypeError("Input is not a list.")
        n = int(input(f"Enter number of rows needed to print (max={len(my_list)} and min=1:), out of range will exit the request "))
        print("\n")
        if n > 0:
            if n > len(my_list):
                print("Too high number")
                return
            else:
                # To print like in columns with specified left alignment values in placeholders
                print("{:<15} {:<40} {:<30} {:<15}".format('Book Number', 'Title', 'Author', 'Borrowed Days'))
                first_n_items = my_list[:n]
                for item in first_n_items:
                    # formatted output to print with gaps to visually available as a table
                    print("{:<15} {:<40} {:<30} {:<15}".format(item['book_number'], item['title'], item['author'], item['borrowed_days']))
    except ValueError:
        print("Invalid input. Please enter a valid number next time.")
    except TypeError as e:
        print(f"TypeError: {e}")
    except Exception as e:
        print(f"An error occurred: {e}")
    return

def printing_tuple(my_tuple, num_lines):
    """
    This function prints out the given number of lines from the tuple.
    Args:
        my_tuple: A tuple
        num_lines: Number of lines to print
    Returns:
        None
    """
    try:
        if not isinstance(my_tuple, tuple):
            raise TypeError("Input is not a tuple.")
        
        print("Tuple elements:")
        for i, item in enumerate(my_tuple):
            if i >= num_lines:
                break
            print(item)
    
    except TypeError as e:
        print(f"TypeError: {e}")
    except Exception as e:
        print(f"An error occurred: {e}")
   


### **The lost data (invalid_loans)**

In [16]:
print("Invalid dataset First 10 lines")
print("==========================")
print("Size of the dataset: ", len(loans_converted(invalid_loans)), "\n")
printing_list_10_lines(loans_converted(invalid_loans))


Invalid dataset First 10 lines
Size of the dataset:  51 

{'book_number': '85', 'member_number': '119', 'date_of_loan': '2023-01-09', 'date_of_return': 0}
{'book_number': '131', 'member_number': '186', 'date_of_loan': '2023-01-10', 'date_of_return': 0}
{'book_number': '58', 'member_number': '24', 'date_of_loan': '2023-01-23', 'date_of_return': 0}
{'book_number': '105', 'member_number': '30', 'date_of_loan': '2023-01-24', 'date_of_return': 0}
{'book_number': '89', 'member_number': '86', 'date_of_loan': '2023-02-15', 'date_of_return': 0}
{'book_number': '43', 'member_number': '150', 'date_of_loan': '2023-02-20', 'date_of_return': 0}
{'book_number': '132', 'member_number': '199', 'date_of_loan': '2023-03-02', 'date_of_return': 0}
{'book_number': '41', 'member_number': '110', 'date_of_loan': '2023-03-09', 'date_of_return': 0}
{'book_number': '56', 'member_number': '194', 'date_of_loan': '2023-03-15', 'date_of_return': 0}
{'book_number': '92', 'member_number': '149', 'date_of_loan': '2023-0

#### **Function testing**

 - **Testing with printing out a list instead of a dictionary**

In [17]:
# To validate the above function, will use dict printing for list printing.
# It should trigger the error handling and give the appropriate result.

printing_dict_10_lines(loans_converted(invalid_loans))

TypeError: Input is not a dictionary.


 - **Testing with printing first 10 lines out a loan_data which includes non returned books as well**

In [18]:
print("Loan dataset first 10 lines")
print("==========================")
print("Size of the dataset: ", len(loans_converted(loan_data)), "\n")
printing_list_10_lines(loans_converted(loan_data))

Loan dataset first 10 lines
Size of the dataset:  2209 

{'book_number': '16', 'member_number': '126', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-18'}
{'book_number': '31', 'member_number': '192', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-19'}
{'book_number': '57', 'member_number': '199', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-21'}
{'book_number': '100', 'member_number': '140', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-15'}
{'book_number': '100', 'member_number': '39', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-08'}
{'book_number': '114', 'member_number': '196', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-12'}
{'book_number': '114', 'member_number': '171', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-20'}
{'book_number': '138', 'member_number': '109', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-04'}
{'book_number': '10', 'member_number': '150', 'date_of_loan': '2023-01-02',

__________________


 - **Testing with printing out bottom 10 lines of the loan_data which includes non returned books as well**

In [19]:
print("Loan dataset bottomm 10 lines")
print("==========================")
print("Size of the dataset: ", len(loan_data), "\n")
printing_list_bottom_10_items(loans_converted(loan_data))

Loan dataset bottomm 10 lines
Size of the dataset:  2209 

{'book_number': '53', 'member_number': '150', 'date_of_loan': '2023-12-29', 'date_of_return': '2023-12-30'}
{'book_number': '83', 'member_number': '143', 'date_of_loan': '2023-12-29', 'date_of_return': '2023-12-31'}
{'book_number': '105', 'member_number': '152', 'date_of_loan': '2023-12-29', 'date_of_return': '2023-12-30'}
{'book_number': '115', 'member_number': '13', 'date_of_loan': '2023-12-29', 'date_of_return': '2023-12-30'}
{'book_number': '129', 'member_number': '75', 'date_of_loan': '2023-12-29', 'date_of_return': '2023-12-31'}
{'book_number': '49', 'member_number': '117', 'date_of_loan': '2023-12-30', 'date_of_return': '2023-12-31'}
{'book_number': '62', 'member_number': '130', 'date_of_loan': '2023-12-30', 'date_of_return': '2023-12-31'}
{'book_number': '63', 'member_number': '19', 'date_of_loan': '2023-12-30', 'date_of_return': '2023-12-31'}
{'book_number': '94', 'member_number': '178', 'date_of_loan': '2023-12-30', '

### **Clean data of book loans (valid_loans):**

In [20]:
print("Valid dataset First 10 lines")
print("==========================")
print("Size of the dataset: ", len(valid_loans), "\n")
printing_list_10_lines(loans_converted(valid_loans))

Valid dataset First 10 lines
Size of the dataset:  2158 

{'book_number': '16', 'member_number': '126', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-18'}
{'book_number': '31', 'member_number': '192', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-19'}
{'book_number': '57', 'member_number': '199', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-21'}
{'book_number': '100', 'member_number': '140', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-15'}
{'book_number': '100', 'member_number': '39', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-08'}
{'book_number': '114', 'member_number': '196', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-12'}
{'book_number': '114', 'member_number': '171', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-20'}
{'book_number': '138', 'member_number': '109', 'date_of_loan': '2023-01-01', 'date_of_return': '2023-01-04'}
{'book_number': '10', 'member_number': '150', 'date_of_loan': '2023-01-02'

### <span style='color:blue'>**Task 1: Solutions**</span>

### **1.** Report "Books Borrowed in 2023"
I have taken into consideration the complete set of borrowers who didnt returned the books in time. This is because I consider this is as 1 time loan. 

In [21]:
def generate_borrowed_books_report(loan_data, book_data):
    """
    Generates a report of borrowed books in 2023.
    Args:
        valid_loans (list): A list of dictionaries containing valid loan details.
        book_data (dict): A dictionary containing book details.
    Returns:
        list: A list of dictionaries representing the report.
    """

    borrowed_books_report = []
    not_found_book_numbers = []
    invalid_borrowed_days = []
    
    for loan in loan_data:
        book_number = loan['book_number']
        if book_number not in book_data:
            # skipping the record if the book number is not found in loan records
            # recording those invalid book numbers into a separate list
            not_found_book_numbers.append(book_number)
            continue
        book_title = book_data[book_number]['title']
        book_author = book_data[book_number]['author']
        borrowed_days = loan['date_of_return'] - loan['date_of_loan']
        
        if borrowed_days <= 0: 
            # skipping the loan record if the borrowed days are invalid and record in to a seprate list
            invalid_borrowed_days.append(book_number)
            continue
        borrowed_books_report.append({
            'book_number': book_number,
            'title': book_title,
            'author': book_author,
            'borrowed_days': borrowed_days
        })
    if len(invalid_borrowed_days) == 0:
        invalid_borrowed_days.append('No data is found')
    return borrowed_books_report, not_found_book_numbers, invalid_borrowed_days
    
borrowed_books_report, not_found_book_numbers, invalid_borrowed_days = generate_borrowed_books_report(loan_data, book_data)

print("A). The following list of book numbers were skipped from the final list of borrowers,\n   because the book numbers were not found in books details:\n\n", "(The number of repetition is the repeated entries of the same book)\n", not_found_book_numbers, "\n")
print("B). The following book numbers were skipped, as not returned or invalid borrowed days:\n", invalid_borrowed_days, "\n")
print("\n")
print("C). Books Borrowed in 2023 (unsorted data):\n")

printing_list(borrowed_books_report) # user defined function

A). The following list of book numbers were skipped from the final list of borrowers,
   because the book numbers were not found in books details:

 (The number of repetition is the repeated entries of the same book)
 ['138', '140', '130', '130', '128', '134', '137', '134', '136', '131', '122', '132', '128', '124', '124', '122', '133', '132', '132', '132', '122', '125', '134', '138', '139', '124', '128', '136', '131', '129', '138', '138', '129', '130', '134', '140', '134', '131', '138', '133', '134', '135', '136', '132', '129', '132', '132', '134', '137', '129', '123', '130', '135', '126', '131', '130', '134', '136', '134', '122', '128', '131', '124', '129', '133', '129', '133', '132', '122', '126', '126', '137', '138', '131', '122', '132', '133', '126', '127', '126', '134', '122', '126', '134', '132', '135', '130', '128', '122', '133', '126', '134', '128', '133', '128', '131', '122', '132', '122', '139', '127', '131', '124', '131', '130', '129', '138', '122', '131', '138', '134', '122

Enter number of rows needed to print (max=1914 and min=1:), out of range will exit the request  10




Book Number     Title                                    Author                         Borrowed Days  
16              The Trial                                Frank Kafka                    17             
31              The Wealth Of Nations                    Adam Smith                     18             
57              Textbook Of Economic Theory              Alfred Stonier                 20             
100             Doctor In The Nude                       Richard Gordon                 14             
100             Doctor In The Nude                       Richard Gordon                 7              
114             Selected Short Stories                   Unknown                        11             
114             Selected Short Stories                   Unknown                        19             
10              How To Think Like Sherlock Holmes        Maria Konnikova                13             
12              Slaughterhouse Five                      Kurt 

### **2. The most popular books**

It is a debated question to identify the most popular book.

Since the library has only one copy of each book it is very tricky to decide whether the most popular book is the one borrowed for the most number of days or the one borrowed for the most number of times!

I have considered the following factors:

**Most days borrowed:**
- The most number of days could be a result of that one person borrowing and returning after a long period which is debatable that he even read it, maybe just forgotten or taking too long to read. That should be treated in a different category as a recovery of the item from the member or replace the book if it is held by one single person for too long. It should be excluded from any statistical reviews.
- Or that book could be a highly valuable textbook or educational book. In that case, may be the same book wanted by many. We cannot say that without looking into the genre and the value of the item. A meaningful statistical evaluation can be done only with historical data for the book.
- If we can categorize and use the 'genre' to identify the above condition in more complex details, may we can make better assumptions as there is no reason why someone is reading a novel for a year!

**Most number times borrowed**
- I assume the most corrected assumption could be the book which has the highest number of loan count with the second criteria as the length of the borrowed period. That indicates the demand and popularity.
- More people are willing to borrow the book. That indicates it is more popular.  
- We should consider the 'genre' in this case too, a textbook cannot be returned that quickly but a novel can be. So if a textbook indicates a high number of loans it could be a bad book if returned in a shorter period.

**Conclusion**
- The statistics and one book are not enough.
- Needed several years of statistical data to evaluate any meaningful conclusion if the library has only a single book.
- We have to consider in both occasions 'genre' to come to a decision.
- It can be even more accurate if we score all three elements (days_borrowed, number_times_borrowed, genre) together to get a better picture. If we can categorize, standardize and score them will give a better picture of the popularity.
- Taking the factors I can weigh more to the side of **a number of times** loaned out for the popularity ranking.
  
**Note** The assumption could be more accurate and with less trouble if we had data for reservation requests for a single book which indicates the book is in demand. 

So, the library should introduce ***'a reservation request'*** method rather than both methods at the top as a prime selective method.

___________________

- I have included measurement elements of both popularity methods in the following function. while iterating the elements "days borrowed" and "number of times loaned out" both are calculated and returned within one tuple. This eliminates calculating twice.


In [22]:
def get_borrowed_days(valid_loans, book_data):
    """
    Calculates the number of days each book was borrowed.
    Args:
        valid_loans (list): A list of dictionaries containing valid loan details.
        book_data (dict): A dictionary containing book details with book numbers as keys.
    Returns:
        dict: A dictionary containing the number of days each book was borrowed and other details.
        list: A list of book numbers for which data was not found.
    """
    loan_out_details = {}
    error_books = set()  # Use a set to store book numbers for which data was not found
    
    for loan in valid_loans:
        book_number = loan['book_number']
        
        # Check if book_number exists in book_data
        if book_number in book_data:
            # Get book details
            book_title = book_data[book_number]['title']
            book_author = book_data[book_number]['author']
            date_of_loan = loan['date_of_loan']
            date_of_return = loan['date_of_return']
            
            # Calculate the number of days the book was borrowed
            days_borrowed = date_of_return - date_of_loan
            
            # Create a unique identifier for each book using book number, title, and author
            book_key = (book_number, book_title, book_author)
            
            # Check if the book_key exists in loan_out_details
            if book_key not in loan_out_details:
                # Add book details to loan_out_details
                loan_out_details[book_key] = {
                    'book_number': book_number,
                    'title': book_title,
                    'author': book_author,
                    'borrowed_days': days_borrowed,  # Initialize with current borrowed days
                    'loan_count': 1
                }
            else:
                # Update borrowed_days and loan_count
                loan_out_details[book_key]['borrowed_days'] += days_borrowed
                loan_out_details[book_key]['loan_count'] += 1
        else:
            # Add book number to error_books. This is to avoid printing a long list of error messages.
            # only printing the list of book numbers without any repetition after collecting them in a set()
            # Elements in set() are unique. 
            error_books.add(book_number)
            
    return loan_out_details, list(error_books)



- Since there are errors in the bookloans.csv there could be some possibility inconsistencies in book numbers and related book names. i.e. there could be the same title but with different book numbers or the same book numbers but with different titles. To avoid those discrepancies I use the unique key as *(book_number, book_title, book_author)*. In that combination, the most accurate data could be retrieved and filtered.

In [23]:
# I separately use the following function for sorting the loan_out_details, sorted by loan_count. 

def sort_loan_out_details_by_loan_count(loan_out_details):
    """
    Sorts the loan_out_details dictionary based on loan_count in descending order.
    Args:
        loan_out_details (dict): A dictionary containing book details with book numbers as keys
                                 and loan details as values.
    Returns:
        list: A list of sorted book details dictionaries.
    """
    # Sort the loan_out_details based on loan_count in descending order
    sorted_books = sorted(loan_out_details.values(), key=lambda x: x['loan_count'], reverse=True)
    return sorted_books



In [24]:
loan_out_details, error_books = get_borrowed_days(valid_loans, book_data)
print("The following Books with book_number not found in book_data. Skipping...\n")
print(error_books,"\n")
print()
sorted_loan_out_details = sort_loan_out_details_by_loan_count(loan_out_details)

print("First 10 lines of sorted list, sorted by: loan_count\n")
printing_list_10_lines(sorted_loan_out_details)
print("\n")
print("Bottom 10 lines of the sorted list, sorted by: loan_count\n")
printing_list_bottom_10_items(sorted_loan_out_details)

The following Books with book_number not found in book_data. Skipping...

['130', '133', '121', '126', '134', '138', '129', '137', '140', '128', '125', '135', '139', '123', '132', '136', '127', '124', '131', '122'] 


First 10 lines of sorted list, sorted by: loan_count

{'book_number': '85', 'title': 'The Great Indian Novel', 'author': 'Shashi Tharoor', 'borrowed_days': 261, 'loan_count': 27}
{'book_number': '38', 'title': 'False Impressions', 'author': 'Jeffery Archer', 'borrowed_days': 258, 'loan_count': 27}
{'book_number': '119', 'title': 'Karl Marx Biography', 'author': 'Unknown', 'borrowed_days': 272, 'loan_count': 27}
{'book_number': '61', 'title': 'A Modern Approach Computer Vision', 'author': 'David Forsyth', 'borrowed_days': 234, 'loan_count': 26}
{'book_number': '72', 'title': 'A Prisoner Of Birth', 'author': 'Jeffery Archer', 'borrowed_days': 290, 'loan_count': 26}
{'book_number': '48', 'title': 'Journal Of A Novel', 'author': 'John Steinbeck', 'borrowed_days': 254, 'loan_c

##### **Most popular books - based on number of times loaned out**

In [25]:

most_popular_book = sorted_loan_out_details[0]
least_popular_book = sorted_loan_out_details[-1]

print("Most Popular Book")
print(f"Book Number: {most_popular_book['book_number']}")
print(f"Title: {most_popular_book['title']}")
print(f"Author: {most_popular_book['author']}")
print(f"Borrowed Days: {most_popular_book['borrowed_days']}")
print(f"Loan Count: {most_popular_book['loan_count']}")
print()

print("Least Popular Book")
print(f"Book Number: {least_popular_book['book_number']}")
print(f"Title: {least_popular_book['title']}")
print(f"Author: {least_popular_book['author']}")
print(f"Borrowed Days: {least_popular_book['borrowed_days']}")
print(f"Loan Count: {least_popular_book['loan_count']}")


Most Popular Book
Book Number: 85
Title: The Great Indian Novel
Author: Shashi Tharoor
Borrowed Days: 261
Loan Count: 27

Least Popular Book
Book Number: 13
Title: Birth Of A Theorem
Author: Cedric Villani
Borrowed Days: 3
Loan Count: 1


##### **Most popular books - based on number of days borrowed**

In [26]:
# Assuming loan_out_details is a dictionary containing book details with book numbers as keys

# Sort the books based on borrowed days
sorted_books_by_borrowed_days = sorted(loan_out_details.values(), key=lambda x: x['borrowed_days'], reverse=True)

print("Sorted List of Dictionaries of Borrowed Days.\nThe most top one is the highest borrowed days\nThe bottom is the least borrowed days\n")
# Print the sorted list
for book in sorted_books_by_borrowed_days:
   print(book)


Sorted List of Dictionaries of Borrowed Days.
The most top one is the highest borrowed days
The bottom is the least borrowed days

{'book_number': '43', 'title': 'Tales Of Mystery And Imagination', 'author': 'Edgar Allen Poe', 'borrowed_days': 301, 'loan_count': 24}
{'book_number': '54', 'title': 'The Complete Mastermind', 'author': 'BBC', 'borrowed_days': 299, 'loan_count': 23}
{'book_number': '72', 'title': 'A Prisoner Of Birth', 'author': 'Jeffery Archer', 'borrowed_days': 290, 'loan_count': 26}
{'book_number': '47', 'title': 'Asami Asami', 'author': 'P L Deshpande', 'borrowed_days': 281, 'loan_count': 23}
{'book_number': '114', 'title': 'Selected Short Stories', 'author': 'Unknown', 'borrowed_days': 276, 'loan_count': 24}
{'book_number': '119', 'title': 'Karl Marx Biography', 'author': 'Unknown', 'borrowed_days': 272, 'loan_count': 27}
{'book_number': '98', 'title': 'Burning Bright', 'author': 'John Steinbeck', 'borrowed_days': 267, 'loan_count': 24}
{'book_number': '103', 'title':

In [27]:
# Assuming loan_out_details is a dictionary containing book details with book numbers as keys

# Sort the books based on borrowed days
sorted_books_by_borrowed_days = sorted(loan_out_details.values(), key=lambda x: x['borrowed_days'], reverse=True)

# Retrieve the most borrowed days book
most_borrowed_book = sorted_books_by_borrowed_days[0]

# Retrieve the least borrowed days book
least_borrowed_book = sorted_books_by_borrowed_days[-1]

print("Based on above sorted List\n\nMost borrowed days book:")
print(most_borrowed_book)

print("\nLeast borrowed days book:")
print(least_borrowed_book)


Based on above sorted List

Most borrowed days book:
{'book_number': '43', 'title': 'Tales Of Mystery And Imagination', 'author': 'Edgar Allen Poe', 'borrowed_days': 301, 'loan_count': 24}

Least borrowed days book:
{'book_number': '13', 'title': 'Birth Of A Theorem', 'author': 'Cedric Villani', 'borrowed_days': 3, 'loan_count': 1}


----------------


### <span style='color:blue'>**Task 2**</span>

The library is keen to know the interests of its readers to influence purchasing decisions. The books have different genres. You cannot assume all genres were borrowed; we are only interested in those that were. Is this sensible do you think? Or would a genre not borrowed at all be significant? Bear such issues in mind for the quality section. We decided to ignore sub-genres this year.

- Write code to produce a popularity report of all genres of books borrowed in 2023 and how many books are in that genre.
- Write code to sort the report. Default or reverse order is okay.

##### **Books summary based on genre:**

In [28]:
def genre_popularity_report(valid_loans, book_data):
    """
    Generates a popularity report of book genres borrowed in 2023.
    Args:
        valid_loans (list): A list of dictionaries containing valid loan details.
        book_data (dict): A dictionary containing book details with book numbers as keys.
    Returns:
        dict: A dictionary containing the count of books borrowed for each genre.
    """
    genre_count = {}
    error_books = set()

    for loan in valid_loans:
        book_number = loan['book_number']
        
        try:
            genre = book_data[book_number]['genre']
            
            # Count the occurrence of each genre
            if genre in genre_count:
                genre_count[genre] += 1
            else:
                genre_count[genre] = 1
        except KeyError:
            error_books.add(book_number)
            
    return genre_count, error_books



In [29]:
genre_count, error_books = genre_popularity_report(valid_loans, book_data)
print("The following Book numbers are not found in book data.\n")
print(error_books)
print()
print("Summary of number of borrows for each genre\n")
print(genre_count)
print()
max_genre, max_count = max(genre_count.items(), key=lambda item: item[1])
min_genre, min_count = min(genre_count.items(), key=lambda item: item[1])
print(f"Most popular genre: {max_genre},\nCount: {max_count}\n")
print(f"Least popular genre: {min_genre},\nCount: {min_count}")


The following Book numbers are not found in book data.

{'130', '133', '121', '126', '134', '138', '129', '137', '140', '128', '125', '135', '139', '123', '132', '136', '127', '124', '131', '122'}

Summary of number of borrows for each genre

{'Fiction': 701, 'Science': 150, 'Tech': 386, 'Non-fiction': 577, 'Philosophy': 100}

Most popular genre: Fiction,
Count: 701

Least popular genre: Philosophy,
Count: 100


- Printing a dictionary containing details of books belonging to the most popular genre. The function printing_dict_10_lines is used to print the first 10 lines of the dictionary, which is a helpful way to display the data.

In [30]:
def list_of_most_popular_genre_books(book_data, most_popular_genre):
    """
    Creates a dictionary containing details of books belonging to the most popular genre.

    Args:
        book_data (dict): A dictionary containing book details with book numbers as keys.
        most_popular_genre (str): The most popular genre for which to retrieve books.

    Returns:
        dict: A dictionary containing details of books belonging to the most popular genre.
              Keys are book numbers and values are dictionaries containing book details.
              Book details include title, author, and genre.
    """
    most_popular_genre_books = {}
    
    for book in book_data:
        # to avoid a KeyError if a book in book_data doesn't have a 'genre' key
        if 'genre' in book_data[book] and most_popular_genre == book_data[book]['genre']:
            most_popular_genre_books[book] = {
                'title' : book_data[book]['title'],
                'author': book_data[book]['author'],
                'genre' : book_data[book]['genre'],
                
            }
    return most_popular_genre_books

most_popular_genre = 'Fiction'
most_popular_genre_books = list_of_most_popular_genre_books(book_data, most_popular_genre)
print("List of Dictionary containing most popular genre: 'Fiction'\n")


printing_dict_10_lines(most_popular_genre_books)

# If needed to print out the whole report, run below line please!
#print(most_popular_genre_books)
        


List of Dictionary containing most popular genre: 'Fiction'

12 {'title': 'Slaughterhouse Five', 'author': 'Kurt Vonnegut', 'genre': 'Fiction'}
16 {'title': 'The Trial', 'author': 'Frank Kafka', 'genre': 'Fiction'}
19 {'title': 'The New Machiavelli', 'author': 'H. G. Wells', 'genre': 'Fiction'}
28 {'title': 'The Outsider', 'author': 'Albert Camus', 'genre': 'Fiction'}
29 {'title': 'The Complete Sherlock Holmes Vol I', 'author': 'Arthur Conan Doyle', 'genre': 'Fiction'}
30 {'title': 'The Complete Sherlock Holmes Vol Ii', 'author': 'Arthur Conan Doyle', 'genre': 'Fiction'}
32 {'title': 'The Pillars Of The Earth', 'author': 'Ken Follett', 'genre': 'Fiction'}
36 {'title': 'A Farewell To Arms', 'author': 'Ernest Hemingway', 'genre': 'Fiction'}
37 {'title': 'The Veteran', 'author': 'Frederick Forsyth', 'genre': 'Fiction'}
38 {'title': 'False Impressions', 'author': 'Jeffery Archer', 'genre': 'Fiction'}


------------


### <span style='color:blue'>**Task 3**</span>

Statistics on loans and late loans are lost.

For the library books, we need you to calculate the:

- Number of loans.
- Number of days borrowed.
- Number of loans returned late.
- Number of days late.
- Average number of days a book was borrowed.
- The percentage proportion of books returned late.
- The average late period of books returned late.
  
A book borrowed and returned on the same day was borrowed for one day. If a book is returned after 19 days, it is 5 days late.

The report should show:
- **a) The average number of days that a book was borrowed.**
        Assume that if books were borrowed for 20,000 days and the number of loans was 2,000, the average number of    days that a book was borrowed was 10 days.
- **b) The percentage proportion of books returned late.**
        Assume that if the number of loans returned late was 500 when the number of loans was 2,000, the proportion of books returned late was 25%.
- **c) The average late period of books returned late.** i.e., a book returned late is on average this many days after the grace period.
  
   

As it doesnt specify valid or invalid data based on same year returns, I assume the loans considered as whole.

In [31]:
print("Number of Loans: ", len(loan_data))

Number of Loans:  2209


--------------------------

#### **Statistics on loans and late loans:**

In [32]:
from datetime import datetime, timedelta

# Define the Excel epoch date
EXCEL_EPOCH = datetime(1899, 12, 30)

def excel_date_to_datetime(excel_date):
    """Converts an Excel date to a Python datetime object."""
    return EXCEL_EPOCH + timedelta(days=excel_date)

def statistics_on_loans(loan_data):
    """
    Calculates Number of loans, Number of days borrowed, Number of loans returned late, Number of days late.
    Args:
        loan_data (list): A list of dictionaries containing loan details.
    Returns:
        tuple: Number of loans, Number of days borrowed, Number of loans returned late, Number of days late.
    """
    # Initialize variables to store statistics
    number_loans = 0
    number_days_borrowed = 0
    number_loans_returned_late = 0
    number_days_late = 0
    
    # Iterate over each loan data
    for loan in loan_data:
        number_loans += 1
        
        # Convert Excel epoch dates to datetime objects
        loan_date = excel_date_to_datetime(loan['date_of_loan'])
        return_date = excel_date_to_datetime(loan['date_of_return'])

        # Calculate the number of days the book was borrowed
        if loan['date_of_return'] != 0:  # Book has been returned
            days_borrowed = (return_date - loan_date).days
        else:  # Book has not been returned yet
            days_borrowed = (datetime.today() - loan_date).days
            
        number_days_borrowed += days_borrowed

        # Calculate the number of days late if applicable
        if days_borrowed > 14:  # Book is returned after 14 days
            number_loans_returned_late += 1
            number_days_late += (days_borrowed - 14)

    return number_loans, number_days_borrowed, number_loans_returned_late, number_days_late

number_loans, number_days_borrowed, number_loans_returned_late, number_days_late = statistics_on_loans(loan_data)

def average_number_days_book_borrowed(number_days_borrowed, number_loans):
    """Claculating the average number of days that a book was borrowed"""
    average_days_borrowed = number_days_borrowed / number_loans
    return average_days_borrowed

def percentage_books_returned_late(number_loans_returned_late, number_loans):
    """Calculating the percentage proportion of books returned late"""
    percentage_late_returns = number_loans_returned_late / number_loans * 100
    return percentage_late_returns

def average_late_returned_books(number_days_late, number_loans):
    """Calculating the average late period of books returned late, in days"""
    average_late_returns = number_days_late / number_loans
    return average_late_returns

average_days_borrowed = average_number_days_book_borrowed(number_days_borrowed, number_loans)
percentage_late_returns = percentage_books_returned_late(number_loans_returned_late, number_loans)
average_late_returns =  average_late_returned_books(number_days_late, number_loans_returned_late)


In [33]:
print("Statistics on loans and late loans")
print("===============================================================")
print(f"Number of loans: {'': <42}{number_loans}")
print(f"Number of days borrowed: {'': <33}{number_days_borrowed}")
print(f"Number of loans returned late: {'': <29}{number_loans_returned_late}")
print(f"Number of days late: {'': <37}{number_days_late}")
print(f"Number of books: {'': <42}{number_loans}")
print(f"The average number of days that a book was borrowed: {'': <8}{average_days_borrowed:.0f}")
print(f"The percentage proportion of books returned late: {'': <8}{percentage_late_returns:.1f}%")
print(f"The average late period of books returned late, in days: {'': <4}{average_late_returns:.0f}")


Statistics on loans and late loans
Number of loans:                                           2209
Number of days borrowed:                                  35027
Number of loans returned late:                              664
Number of days late:                                      14322
Number of books:                                           2209
The average number of days that a book was borrowed:         16
The percentage proportion of books returned late:         30.1%
The average late period of books returned late, in days:     22


--------------------

### **Conclusion:**
I tried to cover and include most of the aspects of the assessment as learning in the process of doing so. Certain, different ways of presentations I tried here although they are not in the scope of the assessment. Learning with dictionaries was very valuable and interesting as it is very powerful in data handling, the same as tuples. They are challenging to handle as well. Some techniques I learned new as hit and try. I extensively used user-defined functions where possible, which gave me clarity in calculations, multiple uses, a lot easier troubleshooting, and a clean sheet in the main body. It helped me to modify the functions very often without affecting the overall result or getting mixed up as I had to focus only on a smaller portion of the whole project. Most of the functions have been developed to this stage after multiple modifications.

Such assignments are very valuable in learning as the most difficult part is the application of the learning materials. Going forward I believe such case studies are very valuable. If we can add such works to our portfolio would be great if we are allowed to do so. The biggest benefit of such assessment I gained was using predefined functions where possible and keeping a clean main body, obviously in research and application.

Completing the assignment I have followed the following points:

- Using functions for most of the operations.
- Used Docstrings in every function. That way helped me to provide the necessary documentation for each operation.
- Used comments in most places.
- Used Jupyter markdown section to provide additional information, chapter divisions and task references etc. This additional information will help me to study the assignment after a later period without any other documentation or use this as for personal portfolio or any one to understand how it was done.
- Where needed for every requirment data sheets, calculations, reports are generated on the screen to verify the functions.
  
**Data Quality:** 
  
***Note:*** In the given limited time completing, testing and troubleshooting this assessment there could be unintentionally missed requirements, wrong calculations or wrong assumptions, I would be glad to correct them if required.
Thank you.  

==================================================================================================================================================
------------

Completed by: **Ajantha Wirasinghe**  &emsp; Keele Student number: **24027813** &emsp;  Date: **31/03/2024**