<div align="center">
    <img src="project_icon.jpg" alt="Debugging code" width="50%">
</div>


In today’s fast-paced, data-driven world, companies rely heavily on accurate analyses of sales data to make informed decisions. However, the quality of the insights derived from such analyses depend significantly on the integrity of the underlying code.

You have been given some starting code with two functions: one that extracts and flattens JSON data into a structured format and the other that transforms electricity sales data by cleaning, filtering, and extracting relevant features. The company plans to use your revised code to improve the accuracy of sales analytics.

Your task is to identify potential errors in the functions and the underlying data that might result in logic and runtime errors, such as missing values, incorrect data types, or incompatible values (e.g., negatives). Enhance the custom functions provided by implementing exceptions to catch data quality issues and edge cases.

The data is available here ("sales_data_sample.csv") for the analyses. This data has 25 columns, but only two columns are analyzed, namely, `quantity_ordered` and `price_each`. A sample of the data is shown below.


In [46]:
# Import libraries
import pandas as pd

# Load data
sales_df = pd.read_csv("data/sales_data_sample.csv")
sales_df.head()

Unnamed: 0,order_number,quantity_ordered,price_each,order_line_number,sales,order_date,status,qtr_id,month_id,year_id,product_line,msrp,product_code,customer_name,phone,address_line1,address_line2,city,state,postal_code,country,territory,contact_last_name,contact_first_name,deal_size
0,10107,30,95.7,2,2871.0,2/24/2003 0:00,Shipped,1,2,2003,Motorcycles,95,S10_1678,Land of Toys Inc.,2125557818,897 Long Airport Avenue,,NYC,NY,10022.0,USA,,Yu,Kwai,Small
1,10121,34,81.35,5,2765.9,5/7/2003 0:00,Shipped,2,5,2003,Motorcycles,95,S10_1678,Reims Collectables,26.47.1555,59 rue de l'Abbaye,,Reims,,51100.0,France,EMEA,Henriot,Paul,Small
2,10134,-41,94.74,2,3884.34,7/1/2003 0:00,Shipped,3,7,2003,Motorcycles,95,S10_1678,Lyon Souveniers,+33 1 46 62 7555,27 rue du Colonel Pierre Avia,,Paris,,75508.0,France,EMEA,Da Cunha,Daniel,Medium
3,10145,45,,6,3746.7,8/25/2003 0:00,Shipped,3,8,2003,Motorcycles,95,S10_1678,Toys4GrownUps.com,6265557265,78934 Hillside Dr.,,Pasadena,CA,90003.0,USA,,Young,Julie,Medium
4,10159,49,100.0,14,5205.27,10/10/2003 0:00,Shipped,4,10,2003,Motorcycles,95,S10_1678,Corporate Gift Ideas Co.,6505551386,7734 Strong St.,,San Francisco,CA,,USA,,Brown,Julie,Medium


In [47]:
# Identify errors and add exceptions to the `get_quantity_ordered_sum()` function
def get_quantity_ordered_sum(sales_quantity_ordered):
    """Calculates the total sum on the 'quantity_ordered' column.

    Args:
        sales_quantity_ordered (pd.core.series.Series): The pandas Series for the 'quantity_ordered' column.

    Returns:
        total_quantity_ordered (int): The total sum of the 'quantity_ordered' column.
    """

    total_quantity_ordered = 0
    try:
        for quantity in sales_quantity_ordered:
            total_quantity_ordered += abs(quantity)
    except TypeError:
            raise TypeError("quantity cannot be negative")
    return total_quantity_ordered

In [48]:
# Identify errors and add exceptions to the `get_price_each_average()` function
def get_price_each_average(sales_price_each, num_places=2):
    """Calculates the average on the 'price_each' column
       using pandas built in methods and rounds to the desired number of places.

    Args:
        sales_price_each (pd.core.series.Series): The pandas Series for the 'price_each' column.
        num_of_places (int): The number of decimal places to round.

    Returns:
        average_price_each (float): The average of the 'price_each' column.
    """
    try:
        total_of_price_each = sales_price_each.sum()
        len_of_price_each = len(sales_price_each)
        average_price_each = round(
            total_of_price_each / len_of_price_each, num_places
        )    
    except TypeError:
        sales_price_each = pd.to_numeric(sales_price_each, errors='coerce')
        sales_price_each.fillna(sales_price_each.mean())
        total_of_price_each = sales_price_each.sum()
        len_of_price_each = len(sales_price_each)
        average_price_each = round(
            total_of_price_each / len_of_price_each, 2
        )        
    return average_price_each

In [49]:
#  Add as many cells as you require 
get_quantity_ordered_sum(sales_df['quantity_ordered'])

99067

In [50]:
sales_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2823 entries, 0 to 2822
Data columns (total 25 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   order_number        2823 non-null   int64  
 1   quantity_ordered    2823 non-null   int64  
 2   price_each          2823 non-null   object 
 3   order_line_number   2823 non-null   int64  
 4   sales               2823 non-null   float64
 5   order_date          2823 non-null   object 
 6   status              2823 non-null   object 
 7   qtr_id              2823 non-null   int64  
 8   month_id            2823 non-null   int64  
 9   year_id             2823 non-null   int64  
 10  product_line        2823 non-null   object 
 11  msrp                2823 non-null   int64  
 12  product_code        2823 non-null   object 
 13  customer_name       2823 non-null   object 
 14  phone               2823 non-null   object 
 15  address_line1       2823 non-null   object 
 16  addres

In [51]:
sales_df.describe()

Unnamed: 0,order_number,quantity_ordered,order_line_number,sales,qtr_id,month_id,year_id,msrp
count,2823.0,2823.0,2823.0,2823.0,2823.0,2823.0,2823.0,2823.0
mean,10258.725115,35.063762,6.466171,3553.889072,2.717676,7.092455,2003.81509,100.715551
std,92.085478,9.845521,4.225841,1841.865106,1.203878,3.656633,0.69967,40.187912
min,10100.0,-41.0,1.0,482.13,1.0,1.0,2003.0,33.0
25%,10180.0,27.0,3.0,2203.43,2.0,4.0,2003.0,68.0
50%,10262.0,35.0,6.0,3184.8,3.0,8.0,2004.0,99.0
75%,10333.5,43.0,9.0,4508.0,4.0,11.0,2004.0,124.0
max,10425.0,97.0,18.0,14082.8,4.0,12.0,2005.0,214.0


In [52]:
total_quantity_ordered = get_quantity_ordered_sum(sales_df['quantity_ordered'])

In [53]:
average_price_each = get_price_each_average(sales_df['price_each'])

In [54]:
print("Total Quantity Ordered is",total_quantity_ordered)

Total Quantity Ordered is 99067


In [55]:
print("Average Price Each is",average_price_each)

Average Price Each is 83.63
