# Iris ecommerce store project  â€“ Final Report

## Project Goal
The goal of this project is to analyze sales data in order to understand
revenue trends, identify top-performing products, and evaluate customer
contribution to total revenue.

## Datasets
The analysis is based on :
- Iris orders dataset
- Iris order items dataset
- Iris customers dataset
  
**Note**: These datasets are cleaned in the data_cleaning notebook

## Time Period
The data covers sales from January 2016 to December 2018.

## Key Findings

This section highlights the most important results from the analysis,
including top-performing products, revenue trends over time,
and customer contribution to total revenue.


### Top 10 products by revenue

In [11]:
import pandas as pd 
top_products_by_revenue = pd.read_csv("../output/Key_analysis_findings/top_products_by_revenue.csv")
top_products_by_revenue.head(10)

Unnamed: 0,product_id,product_revenue
0,bb50f2e236e5eea0100680137654686c,63885.0
1,6cdd53843498f92890544667809f1595,54730.2
2,d6160fb7873f184099d9bc95e30376af,48899.34
3,d1c427060a0f73f6b889a5c7c61f2ac4,47214.51
4,99a4788cb24856965c36a24e339b6058,43025.56
5,3dd2a17168ec895c781a9191c1e95ad7,41082.6
6,25c38557cf793876c5abdd5931f922db,38907.32
7,5f504b3a1c75b73d6151be81eb05bdc9,37733.9
8,53b36df67ebb7c41585e8d54d6772e08,37683.42
9,aca2eb7d00ea1a7b8ebd4e68314663af,37608.9


<img src="../output/charts/top_products_by_revenue.png" style="width:100%; height:auto;">

> As we can see in the chart and the table, the top product by revenue is with the **ID** : *bb50f2e236e5eea0100680137654686c*

### Revenue trend over time 

<img src="../output/charts/revenue_over_time.png" style="width:100%; height:auto;">

> As we can see in the chart, the most noticeable revenue trend continued from the month period  2017-10 to 2018-07

### Total orders per year/month

#### Per month 

In [12]:
total_orders_per_month = pd.read_csv("../output/Key_analysis_findings/orders_per_month.csv")
total_orders_per_month

Unnamed: 0,month,orders_count
0,8,10784
1,5,10563
2,7,10312
3,3,9888
4,6,9408
5,4,9339
6,2,8490
7,1,8065
8,11,7535
9,12,5667


<img src="../output/charts/orders_per_month.png" style="width:100%; height:auto;">

> As we can see in the chart and the table, the month with the highest amount of orders is **August(8)**, with **May(5)** getting a similar result 

#### Per year 

In [13]:
total_orders_per_year = pd.read_csv("../output/Key_analysis_findings/orders_per_year.csv")
total_orders_per_year

Unnamed: 0,year,orders_count
0,2018,53929
1,2017,45029
2,2016,323


<img src="../output/charts/orders_per_year.png" style="width:100%; height:auto;">

> As we can see in the chart and the table, the year with the highest amount of orders is 2018 

### Top customers by total spend with contribution (%)

In [14]:
top_customers = pd.read_csv("../output/Key_analysis_findings/top_customers.csv")
top_customers.head(10)

Unnamed: 0,customer_id,total_spent,contribution_pct
0,1617b1357756262bfa56ab541c47bc16,13440.0,0.098896
1,ec5b2ba62e574342386871631fafd3fc,7160.0,0.052686
2,c6e2731c5b391845f6800c97401a43a9,6735.0,0.049559
3,f48d464a0baaea338cb25f816991ab1f,6729.0,0.049514
4,3fd6777bbce08a352fddd04e4a7cc8f6,6499.0,0.047822
5,05455dfa7cd02f13d132aa7a6a9729c6,5934.6,0.043669
6,df55c14d1476a9a3467f131269c2477f,4799.0,0.035313
7,24bbf5fd2f2e1b359ee7de94defc4a15,4690.0,0.034511
8,e0a2412720e9ea4f26c1ac985f6a7358,4599.9,0.033848
9,3d979689f636322c62418b6346b1c6d2,4590.0,0.033775


<img src="../output/charts/top_customers.png" style="width:100%; height:auto;">

> As we can see in the chart and the table, the customer with the highest spent is with the **ID**: *1617b1357756262bfa56ab541c47bc16* holding a percentage of 0.098% from the total revenue 

### Cities/States with the most customers

#### Per city 

In [15]:
top_cities = pd.read_csv("../output/Key_analysis_findings/top_cities.csv")
top_cities.head(10)

Unnamed: 0,customer_city,customers
0,sao paulo,14967
1,rio de janeiro,6610
2,belo horizonte,2671
3,brasilia,2067
4,curitiba,1463
5,campinas,1396
6,porto alegre,1325
7,salvador,1209
8,guarulhos,1151
9,sao bernardo do campo,907


<img src="../output/charts/customers_city_distribution.png" style="width:100%; height:auto;">

> As we can see in the output, the city that holds the highest amount of customers is **sao paulo**

#### Per state

In [16]:
top_states = pd.read_csv("../output/Key_analysis_findings/top_states.csv")
top_states.head(10)

Unnamed: 0,customer_state,customers
0,SP,40255
1,RJ,12370
2,MG,11249
3,RS,5271
4,PR,4877
5,SC,3531
6,BA,3276
7,DF,2073
8,ES,1962
9,GO,1952


<img src="../output/charts/customers_state_distribution.png" style="width:100%; height:auto;">

> As we can see in the output, the statecity that holds the highest amount of customers is **SP**

### Cities/States with the most orders

#### Per city 

In [19]:
top_cities_by_orders = pd.read_csv("../output/Key_analysis_findings/order_count_city.csv")
top_cities_by_orders.head(10)

Unnamed: 0,customer_city,orders
0,sao paulo,15511
1,rio de janeiro,6870
2,belo horizonte,2768
3,brasilia,2128
4,curitiba,1519
5,campinas,1440
6,porto alegre,1377
7,salvador,1245
8,guarulhos,1187
9,sao bernardo do campo,937


<img src="../output/charts/order_city_count.png" style="width:100%; height:auto;">

> As we can see in the output, the city with the most orders is sau paulo 

#### Per state

In [23]:
top_cities_by_orders = pd.read_csv("../output/Key_analysis_findings/order_count_state.csv")
top_cities_by_orders.head(10)

Unnamed: 0,customer_state,orders
0,SP,41667
1,RJ,12832
2,MG,11619
3,RS,5456
4,PR,5038
5,SC,3631
6,BA,3378
7,DF,2137
8,ES,2031
9,GO,2018


<img src="../output/charts/order_state_count.png" style="width:100%; height:auto;">

> As we can see in the output, the state with the most orders is **SP**

### Top sellers by revenue

In [26]:
top_sellers = pd.read_csv("../output/Key_analysis_findings/top_sellers.csv")
top_sellers.head(10)

Unnamed: 0,seller_id,seller_revenue
0,4869f7a5dfa277a7dca6462dcf3b52b2,229472.63
1,53243585a1d6dc2643021fd1853d8905,222776.05
2,4a3ca9315b744ce9f8e9374361493884,200472.92
3,fa1c13f2614d7b5c4749cbc52fecda94,194042.03
4,7c67e1448b00f6e969d365cea6b010ab,187923.89
5,7e93a43ef30c4f03f38b393420bc753a,176431.87
6,da8622b14eb17ae2831f4ac5b9dab84a,160236.57
7,7a67c85e85bb2ce8582c35f2203ad736,141505.56
8,1025f0e2d44d7041d6cf58b6550e0bfa,138968.55
9,955fee9216a65b617aa5c0531780ce60,135171.7


<img src="../output/charts/top_sellers_by_revenue.png" style="width:100%; height:auto;">

> As we can see in the output, the seller with the highest revenue is with the **ID** : *4869f7a5dfa277a7dca6462dcf3b52b2*

### Top sellers by the average delivery delay

In [29]:
top_sellers_by_delay = pd.read_csv("../output/Key_analysis_findings/c.csv")
top_sellers_by_delay.head(10)

Unnamed: 0,seller_id,avg_delay
0,df683dfda87bf71ac3fc63063fba369d,167.0
1,8e670472e453ba34a379331513d6aab1,35.0
2,8629a7efec1aab257e58cda559f03ba7,33.0
3,391bbd13b6452244774beff1824006ed,24.0
4,be1e9e378700cecaa4ebf71433d7915c,23.5
5,8fec2e460530482132c436cfb5439925,22.0
6,586a871d4f1221763fddb6ceefdeb95e,22.0
7,2a50b7ee5aebecc6fd0ff9784a4747d6,17.0
8,20d53aad4fe5ee93a64f8839609d3586,17.0
9,a154d7316f158bb42e6fa18bbe3afd3a,16.5


<img src="../output/charts/top_sellers_by_delay.png" style="width:100%; height:auto;">

>As we can see in the output, the seller with the highest average delay is with the **ID** : *df683dfda87bf71ac3fc63063fba369d*  averaging 167.0 days 	

### On-time vs late deliveries

In [32]:
ontime_late = pd.read_csv("../output/Key_analysis_findings/On_time_VS_Late_deliveries.csv")
ontime_late

Unnamed: 0,delivery_status,deliveries_count
0,On-time,91454
1,Late,7827


<img src="../output/charts/on-time_vs_late_deliveries.png" style="width:100%; height:auto;">

> As we can see in the output, the majority of deliveries arrives on-time, while a smaller portion of the deliveries arrives late

### Least ordered products

In [44]:
least_ordered_products = pd.read_csv("../output/Key_analysis_findings/least_ordered_products.csv")
least_ordered_products.head(10)

Unnamed: 0,product_id,orders_count
0,001c5d71ac6ad696d22315953758fa04,1
1,001b237c0e9bb435f2e54071129237e9,1
2,ffdde3d63e889c9a9f9ec30d82a4c815,1
3,002c6dab60557c48cfd6c2222ef7fd76,1
4,ffd246249e3225c13f40b5b91dcaa65a,1
5,ffd259a48b9b073c942884d0f3659566,1
6,ffd63ee42a5c8cc5a15a1c8e2aa50011,1
7,ffcfaba393e8ef71937c6e8421bc2868,1
8,004154251837f6ac124ad4374b3a8148,1
9,0042f1a9a7e0edd1400c6cd0fda065f8,1


In [47]:
ordered_once = pd.read_csv("../output/Key_analysis_findings/ordered_once.txt")
ordered_once

Unnamed: 0,18117


> As shown in the output, we can see 18117 products that were only ordered once