# 1. Introduction
　　　　This project analyzes supply chain data to identify key trends, risks, and opportunities for business improvement. The dataset is sourced from Kaggle (DataCo Smart Supply Chain for Big Data Analysis : https://www.kaggle.com/datasets/shashwatwork/dataco-smart-supply-chain-for-big-data-analysis?select=DescriptionDataCoSupplyChain.csv).Usinga this data, we assess delivery performance, profitability, and customer behaviors. The goal is to discover critical strengths and weaknesses that impact overall revenue and efficiency.

　　　　Through visual analytics and statistical evaluation, this report highlights actionable findings. The analysis aims to support decision-makers in prioritizing improvements that will increase profit and optimize operations.


# Data Preparation
## Import csv files
### Import Libraries

In [None]:
%pip install python-dotenv
%pip install seaborn


import os
from dotenv import load_dotenv
import pandas as pd
import numpy as np
import psycopg2
import pandas as pd
from psycopg2 import sql
from sqlalchemy import create_engine, text
import seaborn as sns
import matplotlib.pyplot as plt

### create .env

In [None]:
# Load environment variables
load_dotenv(override=True)

# Test if variables are loaded
db_host = os.getenv('DB_HOST')
db_name = os.getenv('DB_NAME')
db_user = os.getenv('DB_USER')
db_password = os.getenv('DB_PASSWORD')
db_port = os.getenv('DB_PORT')
database_url = os.getenv("DATABASE_URL")
secret_key = os.getenv("SECRET_KEY")
debug_mode = os.getenv("DEBUG")

# file path
supply_chain_file_path = "../resources/DataCoSupplyChainDataset.csv"
access_log_file_path = "../resources/tokenized_access_logs.csv"

print("✓ Environment variables loaded:")
# print(f"DB_HOST: {os.getenv('DB_HOST')}")
# print(f"DB_NAME: {os.getenv('DB_NAME')}")
# print(f"DB_USER: {os.getenv('DB_USER')}")
# print(f"DB_PASSWORD: {os.getenv('DB_PASSWORD')}")
# print(f"DB_PORT: {os.getenv('DB_PORT')}")
# print(f"Database URL: {database_url}")
# print(f"Secret Key: {secret_key}")
# print(f"Debug Mode: {debug_mode}")

## Create Tables and Import Data Using Python

In [None]:
import psycopg2
import pandas as pd
from psycopg2 import sql

conn_params = {
    'host':     db_host,
    'database': db_name,
    'user':     db_user,
    'password': db_password,
    'port':     db_port
}

try:
    conn = psycopg2.connect(**conn_params)
    conn.autocommit = True
    cursor = conn.cursor()
    cursor.execute("CREATE DATABASE final_project;")
    print("Database created successfully!")
    
except psycopg2.errors.DuplicateDatabase:
    print("Database already exists")

except Exception as e:
    print(f"Error: {e}")

finally:
    cursor.close()
    conn.close()

### Create Tables from Your CSV Files

In [None]:
# Connect to your project database
conn_params['database'] = os.getenv('DB_NAME')

try:
    conn = psycopg2.connect(**conn_params)
    cursor = conn.cursor()
    
    # Create table with proper data types
    create_table_query = """
    CREATE TABLE IF NOT EXISTS supply_chain_df (
        type VARCHAR(50),
        days_for_shipping_real INTEGER,
        days_for_shipment_scheduled INTEGER,
        benefit_per_order NUMERIC(10,2),
        sales_per_customer NUMERIC(10,2),
        delivery_status VARCHAR(50),
        late_delivery_risk INTEGER,
        category_id INTEGER,
        category_name VARCHAR(100),
        customer_city VARCHAR(100),
        customer_country VARCHAR(100),
        customer_email VARCHAR(150),
        customer_fname VARCHAR(100),
        customer_id INTEGER,
        customer_lname VARCHAR(100),
        customer_password VARCHAR(100),
        customer_segment VARCHAR(50),
        customer_state VARCHAR(100),
        customer_street VARCHAR(200),
        customer_zipcode VARCHAR(20),
        department_id INTEGER,
        department_name VARCHAR(100),
        latitude NUMERIC(10,6),
        longitude NUMERIC(10,6),
        market VARCHAR(50),
        order_city VARCHAR(100),
        order_country VARCHAR(100),
        order_customer_id INTEGER,
        order_date DATE,
        order_id INTEGER PRIMARY KEY,
        order_item_cardprod_id INTEGER,
        order_item_discount NUMERIC(10,2),
        order_item_discount_rate NUMERIC(5,4),
        order_item_id INTEGER,
        order_item_product_price NUMERIC(10,2),
        order_item_profit_ratio NUMERIC(5,4),
        order_item_quantity INTEGER,
        sales NUMERIC(10,2),
        order_item_total NUMERIC(10,2),
        order_profit_per_order NUMERIC(10,2),
        order_region VARCHAR(50),
        order_state VARCHAR(100),
        order_status VARCHAR(50),
        order_zipcode VARCHAR(20),
        product_card_id INTEGER,
        product_category_id INTEGER,
        product_description TEXT,
        product_image VARCHAR(200),
        product_name VARCHAR(200),
        product_price NUMERIC(10,2),
        product_status INTEGER,
        shipping_date DATE,
        shipping_mode VARCHAR(50)
    );
    """
    
    cursor.execute(create_table_query)
    conn.commit()
    print("✓ Table created successfully!")
    
except Exception as e:
    print(f"Error: {e}")
    conn.rollback()
    
finally:
    cursor.close()
    conn.close()


### Import DataCoSupplyChainDataset


In [None]:

db_url = f"postgresql://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}"
engine = create_engine(db_url)

supply_chain_df = pd.read_csv('resources/DataCoSupplyChainDataset.csv')
supply_chain_df.columns = supply_chain_df.columns.str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
supply_chain_df.to_sql('supply_chain_df', engine, if_exists='replace', index=False)

print(f"✓ Successfully imported {len(supply_chain_df)} rows!")

### Import tokenized_access_logs

In [None]:
# Connection parameters
try:
    conn = psycopg2.connect(**conn_params)
    cursor = conn.cursor()
    
    # CREATE TABLE SQL statement
    create_table_sql = """
    CREATE TABLE IF NOT EXISTS access_log_df (
        product VARCHAR(200),
        category VARCHAR(100),
        date DATE,
        month VARCHAR(20),
        hour TIME,
        department VARCHAR(100),
        ip VARCHAR(50),
        url TEXT
    );
    """
    
    cursor.execute(create_table_sql)
    conn.commit()
    print("✓ Table created successfully!")
    
except Exception as e:
    print(f"Error: {e}")
    conn.rollback()
finally:
    cursor.close()
    conn.close()


In [None]:

access_log_df = pd.read_csv('resources/tokenized_access_logs.csv')
access_log_df.columns = access_log_df.columns.str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
access_log_df.to_sql('access_log_df', engine, if_exists='replace', index=False)

print(f"✓ Successfully imported {len(access_log_df)} rows!")

In [None]:
# Write SQL query
conn = psycopg2.connect(**conn_params)
cur = conn.cursor()
query = "SELECT * FROM supply_chain_df;"
supply_chain_df = pd.read_sql_query(query, conn)

# Convert to lowercase all
for col in supply_chain_df.select_dtypes(include=['object']).columns:
    supply_chain_df[col] = supply_chain_df[col].str.lower()

supply_chain_df.head()

In [None]:
access_log_df = pd.read_sql_query("SELECT * FROM access_log_df", conn)

for col in access_log_df.select_dtypes(include=['object']).columns:
    access_log_df[col] = access_log_df[col].str.lower()

access_log_df.head()

In [None]:
pd.set_option('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', 200)

# 2. Data Analysis
- Delivery Performance Analysis: Investigate shipping risks and delivery times.

- Financial Performance Analysis: Review profit and loss by payment type and category.

- Customer & Geographic Analysis: Examine results by customer segments and regions.

- Product & Category Performance: Compare top products and categories.

- Website Traffic Analysis: Analyze site visits and peak hours.

- Web Traffic vs. Sales Conversion: Identify departments with high traffic and low sales.

- Executive Summary & Recommendations: Summarize insights and propose actions.



# 3. Insights & Interpretation
## Key findings, trends, and interpretations
- Delivery delays are more frequent in specific shipping modes, affecting customer satisfaction and revenue.

- Loss-making product categories were consistently identified, requiring targeted fix.

- Certain customer segments and regions show stronger sales performance and higher profitability.

- Website traffic patterns highlight peak hours, but some departments fail to convert visits into sales efficiently.

#### The biggest opportunities for revenue recovery in the following areas:

**Improving sales conversion**:   
Many website visitors do not complete purchases. Targeting departments and processes with high traffic but low conversion rates can unlock substantial new revenue.

**Fixing late deliveries**:    
Orders with high delivery risk result in customer dissatisfaction and lost sales. Optimizing shipping and logistics processes will directly boost results.

**Boosting top product categories**:    
The most profitable categories still have room for growth. Focused marketing and operations improvements here can yield significant gains.

These trends highlight where changes can make the most impact. 



## Main strengths and weaknesses 
### Strengths

- Strong profitable categories - leverage these

-  Good market presence - multiple profitable markets

-  High web traffic - strong online presence
-  Identifiable peak hours - optimize operations

### Weaknesses

- High late delivery risk impacts sales and reputation.

- Several product categories are unprofitable.

- Conversion gaps lead to missed potential revenue.

- Loss-making categories.





# 4. Recommendations

Based on the findings, we recommend focusing on the following actions to improve business results:

- Address late delivery risks by optimizing shipping methods and logistics processes.

- Eliminate or fix unprofitable payment types and product categories to improve profitability.

- Enhance sales conversion rates by targeting departments and segments with high traffic but low sales.

- Leverage strongest customer segments and regions for marketing and expansion efforts.

- Monitor website traffic and sales alignment to ensure growth opportunities are not missed.

- Implementing these actions can help unlock significant revenue opportunity and strengthen overall business performance.



# 5. Conclusion

   Our analysis provides a clear overview of the supply chain and sales performance, revealing both strengths and areas for improvement. This demonstrates that it can drive the largest business improvements to focus on better sales conversion, delivery performance, and top category growth. By prioritizing these high-potential areas, the company could increase profitability and achieve sustainable growth. Continuous monitoring and action on these key issues would be essential for future success.



# 