# **Project Name**    -



Project Type - EDA





















































Project Name - FedEx Logistics Performance Analysis








































Contribution - Individual








# **Project Summary -**

In today’s fast-paced global economy, efficient logistics and supply chain management play a crucial role in ensuring business success. FedEx Logistics, a global leader in transportation and supply chain solutions, operates across multiple regions, handling shipments for diverse industries with varying delivery requirements. This project focuses on performing Exploratory Data Analysis (EDA) on a FedEx Logistics dataset to uncover meaningful insights related to shipment performance, delivery timelines, cost efficiency, and operational bottlenecks.

The dataset provides comprehensive information on purchase orders (POs), shipment modes, vendor agreements (INCO terms), delivery schedules, freight costs, and product-specific attributes such as item descriptions and dosage forms. Given the rapid growth of eCommerce and cross-border trade, optimizing these logistics operations is essential for maintaining competitiveness, reducing costs, and improving customer satisfaction. The primary objective of this analysis is to understand how shipments are managed, identify inefficiencies, and highlight opportunities for process optimization.

The EDA process began with data cleaning and preprocessing, including handling missing values, correcting inconsistent entries, and standardizing categorical variables such as shipment mode, country, and vendor terms. This step ensured data accuracy and reliability before conducting further analysis. Descriptive statistics were used to summarize key numerical variables like freight cost, shipment weight, and delivery lead times, providing a high-level overview of the logistics performance.

Univariate and bivariate analyses were then performed to examine patterns and relationships within the data. Shipment modes (air, sea, and ground) were analyzed to compare their impact on delivery timelines and freight costs. The analysis revealed that air shipments generally had shorter delivery times but significantly higher costs, while sea shipments were more cost-effective but prone to longer lead times. This trade-off highlights the importance of selecting shipment methods based on urgency and cost constraints.

Further analysis of INCO terms and vendor agreements showed their influence on freight responsibility and cost distribution. Certain vendor terms were associated with higher freight expenses for FedEx, indicating potential areas for renegotiation or process improvement. Country- and region-wise analysis helped identify geographical bottlenecks, with some regions consistently experiencing delays due to longer customs clearance times or infrastructure limitations.

Delivery performance was also evaluated by comparing planned delivery dates versus actual delivery dates, enabling the identification of delayed shipments. Visualization techniques such as bar charts, box plots, and heatmaps were used extensively to communicate trends, outliers, and correlations in an intuitive manner. These visual insights made it easier to pinpoint high-risk shipment categories and regions requiring operational attention.

Overall, this EDA project demonstrates how data-driven analysis can support strategic decision-making in logistics management. By identifying cost drivers, delay patterns, and inefficiencies, FedEx Logistics can take proactive steps to streamline supply chain operations, optimize shipment planning, and improve vendor coordination. The insights gained from this analysis not only enhance operational efficiency but also contribute to improved customer satisfaction through timely and reliable deliveries. This project highlights the critical role of EDA in transforming raw logistics data into actionable business intelligence.     

# **GitHub Link -**

https://github.com/tinapawar22/Fedex_logistics_EDA

# **Problem Statement**


FedEx Logistics needs to analyze its global shipment data to identify delays, cost inefficiencies, and operational bottlenecks across regions, shipment methods, vendors, and delivery terms in order to optimize supply chain performance, reduce freight costs, and improve delivery timelines and customer satisfaction.



#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from google.colab import files
from google.colab import drive
drive.mount('/content/drive')


### Dataset Loading

In [None]:
# Load Dataset
file_path = '/content/drive/MyDrive/SCMS_Delivery_History_Dataset.csv'
df = pd.read_csv(file_path)




### Dataset First View

In [None]:
# Dataset First Look
#First 5 rows
df.head()


In [None]:
#last 5 rows
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
#column names
df.columns

In [None]:
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
#NO of rows duplicated
df.duplicated().sum()


In [None]:
#view duplicated rows
df[df.duplicated()]


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()


In [None]:
#percentage of missing values
(df.isnull().sum() / len(df)) * 100

In [None]:
# Visualizing the missing values
#barplot for null values


missing = df.isnull().sum()
missing = missing[missing > 0]

plt.figure(figsize=(10,5))
missing.plot(kind='bar')
plt.title("Missing Values Count per Column")
plt.ylabel("Count")
plt.show()




### What did you know about your dataset?

The dataset contains missing values primarily in dosage, shipment mode, and line item insurance columns, indicating partial data recording that requires preprocessing before further analysis.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns = df.columns
columns



In [None]:
# Dataset Describe
a = df.describe()
a

In [None]:
#Describe all columns including categorical
all = df.describe(include='all')
all

In [None]:
#Describe only Numerical columns
num = df.describe(include='number')
num


### Variables Description

The dataset contains 33 variables comprising a mix of categorical, numerical, and date-related features that describe purchase orders, shipment details, delivery timelines, vendor agreements, and cost components. These variables collectively enable analysis of logistics performance, freight costs, delivery efficiency, and operational bottlenecks across regions and shipment modes.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique = df.nunique()
unique


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

#Handling missing values

df['Shipment Mode'] = df['Shipment Mode'].fillna('Unknown')
df['Line Item Insurance (USD)'] = df['Line Item Insurance (USD)'].fillna(0)
df['Dosage'] = df['Dosage'].fillna('Not Specified')

#check null values
df.isnull().sum()

In [None]:
#Data Type conversions(dates)
date_cols = [
    'PQ First Sent to Client Date',
    'PO Sent to Vendor Date',
    'Scheduled Delivery Date',
    'Delivered to Client Date',
    'Delivery Recorded Date'
]

for col in date_cols:
    df[col] = pd.to_datetime(df[col], errors='coerce')

#Validate conversions
df[date_cols].isnull().sum()




In [None]:
# lead time analysis
df['Lead Time (Days)'] = (
    df['Delivered to Client Date'] - df['PO Sent to Vendor Date']
).dt.days




In [None]:
#Missing date flags
df['PO_Date_Missing_Flag'] = df['PO Sent to Vendor Date'].isnull().astype(int)
df['PQ_Date_Missing_Flag'] = df['PQ First Sent to Client Date'].isnull().astype(int)


In [None]:
#Check negative delays
df[df['Lead Time (Days)'] < 0].shape


In [None]:
#Set negative lead time to NaN
df.loc[df['Lead Time (Days)'] < 0, 'Lead Time (Days)'] = np.nan

#Validation step
df['Lead Time (Days)'].describe()



In [None]:
#remove duplicate records
df.drop_duplicates(inplace=True)


In [None]:
#Feature engineering
df['Delivery Delay (Days)'] = (
    df['Delivered to Client Date'] - df['Scheduled Delivery Date']
).dt.days





In [None]:
#Categoriocal Stardazation
df['Shipment Mode'] = df['Shipment Mode'].str.strip().str.title()
df['Country'] = df['Country'].str.strip().str.title()


In [None]:
#Check data type
df['Freight Cost (USD)'].dtype

#Convert freight cost to numeric
df['Freight Cost (USD)'] = pd.to_numeric(
    df['Freight Cost (USD)'],
    errors='coerce'
)
#Validate conversion
df['Freight Cost (USD)'].describe()

#Outlier handling

q1 = df['Freight Cost (USD)'].quantile(0.25)
q3 = df['Freight Cost (USD)'].quantile(0.75)
iqr = q3 - q1





### What all manipulations have you done and insights you found?

Data wrangling was performed to enhance data quality and ensure reliable
analysis. Missing values were handled appropriately, date columns were converted to datetime format with robust error handling, and duplicate or inconsistent records were addressed. Derived features such as lead time and delivery delay indicators were created to evaluate shipment efficiency and process performance.

Negative lead time values caused by data inconsistencies were identified and treated as invalid to maintain analytical accuracy. Missing date flags for PO and PQ dates were introduced to highlight gaps in process tracking and assess their potential impact on delivery performance. Additionally, freight cost values were converted from object to numeric format due to mixed entries, with invalid values coerced to missing values, enabling accurate statistical and outlier analysis.

Overall, these steps ensured a clean, consistent, and analysis-ready dataset while uncovering key process gaps and cost variability in logistics operations.





## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

Univariate Analysis

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# Count plot - Categorical
plt.figure(figsize=(8,4))
sns.countplot(data=df, x='Country')
plt.xticks(rotation=90)
plt.title('Orders by Country')
plt.show()

##### 1. Why did you pick the specific chart?



* Bar plots are perfect for categorical data like Country.



*   They clearly show which categories have the highest or lowest counts.



*   Easy to interpret visually for trends, outliers, and imbalances.











##### 2. What is/are the insight(s) found from the chart?

Countries with the highest number of orders appear to be Zambia, Vietnam, Nigeria, and Côte d’Ivoire.

Several countries have very low order counts, close to zero.

There’s a significant imbalance in order distribution across countries.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Inventory and logistics optimization:

Focus more resources (stock, shipment routes) in countries with high order volume.

Market strategy:

Countries with low orders could be targeted for marketing campaigns or sales expansion.

Risk management:

High dependency on a few countries might be risky; diversification strategies can be considered.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
 #Count/Barplot - Another Categorical
sns.countplot(data=df, x='Shipment Mode')
plt.title('Orders by Shipment Mode')
plt.show()

##### 1. Why did you pick the specific chart?



*   Bar/Count plots are perfect for categorical data like Country
*   They clearly show which categories have the highest or lowest counts



##### 2. What is/are the insight(s) found from the chart?



*   Air mode is the highest used shipment mode followed by truck.
*   ocean being the lowest

*   There is a noticeable “Unknown” category, indicating some orders don’t have shipment mode recorded.










##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Resource allocation: Focus on Air and Truck shipments as they dominate orders.

Data quality improvement: Investigate and clean the “Unknown” records to avoid misinformed decisions.

Process optimization: Ensuring shipment mode is recorded accurately can improve logistics planning and reporting.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
#Histogram - Numerical
sns.histplot(data=df, x='Line Item Value', kde=True, bins=30)
plt.title('Distribution of Line Item Value')
plt.show()

##### 1. Why did you pick the specific chart?

It helps in understanding the frequency of different line item values, detecting skewness, spotting outliers, and identifying the range where most transactions occur.

##### 2. What is/are the insight(s) found from the chart?

The distribution is highly right-skewed (long tail on the right).

Most line items have a value closer to 0, with relatively few very high-value items.

Extreme outliers exist (line item values above 2 million), which could potentially distort averages or financial reporting.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Pricing Strategy: Knowing most items are low-value can guide discount strategies or promotions for smaller orders.

Risk Management: High-value outliers may need special handling, approval, or monitoring to reduce financial risk.

Inventory & Procurement: Helps in planning stock levels for items with frequently lower values versus occasional high-value items.

Revenue Forecasting: Recognizing the skewed distribution prevents overestimation of average order value if high-value items dominate raw calculations.

#### Chart - 4

In [None]:
 df['Weight (Kilograms)'] = pd.to_numeric(df['Weight (Kilograms)'], errors='coerce')
 df['Weight (Kilograms)'].describe()

 df_filtered = df[(df['Weight (Kilograms)'] >= 30) & (df['Weight (Kilograms)'] <= 200)]

# Chart - 4 visualization code
sns.boxplot(x=df_filtered['Weight (Kilograms)'])
plt.title("Boxplot of Weight (Filtered)")
plt.show()


##### 1. Why did you pick the specific chart?

A boxplot is ideal for summarizing the distribution of a numerical variable.

##### 2. What is/are the insight(s) found from the chart?

Median weight appears around 100 kg (middle line in the box).

Interquartile range (IQR) is roughly 50–140 kg, showing that most weights fall in this range.

Symmetry: The box looks relatively symmetric, indicating balanced distribution without heavy skew.

No extreme outliers after filtering, which confirms that the data cleaning worked.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps understand the typical weight range of your population/customers/employees.

Useful for planning logistics, health programs, or product sizing based on realistic weight ranges.

Ensures decisions are based on clean and accurate data, avoiding misleading conclusions due to outliers

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Box plot - Numerical
sns.boxplot(data=df, x='Freight Cost (USD)')
plt.title('Freight Cost Distribution')
plt.show()

##### 1. Why did you pick the specific chart?

A boxplot is ideal for summarizing the distribution of a numerical variable.

##### 2. What is/are the insight(s) found from the chart?

The distribution is heavily right-skewed, meaning most freight costs are on the lower end, but there are a few very high costs (outliers).

The thick central line (the median) is much closer to the lower end.

The box (IQR) is compressed toward the lower values, indicating that the middle 50% of freight costs are relatively low.

There are several points far above the upper whisker — these are extreme freight costs, possibly large shipments or special cases.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Operational Costs: Most shipments cost significantly less than the extreme values, which may indicate standard freight is well-controlled, but a few shipments are driving up average costs.

Business Impact: Investigating outliers could help in negotiating better freight rates or identifying inefficiencies in high-cost shipments.

#### Chart - 6

In [None]:
# Chart - 6 visualization code


product_counts = df['Product Group'].value_counts().reset_index()
product_counts.columns = ['Product Group', 'Count']

fig = px.pie(
    product_counts,
    names='Product Group',
    values='Count',
    hole=0.45,
    title='Product Group Share',
)

fig.update_traces(
    textinfo='percent+label',
    pull=[0.05 if i == 0 else 0 for i in range(len(product_counts))]
)

fig.update_layout(
    legend_title_text='Product Group',
    title_x=0.5
)

fig.show()




##### 1. Why did you pick the specific chart?

A pie chart is ideal when you want to show proportions of categories relative to the whole.

Here, you want to visualize how each Product Group contributes to the total count.

##### 2. What is/are the insight(s) found from the chart?

ARV dominates the product mix with 82.8%, indicating it is the most common or popular product group.

HRDT is the second most significant with 16.7%, but much smaller than ARV.

Other product groups (like AM, MPT, etc.) are negligible, each less than 1%, suggesting low contribution or low demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Sales forecasting: Knowing which products dominate helps in planning for production, supply chain, and promotions.

Revenue optimization: Enhancing focus on high-share products could increase profitability and efficiency.

In [None]:
#Plotly Histogram - Numerical
fig = px.histogram(df, x='Line Item Quantity', nbins=50, title='Line Item Quantity Distribution')
fig.show()

Bivariate Analysis

#### Chart - 7

In [None]:
# Chart - 7 visualization code
#Scatter plot
sns.scatterplot(data=df, x='Line Item Quantity', y='Line Item Value', hue='Shipment Mode')
plt.title('Quantity vs Value by Shipment Mode')
plt.show()

# Box plot - Categorical vs Numerical
sns.boxplot(data=df, x='Shipment Mode', y='Line Item Value')
plt.title('Value by Shipment Mode')
plt.show()

##### 1. Why did you pick the specific chart?

boxplot: To compare the distribution, median, and outliers of shipment values across different shipment modes.



























Scatterplot: To visualize the relationship between shipment quantity and value while comparing patterns across different shipment modes.

##### 2. What is/are the insight(s) found from the chart?

Scatterplot:
Higher quantities generally lead to higher values, with Truck and Air Charter shipments showing the strongest positive correlation.




Boxplot:
Air Charter and Ocean shipments have higher median values and greater variability, indicating higher-cost and high-risk shipments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Scatterplot:
Yes, it helps identify cost-intensive shipment modes and supports data-driven decisions on mode selection and volume optimization.



Boxplot:
Yes, it enables better cost control and risk management by highlighting shipment modes with high value dispersion.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
#Violin plot - Categorical vs Numerical
sns.violinplot(data=df, x='Managed By', y='Freight Cost (USD)')
plt.xticks(rotation=45)
plt.show()

# 4. Strip plot - Categorical vs Numerical

fig = px.strip(
    df,
    x='Country',
    y='Weight (Kilograms)',
    title='Shipment Weight Distribution by Country'
)

fig.update_layout(
    xaxis_title='Country',
    yaxis_title='Weight (Kilograms)',
    xaxis_tickangle=90,
    title_x=0.5
)

fig.show()

##### 1. Why did you pick the specific chart?

violin plot:
A violin plot was chosen to visualize the full distribution, density, and outliers of freight costs across managing entities.


strip plot:
A violin plot was chosen to visualize the full distribution, density, and outliers of freight costs across managing entities.

##### 2. What is/are the insight(s) found from the chart?

violin plot:
PMO-US shows a wide cost distribution with extreme high-value outliers, while field offices handle relatively lower and more stable freight costs.

strip plot:
A few countries account for extremely high shipment weights, while most countries handle lighter and more consistent shipments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

violin plot:
Yes, it highlights cost concentration and variability by managing unit, enabling better freight cost control and oversight.


strip plot:
Yes, it supports country-level capacity planning and logistics optimization by focusing on high-volume shipment regions.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
 #Matplotlib scatter with color - Numerical vs Numerical
plt.scatter(df['Pack Price'], df['Unit Price'], c=df['Line Item Quantity'], cmap='viridis')
plt.colorbar(label='Line Item Quantity')
plt.xlabel('Pack Price')
plt.ylabel('Unit Price')
plt.title('Pack vs Unit Price Colored by Quantity')
plt.show()

# 6. Plotly Scatter - Numerical vs Numerical
fig = px.scatter(df, x='Weight (Kilograms)', y='Freight Cost (USD)', color='Shipment Mode', size='Line Item Value', title='Weight vs Freight Cost')
fig.show()

##### 1. Why did you pick the specific chart?

Scatter Plot: Pack Price vs Unit Price




A scatter plot was chosen to analyze the relationship between pack price and unit price while observing quantity intensity through color encoding.


Scatter Plot: Weight vs Freight Cost




This chart was selected to examine how freight cost scales with shipment weight across different shipment modes.

##### 2. What is/are the insight(s) found from the chart?

Scatter Plot: Pack Price vs Unit Price


Most high-quantity items cluster at low unit prices, while unusually high unit prices appear only for low-quantity orders, indicating pricing inefficiencies or exceptions.


Scatter Plot: Weight vs Freight Cost

Freight cost generally increases with weight, with Air and Air Charter modes showing significantly higher costs compared to Truck and Ocean shipments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Scatter Plot: Pack Price vs Unit Price

Yes, it helps identify pricing anomalies and supports bulk pricing optimization and cost-control strategies.



Scatter Plot: Weight vs Freight Cost


Yes, it enables smarter shipment-mode selection and cost-efficient logistics planning based on weight and urgency.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
#Pair plot - Numerical + Categorical hue
sns.pairplot(df[['Line Item Quantity','Line Item Value','Weight (Kilograms)','Freight Cost (USD)','Shipment Mode']], hue='Shipment Mode')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter matrix was chosen to simultaneously analyze pairwise relationships and distributions among multiple numerical variables across shipment modes.

##### 2. What is/are the insight(s) found from the chart?

Line item quantity, value, weight, and freight cost show positive relationships, with Truck and Air Charter modes driving higher volumes and costs, while Air shows more variability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it helps identify key cost drivers and shipment-mode patterns, enabling better pricing, logistics planning, and cost optimization decisions.

Multivariate Analysis

#### Chart - 11

In [None]:
# Chart - 11 visualization code

# Top 6 countries
top_countries = df['Country'].value_counts().nlargest(6).index
filtered_df = df[df['Country'].isin(top_countries)]

fig = px.histogram(
    filtered_df,
    x='Line Item Value',
    facet_col='Shipment Mode',
    facet_row='Country',
    title='Line Item Value Distribution (Top Countries)',
    nbins=30
)

fig.update_layout(height=900, title_x=0.5)
fig.show()

##### 1. Why did you pick the specific chart?

This histogram helps visualize the distribution and skewness of Line Item Value across shipment modes and top countries, making comparison easy.

##### 2. What is/are the insight(s) found from the chart?

Most shipments have low line item values with a strong right skew, while a few high-value shipments significantly impact overall value, varying by shipment mode and country.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, identifying high-value shipment patterns helps optimize logistics strategy and focus on profitable shipment modes, while outliers may indicate cost or risk areas needing control.

#### Chart - 12

In [None]:
# Chart - 12 visualization code


# Keep only numeric rows, drop missing or invalid values
df_clean = df[['Line Item Quantity', 'Line Item Value', 'Freight Cost (USD)', 'Weight (Kilograms)', 'Shipment Mode']].copy()
df_clean = df_clean.dropna()  # remove rows with NaN
df_clean = df_clean[df_clean['Line Item Quantity'].apply(lambda x: str(x).replace('.', '', 1).isdigit())]
df_clean = df_clean[df_clean['Line Item Value'].apply(lambda x: str(x).replace('.', '', 1).isdigit())]
df_clean = df_clean[df_clean['Freight Cost (USD)'].apply(lambda x: str(x).replace('.', '', 1).isdigit())]
df_clean = df_clean[df_clean['Weight (Kilograms)'].apply(lambda x: str(x).replace('.', '', 1).isdigit())]

# Convert to numeric just in case
df_clean['Line Item Quantity'] = pd.to_numeric(df_clean['Line Item Quantity'])
df_clean['Line Item Value'] = pd.to_numeric(df_clean['Line Item Value'])
df_clean['Freight Cost (USD)'] = pd.to_numeric(df_clean['Freight Cost (USD)'])
df_clean['Weight (Kilograms)'] = pd.to_numeric(df_clean['Weight (Kilograms)'])

# 3D scatter plot
fig = px.scatter_3d(df_clean,
                    x='Line Item Quantity',
                    y='Line Item Value',
                    z='Freight Cost (USD)',
                    color='Shipment Mode',
                    size='Weight (Kilograms)',
                    title='3D Scatter: Quantity, Value & Freight')
fig.show()


##### 1. Why did you pick the specific chart?

This scatter plot helps analyze the relationship between Line Item Quantity, Line Item Value, and Freight Cost while comparing shipment modes simultaneously.

##### 2. What is/are the insight(s) found from the chart?

Freight cost generally increases with higher quantities and values, but air and air charter shipments show significantly higher costs even at lower quantities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it highlights cost-intensive shipment modes, enabling better mode selection and cost optimization, while excessive air usage may negatively impact profitability if not justified.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
sns.catplot(data=df, x='Shipment Mode', y='Line Item Value', hue='Country', kind='bar', height=5, aspect=2)
plt.show()

##### 1. Why did you pick the specific chart?

This bar chart compares Line Item Value across different shipment modes and countries, making it easy to identify high-value shipment modes.

##### 2. What is/are the insight(s) found from the chart?

Ocean shipments contribute the highest line item values, while air and truck shipments mostly handle lower-value transactions across countries.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, focusing on ocean shipments for bulk, high-value orders can improve cost efficiency, while excessive reliance on air may reduce margins if not strategically used.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(10,6))
sns.heatmap(df[['Line Item Quantity','Line Item Value','Weight (Kilograms)','Freight Cost (USD)','Pack Price','Unit Price']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

A correlation heatmap clearly shows the strength and direction of relationships between key numerical variables in one view.

##### 2. What is/are the insight(s) found from the chart?

Line Item Quantity is strongly correlated with Line Item Value, while Freight Cost shows a moderate positive correlation with quantity and value; unit and pack prices have weak correlations.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
pivot = df.pivot_table(index='Country', columns='Shipment Mode', values='Line Item Value', aggfunc='mean')
plt.figure(figsize=(8,6))
sns.heatmap(pivot, annot=True, fmt=".1f", cmap='YlGnBu')
plt.title('Average Line Item Value by Country & Shipment Mode')
plt.show()

##### 1. Why did you pick the specific chart?

This heatmap effectively compares average Line Item Value across countries and shipment modes in a single view

##### 2. What is/are the insight(s) found from the chart?

Ocean and air charter modes show higher average line item values in several countries, while air and unknown modes generally have lower values.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Increase revenue by focusing on high-value markets & products

 Ocean shipments carry higher value orders, and countries like Zambia, Vietnam, Nigeria, and Côte d’Ivoire dominate high-value transactions. ARV product group accounts for 82.8% of sales.

 Business Impact: Target marketing and sales efforts on these high-value countries and products; focus promotions and stock on ARV products to maximize revenue.

2. Optimize shipping & reduce logistics costs

Explanation: Freight cost increases with weight and quantity, especially for Air and Air Charter modes. Ocean shipments are more cost-efficient for high-value bulk orders. Outliers in freight cost indicate potential inefficiencies.

Business Impact: Prioritize cost-effective shipment modes for bulk orders, negotiate better rates, and reduce excessive reliance on high-cost modes like Air for low-value shipments.

3. Improve pricing strategy & encourage bulk orders

Explanation: Most high-quantity items cluster at low unit prices; high unit prices occur only for small orders. Line Item Quantity is strongly correlated with Line Item Value.

Business Impact: Introduce tiered pricing or volume discounts to encourage bulk orders, correct pricing anomalies, and maximize profitability.

4. Enhance operational efficiency via manager performance insights

Explanation: Freight cost patterns differ across managers, showing some handle high-cost shipments more efficiently than others. Most shipments have controlled costs, but outliers exist.

Business Impact: Identify high-performing managers’ practices for replication, provide targeted training, and investigate outliers to improve operational efficiency and cost control.

5. Enable predictive planning using correlated variables

Explanation: Strong positive correlations exist between Line Item Quantity, Value, Weight, and Freight Cost. Moderate correlations suggest predictable relationships that can inform forecasts.

Business Impact: Use correlations for demand forecasting, inventory planning, and predictive logistics models to reduce stockouts, overstock, and optimize resource allocation.



# **Conclusion**



Analysis of the logistics dataset reveals that a few countries and product groups drive most orders and high-value shipments, while shipment modes and freight costs vary significantly, impacting profitability. Pricing and quantity patterns suggest opportunities for tiered pricing and bulk discounts. Operational differences across managers and strong correlations among key variables highlight chances for process optimization and predictive planning. Leveraging these insights can maximize revenue, reduce costs, and improve strategic decision-making across marketing, logistics, and inventory management.





### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***