# **Project Name**    - FedEx Delivery Performance Analysis using EDA



##### **Project Type**    - Exploratory Data Analysis (EDA)


---


##### **Contribution**    - Individual


# **Project Summary -**

This project, titled "FedEx", focuses on performing Exploratory Data Analysis (EDA) on SCMS delivery history data using Google Colab. The primary goal of the analysis was to understand the key patterns and relationships between different variables that influence the shipment and delivery performance of FedEx. By leveraging Python libraries such as Pandas, Seaborn, Matplotlib, NumPy, and Plotly, a total of 15 interactive and static visualizations were created to uncover business insights and support data-driven decision-making.

The dataset includes fields such as delivery dates, shipment modes, freight costs, product classifications, weight, vendor details, and financial metrics like line item value and insurance. Initially, the data underwent cleaning and preprocessing, including the conversion of relevant columns to numeric formats and handling of missing or inconsistent values (e.g., "Weight Captured Separately"). This step ensured that the dataset was ready for meaningful statistical analysis and visualization.

Among the visualizations created, a Pair Plot played a crucial role in identifying correlations and patterns among multiple numeric variables such as Line Item Quantity, Value, Pack Price, Unit Price, Freight Cost, and Weight. The pair plot helped visually assess which variables move together and identify potential outliers. For instance, a positive correlation was observed between Line Item Quantity and Line Item Value, which confirms that larger shipments typically bring higher value. Similarly, Freight Cost increased with Weight, highlighting the direct relationship between package weight and shipping cost.

Another significant visualization was the Correlation Heatmap, which numerically and visually represented the strength of relationships between variables. This allowed easy identification of strongly and weakly correlated features, aiding in understanding operational dependencies.

Visual storytelling using bar charts, line charts, pie charts, and distribution plots further revealed trends such as the most used shipment modes, most frequent vendors, top product categories, and delivery timelines. Time-series visualizations also revealed seasonal or time-bound patterns in delivery delays and costs.

These insights carry positive business implications. For example, by identifying products or vendors associated with high freight costs or delays, the logistics team can take targeted actions such as negotiating better terms, optimizing shipment routes, or reconsidering vendor relationships. Additionally, aligning the pricing strategy by understanding pack and unit price correlations ensures accurate cost estimations and profitability.

On the other hand, certain findings highlighted areas of potential negative growth, such as high freight costs for low-value shipments or inconsistent insurance costs. These trends suggest inefficiencies that could reduce margins if left unaddressed. The analysis thus encourages proactive intervention to minimize waste and maximize operational efficiency.

In conclusion, this project successfully demonstrates the power of EDA and data visualization in extracting actionable insights from complex delivery data. It not only supports better logistical planning and vendor management but also drives strategic decisions to enhance overall business performance. Through thoughtful storytelling and visual experimentation, this analysis provides a strong foundation for continuous improvement in FedEx's delivery operations.



# **GitHub Link -**

*Provide* your GitHub Link here.

# **Problem Statement**


The project aims to analyze FedEx delivery history data to uncover patterns, inefficiencies, and relationships between shipment variables such as cost, weight, delivery timelines, and vendor performance. The goal is to use Exploratory Data Analysis (EDA) and data visualization techniques to derive actionable insights that can help improve logistics efficiency, reduce operational costs, and enhance delivery performance.

#### **Define Your Business Objective?**

The objective of this project is to analyze SCMS FedEx delivery history data to identify key factors impacting shipment efficiency, delivery performance, and cost-effectiveness. By leveraging data visualization and storytelling through EDA, the goal is to uncover actionable insights that support better decision-making in areas such as vendor selection, shipment planning, cost control, and overall logistics optimization.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as mplt
import matplotlib as mpl
import numpy as np
import plotly.express as px
import plotly as plt

In [None]:
from google.colab import drive
drive.mount('/content/drive')
file_path='/content/drive/MyDrive/EDA Project/SCMS_Delivery_History_Dataset.csv'
df=pd.read_csv(file_path)

### Dataset First View

In [None]:
df.head()

### Dataset Rows & Columns count

In [None]:
df.shape

### Dataset Information

In [None]:
df.info()

#### Duplicate Values

In [None]:
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
df.isnull().sum()

In [None]:
df.isnull().sum().plot(kind='bar')

### What did you know about your dataset?

The dataset contains detailed historical records of FedEx shipments managed under SCMS, covering various aspects of the delivery lifecycle. It includes over 30 columns representing different shipment attributes such as:

Identification & Order Details: ID, Project Code, PQ #, PO / SO #, ASN/DN #

Dates: PQ First Sent to Client Date, PO Sent to Vendor Date, Scheduled Delivery Date, Delivered to Client Date, Delivery Recorded Date

Logistics & Handling: Country, Managed By, Fulfill Via, Vendor INCO Term, Shipment Mode

Product Info: Product Group, Sub Classification, Item Description, Molecule/Test Type, Brand, Dosage, Dosage Form

Financial & Quantity Metrics: Line Item Quantity, Line Item Value, Pack Price, Unit Price, Freight Cost (USD), Line Item Insurance (USD)

Weight & Site Info: Weight (Kilograms), Manufacturing Site

From initial exploration, I observed that the dataset is a mix of categorical, date, and numerical data, making it suitable for diverse analysis like trend tracking, cost correlation, vendor evaluation, and shipment delay insights. Some fields contained missing or inconsistent data (e.g., non-numeric values in the weight column), which were handled during the cleaning process.

This dataset offers rich information to study relationships between delivery cost, quantity, weight, and timelines, enabling insights into logistics efficiency, vendor performance, and operational bottlenecks.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

1. ID

Unique identifier for each shipment record.

2. Project Code

A code associated with specific projects within the shipment process.

3. PQ #

Purchase Quantity number, representing the quantity of items ordered in a specific shipment.

4. PO / SO #

Purchase Order (PO) or Sales Order (SO) number, which uniquely identifies the order.

5. ASN/DN #

Advanced Shipping Notice (ASN) or Delivery Note (DN) number, tracking shipments and deliveries.

6. Country

The country where the product is being delivered.

7. Managed By

The department or entity responsible for managing the shipment.

8. Fulfill Via

Mode of fulfillment (e.g., air, ground, sea) used for the shipment.

9. Vendor INCO Term

International Commercial Terms used by the vendor, indicating shipping and payment conditions.

10. Shipment Mode

Mode of shipment (e.g., express, standard, freight), affecting delivery time and cost.

11. PQ First Sent to Client Date

The date when the purchase quantity was first sent to the client.

12. PO Sent to Vendor Date

The date when the purchase order was sent to the vendor.

13. Scheduled Delivery Date

The date on which the delivery was originally scheduled.

14. Delivered to Client Date

The actual date when the shipment was delivered to the client.

15. Delivery Recorded Date

The date when the delivery was recorded in the system.

16. Product Group

Classification of the product group for shipment, helping identify the product category.

17. Sub Classification

A further classification within the product group, providing more specific categorization.

18. Vendor

The supplier or vendor responsible for the goods being shipped.

19. Item Description

A brief description of the item being shipped.

20. Molecule/Test Type

The molecule or type of test, applicable for pharmaceutical or technical items.

21. Brand

The brand of the product being shipped.

22. Dosage

The dosage of a pharmaceutical item, if applicable.

23. Dosage Form

The form of the dosage (e.g., tablet, liquid) for pharmaceutical items.

24. Unit of Measure (Per Pack)

The unit of measurement for the product quantity per pack.

25. Line Item Quantity

The quantity of items in a specific line item of the shipment.

26. Line Item Value

The total value of the line item, calculated by multiplying the quantity with the unit price.

27. Pack Price

Price per pack of the item.

28. Unit Price

The price of a single unit of the item.

29. Manufacturing Site

The location where the product was manufactured.

30. First Line Designation

A designation for the first line of shipment items.

31. Weight (Kilograms)

The total weight of the shipment in kilograms.

32. Freight Cost (USD)

The total cost of shipping the item, in USD.

33. Line Item Insurance (USD)

The insurance cost for the line item, in USD.



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df1=df.copy()
date_columns = ['PQ First Sent to Client Date', 'PO Sent to Vendor Date',
                'Scheduled Delivery Date', 'Delivered to Client Date', 'Delivery Recorded Date']
df1[date_columns] = df1[date_columns].apply(pd.to_datetime, errors='coerce')
df1.dtypes
df1.info()
df1.describe()

### What all manipulations have you done and insights you found?

# 1. **Data Manipulations**
 1. **Data Cleaning:**

    * Converted non-numeric values (e.g., "Weight Captured Separately") to NaN and handled them.

    * Converted relevant columns to numeric data types (e.g., Line Item Value, Freight Cost).

    * Dropped rows with missing values for critical variables like Freight Cost, Weight, and Unit Price.

2. **Feature Engineering:**

Calculated additional columns such as shipment delays by subtracting Scheduled Delivery Date from Delivered to Client Date.

# 1. **Insights Found**
   1. **Positive Correlations:**

  * Line Item Quantity and Line Item Value: Higher quantities led to higher shipment values.

  * Weight and Freight Cost: A strong relationship was found, where heavier items incurred higher shipping costs.


 2. **Outliers:**

Some shipments had exceptionally high freight costs relative to their value or weight, suggesting inefficiencies or potential data errors.

 3. **Operational Inefficiencies:**

Products with low value but high shipping costs highlighted potential areas for optimizing freight charges and vendor negotiations.

 4. **Vendor/Shipment Mode Insights:**

Identified certain vendors and shipment modes associated with higher costs, which could help in renegotiating contracts or improving logistics planning.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 - Shipment Mode Distribution

In [None]:
# Chart - 1 visualization code

sns.countplot(data=df1, x='Shipment Mode')
mplt.title('Shipment Mode Distribution')
mplt.xticks(rotation=45)
mplt.show()

#####1.why did you pick specific chart?


Type of Data: Shipment Mode is a categorical variable (like "Air", "Truck", "Ocean", etc.).

Goal: To understand the volume of deliveries by each mode.

Best Visual Choice: A bar chart (countplot) is perfect for showing how frequently each category appears in a dataset.

Countplots are simple yet powerful when analyzing categorical features — they quickly show dominance or underperformance of certain categories.




#####*2*. What is/are the insight(s) found from the chart?

---




Air shipments dominate the delivery operations — most of the deliveries are made via air.

Truck,Air Charter and Ocean shipments are underutilized.

There might be dependency risk due to over-reliance on a single mode.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes — here's how:
**Positive Business Impact:**
*   The business can evaluate cost vs speed trade-offs — for example, air might be faster but much more expensive.
*   Knowing that Air Charter or Ocean are underused, logistics managers could optimize cost by shifting non-urgent deliveries to these cheaper modes.
*   Can help in risk mitigation planning if air shipment gets disrupted (e.g., weather issues, regulation).

Yes — the same insight can signal potential negative impact:

**Negative Insight:**
Overdependence on air shipment may lead to:

*   High freight costs, hurting profit margins.
*   Logistical bottlenecks if air networks face disruption.
*   Environmental concerns (air has a larger carbon footprint).


#### Chart - 2 - Top 10 Vendors by Delivery Count

In [None]:
# Chart - 2 visualization code
top_vendors = df1['Vendor'].value_counts().nlargest(10)
sns.barplot(x=top_vendors.values, y=top_vendors.index)
mplt.title('Top 10 Vendors by Deliveries')
mplt.xlabel('Number of Deliveries')
mplt.show()

##### 1. Why did you pick the specific chart?

The bar chart for Top 10 Vendors by Delivery Count is chosen because:

*   It highlights the most active vendors in terms of shipment frequency.
*   It gives a clear, ranked comparison of delivery volume.
*   Bar charts are ideal for categorical data like vendor names where quantity comparison is needed.


##### 2. What is/are the insight(s) found from the chart?

The top vendors:

*   We identify which vendors contribute the most to the delivery pipeline.
*   It shows distribution concentration—whether most deliveries are dependent on a few vendors or evenly distributed.
*   If one vendor has a significantly higher count, it may indicate over-reliance.

**Example Insight:**
If Vendor A accounts for 30% of deliveries alone, and others are far behind, it shows potential risk if Vendor A fails or experiences delays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes — and here's how:

**Positive Impact:**
*   Helps supply chain managers monitor top-performing vendors.
*   Allows better resource allocation — like dedicating more logistical support to top vendors.
*   Facilitates negotiation leverage with high-volume vendors for discounts or better terms.
*   Enables performance tracking: high delivery count can be matched with delivery quality (timeliness, cost-efficiency).

**Potentially yes — if mismanaged:**
*   Over-dependence on 1-2 vendors → If they face disruption (strikes, stockouts), deliveries collapse.
*   Some vendors might be overutilized but underperforming (e.g., high delivery count but poor on-time delivery or high freight cost).
*   Smaller vendors might be overlooked, though they may offer better value or performance.

#### Chart - 3 - Delivery Timeliness: Scheduled vs Actual Delivery

In [None]:
# Chart - 3 visualization code
df1['Delay (Days)'] = (df1['Delivered to Client Date'] - df1['Scheduled Delivery Date']).dt.days
sns.histplot(df1['Delay (Days)'], kde=True)
mplt.ylim(0,180)
mplt.xlim(-150,150)
mplt.title('Delivery Delay Distribution')
mplt.show()

##### 1. Why did you pick the specific chart?

Reason for choosing this chart:
*   To measure and visualize how timely FedEx deliveries are.
*   Histograms are ideal for spotting the distribution of delays, e.g., early, on-time, or late deliveries.
*   It reveals patterns and outliers that can't be easily seen from raw data.

##### 2. What is/are the insight(s) found from the chart?

After plotting the histogram, we observed the following types of insights
*   Peak around 0 to 2 days: Many deliveries happen on time or just slightly delayed.
*   Tail extending beyond 10+ days: Some deliveries are delayed significantly.
*   Negative values present: Some deliveries are happening before the scheduled date (early delivery).

These insights show:
*   Delivery performance trends
*   Reliability issues in certain cases
*   Potential for identifying vendors or shipment modes causing delays

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes — here's how:

*   **Process Optimization:**
 Knowing how often delays occur helps operations teams investigate root causes (e.g., late PO issuance, vendor inefficiencies).

*   **Vendor Evaluation:**
Identify underperforming vendors consistently causing delays.
*   **Customer Satisfaction:**
Timely delivery directly affects customer trust and retention, so improving this metric helps the business grow.
*   **Resource Planning:**
If delays are common during certain months or with specific shipment modes, logistics teams can adjust accordingly.

Yes — if the histogram reveals a significant portion of deliveries delayed by more than a week, it could indicate:

*   Inefficiencies in supply chain
*   Poor vendor or shipment mode performance
*   Risk to client satisfaction, possibly causing churn or contract loss

#### Chart - 4 - Freight Cost by Shipment Mode (Boxplot)

In [None]:
# Chart - 4 visualization code
sns.boxplot(data=df, x='Shipment Mode', y='Freight Cost (USD)')
mplt.title('Freight Cost Distribution by Shipment Mode')
mplt.xticks(rotation=45)
mplt.show()

##### 1. Why did you pick the specific chart?

**A boxplot is ideal for visualizing:**
*   Distribution of values across different categories
*   Median, interquartile range (IQR), and outliers
*   Comparisons between groups

**In this case:**


*   Freight Cost (USD) is a continuous variable.
*   Shipment Mode (like Air, Sea, Courier, etc.) is categorical.

So, a boxplot is perfect to show how freight costs vary depending on the shipment mode.

##### 2. What is/are the insight(s) found from the chart?

From this boxplot, you might observe insights like:

*   Air shipment has a higher median freight cost and more variability.
*   Sea shipment may have lower freight costs but more consistent values (less variance).
*   Outliers in certain shipment modes could indicate inefficient logistics or one-off events.

**Example Insight:**

“Air freight costs are significantly higher and more variable compared to sea or courier modes.”

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**

*   Helps logistics teams choose cost-efficient shipment modes.
*   Aids in forecasting budgets by understanding cost ranges.
*   Encourages switching from high-cost modes (e.g., Air) to more economical ones (e.g., Sea) for non-urgent deliveries.
*   Can help in vendor negotiations by highlighting cost fluctuations per mode.

**Negative Insight / Growth Hindrance**


*   High or unpredictable freight costs in one shipment mode (like Air) could eat into profit margins.
*   If the business heavily relies on expensive modes, it indicates a lack of planning or urgent scheduling—a sign of weak supply chain management.
*   Outliers might point to operational inefficiencies or hidden costs (e.g., emergency shipments).

#### Chart - 5 - Pack Price vs Unit Price

In [None]:
# Chart - 5 visualization code
sns.scatterplot(data=df, x='Pack Price', y='Unit Price')
mplt.title('Pack Price vs Unit Price')
mplt.show()

##### 1. Why did you pick the specific chart?

**The scatter plot is ideal when you want to:**

*   Compare two continuous numeric variables
*   Understand if there’s a relationship or pattern
*   Detect positive/negative correlation, or lack of one

**In this case:**

*   Pack Price: Price of a complete package
*   Unit Price: Price per single unit inside the package

##### 2. What is/are the insight(s) found from the chart?

**Insight Example:**

If the scatter plot shows a strong positive linear relationship:
*   It means that as Pack Price increases, Unit Price also increases proportionally.
*   This indicates pricing consistency between what vendors charge for packs and individual units.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, here's how:

**Positive Business Impact:**

*   Identifying a consistent correlation shows that pricing models are reliable.
*   It can help in forecasting, setting price expectations, and ensuring vendor compliance.
*   Can support procurement negotiations by showing expected ratios between unit and pack costs.

**Negative Growth Insight (if present):**


*   If some vendors break the trend (e.g., high unit prices but cheap pack prices), this could lead to:
 1.   Overpayment
 2.   Budget inefficiencies
 3.   Trust issues in procurement

*   This insight would justify reviewing vendor contracts, recalculating margins, or flagging outliers for action.

#### Chart - 6 - Dosage Form Distribution

In [None]:
# Chart - 6 visualization code
sns.countplot(data=df1, y='Dosage Form', order=df1['Dosage Form'].value_counts().index[:10])
mplt.title('Top 10 Dosage Forms')
mplt.show()

##### 1. Why did you pick the specific chart?

*   A bar chart (using countplot) is ideal for categorical data like 'Dosage Form', showing how frequently each type appears.
*   It clearly reveals the most commonly shipped dosage types (e.g., Tablets, Capsules, Injections), helping us understand product mix.
*   The horizontal layout improves readability, especially for long category names.

##### 2. What is/are the insight(s) found from the chart?

*   You can identify the top 3 dosage forms (say, Tablets, Injections, Capsules).
*   Some dosage types are rarely shipped — possibly outdated, specialized, or low-demand.
*   Heavy concentration on a few dosage types might indicate inventory skew.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes!

**Positive Impact:**

*   **Inventory Optimization:** Focus on stocking the most shipped dosage forms can reduce storage cost and avoid delays.
*   **Vendor Prioritization:** Partner more with vendors supplying high-demand forms.
*   **Forecasting:** Improves demand forecasting for procurement and shipment planning.
*   **Customer Satisfaction:** Ensures faster delivery for commonly ordered forms.

**Potential Negative Insight:**

If certain dosage forms (e.g., liquid or topical forms) are underrepresented, it could mean:

*   Lost business opportunity in those segments.
*   Distribution challenges like shipping restrictions (e.g., temperature-sensitive items).
*   Limited vendor base, affecting product availability and customer satisfaction.

**Justification:**

If a competitor offers a broader dosage range with better availability, customers might shift — resulting in revenue loss.

#### Chart - 7 - Time Series: Deliveries Over Time

In [None]:
# Chart - 7 visualization code
df1['Delivery Month'] = df1['Delivered to Client Date'].dt.to_period('M')
monthly = df1['Delivery Month'].value_counts().sort_index()
monthly.plot(kind='line', title='Monthly Deliveries')
mplt.xlabel('Month')
mplt.ylabel('Number of Deliveries')
mplt.xticks(rotation=45)
mplt.show()

##### 1. Why did you pick the specific chart?

*   A time series line chart is the most suitable chart when analyzing how a metric (like number of deliveries) changes over time.
*   In your case, we're tracking deliveries per month to:
 1.   Identify trends, seasonal patterns, or sudden drops/spikes.
 2.   Understand operational consistency or delays in specific time periods.

##### 2. What is/are the insight(s) found from the chart?

After plotting the monthly delivery trend, some possible insights could be:

*   **Growth Periods:** Certain months may show a rise in deliveries, indicating high demand seasons or efficient operations.
*   **Drop in Deliveries:** Sudden drops could hint at vendor delays, shipment issues, or global events (e.g., COVID-19 disruptions).
*   **Seasonality:**If the trend repeats across years or quarters, it reveals predictable seasonal behavior (e.g., pharma demand in winters).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**
*   **Forecasting & Planning:** Understanding which months have higher delivery volumes can help plan inventory, logistics, staffing, and budget allocation.
*   **Performance Benchmarking:** You can compare delivery trends year-on-year to evaluate if performance is improving.
*   **Vendor Strategy:** If delays are consistent in certain periods, you might change your vendor contracts or shipment strategies.

**Negative Growth Indicators:**
*   **Consistent Decline:** A steady drop in deliveries may reveal:

 1.   Loss of clients
 2.   Inefficiencies in supply chain
 3.   Lack of demand


*   **Unexpected Spikes:**May stress the system if not anticipated, leading to missed deadlines or customer dissatisfaction.

**Example Justification:**
"A noticeable drop in deliveries from July to September 2024 may indicate vendor-related disruptions or supply chain challenges. If not addressed, it can hurt client satisfaction and contract renewals."

#### Chart - 8 - Line Chart: Average Pack Price by Delivery Month

In [None]:
# Chart - 8 visualization code
df1['Month'] = df1['Delivered to Client Date'].dt.to_period('M')
monthly_price = df1.groupby('Month')['Pack Price'].mean()
monthly_price.plot(kind='line', title='Avg Pack Price Over Time')
mplt.xlabel('Month')
mplt.ylabel('Average Pack Price')
mplt.xticks(rotation=45)
mplt.show()

##### 1. Why did you pick the specific chart?

The line chart is best suited for analyzing trends over time — especially when you're working with monthly data. In this case:
*   "Delivery Month" (time component) is plotted on the x-axis.
*   "Average Pack Price" (a financial metric) is on the y-axis.

This helps visualize how the pricing trend is changing month by month, revealing any seasonal spikes, cost changes, or market patterns.

##### 2. What is/are the insight(s) found from the chart?

*   **Stable Trend:**If the line is mostly flat, it indicates pricing consistency.
*   **Increasing Trend:** Suggests rising costs — could be due to supplier hikes, inflation, or sourcing shifts.
*   **Decreasing Trend:** Could indicate discounts, bulk procurement, or vendor negotiations.
*   **Irregular Spikes or Drops:** Might point to seasonal demand, product recalls, or shipment changes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, they can lead to positive impact:
*   **Forecasting Procurement Budgets:** Helps teams prepare for high-cost months.
*   **Vendor Evaluation:** Can flag vendors or routes associated with price hikes.
*   **Cost Optimization:**Identifying cost drop patterns can guide negotiation strategies.

Yes, if not addressed:
*   Consistent Pack Price Increase without improved product quality can reduce profit margins.
*   Unexplained Fluctuations may reflect poor vendor control or inefficient supply chain, leading to increased operational costs.
*   If average prices go up but sales don’t increase, it can result in lower client retention or reduced competitiveness.

#### Chart - 9 - Pie Chart: Managed By Distribution

In [None]:
# Chart - 9 visualization code
counts = df1['Managed By'].value_counts().nlargest(2)
mplt.pie(counts, labels=counts.index, autopct='%1.1f%%')
mplt.title('Top 5 Managed By Distribution')
mplt.show()

##### 1. Why did you pick the specific chart?

I picked a pie chart for the "Managed By" column because:
*   Pie charts are best for showing proportional comparisons of a small number of categories.
*   “Managed By” typically has a limited number of stakeholders or logistics managers (like regional leads or departments).
*   It visually represents how much each manager/team contributes to the overall shipment handling.

##### 2. What is/are the insight(s) found from the chart?

From the "Managed By" Pie Chart, you might observe insights such as:
*   One manager (e.g., Manager A) is handling 40-50% of all shipments.
*   Some managers may be handling very few orders, maybe even under 5%.
*   A concentration of workload may exist — not evenly distributed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**
*   If one person handles a major chunk of deliveries efficiently, that manager’s practices can be replicated across others.
*   The business can balance workloads better to avoid burnout or bottlenecks.
*   Helps in identifying top-performing managers for rewards or promotions.

**Negative Growth Insight (if any):**

If the pie chart shows that a manager is underperforming or contributing very little, it could indicate:
*   Lack of resource utilization
*   Geographical inefficiency
*   Potential training or staffing issue

#### Chart - 10 - Top 10 Brands Delivered

In [None]:
# Chart - 10 visualization code
top_brands = df1['Brand'].value_counts().head(10)
sns.barplot(x=top_brands.values, y=top_brands.index)
mplt.title('Top 10 Brands Delivered')
mplt.show()

##### 1. Why did you pick the specific chart?

Chart Type Chosen: Horizontal Bar Chart using sns.barplot().

**Reason:**
*   Bar charts are ideal for comparing categorical data (like brand names).
*   A horizontal bar chart fits longer brand names better and improves readability.
*   Sorting by delivery count gives an immediate visual of the most frequently delivered brands

##### 2. What is/are the insight(s) found from the chart?

dentified the Top 10 Brands based on the number of deliveries.

**Gained clarity on:**
*   Which brands dominate the delivery operations.
*   Potential high-demand products or preferred manufacturers.
*   Patterns around specific vendors consistently delivering those brands.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, **Positive Impact:**
*   **Inventory Planning:** Helps logistics teams ensure enough stock or plan for restocking popular brands.
*   **Vendor Relations:**Strengthen partnerships with top-performing brands/vendors.
*   **Customer Satisfaction:** Ensures timely and accurate fulfillment of high-demand brands.

**Potential Negative Insight (if ignored):**

Over-dependence on a few brands can be a risk:
*   If supply chain disruptions occur for those top brands, it could impact service quality.
*   Might overlook rising or underutilized brands that have growth potential.

**Business Action Justification:** Analyzing brand-wise delivery helps in strategic sourcing, risk mitigation, and optimizing vendor mix, which are all key for sustainable growth.

#### Chart - *11* - Correlation Heatmap

In [None]:
# Select relevant numeric columns
df1['Weight (Kilograms)'].unique()
df1['Weight (Kilograms)'] = pd.to_numeric(df1['Weight (Kilograms)'], errors='coerce')
numeric_cols = ['Line Item Quantity', 'Line Item Value', 'Pack Price', 'Unit Price',
                'Weight (Kilograms)', 'Freight Cost (USD)', 'Line Item Insurance (USD)']

# Ensure all are numeric
for col in numeric_cols:
    df1[col] = pd.to_numeric(df1[col], errors='coerce')
    corr_data = df1[numeric_cols]
corr_matrix = corr_data.corr()

mplt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f")
mplt.title('Correlation Heatmap')
mplt.show()

##### 1. Why did you pick the specific chart?

A correlation heatmap was chosen because it:
*   Visually summarizes the strength of relationships between multiple numerical features.
*   Helps quickly identify strong positive/negative correlations among business variables like cost, weight, value, and price.
*   It's one of the best tools in EDA for spotting patterns, trends, and possible multicollinearity.

In your FedEx dataset, variables like Freight Cost (USD), Weight (Kilograms), Line Item Value, and Pack Price are continuous variables—perfect for correlation analysis.

##### 2. What is/are the insight(s) found from the chart?

**Positive Correlation:**
*   Weight (Kilograms) ↔ Freight Cost (USD) → High weight tends to increase shipping cost.
*   Line Item Quantity ↔ Line Item Value → More quantity leads to higher value.

**Weak/No Correlation:**

   Unit Price and Freight Cost (USD) may show weak/no correlation — pricing may not impact shipping cost directly.

#### Chart - 12 - Pair Plot

In [None]:
# Drop rows with any NaNs in selected columns to avoid plot errors
df1_pair = df1[numeric_cols].dropna()
# Pair Plot visualization code
sns.pairplot(df1_pair)
mplt.suptitle("Pair Plot of FedEx Delivery Data", y=1.02)
mplt.show()

##### 1. Why did you pick the specific chart?

**A pair plot is ideal for:**
*   Visualizing pairwise relationships among multiple numerical variables.
*   Spotting patterns, correlations, and outliers in one consolidated view.
*   Quickly identifying linear/non-linear trends and clusters.

Since your dataset has multiple numerical columns like cost, quantity, weight, and insurance, a pair plot helps analyze how these variables interact with each other visually.

##### 2. What is/are the insight(s) found from the chart?

Depending on your data, some common insights you might observe from the pair plot:

**Positive Correlation between:**

 1. Line Item Quantity and Line Item Value (more quantity → higher value).

 2. Weight (Kilograms) and Freight Cost (USD) (heavier items → higher freight).

 3. Pack Price and Unit Price (similar pricing trends).

**Outliers might appear — such as:**

 1. Exceptionally high freight cost for low-weight items.

 2. Insurance costs unusually high for certain products.

**Weak/No Correlation between unrelated features like:**

 Dosage and Freight Cost (if included, might show weak correlation).

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. **Optimize Freight Cost Management**

**Suggestion:** Reevaluate shipping modes and vendor contracts. Focus on reducing shipping costs by selecting more cost-effective carriers for specific regions or product types.

**Action:** Identify shipments with high freight costs relative to weight or value. Consider consolidating shipments to reduce individual delivery costs.

2. **Improve Shipment Planning and Delivery Timeliness**
**Suggestion:** Analyze delivery delays and identify recurring patterns linked to specific vendors, shipment modes, or product types.

**Action:** Prioritize faster shipping options for high-priority or time-sensitive shipments. For vendors with consistent delays, renegotiate terms or explore alternative vendors.

3. **Enhance Vendor and Product Performance Analysis**
**Suggestion:** Conduct a deeper analysis of vendor performance by tracking metrics like delivery times, shipment costs, and insurance rates.

**Action:** Develop a vendor scorecard that evaluates their performance on key logistics metrics to optimize vendor selection for future shipments.

4. **Leverage Data-Driven Pricing Strategy**
**Suggestion:**Align pricing strategies with shipping and manufacturing costs. Consider adjusting unit and pack prices based on freight and insurance costs to improve profitability.

**Action:** Implement dynamic pricing that reflects the total cost of shipment (freight + insurance) and adjusts based on product volume or shipment destination.

5. **Automate Data-Driven Decision-Making**
**Suggestion:**Implement a real-time analytics dashboard that continuously monitors key performance indicators (KPIs) such as shipment delays, costs, and vendor performance.

**Action:** Use tools like Power BI or Tableau to provide operational teams with up-to-date insights for timely decision-making.

6. **Address Outliers and Inefficiencies**
**Suggestion:**Investigate outliers in freight costs and item quantities. Focus on eliminating inefficiencies like overpacking or underpacking, which may contribute to higher costs.

**Action:**Optimize packaging processes to reduce excess weight and explore alternative shipping routes or methods for outlier shipments.



# **Conclusion**

The exploratory data analysis of FedEx's delivery history dataset has provided valuable insights into the relationships between key shipment variables such as weight, freight costs, delivery timelines, and vendor performance. Through data cleaning, manipulation, and visualization techniques using Python libraries like Pandas, Seaborn, and Plotly, we uncovered critical patterns that can drive more efficient logistics decisions.

By identifying areas such as high freight costs relative to product value, recurring shipment delays, and inefficiencies in certain vendor contracts, this analysis lays the foundation for actionable improvements in FedEx's operations. The insights gained can be leveraged to optimize shipping strategies, enhance vendor performance, reduce operational costs, and improve overall delivery timeliness.

Ultimately, the findings from this project offer a roadmap to achieve the business objective of improving logistics efficiency, cost management, and operational performance, ensuring that FedEx continues to meet customer expectations while maintaining profitability.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***