## Introduction

### Project Overview

This project demonstrates a sample ETL (Extract, Transform, Load) process using Python. The objective is to extract data from various file formats, transform the data to ensure consistency and cleanliness, and load the transformed data into a consolidated dataset. This example showcases fundamental ETL processes using CSV, Excel, and JSON files, which are common formats for data interchange.

### Objectives

- **Extract**: Load data from multiple sources: a CSV file containing sales information, an Excel file with customer details, and a JSON file with product data.
- **Transform**: Merge and clean the data to create a unified dataset. This includes handling missing values and ensuring data integrity.
- **Load**: Save the transformed data into a new CSV file, which can be used for further analysis or reporting.

### Data Sources

1. **Sales Data (CSV)**: Contains transaction records with order details.
2. **Customer Data (Excel)**: Provides information about customers including their IDs and locations.
3. **Product Data (JSON)**: Includes product IDs and names for mapping sales to products.


In [3]:
import pandas as pd
import json

# Extract
# Load CSV
sales_df = pd.read_csv('sales_data.csv')

#Excel loading
customer_df = pd.read_excel('customer_data.xlsx')

# Load JSON
with open('product_data.js', 'r') as f:
    product_data = json.load(f)
product_df = pd.DataFrame(product_data)

# Transform
# Merge sales data with customer data
sales_customer_df = pd.merge(sales_df, customer_df, on='CustomerID', how='left')

# Merge sales data with product data
sales_full_df = pd.merge(sales_customer_df, product_df, on='ProductID', how='left')

# Handle missing values
sales_full_df.fillna('Unknown', inplace=True)


sales_full_df.to_csv('combined_data.csv', index=False)

print('ETL process completed successfully!')


ETL process completed successfully!


## Explanation

### Data Extraction

The ETL process begins with the extraction of data from various sources:
- **CSV**: Sales data is loaded into a Pandas DataFrame.
- **Excel**: Customer data is read into another DataFrame.
- **JSON**: Product data is parsed and converted into a DataFrame.

### Data Transformation

The transformation phase involves:
- **Merging Data**: Combining sales data with customer and product information based on common keys (CustomerID and ProductID).
- **Handling Missing Values**: Ensuring that any missing data is appropriately filled or marked to prevent errors in subsequent analysis.

### Data Loading

The final stage is to load the transformed data:
- **Combining**: The cleaned and merged data is consolidated into a single DataFrame.
- **Saving**: The combined DataFrame is saved as a CSV file (`combined_data.csv`), which can be easily accessed for further analysis or reporting.
