<a href="https://colab.research.google.com/github/zpsy-hub/IoT-Data-Generation-for-Smart-Logistic/blob/main/MO_IT148_Homework_IoT_Data_Simulation_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Smart Logistic Tracking Data Generator

This script generates a synthetic dataset for smart logistic tracking applications.
It creates a dataset with information about packages, their locations, status,
and environmental conditions during transit.

</br>

**Data Description:**

The dataset includes the following columns:

* **timestamp:** The date and time of the record.
* **package_id:** A unique identifier for each package.
* **location:** The geographical location of the package (latitude, longitude).
* **status:** The current status of the package (e.g., In Transit, Delivered, Pending, Delayed).
* **temperature:** The temperature surrounding the package during transit (in Celsius).
* **humidity:** The humidity level surrounding the package during transit (in percentage).

</br>

**Use Cases:**

This dataset can be used for various purposes, including:

* **Real-time Package Tracking:** Monitor the location and status of packages in real-time.
* **Delivery Optimization:** Analyze delivery routes and times to identify areas for improvement.
* **Condition Monitoring:** Track the temperature and humidity experienced by packages during transit to ensure product quality.
* **Predictive Analytics:** Build models to predict potential delays or delivery issues based on historical data.
* **Supply Chain Management:** Gain insights into the movement of goods and optimize inventory levels.


---



# **Smart Logistic Tracking Data Generator**
This script generates a synthetic dataset for smart logistic tracking applications. It creates realistic data about packages, their locations, status, and environmental conditions during transit.


### **Data Description**
The dataset includes the following columns:

- timestamp: The date and time of the record
- package_id: A unique identifier for each package
- origin: The city where the package originated from
- destination: The city where the package is being delivered to
- location: The current geographical location of the package (latitude, longitude)
- closest_city: The major city closest to the package's current location
- status: The current status of the package (Processing, In Transit, Delayed, Out for Delivery, Delivered)
- temperature: The temperature surrounding the package (in Celsius), calculated based on:

### **Geographic location**
- Time of day (cooler at night, warmer midday)
- Season (adjusted for northern/southern hemisphere)
- humidity: The humidity level (percentage), calculated based on:
- Current temperature (typically lower humidity at higher temperatures)
Seasonal factors
- shock: Shock/impact value (in g-force), with higher values for packages with "Delayed" status

### Realistic Features
The script includes several features to make the data realistic:

- Geographic Accuracy:
  - Uses real-world coordinates for major cities
  - Simulates package journeys between origin and destination cities
  - Calculates intermediate points along realistic routes


- Climate Modeling:
  - Temperature varies by city, season, time of day, and hemisphere
  - Humidity correlates with location and temperature
  - Reasonable variation in environmental values


- Logical Status Progression:
  - Package status correlates with journey progress
  - Newly shipped packages are "Processing"
  - Packages nearing destination are "Out for Delivery" or "Delivered"
  - Random but low probability of "Delayed" status



### Use Cases
This dataset can be used for various purposes, including:

- Real-time Package Tracking: Monitor the location and status of packages in real-time
- Delivery Optimization: Analyze delivery routes and times to identify areas for improvement
- Condition Monitoring: Track temperature, humidity, and shock values experienced during transit
- Predictive Analytics: Build models to predict potential delays based on historical data
- Supply Chain Management: Gain insights into the movement of goods for better planning

When run, the script will generate 100 records, save them to CSV and JSON formats, and display the first 20 rows for inspection.

In [5]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import math

# Set seed for reproducibility
np.random.seed(42)

num_records = 100  # Adjust this number as needed

# 20 major cities with their approximate coordinates and climate data
# Format: (latitude, longitude, avg_temp_range, avg_humidity_range)
cities = {
    "New York": (40.71, -74.01, (-5, 25), (60, 80)),
    "Los Angeles": (34.05, -118.24, (10, 30), (50, 70)),
    "Chicago": (41.88, -87.63, (-10, 25), (55, 75)),
    "Singapore": (1.35, 103.82, (25, 35), (70, 90)),
    "London": (51.51, -0.13, (0, 20), (65, 85)),
    "Tokyo": (35.69, 139.69, (0, 30), (60, 80)),
    "Sydney": (-33.87, 151.21, (10, 25), (60, 75)),
    "Moscow": (55.75, 37.62, (-15, 20), (55, 75)),
    "Cairo": (30.04, 31.24, (15, 35), (40, 60)),
    "Mumbai": (19.08, 72.88, (20, 35), (65, 90)),
    "Berlin": (52.52, 13.41, (-5, 25), (60, 80)),
    "Mexico City": (19.43, -99.13, (10, 25), (40, 70)),
    "Paris": (48.85, 2.35, (0, 25), (70, 85)),
    "Dubai": (25.20, 55.27, (20, 45), (50, 70)),
    "Bangkok": (13.75, 100.50, (25, 35), (70, 90)),
    "Toronto": (43.65, -79.38, (-10, 25), (60, 80)),
    "Seoul": (37.57, 126.98, (-5, 30), (60, 75)),
    "São Paulo": (-23.55, -46.63, (15, 30), (70, 85)),
    "Cape Town": (-33.92, 18.42, (10, 25), (60, 80)),
    "Manila": (14.60, 120.98, (25, 35), (70, 85))
}

city_names = list(cities.keys())

# Function to calculate temperature based on location, date, and random variation
def calculate_temperature(city, timestamp):
    # Get the base temperature range for the city
    temp_range = cities[city][2]

    # Adjust for season (simplified - northern/southern hemisphere)
    month = timestamp.month
    latitude = cities[city][0]

    # Northern hemisphere: winter = colder, summer = warmer
    # Southern hemisphere: opposite
    seasonal_adjustment = 0
    if latitude > 0:  # Northern hemisphere
        # Winter (Dec-Feb)
        if month in [12, 1, 2]:
            seasonal_adjustment = -5
        # Summer (Jun-Aug)
        elif month in [6, 7, 8]:
            seasonal_adjustment = 5
    else:  # Southern hemisphere
        # Winter (Jun-Aug)
        if month in [6, 7, 8]:
            seasonal_adjustment = -5
        # Summer (Dec-Feb)
        elif month in [12, 1, 2]:
            seasonal_adjustment = 5

    # Calculate the adjusted temperature range
    adjusted_min = temp_range[0] + seasonal_adjustment
    adjusted_max = temp_range[1] + seasonal_adjustment

    # Random variation within the adjusted range
    temperature = round(np.random.uniform(adjusted_min, adjusted_max), 1)

    # Add some daily variation
    hour = timestamp.hour
    if 0 <= hour < 6:  # Late night/early morning
        temperature -= np.random.uniform(1, 3)
    elif 10 <= hour < 16:  # Mid-day
        temperature += np.random.uniform(1, 3)

    return round(temperature, 1)

# Function to calculate humidity based on location and temperature
def calculate_humidity(city, temperature, timestamp):
    # Get the base humidity range for the city
    humidity_range = cities[city][3]

    # Adjust humidity based on temperature (higher temp = lower humidity, typically)
    temp_factor = max(0, min(1, (temperature - cities[city][2][0]) / (cities[city][2][1] - cities[city][2][0])))
    humidity_adjustment = (1 - temp_factor) * 10  # Up to 10% more humid when cooler

    # Calculate humidity with some random variation
    base_humidity = np.random.uniform(humidity_range[0], humidity_range[1])
    humidity = round(base_humidity + humidity_adjustment, 1)

    # Ensure humidity stays in valid range (0-100%)
    humidity = max(0, min(100, humidity))

    return humidity

# Generate data for smart logistics tracking
data = []

# Define possible shipment statuses
statuses = ["In Transit", "Delivered", "Processing", "Out for Delivery", "Delayed"]

# Current time as reference point
current_time = datetime.now()

for _ in range(num_records):
    # Pick origin and destination cities
    origin_city = np.random.choice(city_names)
    destination_city = np.random.choice([city for city in city_names if city != origin_city])

    # Generate a random timestamp within the last 7 days
    random_minutes = np.random.randint(0, 7 * 24 * 60)
    timestamp = current_time - timedelta(minutes=random_minutes)

    # Generate package ID
    package_id = f"PKG{np.random.randint(10000, 99999)}"

    # Simulate package journey - determine current location
    journey_progress = np.random.random()  # 0 to 1 representing progress from origin to destination

    # Determine status based on journey progress
    if journey_progress < 0.15:
        status = "Processing"
    elif journey_progress > 0.9:
        status = np.random.choice(["Delivered", "Out for Delivery"], p=[0.8, 0.2])
    else:
        status = np.random.choice(["In Transit", "Delayed"], p=[0.9, 0.1])

    # Calculate current location coordinates based on journey progress
    origin_lat, origin_lon = cities[origin_city][0], cities[origin_city][1]
    dest_lat, dest_lon = cities[destination_city][0], cities[destination_city][1]

    current_lat = origin_lat + (dest_lat - origin_lat) * journey_progress
    current_lon = origin_lon + (dest_lon - origin_lon) * journey_progress

    # Add some random variation to the route
    current_lat += np.random.uniform(-1, 1)
    current_lon += np.random.uniform(-1, 1)

    # Round coordinates
    current_lat = round(current_lat, 4)
    current_lon = round(current_lon, 4)

    # Determine which city's climate we're closest to
    closest_city = min(cities.keys(),
                       key=lambda city: math.sqrt((current_lat - cities[city][0])**2 +
                                                  (current_lon - cities[city][1])**2))

    # Calculate environmental data based on location and time
    temperature = calculate_temperature(closest_city, timestamp)
    humidity = calculate_humidity(closest_city, temperature, timestamp)

    # Generate shock sensor reading (more likely to be higher if status is "Delayed")
    shock = 0.5 if status != "Delayed" else 2.0
    shock_value = round(np.random.uniform(0, shock), 2)

    # Create the record
    record = {
        "timestamp": timestamp,
        "package_id": package_id,
        "origin": origin_city,
        "destination": destination_city,
        "location": f"{current_lat}, {current_lon}",
        "closest_city": closest_city,
        "status": status,
        "temperature": temperature,
        "humidity": humidity,
        "shock": shock_value
    }
    data.append(record)

# Convert to DataFrame
df = pd.DataFrame(data)

# Sort by timestamp
df = df.sort_values('timestamp')

# Save dataset
df.to_csv("logistics_data.csv", index=False)
df.to_json("logistics_data.json", orient="records")

# Display first 20 rows of the original dataset
print("First 20 rows of Smart Logistics IoT Data:")
with pd.option_context('display.max_rows', 20, 'display.max_columns', None, 'display.width', 1000):
    print(df.head(20))

First 20 rows of Smart Logistics IoT Data:
                    timestamp package_id       origin  destination            location closest_city      status  temperature  humidity  shock
70 2025-04-29 12:08:30.336512   PKG85046    São Paulo       Moscow    33.3012, 13.5857        Cairo  In Transit         26.0      51.6   0.20
16 2025-04-29 13:17:30.336512   PKG59811        Tokyo        Seoul   36.7251, 136.1661        Tokyo  In Transit         11.7      84.9   0.20
27 2025-04-29 14:36:30.336512   PKG80313    Cape Town        Seoul    24.1418, 107.766      Bangkok  In Transit         34.1      83.4   0.04
50 2025-04-29 15:23:30.336512   PKG90642        Tokyo    São Paulo     3.6758, 36.9013        Cairo  In Transit         21.8      49.2   0.48
35 2025-04-29 16:06:30.336512   PKG86797       Moscow       Berlin    52.3556, 18.2952       Berlin  In Transit         13.9      81.2   0.37
45 2025-04-29 22:21:30.336512   PKG40354       London    Singapore    3.1486, 101.0205    Singapore   Del