Skip to content

data migration

windula_k edited this page Nov 3, 2023 · 6 revisions

Database design and implementation

Database design

ER Diagram

Separate python Job was written to conduct Data Migration

Initial preparation

Column Data

    * Temperature (C), Apparent Temperature (C) columns round-off to 1 decimal place
    * Visibility (km)   round-off to 1 decimal place
    * Formatted Date modified to this format '2008-07-12 17:00:00'

Following reasons can identified for this modification

    * Storage Space: More decimal places require more storage space. While this might not be significant for a small number of records, it can become substantial with large datasets, leading to increased storage costs and potentially slower backup and restore times.

    * Memory Usage: When these values are loaded into memory, they will consume more space, which can lead to more memory being used overall by the database system. This can be particularly impactful if the dataset is large and is frequently accessed, as it can lead to increased memory pressure.

    * Index Size and Performance: Indexes on columns with many decimal places will also be larger. This can slow down index creation and maintenance (like insert, update, and delete operations), as well as potentially slow down query performance due to the larger index size leading to less efficient I/O operations.

    * Calculation Overhead: Mathematical operations on numbers with many decimal places can take longer to compute. This is because the underlying algorithms that handle arithmetic operations have to process more digits, which can increase CPU usage and slow down query performance.

Running the migration application

    dbUrl=$url
    dbUser=$user
    dbPassword=$password
    database=$database_name
    port=$port number

populate these values in the main.py

    python main.py

Normalization

weather_data table

    CREATE TABLE weather_analytics_data_db.weather_data (
    weather_data_id int NOT NULL AUTO_INCREMENT,
    observed_at datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
    summary varchar(30) default null,
    precip_type ENUM ('rain', 'snow', 'sleet', 'hail') DEFAULT null,
    temperature_celsius decimal default 0.0,
    apprnt_temperature_celsius decimal default 0.0,
    humidity decimal default 0.0,
    wind_speed_km_h decimal default 0.0,
    wind_bearing_degrees int default 0,
    visibility_km decimal default 0.0,
    cloud_cover tinyint(1) default 0,
    pressure_millibar decimal default 0.0,
    daily_summary varchar(150) default null,
    primary key (weather_data_id),
    unique key weather_id_observed_at (weather_data_id,observed_at)
    )

weather_summary table

    CREATE TABLE weather_analytics_data_db.weather_summary (
    weather_summary_id int NOT NULL AUTO_INCREMENT,
    summary varchar(30) default null,
    precip_type ENUM ('rain', 'snow', 'sleet', 'hail') DEFAULT null,
    daily_summary varchar(150) default null,
    primary key (weather_summary_id),
    unique key weather_summary_id_precip_type (summary,precip_type,daily_summary)
    )