# Energy Measurement Evaluation

This notebook evaluates power consumption and energy trends from experiment results.  
The data is collected from multiple nodes and analyzed for insights into power usage, voltage, and energy consumption.

## Specify the Result Folder

Before loading data, enter the path to your experiment result folder.  
By default, the last used path is shown, but you can change it to any valid directory.

In [None]:
import os
from IPython.display import display, HTML
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

base_result_folder = "/srv/testbed/results/warmuth/default/"
default_result_folder = os.path.join(base_result_folder, "2025-03-18_12-00-36_925581")

user_input = input(f"Enter result folder path - if leaving empty it uses the default folder:[{default_result_folder}]: ").strip()

if not user_input:
    RESULT_FOLDER = default_result_folder
elif "/" in user_input:
    RESULT_FOLDER = user_input
else:
    RESULT_FOLDER = os.path.join(base_result_folder, user_input)

if not os.path.exists(RESULT_FOLDER):
    raise FileNotFoundError(f"Result folder not found: {RESULT_FOLDER}")

display(f"Using result folder: {RESULT_FOLDER}")

### Creator Information

The following table presents details about the experiment's creator, extracted from the **RO-Crate metadata**.

- **Name:** The name of the creator.
- **ORCID:** A unique researcher identifier, linked to the official ORCID profile.
- **Affiliation:** The institution the creator is affiliated with.
- **Affiliation ROR:** A **Research Organization Registry (ROR) ID**, used for standard identification of research institutions.
- **Affiliation URL:** A direct link to the institution’s website.

## TODO
- Bezug Creator/Author besser darstellen -> ähnlich wie in Publication


In [None]:
def load_creator_info():
    """
    Extracts all creators from the RO-Crate metadata JSON file.
    Retrieves each creator's name, ORCID, and affiliation details.
    """
    rocrate_path = os.path.join(RESULT_FOLDER, "ro-crate-metadata.json")
    if not os.path.exists(rocrate_path):
        raise FileNotFoundError(f"RO-Crate metadata file not found: {rocrate_path}")

    with open(rocrate_path, "r") as f:
        metadata = json.load(f)

    creators = []

    for item in metadata.get("@graph", []):
        if item.get("@type") == "Person" and "creator" in item.get("keywords", []):
            creator_info = {
                "Creator Name": item.get("name", "Unknown"),
                "ORCID": item.get("@id", "Unknown"),
                "Affiliation Name": "Unknown",
                "Affiliation ROR": "Unknown",
                "Affiliation URL": "Unknown"
            }

            # Find affiliation
            affiliation_id = item.get("affiliation", {}).get("@id", None)
            if affiliation_id:
                for org in metadata.get("@graph", []):
                    if org.get("@id") == affiliation_id:
                        creator_info["Affiliation Name"] = org.get("name", "Unknown")
                        creator_info["Affiliation ROR"] = org.get("@id", "Unknown")
                        creator_info["Affiliation URL"] = org.get("url", "Unknown")
                        break

            creators.append(creator_info)

    return creators

# Load creator data (handling multiple entries)
creator_data = load_creator_info()
creator_df = pd.DataFrame(creator_data)

# Convert links to clickable HTML
def make_link(text, url):
    return f'<a href="{url}" target="_blank">{text}</a>' if url != "Unknown" else "Unknown"

creator_df["ORCID"] = creator_df["ORCID"].apply(lambda x: make_link("ORCID Profile", x))
creator_df["Affiliation ROR"] = creator_df["Affiliation ROR"].apply(lambda x: make_link("ROR ID", x))
creator_df["Affiliation URL"] = creator_df["Affiliation URL"].apply(lambda x: make_link("University Website", x))

# Generate an HTML-styled table
html_table = creator_df.to_html(escape=False, index=False)
styled_table = f"""
<style>
    table {{ width: 80%; border-collapse: collapse; margin: 20px 0; }}
    th, td {{ padding: 8px 12px; border: 1px solid #ddd; text-align: left; }}
    th {{ background-color: #f4f4f4; font-weight: bold; }}
</style>
{html_table}
"""

display(HTML(styled_table))

## Node Information & Topology Visualization

Each experiment setup includes metadata about the participating nodes.  
This section extracts details such as:
- Node names
- Links to the Testbed
- Fully Qualified Domain Names (FQDN)
- Topology information (if available).

If a **topology visualization** is provided in the RO-Crate metadata, it is displayed below.

In [None]:
import os
import json
import pandas as pd
from IPython.display import display, HTML

def load_rocrate_metadata():
    """
    Load and parse the RO-Crate metadata JSON file.
    Extract node information and locate paths for hardware details and topology PDFs.
    """
    rocrate_path = os.path.join(RESULT_FOLDER, "ro-crate-metadata.json")
    if not os.path.exists(rocrate_path):
        raise FileNotFoundError(f"RO-Crate metadata file not found: {rocrate_path}")

    with open(rocrate_path, "r") as f:
        metadata = json.load(f)

    nodes_info = []

    for item in metadata.get("@graph", []):
        if "keywords" in item and "node" in item["keywords"]:
            node_name = item.get("name", "Unknown")
            fqdn = item.get("fqdn", "Unknown")

            topology_pdf_path = None
            if isinstance(item.get("visualizedTopology", {}), dict) and "@id" in item["visualizedTopology"]:
                topology_pdf_path = os.path.join(RESULT_FOLDER, item["visualizedTopology"]["@id"])
                if not os.path.exists(topology_pdf_path):
                    topology_pdf_path = None

            hardware_json_path = None
            if isinstance(item.get("hardware", {}), dict) and "@id" in item["hardware"]:
                hardware_json_path = os.path.join(RESULT_FOLDER, item["hardware"]["@id"])
                if not os.path.exists(hardware_json_path):
                    hardware_json_path = None

            nodes_info.append({
                "name": node_name if isinstance(node_name, str) else "Unknown",
                "fqdn": fqdn if isinstance(fqdn, str) else "Unknown",
                "topology_pdf": topology_pdf_path if topology_pdf_path else "None",
                "hardware_json": hardware_json_path if hardware_json_path else "None"
            })

    return nodes_info

def extract_hardware_info(hardware_json_path):
    """
    Extract processor, network, and memory information from the hardware.json file.
    Returns a dictionary with processor details, NIC models, and installed memory.
    """
    if not hardware_json_path or not os.path.exists(hardware_json_path):
        return {
            "cpu_model": "Unknown", "cpu_cores": "Unknown", "cpu_threads": "Unknown",
            "memory": "Unknown", "nic_models": "Unknown"
        }

    try:
        with open(hardware_json_path, "r") as f:
            hardware_data = json.load(f)

        cpu_data = hardware_data.get("processor", [{}])[0]
        cpu_model = cpu_data.get("model", "Unknown")
        cpu_cores = cpu_data.get("cores", "Unknown")
        cpu_threads = cpu_data.get("threads", "Unknown")

        nic_models = []
        if isinstance(hardware_data.get("network"), list):
            for nic in hardware_data["network"]:
                if isinstance(nic, dict) and "model" in nic:
                    nic_models.append(nic["model"])

        memory_val = hardware_data.get("memory", {}).get("installed_capacity_human_val", "Unknown")
        memory_unit = hardware_data.get("memory", {}).get("installed_capacity_human_unit", "")
        memory_str = f"{memory_val} {memory_unit}" if isinstance(memory_val, (int, float, str)) else "Unknown"

        return {
            "cpu_model": cpu_model if isinstance(cpu_model, str) else "Unknown",
            "cpu_cores": cpu_cores if isinstance(cpu_cores, int) else "Unknown",
            "cpu_threads": cpu_threads if isinstance(cpu_threads, int) else "Unknown",
            "memory": f"RAM: {memory_str}" if memory_str != "Unknown" else "Unknown",
            "nic_models": "<br>".join(nic_models) if nic_models else "No NICs detected",
        }

    except (json.JSONDecodeError, KeyError, TypeError):
        return {
            "cpu_model": "Unknown", "cpu_cores": "Unknown", "cpu_threads": "Unknown",
            "memory": "Unknown", "nic_models": "Unknown"
        }

# Load node metadata and hardware details
nodes_info = load_rocrate_metadata()
nodes_df = pd.DataFrame(nodes_info)
hardware_details = [extract_hardware_info(node["hardware_json"]) for node in nodes_info]
hardware_df = pd.DataFrame(hardware_details)
nodes_df = pd.concat([nodes_df, hardware_df], axis=1)

# Remove "hardware_json" column (no longer needed)
nodes_df.drop(columns=["hardware_json"], inplace=True)

# Extract testbed name from FQDN
def extract_testbed(fqdn):
    parts = fqdn.split(".")
    if len(parts) > 1:
        return parts[1].capitalize()  # Extract second-level domain
    return "Unknown"

nodes_df["Testbed"] = nodes_df["fqdn"].apply(extract_testbed)

# Map known testbeds to their websites
testbed_urls = {
    "Baltikum": "https://kaunas.net.cit.tum.de/",
    "Blockchain": "https://coinbase.net.cit.tum.de/"
}

# Convert testbed names to clickable links
def make_testbed_link(testbed):
    url = testbed_urls.get(testbed, "Unknown")
    return f'<a href="{url}" target="_blank">{testbed}</a>' if url != "Unknown" else "Unknown"

nodes_df["Testbed"] = nodes_df["Testbed"].apply(make_testbed_link)

# Convert topology path to clickable links
def make_clickable(path):
    return f'<a href="{path}" target="_blank">Open PDF</a>' if path != "None" else "No topology available"

nodes_df["topology_pdf"] = nodes_df["topology_pdf"].apply(make_clickable)

# Rename columns for better readability
nodes_df.rename(columns={
    "name": "Name",
    "fqdn": "FQDN",
    "topology_pdf": "Topology",
    "cpu_model": "CPU",
    "cpu_cores": "Cores",
    "cpu_threads": "Threads",
    "memory": "Memory",
    "nic_models": "NICs",
    "Testbed": "Testbed"
}, inplace=True)

# Define column order
nodes_df = nodes_df[["Name", "FQDN", "Testbed", "Topology", "CPU", "Cores", "Threads", "Memory", "NICs"]]

# Generate an HTML-styled table
html_table = nodes_df.to_html(escape=False)
styled_table = f"""
<style>
    table {{ width: 90%; border-collapse: collapse; margin: 20px 0; }}
    th, td {{ padding: 8px 12px; border: 1px solid #ddd; text-align: left; }}
    th {{ background-color: #f4f4f4; font-weight: bold; }}
</style>
{html_table}
"""

display(HTML(styled_table))

## Loading and Previewing Data

The energy measurement data is stored in CSV format, with each node having its own folder inside the `energy` directory.

The dataset includes:
- **Timestamp** (`timestamp`): Time when the measurement was recorded.
- **Current** (`current_mA`): Measured current in milliamps (mA).
- **Voltage** (`voltage_V`): Measured voltage in volts (V).
- **Power Consumption** (`power_active_W`): Active power in watts (W).
- **Energy Counter** (`energy_counter_Wh`): Cumulative energy usage in watt-hours (Wh).

Below, we load the data and display a preview.

In [None]:
sns.set_theme(style="whitegrid")
plt.rcParams.update({"axes.titlesize": 14, "axes.labelsize": 12})

def load_energy_data():
    """
    Load all CSV files from the energy folder inside the result folder.
    Each node has its own subfolder containing multiple _runXX.csv files.
    """
    energy_folder = os.path.join(RESULT_FOLDER, "energy")
    if not os.path.exists(energy_folder):
        raise FileNotFoundError(f"Energy folder not found: {energy_folder}")

    all_data = []

    for node in os.listdir(energy_folder):
        node_path = os.path.join(energy_folder, node)
        if os.path.isdir(node_path):
            for file in os.listdir(node_path):
                if file.endswith(".csv") and "_run" in file:
                    file_path = os.path.join(node_path, file)
                    print(file_path)
                    df = pd.read_csv(file_path)

                    df["timestamp"] = pd.to_datetime(df["timestamp"], format="%Y%m%d%H%M%S%f")
                    df["node"] = node
                    df["run"] = file.split("_run")[-1].split(".")[0]  # Extract run number
                    all_data.append(df)

    if not all_data:
        raise ValueError("No valid CSV files found in the energy folder.")

    return pd.concat(all_data, ignore_index=True)

df = load_energy_data()

def remove_outliers(df):
    """
    Removes extreme outliers from all numeric columns using the IQR method.
    """
    numeric_cols = df.select_dtypes(include=[np.number]).columns

    for col in numeric_cols:
        Q1 = df[col].quantile(0.25)
        Q3 = df[col].quantile(0.75)
        IQR = Q3 - Q1

        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR

        df = df[(df[col] >= lower_bound) & (df[col] <= upper_bound)]

    return df

df = remove_outliers(df)
display(df.head())

## Data Overview

After loading the data, we analyze its structure using summary statistics.  
This helps in identifying potential issues such as missing values, anomalies, or trends.

In [None]:
temp_df = df.drop(columns=['node', 'run'])
display(temp_df.describe(exclude=[np.datetime64]))

## Power Consumption Over Time

The following plot shows the power consumption trends over time for different nodes.  
This helps us observe variations in power usage and detect potential anomalies.

In [None]:
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x="timestamp", y="power_active_W", hue="node", style="run", linewidth=2)
plt.xlabel("Timestamp")
plt.ylabel("Power Consumption (W)")
plt.title("Power Consumption Over Time")
plt.xticks(rotation=45)
plt.legend(title="Node / Run", bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

## Cumulative Energy Consumption

The energy counter represents the cumulative energy consumed over time.  
This plot provides insights into the total energy usage per node and how it changes over the experiment duration.

In [None]:
df["energy_counter_mWh"] = df["energy_counter_Wh"] * 1000

plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x="timestamp", y="energy_counter_mWh", hue="node", style="run", linewidth=2)

plt.xlabel("Timestamp")
plt.ylabel("Cumulative Energy (mWh)")
plt.title("Cumulative Energy Consumption in mWh")
plt.xticks(rotation=45)
plt.legend(title="Node / Run", bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()

## Current and Voltage Trends

To better understand the electrical characteristics, we visualize:
- **Current (mA) over time** to see how power draw fluctuates.
- **Voltage (V) over time** to ensure stability across measurements.

In [None]:
df["voltage_V_smoothed"] = df["voltage_V"].rolling(window=5, min_periods=1).mean()
first_timestamps = df.groupby("run")["timestamp"].min()

fig, axes = plt.subplots(2, 1, figsize=(12, 12), sharex=True)

sns.lineplot(data=df, x="timestamp", y="current_mA", hue="node", style="run", linewidth=2, ax=axes[0])
axes[0].set_ylabel("Current (mA)")
axes[0].set_title("Current Trend Over Time")
axes[0].legend(title="Node / Run", bbox_to_anchor=(1.05, 1), loc='upper left')

sns.lineplot(data=df, x="timestamp", y="voltage_V_smoothed", hue="node", style="run", linewidth=2, ax=axes[1])
axes[1].set_ylabel("Voltage (V)")
axes[1].set_title("Voltage Trend Over Time (Smoothed)")
axes[1].set_xlabel("Timestamp")
axes[1].legend(title="Node / Run", bbox_to_anchor=(1.05, 1), loc='upper left')

axes[1].set_ylim(df["voltage_V"].min() - 1, df["voltage_V"].max() + 1)

for run, ts in first_timestamps.items():
    for ax in axes:
        ax.axvline(x=ts, color="black", linestyle="dashed", alpha=0.7)
        ax.text(ts, ax.get_ylim()[1] * 0.95, f"Run {run}", rotation=90, verticalalignment="top", fontsize=10, color="black")

plt.xticks(rotation=45)
plt.subplots_adjust(hspace=0.3, bottom=0.15)
plt.show()

### Energy Consumption Rate Over Time

This plot shows the **rate at which energy is consumed over time (mW/s)**.  
Instead of cumulative energy, this visualization helps identify **periods of high workload**.  
A higher energy rate means that the system was **actively consuming more power**,  
which may indicate high CPU load or network traffic.


TODO -> fix

In [None]:
df = df.sort_values("timestamp").reset_index(drop=True)

plt.figure(figsize=(12, 6))
plt.plot(df["timestamp"], df["energy_counter_Wh"], marker="o", linestyle="-")
plt.xlabel("Timestamp")
plt.ylabel("Energy Counter (Wh)")
plt.title("Energy Counter Over Time (Raw Data)")
plt.xticks(rotation=45)
plt.show()

df["energy_diff"] = df["energy_counter_Wh"].diff()
df["energy_adjusted"] = df["energy_counter_Wh"]
df.loc[df["energy_diff"] < 0, "energy_adjusted"] = np.nan

df["energy_adjusted"] = df["energy_adjusted"].ffill()
df["energy_corrected_diff"] = df["energy_adjusted"].diff()
df["energy_rate_mW"] = (df["energy_corrected_diff"] * 1000) / df["timestamp"].diff().dt.total_seconds()

plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x="timestamp", y="energy_rate_mW", hue="node", style="run", linewidth=2)
plt.xlabel("Timestamp")
plt.ylabel("Energy Consumption Rate (mW/s)")
plt.title("Rate of Energy Consumption Over Time (Corrected)")
plt.xticks(rotation=45)
plt.legend(title="Node / Run", bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Total Energy Consumption per Node
df_grouped = df.groupby("node")["energy_counter_Wh"].max()
plt.figure(figsize=(8, 5))
df_grouped.plot(kind="bar", color="skyblue")
plt.ylabel("Total Energy (Wh)")
plt.title("Total Energy Consumption Per Node")
plt.xticks(rotation=45)
plt.show()

## Summary & Findings

Based on the visualizations and statistical analysis, we can derive the following insights:

- The power consumption varies across different nodes and runs.
- The cumulative energy consumption follows an increasing trend over time.
- Voltage and current appear stable with minor fluctuations.

Further analysis could involve:
- Identifying periods of peak energy usage.
- Comparing nodes to find efficiency variations.
- Investigating external factors influencing power consumption.