# Thermal Neural Networks (Pytorch example)

This jupyter notebook showcases how to utilize a thermal neural network (TNN) on an exemplary data set.
This example is more concise than the TF2 example and standalone (not utilizing the "tf2utils" package) but lacks a few convenient mechanics in turn (e.g., validation set, early stopping, plotting helpers etc.)

The data set can be downloaded from [Kaggle](https://www.kaggle.com/wkirgsn/electric-motor-temperature).
It should be placed in `data/input/`.

In [0]:
%pip install numpy pandas matplotlib torch scipy tqdm
from pathlib import Path
import numpy as np
import pandas as pd
from tqdm import tqdm
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn import Parameter as TorchParam
from torch import Tensor
from typing import List, Tuple

from scipy.stats import norm
import plotly.graph_objs as go
from plotly.subplots import make_subplots
import plotly.io as pio


[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


## Data setup

In [0]:
MeasurementData_Merged = pd.read_csv("MeasurementData_eACL_Merged.csv")
# MeasurementData_Merged = MeasurementData_Merged.rename(columns={'Rotor Temperature TelemetriesExternTemp.rp_rbe_Cif_10ms_PIExternTemp1.rbe_Cif': 'ExternTemp.rp_rbe_Cif_10ms_PIExternTemp1.rbe_Cif', 'Stator Temperature TelemetriesI_EM_tTMotWinDeLay2Cl03': 'I_EM_tTMotWinDeLay2Cl03'})
# MeasurementData_Merged.to_pickle('MeasurementData_Merged.pkl')
# MeasurementData_Merged['pm'] = MeasurementData_Merged[['ExternTemp.rp_rbe_Cif_10ms_PIExternTemp1.rbe_Cif', 'ExternTemp.rp_rbe_Cif_10ms_PIExternTemp10.rbe_Cif', 'ExternTemp.rp_rbe_Cif_10ms_PIExternTemp5.rbe_Cif','ExternTemp.rp_rbe_Cif_10ms_PIExternTemp7.rbe_Cif']].mean(axis=1)
MeasurementData_Merged['stator_winding'] = MeasurementData_Merged[['tStatorHotPha1', 'tStatorHotPha2','tStatorHotPha3']].max(axis=1)
MeasurementData_Merged['Us'] = np.sqrt(MeasurementData_Merged['uDaFundaFild']**2 + MeasurementData_Merged['uDaFundaFild']**2)
MeasurementData_Merged['Is'] = np.sqrt(MeasurementData_Merged['iDaFild']**2 + MeasurementData_Merged['iQaFild']**2)

# input_cols = ['nEmFild',
#             'tqEmFild',
#             'iDaFild',
#             'iQaFild',
#             'Is',
#             'uDaFundaFild',
#             'uQaFundaFild',
#             'Us',
#             'tCooltIvtrOut',
#             'tOilGbxSnsr',
#             'vfCoolt'
#             ]
# input_cols = ['nEmFild',
#             'tqEmFild',
#             'iDaFild',
#             'iQaFild',
#             'Is',
#             'uDaFundaFild',
#             'uQaFundaFild',
#             'Us',
#             'tCooltIvtrOut',
#             'vfCoolt'
#             ]
input_cols = ['nEmFild',
            'tqEmFild',
            'iDaFild',
            'iQaFild',
            'uDaFundaFild',
            'uQaFundaFild',
            'tCooltIvtrOut',
            'vfCoolt'
            ]

In [0]:
# path_to_csv = Path().cwd() / "data" / "input" / "measures_v2.csv"
data = MeasurementData_Merged.copy()
# data = data[data['pm'] <= 200]
# data = data[data['stator_winding'] <= 150]
# target_cols = ['pm', 'stator_winding']
target_cols = ['pm', 'stator_winding','tOilGbxSnsr']
# target_cols = ['pm']
# target_cols = ['stator_winding']
# temperature_cols = target_cols + ['tOilGbxSnsr','tCooltIvtrOut']
temperature_cols = target_cols + ['tCooltIvtrOut']
# test_profiles = [240, 280, 283, 290]
train_profiles = [p for p in data.profile_id.unique()]
test_profiles = train_profiles[-86:]  # last 86 profiles for training
train_profiles = [p for p in data.profile_id.unique() if p not in test_profiles]
profile_sizes = data.groupby("profile_id").agg("size")

# normalize
non_temperature_cols = [c for c in data if c in input_cols and c not in temperature_cols]
data.loc[:, temperature_cols] /= 200  # deg C
data.loc[:, non_temperature_cols] /= data.loc[:, non_temperature_cols].abs().max(axis=0)

# # extra feats (FE)
# if {"i_d", "i_q", "u_d", "u_q"}.issubset(set(data.columns.tolist())):
#     extra_feats = {
#         "i_s": lambda x: np.sqrt((x["i_d"] ** 2 + x["i_q"] ** 2)),
#         "u_s": lambda x: np.sqrt((x["u_d"] ** 2 + x["u_q"] ** 2)),
#     }
# data = data.assign(**extra_feats)
# input_cols = [c for c in data.columns if c not in target_cols]
# device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# overwrite. We recommend CPU over GPU here, as that runs faster with pytorch on this data set
device = torch.device("cpu")

In [0]:
# Rearrange features
# input_cols = [c for c in data.columns if c not in target_cols + ["profile_id"]]
data = data.loc[:, input_cols + ["profile_id"] + target_cols].dropna()

def generate_tensor(profiles_list):
    """Returns profiles of the data set in a coherent 3D tensor with
    time-major shape (T, B, F) where
    T : Maximum profile length
    B : Batch size = Amount of profiles
    F : Amount of input features.

    Also returns a likewise-shaped sample_weights tensor, which zeros out post-padded zeros for use
    in the cost function (i.e., it acts as masking tensor)"""
    tensor = np.full(
        (profile_sizes[profiles_list].max(), len(profiles_list), data.shape[1] - 1),
        np.nan,
    )
    for i, (pid, df) in enumerate(
        data.loc[data.profile_id.isin(profiles_list), :].groupby("profile_id")
    ):
        assert pid in profiles_list, f"PID is not in {profiles_list}!"
        tensor[: len(df), i, :] = df.drop(columns="profile_id").to_numpy()
    sample_weights = 1 - np.isnan(tensor[:, :, 0])
    tensor = np.nan_to_num(tensor).astype(np.float32)
    tensor = torch.from_numpy(tensor).to(device)
    sample_weights = torch.from_numpy(sample_weights).to(device)
    return tensor, sample_weights


train_tensor, train_sample_weights = generate_tensor(train_profiles)
test_tensor, test_sample_weights = generate_tensor(test_profiles)

## Model declaration

In [0]:
class DiffEqLayer(nn.Module):
    """This class is a container for the computation logic in each step.
    This layer could be used for any 'cell', also RNNs, LSTMs or GRUs."""

    def __init__(self, cell, *cell_args):
        super().__init__()
        self.cell = cell(*cell_args)

    def forward(self, input: Tensor, state: Tensor) -> Tuple[Tensor, Tensor]:
        inputs = input.unbind(0)
        outputs = torch.jit.annotate(List[Tensor], [])
        for i in range(len(inputs)):
            out, state = self.cell(inputs[i], state)
            outputs += [out]
        return torch.stack(outputs), state


class TNNCell(nn.Module):
    """The main TNN logic. Here, the sub-NNs are initialized as well as the constant learnable
    thermal capacitances. The forward function houses the LPTN ODE discretized with the explicit Euler method
    """

    def __init__(self):
        super().__init__()
        self.sample_time = 0.5  # in s
        self.output_size = len(target_cols)
        self.caps = TorchParam(torch.Tensor(self.output_size).to(device))
        nn.init.normal_(
            self.caps, mean=-9.2, std=0.5
        )  # hand-picked init mean, might be application-dependent
        n_temps = len(temperature_cols)  # number of temperatures (targets and input)
        n_conds = int(0.5 * n_temps * (n_temps - 1))  # number of thermal conductances
        # conductance net sub-NN
        self.conductance_net = nn.Sequential(
            nn.Linear(len(input_cols) + self.output_size, n_conds), 
            nn.ReLU(),         # Activation function added by Yuping
            nn.Linear(n_conds, n_conds), # Additional Layer added by Yuping
            nn.Sigmoid()
        )
        # populate adjacency matrix. It is used for indexing the conductance sub-NN output
        self.adj_mat = np.zeros((n_temps, n_temps), dtype=int)
        adj_idx_arr = np.ones_like(self.adj_mat)
        triu_idx = np.triu_indices(n_temps, 1)
        adj_idx_arr = adj_idx_arr[triu_idx].ravel()
        self.adj_mat[triu_idx] = np.cumsum(adj_idx_arr) - 1
        self.adj_mat += self.adj_mat.T
        self.adj_mat = torch.from_numpy(self.adj_mat[: self.output_size, :]).type(
            torch.int64
        )  # crop
        self.n_temps = n_temps

        # power loss sub-NN
        self.ploss = nn.Sequential(
            nn.Linear(len(input_cols) + self.output_size, 16),
            nn.ReLU(),         # Activation function added by Yuping
            nn.Linear(16, 16), # Additional Layer added by Yuping
            nn.Tanh(),
            nn.Linear(16, self.output_size),
        )

        self.temp_idcs = [i for i, x in enumerate(input_cols) if x in temperature_cols]
        self.nontemp_idcs = [
            i
            for i, x in enumerate(input_cols)
            if x not in temperature_cols + ["profile_id"]
        ]

    def forward(self, inp: Tensor, hidden: Tensor) -> Tuple[Tensor, Tensor]:
        prev_out = hidden
        temps = torch.cat([prev_out, inp[:, self.temp_idcs]], dim=1)
        sub_nn_inp = torch.cat([inp, prev_out], dim=1)
        conducts = torch.abs(self.conductance_net(sub_nn_inp))
        power_loss = torch.abs(self.ploss(sub_nn_inp))
        temp_diffs = torch.sum(
            (temps.unsqueeze(1) - prev_out.unsqueeze(-1)) * conducts[:, self.adj_mat],
            dim=-1,
        )
        out = prev_out + self.sample_time * torch.exp(self.caps) * (
            temp_diffs + power_loss
        )
        return prev_out, torch.clip(out, -1, 5)

## Training

In [0]:
# 模型保存路径
mdl_path = Path.cwd() / 'data' / 'models'
mdl_path.mkdir(exist_ok=True, parents=True)

for k in range(10):  # optimize the model for 10 times with different initial parameter values
    model = torch.jit.script(DiffEqLayer(TNNCell).to(device))
    loss_func = nn.MSELoss(reduction="none")
    opt = optim.Adam(model.parameters(), lr=1e-3)
    n_epochs = 500 #100
    tbptt_size = 100 #512

    n_batches = np.ceil(train_tensor.shape[0] / tbptt_size).astype(int)
    with tqdm(desc="Training", total=n_epochs) as pbar:
        for epoch in range(n_epochs):
            # first state is ground truth temperature data
            hidden = train_tensor[0, :, -len(target_cols) :]

            # propagate batch-wise through data set
            for i in range(n_batches):
                model.zero_grad()
                output, hidden = model(
                    train_tensor[
                        i * tbptt_size : (i + 1) * tbptt_size, :, : len(input_cols)
                    ],
                    hidden.detach(),
                )
                loss = loss_func(
                    output,
                    train_tensor[
                        i * tbptt_size : (i + 1) * tbptt_size, :, -len(target_cols) :
                    ],
                    # output[:,:,-2],     # adapted by Yuping to isolate the channel of stator winding temperature
                    # train_tensor[
                    #     i * tbptt_size : (i + 1) * tbptt_size, :, -2    # adapted by Yuping to isolate the channel of stator winding temperature
                    # ],
                )
                # sample_weighting
                # loss = loss[:,:,None]  # add by Yuping to increase the dimension of loss from 2 to 3
                loss = (
                    (
                        loss
                        * train_sample_weights[
                            i * tbptt_size : (i + 1) * tbptt_size, :, None
                        ]
                        / train_sample_weights[
                            i * tbptt_size : (i + 1) * tbptt_size, :
                        ].sum()
                    )
                    .sum()
                    .mean()
                )
                loss.backward()
                opt.step()

            # reduce learning rate
            if epoch == int(n_epochs*0.75):
                for group in opt.param_groups:
                    group["lr"] *= 0.5
            pbar.update()
            pbar.set_postfix_str(f"loss: {loss.item():.2e}")
    # model saving
    suffix = f"_{k}"  # 后缀格式示例：_0, _1, ...
    mdl_file_name = f"tnn_jit_torch_RTM_STM_Oil_Joint_eACL{suffix}.pt"
    mdl_file_path = mdl_path / mdl_file_name

    # 保存模型（添加异常处理）
    try:
        model.save(mdl_file_path)  # PyTorch的JIT保存
        print(f"model is saved to: {mdl_file_path}")
    except Exception as e:
        print(f"error of model saving（ieration{k}）: {str(e)}")
    

Training:   0%|          | 0/500 [00:00<?, ?it/s]Training:   0%|          | 1/500 [00:38<5:19:17, 38.39s/it]Training:   0%|          | 1/500 [00:38<5:19:17, 38.39s/it, loss: 6.28e-02]Training:   0%|          | 2/500 [01:18<5:28:43, 39.61s/it, loss: 6.28e-02]Training:   0%|          | 2/500 [01:18<5:28:43, 39.61s/it, loss: 1.55e-02]Training:   1%|          | 3/500 [01:57<5:24:58, 39.23s/it, loss: 1.55e-02]Training:   1%|          | 3/500 [01:57<5:24:58, 39.23s/it, loss: 2.76e-03]

com.databricks.backend.common.rpc.CommandCancelledException
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$5(SequenceExecutionState.scala:136)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3(SequenceExecutionState.scala:136)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3$adapted(SequenceExecutionState.scala:133)
	at scala.collection.immutable.Range.foreach(Range.scala:158)
	at com.databricks.spark.chauffeur.SequenceExecutionState.cancel(SequenceExecutionState.scala:133)
	at com.databricks.spark.chauffeur.ExecContextState.cancelRunningSequence(ExecContextState.scala:717)
	at com.databricks.spark.chauffeur.ExecContextState.$anonfun$cancel$1(ExecContextState.scala:435)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.ExecContextState.cancel(ExecContextState.scala:435)
	at com.databricks.spark.chauffeur.ExecutionContextManagerV1.can

The performance that is achievable by the hybridization of LPTNs with neural networks is unprecedented and not achievable by pure LPTN or pure black-box ML models.
Note that the visualized performance stems from training a TNN from scratch once. All neural networks are initialized randomly when their training by gradient descent begins.
This means that better performance can be easily achieved by repeating this experiment since the convergence into better local minima becomes likely.