<a href="https://colab.research.google.com/github/uss111kr/SHM-and-PedM/blob/main/Uma_Shankar_Singh_Guided_Project_SHM_and_PredM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Structural Health Monitoring and Predictive Maintenance

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**Structural Health Monitoring (SHM)** is the process of collecting and analyzing data related to structural health of a structure usually through autonomous monitoring systems.

Topics in SHM:
1. Signal Processing
2. Time Series Analysis
3. Feature Extraction

**Predictive Maintenance (PredM)** is the maintenance strategy that utilizes data (usually obtained through SHM) to make informed decision to carry out maintenance activities so as to optimize resources usage and minimize risks of failure. This contrasts with other maintenance philosophies like reactive maintenance and scheduled maintenance.

Topics in PredM:
1. Anomaly Detection
2. Remaining Useful Life Prediction

### The Vänersborg Bridge

The Vanersborg bridge is a catenary steel bridge in Sweden. A monitoring system was set up and measurements started in 2021 for the bridge. [Read More](https://www.sciencedirect.com/science/article/pii/S2352340923008004?via%3Dihub)

![Vanersborg Bridge](https://usercontent.one/wp/www.iotbridge.se/wp-content/uploads/2024/05/Vanersborg-bridge.jpg?media=1641573162)

The subset of data collected is publicly available for research purposes. It can be obtained from [this link](https://zenodo.org/records/8300495).

*[Learn about the application of SHM & PredM in this post.](https://www.iotbridge.se/application-vanersborg/)*



### About the dataset


The obtained data is collection of several time series sampled at 200Hz during opening events. The time series consists data for:
1. Acceleration at 5 points (channels 18 to 22)
2. Strain at 16 points (channels 4 to 19)
3. Inclination (channel 25)
4. Weather parameters - wind direction, wind speed and air temperature (channels 28 to 30)

The data for each opening event is stored in separate CSV file. Each file is processed to a single record with important values on time and frequency domains. For the 64 files, we obtain a processed data with 64 records.

The processed data contains `min`, `max`, `rms` and `fmax` (peak frequency greater than 0.5 Hz obtained through FFT) values.

### Task 1: Load the Merged Dataset

In [None]:
import pandas as pd

csv_path = '/content/drive/MyDrive/Python/shm_data-20260112T030325Z-3-001/shm_data/merged.csv'

df = pd.read_csv(csv_path)

df.head(3)

Unnamed: 0,timestamp,duration,strain_1_max,strain_1_min,strain_1_rms,strain_1_fmax,strain_2_max,strain_2_min,strain_2_rms,strain_2_fmax,...,wind_direction_rms,wind_direction_fmax,wind_speed_max,wind_speed_min,wind_speed_rms,wind_speed_fmax,air_temperature_max,air_temperature_min,air_temperature_rms,air_temperature_fmax
0,2023-02-20 03:17:23,637.0,485.66,429.48,443.849044,2.092244,208.19,143.38,157.148308,0.797344,...,228.275647,0.47872,5.6,2.7,3.836093,0.48186,4.1,3.8,3.951086,0.048657
1,2023-02-20 03:10:53,249.0,485.68,428.55,459.065696,2.094955,206.43,143.28,172.359422,0.794638,...,219.67429,0.07224,6.3,2.8,4.856191,0.020067,3.8,3.7,3.772498,0.01204
2,2023-02-21 07:54:28,438.0,493.6,430.54,453.660872,2.094862,224.2,150.53,172.685065,0.798695,...,217.624322,0.025102,6.8,0.7,2.571939,0.518011,4.5,4.2,4.305831,0.073024


### Task 2: Anomaly Detection

1. Train a `IsolationForest` model for anomaly detection.
2. Use `decision_function` method on the trained model to calculate `anomaly_score` for each data point.
3. Find the `timestamp` of lowest score and mark it as the "failure event".

In [None]:
from sklearn.ensemble import IsolationForest

# where is useful to filter indices of a numpy array
# based on a condition
from numpy import where

# Filter out the timestamp column
filtered_df = df.loc[:, df.columns[1:]]

# Train the model using "fit" method
model = IsolationForest().fit(filtered_df)

# Calculate anomaly scores using "decision_function" method
scores = model.decision_function(filtered_df)
df['anomaly_score'] = scores

# Get predictions (1 for inliers, -1 for outliers)
predictions = model.predict(filtered_df)

anomaly_indices = where(predictions == -1)

# Display anomalous rows using the array of indices
display(df.iloc[anomaly_indices[0]])

sorted_df = df.sort_values(by = 'anomaly_score')

display(sorted_df[['timestamp', 'anomaly_score']].head(5))

# Find the index of the lowest anomaly score
min_anomaly_score_index = sorted_df.index[0]

Unnamed: 0,timestamp,duration,strain_1_max,strain_1_min,strain_1_rms,strain_1_fmax,strain_2_max,strain_2_min,strain_2_rms,strain_2_fmax,...,wind_direction_fmax,wind_speed_max,wind_speed_min,wind_speed_rms,wind_speed_fmax,air_temperature_max,air_temperature_min,air_temperature_rms,air_temperature_fmax,anomaly_score
3,2023-02-22 09:24:10,252.0,476.78,417.05,450.258183,48.221265,207.49,139.47,168.967088,0.789667,...,0.476181,4.0,1.4,2.667053,0.523799,-0.3,-0.4,0.369762,0.003968,-0.002073
7,2023-02-21 08:13:01,247.0,494.3,432.37,467.278947,0.797555,227.15,155.03,186.978787,0.797555,...,0.008097,5.3,0.7,2.970854,0.481772,4.4,4.2,4.231006,0.064776,-0.038714
53,2023-03-09 21:53:28,427.0,470.3,411.22,432.859463,50.128763,177.12,109.4,129.73316,0.798333,...,0.051505,0.6,0.1,0.213706,0.009365,-6.3,-6.3,6.3,0.028094,-0.041571
55,2023-03-08 06:25:19,921.0,476.19,404.71,421.101886,49.789932,184.27,114.42,126.303322,0.794676,...,0.023884,2.5,0.1,1.020857,0.032569,-7.5,-8.4,7.992406,0.009771,-0.040689
59,2023-03-09 23:45:25,1103.0,466.56,404.64,417.762618,47.38507,178.7,108.42,120.9958,0.796715,...,0.006345,1.9,0.1,0.595903,0.142303,-6.4,-7.2,6.901127,0.049851,-0.113807


Unnamed: 0,timestamp,anomaly_score
59,2023-03-09 23:45:25,-0.113807
53,2023-03-09 21:53:28,-0.041571
55,2023-03-08 06:25:19,-0.040689
7,2023-02-21 08:13:01,-0.038714
3,2023-02-22 09:24:10,-0.002073


### Task 3: Remaining Useful Life (RUL)

1. Find the timestamp with lowest `anomaly_score` as `failure_timestamp`.
2. Set all records before the failure to `pre_damage_df`.
3. For the `pre_damage_df` estimate the remaining useful life (RUL) by `timestamp - failure_timestamp`

In [None]:
failure_timestamp = df.loc[min_anomaly_score_index,'timestamp']

print(f"The timestamp of the predicted failure event is: {failure_timestamp}")

The timestamp of the predicted failure event is: 2023-03-09 23:45:25


In [None]:
# convert the timestamps to "datetime" objects for comparison
df['timestamp'] = pd.to_datetime(df['timestamp'])
failure_timestamp = pd.to_datetime(failure_timestamp)

# Filter data to include only observations before the failure event
pre_failure_df = df[df['timestamp'] < failure_timestamp].copy()

# Remaining Useful Life (RUL) for each pre-failure data point
pre_failure_df['RUL'] = (failure_timestamp - pre_failure_df['timestamp']).dt.total_seconds() / (24 * 3600)

display(pre_failure_df[['timestamp', 'RUL']].head())

# Answer the first question: What is the "maximum" useful life recorded?
max_rul = pre_failure_df['RUL'].max()
print(f"The maximum useful life recorded is: {max_rul:.2f} days")

Unnamed: 0,timestamp,RUL
0,2023-02-20 03:17:23,17.852801
1,2023-02-20 03:10:53,17.857315
2,2023-02-21 07:54:28,16.660382
3,2023-02-22 09:24:10,15.59809
4,2023-02-20 20:35:06,17.132164


The maximum useful life recorded is: 17.86 days
