### Before you use the code

You need to install a python package called `pykalman`\
`pip install pykalman`

Don't use the algorithm on large amount of data.
If you want to use it, partition data into small chunks with values on
both sides of the gap and run the partitions individually, and then merge
them back. The algorithm is computationally intensive, if you try to apply
directly on a large dataset, it will take forever.

In [14]:
import numpy as np
import pandas as pd
from pykalman import KalmanFilter

# Read the data into a pandas dataframe
df = pd.read_csv('data_gaps.csv')

# Extract the 'HourlyAltimeterSetting' column and convert to numpy array
Qobs = df['HourlyAltimeterSetting'].to_numpy()

# Mask the NaN values
Qobs_masked = np.ma.masked_invalid(Qobs)

print('Before filling the gaps: \n', Qobs_masked)

# Define the Kalman Filter parameters for a 1-dimensional time series
transition_matrices = [1]  # State transition matrix (1D)
observation_matrices = [1]  # Observation matrix (1D)

# Estimate the process and observation covariances
# Use a small value if variance can't be computed due to masked data
if np.ma.var(Qobs_masked):
    observation_covariance = np.ma.var(Qobs_masked)
else:
    observation_covariance = 1e-3  # Small value to prevent errors

transition_covariance = observation_covariance  # Assuming same as observation covariance

# Initialize the Kalman Filter
kf = KalmanFilter(
    transition_matrices=transition_matrices,
    observation_matrices=observation_matrices,
    transition_covariance=transition_covariance,
    observation_covariance=observation_covariance,
    initial_state_mean=Qobs_masked[~Qobs_masked.mask][0],  # First valid observation
    initial_state_covariance=1
)

# Use EM algorithm to estimate parameters
kf = kf.em(Qobs_masked, n_iter=10)

# Perform Kalman smoothing
smoothed_state_means, smoothed_state_covariances = kf.smooth(Qobs_masked)

# Replace the missing values with the smoothed estimates
Qobs_filled = Qobs.copy()
Qobs_filled[np.isnan(Qobs_filled)] = smoothed_state_means[np.isnan(Qobs_filled)].flatten()

# print('Predicted values: \n', smoothed_state_means)
print('Data after filling the gaps: \n', Qobs_filled)


Before filling the gaps: 
 [30.0 30.0 30.02 30.01 30.01 30.02 30.02 30.02 30.04 30.06 30.08 30.08
 30.1 30.1 -- 30.06 30.03 30.03 30.0 30.0 29.98 29.99 30.0 29.99 30.02
 30.02 30.02 30.02 30.01 30.0 30.02 30.01 30.01 29.99 29.99 29.99 29.99
 30.0 30.01 -- -- 29.99 29.99 29.96 29.96 29.98 29.98 29.98 29.98 30.02
 29.99 29.97 29.96 29.96 29.95 29.94 29.95 29.94 29.94 29.93 29.93 29.92
 29.93 29.9 29.91 29.92 29.91 29.92 29.89 29.88 29.87 29.83 29.8 -- 29.79
 29.76 29.72 29.7 29.69 29.69 29.69 29.7 29.71 29.73 29.73 29.75 29.77
 29.78 -- 29.78 29.79 29.8 29.82 29.83 29.86 29.87 29.89 29.93 29.95 29.97
 29.98 29.97 29.97 29.95 29.95 29.96 -- 30.01 30.03 30.04 30.04 30.04
 30.04 30.05 30.04 -- 30.04 30.04 30.04 30.04 30.03 30.03 30.03 30.05
 30.04 30.04 30.04 -- 30.05 30.05 30.05 30.06 30.06 30.06 30.06 30.06
 30.06 30.06 30.07 30.08 30.08 30.07 30.03 29.99 29.95 29.93 -- 29.92
 29.91 29.93 29.93 29.93 29.92 29.91 29.9 -- 29.89]
Data after filling the gaps: 
 [30.         30.         30.02 