# Prediction on Production of Oil Well with AttentionCNN-LSTM

Authors: S Pan, J Wang, W Zhou

Published in: Journal of Physics: Conference Series (Volume 2030, Paper 012038), 2021. Presented at ICEECT 2021 conference

## 1. The Problem


Oil well production prediction is crucial for efficient resource management in the petroleum industry. Traditional methods like curve analysis and mathematical modeling are limited in accuracy due to the complexity of external factors affecting production. Machine learning techniques, such as ARIMA, BP neural networks, and SVR, have been used but suffer from limitations like data stability requirements, poor scalability, and susceptibility to local minima. Deep learning approaches, including CNNs and LSTMs, offer better predictive power, but individual models struggle with stability in long-term sequence forecasting. The Attention-CNN-LSTM model is proposed to address these challenges.

## 2. Related work

In the early stage of oilfield development, the curve analysis method and mathematical modeling methods are widely used. 

The authors mention, that "The traditional machine learning methods generally require, that all data should be put into the memory during training". I disagree with them on this topic. In some models - yes we sometimes require a lot of the data initially, but still we can expertiment and do fine with partial fitting the data in most of machine learning models. 

Currently LSTM's are used in production predictions of an oil well and have achieved good results. However, due to the harsh udnerground production envrionment, the oil production data usually contains multiple noise components, which are non linear and non stationary time series. That is the reason, why the paper combines CNN, LSTM and Attention mechanism to construct a production prediction model. I also disagree partially with that, since LSTM alone is enough to handle nonlinear data, due to the gated mechanism, that allow it to capture complex dependendencies.

## 3. Methodology

$$\{\hat{y}_t\}_{t=T+1}^{T+\Delta} = F\left(\{x_t\}_{t=1}^{T}, \{y_t\}_{t=1}^{T} \right)$$

The production prediction of an oil well uses the timeseries of X and the actual oil well production y as inputs to construct a model to predict y in the future.

The model, that will be constructed is constisting of:

- CNN

The input data will be passed to the CNN layer. It can bе abstract and express the original oil production data at a higher level. The features of the original oil production data are processed by CNN, the correlation between the multi-dimensional data is mined and noises are removed.

- LSTM

The data is passed on to LSTM layers.

- Attention

The attention can be used to extract the salient features in the sub-sequences of long-time sequence and applied to calculate the weighted sumation for the vector expression of the hidden layer of the LSTM output.

Finally we end up with the following structuri - Attention-CNN-LSTM

<img src="./attention_cnn_lstm.png" alt="drawing" width="1000"/>

## 4. Training

The model is trained on data from an oilfield in souther China and includes the T1 and T2 wells. 

The metrics, that will determine, how good the model is will be RMSE, MAE and MAPE.

Those are the results the authors have provided us:

<img src="./results_comparison.png" alt="drawing" width="1000"/>

It seems like the proposed model is performing much better than all the other models on the T1 and T2 datasets.

## 5. Conclusion

Attention-CNN-LSTM is more suitable for predicting the time series data such as oil well production than the compared models.

The models seems to correctly extract high-dimensional features using the CNN and with attention and LSTM manages to avoid the gradient explosion and get the important features.

# Experimentation

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import LSTM, Conv1D, MaxPooling1D, Dense, Dropout, Attention, Input
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
import matplotlib.pyplot as plt

# Generate Nonlinear Time-Series Data
def generate_nonlinear_data(seq_length=6000):
    t = np.linspace(0, 100, seq_length)
    y = np.sin(t) + np.log(t+1) + np.random.normal(scale=0.2, size=seq_length)
    return t, y

# Prepare Data for Training
def create_dataset(data, look_back=10):
    X, y = [], []
    for i in range(len(data) - look_back):
        X.append(data[i:i+look_back])
        y.append(data[i+look_back])
    return np.array(X), np.array(y)

# Generate and preprocess data
t, y = generate_nonlinear_data()
scaler = MinMaxScaler()
y_scaled = scaler.fit_transform(y.reshape(-1, 1)).flatten()

look_back = 10
X, y = create_dataset(y_scaled, look_back)
X = X.reshape(X.shape[0], look_back, 1)  # Reshape for LSTM input

split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

# Define Models
def build_attention_lstm_model():
    inputs = Input(shape=(look_back, 1))
    lstm_out = LSTM(50, return_sequences=True)(inputs)
    attention_out = Attention()([lstm_out, lstm_out])
    lstm_out2 = LSTM(50)(attention_out)
    dense_out = Dense(25, activation='relu')(lstm_out2)
    outputs = Dense(1)(dense_out)
    model = Model(inputs, outputs)
    model.compile(loss='mse', optimizer=Adam(learning_rate=0.001))
    return model

def build_attention_cnn_lstm_model():
    inputs = Input(shape=(look_back, 1))
    cnn_out = Conv1D(filters=64, kernel_size=3, activation='relu')(inputs)
    cnn_out = MaxPooling1D(pool_size=2)(cnn_out)
    lstm_out = LSTM(50, return_sequences=True)(cnn_out)
    attention_out = Attention()([lstm_out, lstm_out])
    lstm_out2 = LSTM(50)(attention_out)
    dense_out = Dense(25, activation='relu')(lstm_out2)
    outputs = Dense(1)(dense_out)
    model = Model(inputs, outputs)
    model.compile(loss='mse', optimizer=Adam(learning_rate=0.001))
    return model

attention_lstm_model = build_attention_lstm_model()
history_attention_lstm = attention_lstm_model.fit(X_train, y_train, epochs=100, batch_size=16, validation_data=(X_test, y_test), verbose=1)
mse_attention_lstm = attention_lstm_model.evaluate(X_test, y_test)

attention_cnn_lstm_model = build_attention_cnn_lstm_model()
history_attention_cnn_lstm = attention_cnn_lstm_model.fit(X_train, y_train, epochs=100, batch_size=16, validation_data=(X_test, y_test), verbose=1)
mse_attention_cnn_lstm = attention_cnn_lstm_model.evaluate(X_test, y_test)

print(f"Attention-LSTM MSE: {mse_attention_lstm:.4f}")
print(f"Attention-CNN-LSTM MSE: {mse_attention_cnn_lstm:.4f}")


Epoch 1/100
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - loss: 0.0289 - val_loss: 0.0025
Epoch 2/100
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 0.0014 - val_loss: 0.0024
Epoch 3/100
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - loss: 0.0016 - val_loss: 0.0017
Epoch 4/100
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 0.0015 - val_loss: 0.0023
Epoch 5/100
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 0.0015 - val_loss: 0.0014
Epoch 6/100
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 0.0014 - val_loss: 0.0014
Epoch 7/100
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 0.0015 - val_loss: 0.0021
Epoch 8/100
[1m300/300[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - loss: 0.0016 - val_loss: 0.0013
Epoch 9/100
[1m300/300[0m [32

Well in our testing grounds, we can see, that the Attention-CNN-LSTM is perfomirng a bit better than Attention-LSTM.