# Prediction on Production of Oil Well with AttentionCNN-LSTM

Authors: S Pan, J Wang, W Zhou

Published in: Journal of Physics: Conference Series (Volume 2030, Paper 012038), 2021. Presented at ICEECT 2021 conference

The paper investigates whether Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks truly have long memory—the ability to retain information for a long time. Even though LSTMs were designed to overcome the short memory issue in RNNs, the authors show that both RNNs and LSTMs do not have long memory from a statistical perspective.

## 1. The Problem


Oil well production prediction is crucial for efficient resource management in the petroleum industry. Traditional methods like curve analysis and mathematical modeling are limited in accuracy due to the complexity of external factors affecting production. Machine learning techniques, such as ARIMA, BP neural networks, and SVR, have been used but suffer from limitations like data stability requirements, poor scalability, and susceptibility to local minima. Deep learning approaches, including CNNs and LSTMs, offer better predictive power, but individual models struggle with stability in long-term sequence forecasting. The Attention-CNN-LSTM model is proposed to address these challenges.

## 2. Related work

In the early stage of oilfield development, the curve analysis method and mathematical modeling methods are widely used. 

The authors mention, that "The traditional machine learning methods generally require, that all data should be put into the memory during training". I disagree with them on this topic. In some models - yes we require a lot of the data initially in memory to have correct weights, but still we can expertiment and do fine with partial fitting the data. 

Currently LSTM's are used in production predictions of an oil well and have achieved good results. However, due to the harsh udnerground production envrionment, the oil production data usually contains multiple noise components, which are non linear and non stationary time series. That is the reason, why the paper combines CNN, LSTM and Attention mechanism to construct a production prediction model. I also disagree partially with that, since LSTM alone is enough to handle nonlinear data, due to the gated mechanism, that allow it to capture complex dependendencies.

## 3. Methodology

$$\{\hat{y}_t\}_{t=T+1}^{T+\Delta} = F\left(\{x_t\}_{t=1}^{T}, \{y_t\}_{t=1}^{T} \right)$$

The production prediction of an oil well uses the timeseries of X and the actual oil well production y as inputs to construct a model to predict y in the future.

The model, that will be constructed is constisting of:

- CNN

The input data will be passed to the CNN layer. It can babstract and express the original oil production data at a higher level. The features of the original oil production data are processed by CNN, the correlation between the multi-dimensional data is mined and noises are removed.

- LSTM

The data is passed on to LSTM layers.

- Attention

The attention can be used to extract the salient features in the sub-sequences of long-time sequence and applied to calculate the weighted sumation for the vector expression of the hidden layer of the LSTM output.

Finally we end up with the following structuri - Attention-CNN-LSTM

<img src="./attention_cnn_lstm.png" alt="drawing" width="1000"/>

## 4. Training

The model is trained on data from an oilfield in souther China and includes the T1 and T2 wells. 

The metrics, that will determine, how good the model is will be RMSE, MAE and MAPE.

Those are the results the authors have provided us:

<img src="./results_comparison.png" alt="drawing" width="1000"/>

It seems like the proposed model is performing much better than all the other models on the T1 and T2 datasets.

## 5. Conclusion

Attention-CNN-LSTM is more suitable for predicting the time series data such as oil well production than the compared models.

The models seems to correctly extract high-dimensional features using the CNN and with attention and LSTM manages to avoid the gradiend explosion and get the important features.

# Experimentation

The authors have provided the actual code for free from their github page. I have put it in the folder mrnn_mlstm_experiment. To run it we just follow the instructions in the readme file. Basically we run the following command:

python .\train.py --dataset '{dataset}' --algorithm '{the algorithm}' --epochs 50

I have chosen to run it on 'tree7' as dataset and on DJI (Dow Jones Index) and those are the results for MLSTM:

tree7:
RMSE:[0.2990356773379933]
MAE:[0.23424398632834936]

DJI:



LSTM results for comparison:
tree7


DJI:



TODO show some chart and validate with the provided stuff in the article
