@wanchen from Sep. 2018 to Jan. 2019
work with ...
aims:
- forecast system health score in multi steps
- predict main factors over forecasting horizon
Details:
(NOTE: given a framework of our model)
Firstly, we show the target time series as following.
0.1.1 score (2018-09-29 to 2018-11-24) AlignBuild
and ShowData
in BuildData.py
.
0.1.3 58 KPIs ShowFigs
in BuildData.py
.
Here, we try to find meanings behind the 58 KPIs.
['2180200', '2180501', '2180507', '2180508', '2180509', '2180513',
'2180514', '2184301', '2184302', '2184303', '2184305', '2184306',
'2184312', '2184313', '2184314', '2184315', '2184316', '2184317',
'2184318', '2184319', '2184322', '2184323', '2189000', '2189001',
'2189002', '2189003', '2189004', '2189006', '2189008', '2189010',
'2189016', '2189018', '2189026', '2189030', '2189044', '2189046',
'2189055', '2189058', '2189092', '2189100', '2189106', '2189108',
'2189112', '2189118', '2189119', '2189121', '2189123', '2189144',
'2189147', '2189159', '2189999', '2190054', '3000003', '3000005',
'3000006', '3000007', '3000008', '3000200']
The whole information of KPIs meanings can be found in data/data-1544602176927.csv
. Actually, there are totally 276 different kinds of KPIs.
...
3000003
,CPU used
,CPU使用率
3000004
,Mem used
,内存使用率
3000005
,root fss usage
,根目录文件系统使用率
3000006
,IO Latency
,Io读写延时 RXawaite
3000007
,Mem Free (kb)
,物理内存剩余量(kb)
3000008
,Ioawait
,WIO占CPU百分比
...
We call KPIs as KPIs-1
, KPIs-2
, ... , KPIs-276
while a data set of one certain DBID doesn't have all 276 KPIs. fetchKPIs276
in BuildData.py
As expected, scores
and KPIs
are sampled every 3 minutes while some shifts
and missing values
occur sometime. We call it interpolation & alignment task
, which deal with the discontinuity
.
We adopt interp1d
in scipy.
from scipy.interpolate import interp1d
'''
interp1d(x, y, kind)
'''
t_raw = [1, 4, 8, 12]
t_new = np.linspace(1, 12, 3)
func = interp1d(t_raw, y_raw, kind)
y_new = func(t_new)
NOTES:
- the format of one iterm is
timestamp + score + KPIs
, so we can check the discontinuity of timestamp. - Firstly, we delete the item whose score is 0.0 representing that the system cannot be reached and the KPIs are NULL at the same time.
- Secondly, (move the timestamp like
2017-11-27T12:15:06
to2017-11-27T12:15:00
) Acutually, we can ignore that. - Thirdly, if the time-nearby pairwise
t2-t1>(3+1)minutes
, we choose to set a seperator here which means they are seperated into different subsequences.
0.2.1 score AlignBuild_v2
and ShowData_v2
in BuildData.py
.
0.2.2 58 KPIs
The processing details are given as following.
- fetch raw data from
data/210100063/all.csv
interpolation & alignment
onscore
and58 KPIs
- results are saved as
data/210100063/all_interp.csv
(we save data asnp.float32
and it makes file bigger)
this fold is mainly for Multi-Step Forecasting
.
x-axis: forecasting horizon, 10 steps
y-axis: corresponding RMSE
curves: different methods, baseline includes AR
, ARIMA
, ... , DeepSeqMO
, our method
our method
->given_data_1128/210100063/score_forecast.npz
(data from genCorData.pickle, the following method will train on SimpleScoreData.npz)AR
ARIMA
DeepMO
...
x-axis:
y-axis:
this fold is mainly for Pattern & KPIs Analysis
. Actually, the Pattern in segmentation correspond with the System Condition.
(NOTES: Segmentation & Pattern
are implemented after interpolation & alignment
)
Attension: For DBID=210100063
, in the begining, we choose the data between 2018-11-12T15:51:00
and 2018-11-27T09:21:00
(before it there is along time without connection). The most important thing we must pay attention to is that our interpolation & alignment
method adds one kind of pattern to the raw data (according to the figure 2018-11-26T17:33:02
).
score: The Analysing details are shown in analyst/readme.md
.
58 KPIs: The Analysing details are shown in analyst/readme.md
.
To find how various KPIs affect the System Condition, we define the problem as Rule Learning
which can mine the rules as following.
n-th KPI's i-th Pattern and ... and m-th KPI's j-th Pattern might cause the System's k-th Pattern.