#About

This notebook shows how to use datapop library in a simple way.

In [1]:
%matplotlib inline
import pandas
import numpy
import matplotlib.pyplot as plt

#Data

In [2]:
import pandas
data = pandas.read_csv('data/inputs/popularity-910days.csv')

#ReplicationPlacementStrategy

class **ReplicationPlacementStrategy** generates data replication recommendation.

pandas.DataFrame **data**: data for the analysis.
    
int **min_replicas**: minimum number of datasets replicas. Default: 1
        
int **max_replicas**: maximum number of datasets replicas. Default: 7

In [5]:
from datapop import ReplicationPlacementStrategy
rps = ReplicationPlacementStrategy(data=data, min_replicas=1, max_replicas=7)

Recommendations for which datasets decrease number of replicas and in wich order to **save N Tb** disk space. Reduce just one replica in one step. Number of replicas for the datasets with the lowest metric value are decreased first.

int **n_tb**: number of Tb wanted to save. If **None** - reduce number of replicas for all datasets. Default: None

In [8]:
report = rps.save_n_tb(n_tb=10)
report.head()

Unnamed: 0,Name,Probability,roc_auc,precision0,Prediction,rmse,Nb_Replicas,LFNSize,Metric,DecreaseReplicas
0,/MC/2011/Beam3500GeV-2011-MagDown-Nu2-Pythia8/...,0.000395,0.868972,0.877842,0,0,2.961538,0.066009,0,1
0,/MC/2011/Beam3500GeV-2011-MagDown-Nu2-Pythia8/...,0.000395,0.868972,0.877842,0,0,3.0,0.070195,0,1
0,/MC/2011/Beam3500GeV-2011-MagDown-Nu2-Pythia8/...,0.000395,0.868972,0.877842,0,0,2.0,0.070195,0,1
0,/MC/2011/Beam3500GeV-2011-MagDown-Nu2-Pythia8/...,0.000395,0.868972,0.877842,0,0,2.962963,0.084499,0,1
0,/MC/2011/Beam3500GeV-2011-MagDown-Nu2-Pythia8/...,0.000395,0.868972,0.877842,0,0,3.0,0.083705,0,1


Recommendations for which datasets increase number of replicas and in wich order to **fill N Tb** of disk space. Add just one replica in one step. Number of replicas for the datasets with the highest metric value are increased first.

int **n_tb**: number of Tb wanted to fill. If **None** - increase number of replicas for all datasets. Default: None

In [9]:
report = rps.fill_n_tb(n_tb=100)
report.head()

Unnamed: 0,Name,Probability,roc_auc,precision0,Prediction,rmse,Nb_Replicas,LFNSize,Metric,IncreaseReplicas
0,/LHCb/Collision12/Beam4000GeV-VeloClosed-MagDo...,0.875755,0.872937,0.881308,90368.930288,38158.044273,3.985338,29.072362,22675.349064,1
0,/LHCb/Collision12/Beam4000GeV-VeloClosed-MagUp...,0.885385,0.872937,0.881308,86111.981671,31571.521761,4.008043,28.696464,21484.794866,1
0,/LHCb/Collision12/Beam4000GeV-VeloClosed-MagDo...,0.875755,0.872937,0.881308,90368.930288,38158.044273,4.985338,29.072362,18126.941501,1


Recommendations for which datasets can be remove from disks and in wich order to **clean N Tb** of disk space. The datasets with the lowest probability to be accessed are removed first.

int **n_tb**: number of Tb wanted to clean. If **None** - remove all datasets. Default: None

In [11]:
report = rps.clean_n_tb(n_tb=10)
report.head()

Unnamed: 0,Name,Probability,roc_auc,precision0,Prediction,rmse,Nb_Replicas,LFNSize
5,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.019623,0.870184,0.880889,0.0,0.0,3.993421,0.066366
6,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.019887,0.870184,0.880889,0.0,0.0,3.994186,0.295318
1,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.054753,0.870184,0.880889,0.0,0.25641,0.004592,2.402856
7,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.068039,0.870184,0.880889,0.0,0.0,0.997183,1.370105
2,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.077262,0.870184,0.880889,5.4e-05,18.919273,0.001443,0.085333


Combination of the long-term prediction and the short-term forecast reports.

pandas.DataFrame **data**: data for the analysis.

In [12]:
report = rps.get_combine_report(data)
report.head()

Unnamed: 0,Name,Probability,roc_auc,precision0,Prediction,rmse,Nb_Replicas,LFNSize
0,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.587157,0.870572,0.879532,6.660679,153.429956,2.0,0.3179
1,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.03941,0.870572,0.879532,0.0,0.25641,0.004592,2.402856
2,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.084264,0.870572,0.879532,5.4e-05,18.919273,0.001443,0.085333
3,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.317376,0.870572,0.879532,10.6752,22.378141,3.973568,0.649204
4,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.338774,0.870572,0.879532,0.0,0.0,3.984375,0.803981
