#About
This notebook shows how to use data popularity api service wrapper. The service runs in docker container.

For further understanding of this service look **howto_01_datapop.ipynb** and **howto_02_datapopserv.ipynb** first.

#Docker pull & run
To start your work you should run docker container, which provides the data popularity api service

In terminal type:
1. ####sudo docker pull hushchynmikhail/dp_api
2. ####sudo docker run -d -p 5000:5000 hushchynmikhail/dp_api python DataPopularity/data_popularity_api/dp_api.py

#Init DataPopularityApiWrapper

In [2]:
from datapopclient import DataPopularityClient

dpaw = DataPopularityClient(service_url='http://localhost:5000')

#Upload data

In [3]:
data_path = 'Data/popularity-728days.csv'
dpaw.upload(data_path=data_path)

1

#Run algorithm
The following method runs DataPopularityEstimator and DataIntensityPredictor

In [4]:
%%time
dpaw.run_algorithm(nb_of_weeks=104)

CPU times: user 8.08 ms, sys: 13.3 ms, total: 21.4 ms
Wall time: 7min 31s


#Get data popularity

In [5]:
%%time
popularity = dpaw.get_data_popularity()

CPU times: user 21.7 ms, sys: 11.2 ms, total: 32.9 ms
Wall time: 41.6 ms


In [6]:
popularity.irow(range(0,5))

Unnamed: 0.1,Unnamed: 0,Name,Popularity,Label
0,0,/MC/2011/Beam3500GeV-2011-MagDown-Nu2-EmNoCuts...,0.041713,0
1,1,/MC/2011/Beam3500GeV-2011-MagDown-Nu2-EmNoCuts...,0.351177,1
2,2,/MC/Upgrade/Beam7000GeV-Upgrade-MagUp-Nu3.8-25...,0.819466,1
3,3,/MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia...,0.000157,0
4,4,/MC/Dev/Beam4000GeV-2012-MagUp-Fix1-UniformHea...,0.015228,1


#Get predicted data intensity

In [7]:
%%time
prediction = dpaw.get_data_intensity_prediction()

CPU times: user 32.8 ms, sys: 4.2 ms, total: 37 ms
Wall time: 40.8 ms


In [8]:
prediction.irow(range(0,5))

Unnamed: 0.1,Unnamed: 0,Name,Intensity,Std_error
0,0,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,6.24164,29.241197
1,1,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,32.69593,105.950344
2,2,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.0,2e-06
3,3,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.180068,0.0
4,4,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,2.080052e-15,0.0


#Get optimization report
This method returns report after loss function optimization. The method runs DataPlacementOptimizer

In [9]:
%%time
opti_report = dpaw.get_opti_report(q=None, set_replicas='auto', c_disk=100, c_tape=1, c_miss=2000,\
                alpha=1, max_replicas=4)

CPU times: user 34.7 ms, sys: 4 ms, total: 38.7 ms
Wall time: 6.02 s


In [10]:
opti_report.irow(range(0,5))

Unnamed: 0.1,Unnamed: 0,Name,OnDisk,NbReplicas
0,4543,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,1,3
1,290,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,1,4
2,186,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0,1
3,2325,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0,1
4,8202,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,1,1


#Get report
This method returns report for the data popularity value.

In [11]:
%%time
report = dpaw.get_report(q=None, set_replicas='auto', c_disk=100, c_tape=1, c_miss=2000,\
                alpha=1, max_replicas=4, pop_cut=0.5)

CPU times: user 35.1 ms, sys: 4.9 ms, total: 40 ms
Wall time: 663 ms


In [12]:
report.irow(range(0,5))

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Name,Popularity,Label,Intensity,LFNSize,OnDisk,NbReplicas,Missing
0,4543,4543,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.00661,0,6.24164,0.3179,1,3,0
1,290,290,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.109195,0,32.695929,0.649204,1,4,0
2,186,186,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.394192,1,0.0,1.370105,1,1,0
3,2325,2325,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.728414,1,0.0,0.09529,0,1,0
4,8202,3018,/LHCb/Collision10/Beam3500GeV-VeloClosed-MagDo...,0.000157,0,0.180068,0.803981,1,1,0
