Py-tracy evaluates the inter-arrivals times between logs applying different filters based on the user input.
The logs comes from the Google's traces clusterdata-2011-2 (available on Google Storage) for job_events
and task_events
.
Py-tracy was created furing the course of Computer Systems Performance Evaluation @ Politecnico of Milan.
This repository contains the code used to analyse the data provided by Google.
For this project we used
IPython
Spark
Bokeh
I wrote a tutorial on how to link IPython with Spark on OS X and Linux.
Up to now the code is available only as ipynb
, so you need to open the source file with IPython.
The traces contain data starting from the 1st of May 2011 19 EDT until the 30th of May 2011.
Important: many parameters may be set to filter the traces, in any case (except time parameters) you can type None
and no filter will be applied.
j_eval_day(None, 2, 3600)
will evaluate traces with any event_type
.
Open the inter_arrival_jobs
file and run all the cells till the Usage section.
After that you can call the function to start the computation.
# j_eval_day(event_type, day, granularity)
j_eval_day(0, 2, 3600)
event_type
: the type of event, e.g. 0 correspond to SUBMITday
: day you wish to evaluate, e.g. 2 correspond to the 2nd of May from 0:00 until 23:59granularity
: interval at which you want to perfom the evaluation (expressed in seconds!), e.g. 3600 will evaluate the model at intervals of 1 hour
# j_eval_days(event_type, init_day, finish_day, granularity)
j_eval_days(0, 2, 5, 3600)
event_type
: the type of event, e.g. 0 correspond to SUBMITinit_day
: day you wish to start the evaluation, e.g. 2 correspond to the 2nd of May 0:00finish_day
: day you wish to end the evaluation, e.g. 5 correspond to the 5th of May 23:59granularity
: interval at which you want to perfom the evaluation (expressed in seconds!), e.g. 3600 will evaluate the model at intervals of 1 hour
If you wish you could also select periods of time which are not day or days
# j_eval_time_window(event_type, init_time, finish_time, granularity)
j_eval_time_window(0, 600, 86400, 3600)
event_type
: the type of event, e.g. 0 correspond to SUBMITinit_time
: time you wish to start the evaluation (in seconds!), e.g. 600 correspond to the 600 seconds after the start of the collection of the traces (May 2nd 19:00 ETD)finish_time
: time you wish to end the evaluation (in seconds!)granularity
: interval at which you want to perfom the evaluation (expressed in seconds!), e.g. 3600 will evaluate the model at intervals of 1 hour
Open the inter_arrival_tasks
file and run all the cells till the Usage section.
After that you can call the function to start the computation.
# t_eval_day(event_type, machine_ID, job_ID, day, granularity)
t_eval_day(0, 2, 3600)
event_type
: the type of event, e.g. 0 correspond to SUBMITmachine_ID
: ID of the machine running the taskjob_ID
: ID of the jobs the task belongs today
: day you wish to evaluate, e.g. 2 correspond to the 2nd of May from 0:00 until 23:59granularity
: interval at which you want to perfom the evaluation (expressed in seconds!), e.g. 3600 will evaluate the model at intervals of 1 hour
# t_eval_days(event_type, machine_ID, job_ID, init_day, finish_day, granularity)
t_eval_days(0, 1, 1, 2, 5, 3600)
event_type
: the type of event, e.g. 0 correspond to SUBMITmachine_ID
: ID of the machine running the taskjob_ID
: ID of the jobs the task belongs toinit_day
: day you wish to start the evaluation, e.g. 2 correspond to the 2nd of May 0:00finish_day
: day you wish to end the evaluation, e.g. 5 correspond to the 5th of May 23:59granularity
: interval at which you want to perfom the evaluation (expressed in seconds!), e.g. 3600 will evaluate the model at intervals of 1 hour
If you wish you could also select periods of time which are not day or days
# t_eval_time_window(event_type, machine_ID, job_ID, init_time, finish_time, granularity)
t_eval_time_window(0, 1, 1, 600, 86400, 3600)
event_type
: the type of event, e.g. 0 correspond to SUBMITmachine_ID
: ID of the machine running the taskjob_ID
: ID of the jobs the task belongs toinit_time
: time you wish to start the evaluation (in seconds!), e.g. 600 correspond to the 600 seconds after the start of the collection of the traces (May 2nd 19:00 ETD)finish_time
: time you wish to end the evaluation (in seconds!)granularity
: interval at which you want to perfom the evaluation (expressed in seconds!), e.g. 3600 will evaluate the model at intervals of 1 hour
Py-tracy outputs a csv file containing the processed traces in the format described below, and it also outputs some graphs showing the computed inter-arrival values.
Those graphs represents:
- Mean
- Variance
- Median
- Standart Variation
init_time,finish_time,timestamp_1,timestamp_2,inter_time
init_time
andfinish_time
: those are time values considered with respect to the granularity, for example if we consider a granularity of 1 hour (3600 seconds) over the 2nd of May, the values will be0:00 am - 1:00 am
,1:00 am - 2:00 am
and so on (in seconds).timestamp_1
andtimestamp_2
: those represent 2 subsequent logs in the traces, used to evaluate the inter-arrival rate.inter time
: the computed inter-arrival time
The name of the csv in written in this format
jobs_intervals_event-type_init-time_finish-time_@granularity.csv
init_time,finish_time,timestamp_1,timestamp_2,inter_time
init_time
andfinish_time
: those are time values considered with respect to the granularity, for example if we consider a granularity of 1 hour (3600 seconds) over the 2nd of May, the values will be0:00 am - 1:00 am
,1:00 am - 2:00 am
and so on (in seconds).timestamp_1
andtimestamp_2
: those represent 2 subsequent logs in the traces, used to evaluate the inter-arrival rate.inter time
: the computed inter-arrival time
The name of the csv in written in this format
tasks_intervals_event-type_machine-ID_job-ID_init-time_finish-time_@granularity.csv