sitestat tool designed to catch statistics from various CMS sites. The underlying process follow these steps:
- Fetch all site names from SiteDB
- loop over specific time range, e.g. last 3m
- create dates for that range
- Use popularity API (DSStatInTImeWindow) to get summary statistics. The API returns various information about dataset usage on sites.
- Organize data in number of access bins
- For every bin collect dataset names
- Call DBS APIs to get dataset statistics via blocksummaries API.
- sum up info about file_size which will give total size used by specific site.
Here is example of sitestat tool usage
Usage of ./sitestat:
-bins string
Comma separated list of bin values, e.g. 0,1,2,3,4 for naccesses or 0,10,100 for tot cpu metrics
-blkinfo
Use block information for finding statistics, by default use dataset info
-breakdown string
Breakdown report into more details (tier, dataset)
-chunkSize int
chunkSize for processing URLs (default 100)
-dbsinfo
Use DBS to collect dataset information, default use PhEDEx
-format string
Output format type, txt or json (default "txt")
-metric string
Popularity DB metric (NACC, TOTCPU, NUSERS) (default "NACC")
-pbrdb string
Name of PBR db (see PhedexReplicaMonitoring project)
-phgroup string
Phedex group name (default "AnalysisOps")
-profile
profile code
-site string
CMS site name, use T1, T2, T3 to specify all Tier sites
-tier string
Look-up specific data-tier
-trange string
Specify time interval in YYYYMMDD format, e.g 20150101-20150201 or use short notations 1d, 1m, 1y for one day, month, year, respectively (default "1d")
-verbose int
Verbose level, support 0,1,2
In all examples below we use T2_XX_Abc as a site name.
# list site statistics for last month
sitestat -site T2_XX_Abc -trange 1m
# list site statistics for specific time range
sitestat -site T2_XX_Abc -trange 20150201-20150205
# list site statistics for last 3 months
sitestat -site T2_XX_Abc -trange 3m
# list site statistics for last month and only count AOD data-tier
sitestat -site T2_XX_Abc -trange 1m -tier AOD
# list site statistics for last month with breakdown for all data-tiers
sitestat -site T2_XX_Abc -trange 1m -breakdown tier
# list site statistics for last month with breakdown for all datasets
sitestat -site T2_XX_Abc -trange 1m -breakdown dataset
# list site statistics for last month with breakdown for all data-tiers and look for NUSERS metric
sitestat -site T2_XX_Abc -trange 1m -metric NUSERS -breakdown tier
# by default sitestat relies on PhEDEx data-service to collect
# dataset information on site, but we may use DBS instead
sitestat -site T2_XX_Abc -trange 1m -dbsinfo
# return information in json data format
sitestat -site T2_XX_Abc -trange 1m -format json
The tools directory contains useful scripts to use PhedexReplicaMonitoring which allows to obtained weighted datasets size on sites from PhEDEx DB by running pbr script from PhedexReplicaMonitoring repository.
- pbr_avg.sh script can be used to submit Spark job to calculate average size of datasets
- pbr_db.py script can be used to convert HDFS output from pbr_avg.sh and convert it into SQLiteDB. The later can be used by sitestat tool
- plot.R an R script to produce size vs bins (#accesses) plot.