cDep is a tool for discovering configuration dependencies both within and across software components. cDep analyzes Java bytecode of the target software programs (supporting both Java and Scala code). It outputs specific types of dependencies and the corresponding interdependent configuration parameters.
cDep currently supports:
The repository contains all the artifacts (including all the code and datasets) of the paper
- Understanding and Discovering Software Configuration Dependencies in Cloud and Datacenter Systems
Qingrong Chen, Teng Wang, Owolabi Legunsen, Shanshan Li, and Tianyin Xu
In Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020), November 8-13, 2020 Sacramento, CA.
We prepared a Docker container image, with which you can directly interact with the pre-built cDep.
The cDep Docker image is hosted on Docker hub and it will be automatically downloaded when you run the following command.
To run the Docker image, there is one CLI option:
-a <arg>
: where<arg>
is a comma-separated list of elements inhdfs
,mapreduce
,yarn
,hadoop_common
,hadoop_tools
,hbase
,alluxio
,zookeeper
,spark
An example running command is as follows:
$ git clone https://github.com/xlab-uiuc/cdep-fse.git
$ cd cdep-fse
$ ./dockerrun.sh -a hdfs,mapreduce
The results will be stored at /tmp/output/cDep_result.csv
.
The analysis could take several tens of minutes (so be patient).
We provide the Dockerfile as well, with which you could build the docker image locally and run the program.
To build the docker image:
$ git clone https://github.com/xlab-uiuc/cdep-fse.git
$ cd cdep-fse
$ docker build -t cdep/cdep:1.0 .
Then the running command is same as above. An example running command is:
$ ./dockerrun.sh -a hdfs,mapreduce
We build cDep using Java(TM) SE Runtime Environment (build 12.0.2+10) and Apache Maven 3.6.1. We did not test on other Java versions.
First, clone the repository,
$ git clone https://github.com/xlab-uiuc/cdep-fse.git
$ cd cdep-fse
Second, build cDep (we use Maven as the build tool for cDep)
$ mvn compile
After compiling, cDep.class
should be generated at target/classes/cdep/cDep.class
.
Third, use the script run.sh
. One example running command is as follows:
$ ./run.sh -a hdfs,mapreduce
All the results in the paper, including both the study dataset and the cDep results can be reproduced.
The cDep results can be reproduced by running cDep and it could take several hours:
$ ./dockerrun.sh -a hdfs,mapreduce,yarn,hadoop_common,hadoop_tools,hbase,alluxio,zookeeper,spark
The cDep_result.csv
is in the format of:
["parameter A","parameter B","dependency type","java class","java method","jimple stmt"]
The output means parameter A
and parameter B
have a dependency type
. And that dependency relation is identified in the jimple stmt
of a certain java method
and java class
.
The following shows an example of a dependency cDep extracts from MapReduce:
(
'mapreduce.output.fileoutputformat.compress',
'mapreduce.output.fileoutputformat.compress.type',
'control dependency',
'org.apache.hadoop.mapred.MapFileOutputFormat',
'<org.apache.hadoop.mapred.MapFileOutputFormat:org.apache.hadoop.mapred.RecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.mapred.JobConf,java.lang.String,org.apache.hadoop.util.Progressable)>',
'if $z0 == 0 goto $r7 = new org.apache.hadoop.io.MapFile$Writer'
)
The two parameters, mapreduce.output.fileoutputformat.compress
and mapreduce.output.fileoutputformat.compress.type
, have a control dependency. And that relation is found from class org.apache.hadoop.mapred.MapFileOutputFormat
.
We also release all the dataset included in the paper under the dataset
directory.
It contains the following four files:
hadoop_intra.csv
: Intra-component dependencies in each individual component of the Hadoop-based stack;hadoop_inter.csv
: Inter-component dependencies across components of the Hadoop-based stack;openstack_intra.csv
: Intra-component dependencies in each individual component of OpenStack;openstack_inter.csv
: Inter-component dependencies across components of OpenStack;one_off_dep.csv
: One-off dependencies described in Section 4.3.
All the data sheets are in the format of CSV, with the first row describing the meaning of each column.
The data sheets provide detailed labels of the analysis results presented in our study.
The found dependency cases from cDep can be found at cDep_result
.
It contains the following two files:
intra.csv
: Intra-component dependencies in each individual component of the Hadoop-based stack;inter.csv
: Inter-component dependencies across components of the Hadoop-based stack;
All the data sheets are in the format of CSV, with the first row describing the meaning of each column.
The following graph shows the end-to-end workflow of cDep:
The source code of cdep
is placed under the src/main/java
directory.
It contains the following main modules:
configinterface
implements the configuration interface methods to read configuration values in different projects;dataflow
implements the inter-procedure and intra-procedure taint tracking;handlingdep
implements the methods to capture different types of configuration dependencies;utility
implements utility methods.
We show some configuration dependency cases (found by cDep) and explain why they are dependent on each other.
If the first parameter is true, the second parameter will work.
fs.client.resolve.topology.enabled
net.topology.node.switch.mapping.impl
Code snippets:
private void initTopologyResolution(Configuration config) {
topologyResolutionEnabled = config.getBoolean(
FS_CLIENT_TOPOLOGY_RESOLUTION_ENABLED,
FS_CLIENT_TOPOLOGY_RESOLUTION_ENABLED_DEFAULT);
if (!topologyResolutionEnabled) {
return;
}
DNSToSwitchMapping dnsToSwitchMapping = ReflectionUtils.newInstance(
config.getClass(
CommonConfigurationKeys.NET_TOPOLOGY_NODE_SWITCH_MAPPING_IMPL_KEY, ScriptBasedMapping.class, DNSToSwitchMapping.class), config);
}
If the first parameter is not null
, then the second parameter has to be kerberos
to enable authentication.
hbase.thrift.security.qop
hadoop.security.authentication
Code snippets:
if (qop != null) {
...
if (!securityEnabled) {
throw new IOException("Thrift server must run in secure mode to support authentication");
}
}
(qop
stores values of the first parameter, while securityEnabled
takes the value from the second parameter.)
The second parameter overwrites the first parameter.
hadoop.security.service.user.name.key
mapreduce.jobhistory.principal
Code snippets:
private Configuration addSecurityConfiguration(Configuration conf) {
...
conf.set(CommonConfigurationKeys.HADOOP_SECURITY_SERVICE_USER_NAME_KEY,
conf.get(JHAdminConfig.MR_HISTORY_PRINCIPAL, ""));
return conf;
}
If the value of the first parameter is not available, the second parameter will serve as its default value.
yarn.resourcemanager.fail-fast
yarn.fail-fast
Code snippets:
public static boolean shouldRMFailFast(Configuration conf) {
return conf.getBoolean(YarnConfiguration.RM_FAIL_FAST,
conf.getBoolean(YarnConfiguration.YARN_FAIL_FAST,
YarnConfiguration.DEFAULT_YARN_FAIL_FAST));
}
The first and second parameters work together to determine an IP address.
fs.ftp.host
fs.ftp.host.port
Code snippets:
private FTPClient connect() throws IOException {
FTPClient client = null;
Configuration conf = getConf();
String host = conf.get(FS_FTP_HOST);
int port = conf.getInt(FS_FTP_HOST_PORT, FTP.DEFAULT_PORT);
...
client.connect(host, port);
}