Skip to content

Configuration dependency analysis for cloud software

Notifications You must be signed in to change notification settings


Repository files navigation

cDep: Configuration Dependency Analysis

cDep is a tool for discovering configuration dependencies both within and across software components. cDep analyzes Java bytecode of the target software programs (supporting both Java and Scala code). It outputs specific types of dependencies and the corresponding interdependent configuration parameters.

cDep currently supports:

The repository contains all the artifacts (including all the code and datasets) of the paper

1. Building and Running cDep

1.1 Docker Container Image

We prepared a Docker container image, with which you can directly interact with the pre-built cDep.

The cDep Docker image is hosted on Docker hub and it will be automatically downloaded when you run the following command.

To run the Docker image, there is one CLI option:

  • -a <arg>: where <arg> is a comma-separated list of elements in hdfs, mapreduce, yarn, hadoop_common, hadoop_tools, hbase, alluxio, zookeeper, spark

An example running command is as follows:

$ git clone
$ cd cdep-fse
$ ./ -a hdfs,mapreduce

The results will be stored at /tmp/output/cDep_result.csv.

The analysis could take several tens of minutes (so be patient).

1.2 Build Docker Image Locally

We provide the Dockerfile as well, with which you could build the docker image locally and run the program.

To build the docker image:

$ git clone
$ cd cdep-fse
$ docker build -t cdep/cdep:1.0 .

Then the running command is same as above. An example running command is:

$ ./ -a hdfs,mapreduce

1.3 Building cDep in Your Own Environment

We build cDep using Java(TM) SE Runtime Environment (build 12.0.2+10) and Apache Maven 3.6.1. We did not test on other Java versions.

First, clone the repository,

$ git clone
$ cd cdep-fse

Second, build cDep (we use Maven as the build tool for cDep)

$ mvn compile

After compiling, cDep.class should be generated at target/classes/cdep/cDep.class.

Third, use the script One example running command is as follows:

$ ./ -a hdfs,mapreduce

2. Reproducibility

All the results in the paper, including both the study dataset and the cDep results can be reproduced.

The cDep results can be reproduced by running cDep and it could take several hours:

$ ./  -a  hdfs,mapreduce,yarn,hadoop_common,hadoop_tools,hbase,alluxio,zookeeper,spark

The cDep_result.csv is in the format of: ["parameter A","parameter B","dependency type","java class","java method","jimple stmt"]

The output means parameter A and parameter B have a dependency type. And that dependency relation is identified in the jimple stmt of a certain java method and java class.

The following shows an example of a dependency cDep extracts from MapReduce:

  'control dependency',
  '<org.apache.hadoop.mapred.MapFileOutputFormat:org.apache.hadoop.mapred.RecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.mapred.JobConf,java.lang.String,org.apache.hadoop.util.Progressable)>', 
  'if $z0 == 0 goto $r7 = new$Writer'

The two parameters, mapreduce.output.fileoutputformat.compress and mapreduce.output.fileoutputformat.compress.type, have a control dependency. And that relation is found from class org.apache.hadoop.mapred.MapFileOutputFormat.

3. Datasets

We also release all the dataset included in the paper under the dataset directory.

3.1 Configuration Dependency Dataset

It contains the following four files:

  • hadoop_intra.csv : Intra-component dependencies in each individual component of the Hadoop-based stack;
  • hadoop_inter.csv : Inter-component dependencies across components of the Hadoop-based stack;
  • openstack_intra.csv : Intra-component dependencies in each individual component of OpenStack;
  • openstack_inter.csv : Inter-component dependencies across components of OpenStack;
  • one_off_dep.csv : One-off dependencies described in Section 4.3.

All the data sheets are in the format of CSV, with the first row describing the meaning of each column.

The data sheets provide detailed labels of the analysis results presented in our study.

3.2 cDep Findings

The found dependency cases from cDep can be found at cDep_result. It contains the following two files:

  • intra.csv : Intra-component dependencies in each individual component of the Hadoop-based stack;
  • inter.csv : Inter-component dependencies across components of the Hadoop-based stack;

All the data sheets are in the format of CSV, with the first row describing the meaning of each column.

4. Code Structure

The following graph shows the end-to-end workflow of cDep:

The source code of cdep is placed under the src/main/java directory.

It contains the following main modules:

  • configinterface implements the configuration interface methods to read configuration values in different projects;
  • dataflow implements the inter-procedure and intra-procedure taint tracking;
  • handlingdep implements the methods to capture different types of configuration dependencies;
  • utility implements utility methods.

5. Verification and Validation

We show some configuration dependency cases (found by cDep) and explain why they are dependent on each other.

5.1 Control Dependency

If the first parameter is true, the second parameter will work.

  1. fs.client.resolve.topology.enabled
  2. net.topology.node.switch.mapping.impl

Code snippets:

private void initTopologyResolution(Configuration config) {
 topologyResolutionEnabled = config.getBoolean(
  if (!topologyResolutionEnabled) {
 DNSToSwitchMapping dnsToSwitchMapping = ReflectionUtils.newInstance(
CommonConfigurationKeys.NET_TOPOLOGY_NODE_SWITCH_MAPPING_IMPL_KEY, ScriptBasedMapping.class, DNSToSwitchMapping.class), config);

5.2 Value Relationship Dependency

If the first parameter is not null, then the second parameter has to be kerberos to enable authentication.


Code snippets:

if (qop != null) {
    if (!securityEnabled) {
        throw new IOException("Thrift server must run in secure mode to support authentication");

(qop stores values of the first parameter, while securityEnabled takes the value from the second parameter.)

5.3 Overwrite Dependency

The second parameter overwrites the first parameter.

  2. mapreduce.jobhistory.principal

Code snippets:

private Configuration addSecurityConfiguration(Configuration conf) {
             conf.get(JHAdminConfig.MR_HISTORY_PRINCIPAL, ""));
    return conf;

5.4 Default Value Dependency

If the value of the first parameter is not available, the second parameter will serve as its default value.


Code snippets:

public static boolean shouldRMFailFast(Configuration conf) {
    return conf.getBoolean(YarnConfiguration.RM_FAIL_FAST,

5.5 Behavioral Dependency

The first and second parameters work together to determine an IP address.


Code snippets:

private FTPClient connect() throws IOException {
    FTPClient client = null;
    Configuration conf = getConf();
    String host = conf.get(FS_FTP_HOST);
    int port = conf.getInt(FS_FTP_HOST_PORT, FTP.DEFAULT_PORT);
    client.connect(host, port);


Configuration dependency analysis for cloud software






No releases published


No packages published