Skip to content

tonellotto/HashToMin

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HashToMin

A Hadoop MapReduce implementation of the HashToMin algorithm for finding connected components in a graph, starting from an input file either specifying the edges of the graph or the adjacency lists for each node. Each line of the input file represents either an already formed cluster within the graph G or an edge of the graph. Vertex identifiers must be separated by a space or a tab. The output file will contain one connected component per line, with the first node representing its label, followed by a tab and all the cluster's nodes divided by spaces. Sample input files can be found in the folder inputfiles.

The usage is fairly simple and it is listed below. Instantiate the class

public ConnectedComponents (String input,
                            String output, 
                            int reduceTasksNumber,
                            boolean verifyResult,
                            boolean secondarySort) 

where:

  • input and output specify the input and output file paths,
  • reduceTasksNumber specifies the number of reducers available and to be exploited in all jobs but the Export procedure (that must output a single file),
  • verifyResult that is used to execute the CountNodes and the Verifier job if it is set to true,
  • secondarySort to decide which version of the algorithm to use, HashToMinSecondarySort runs when this attribute is true.

Then call the method run() over the new object.

Alternatively, the jar can be run on some input issuing the command

hadoop jar ./target/HashToMin-1.0.jar <input> <output> <numberOfReducers>

from the project folder.

About

A MapReduce implementation of HashToMin for finding Connected Components in a graph.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%