Browse files

-add: version 1.0 of the README

  • Loading branch information...
1 parent e47c623 commit 54688f33330b87e8bbad64c9d865e3b26b8580bc @yas4891 committed Oct 14, 2012
Showing with 47 additions and 1 deletion.
  1. +47 −1
@@ -1,4 +1,50 @@
-MUTEX helps you prevent software plagiarism among students
+MUTEX helps you prevent software plagiarism among students.
+It was designed for real-time comparison of source codes handed in by students at university.
+It is used at the University of the German Armed Forces (
+How it works
+Given a new piece of source code (named 'NEW_SRC'), MUTEX will load the set of previously handed in source codes ('OLD_SET').
+It will then parse NEW_SOURCE and each element of OLD_SET into separate token sequences.
+Comparing NEW_SRC against each element in OLD_SET, it will calculate the similarity between the two token sequences. After that it will determine the
+maximum similarity between NEW_SRC and *any* element in OLD_SET. If the similarity is above a pre-defined threshold (default: 50 percent), NEW_SRC will
+be considered a rip-off. MUTEX will store NEW_SRC in the database for further use and return the maximum similarity - along with an identifier for the corresponding element of OLD_SET - on stdout.
+1. Load solution
+2. Build solution as "Release"
+3. run "GSTConsole/bin/Release/mutex.exe"
+Greedy-String-Tiling algorithm
+To determine the similarity between two sequences A and B MUTEX uses an algorithm known as Greedy String Tiling.
+You can read more about that at
+MUTEX implements two version of the GST algorithm:
+1. The original algorithm with average complexity of O(n^2) and worst-case complexity O(n ^ 3) is implemented in GSTLibrary/tile/GSTAlgorithm.cs
+2. an optimized version with average complexity of O(n) is implemented in GSTLibrary/tile/HashingGSTAlgorithm.cs
+It is recommended to use the optimized version HashingGSTAlgorithm
+What are all those folders for?
+If time permits, I will try to clean up this mess.
+Until then, here is a list of the different projects:
+- GreedyStringTiling (top-level directory): a small graphical demo application used during presentations; nothing useful in here
+- ThirdPartyLibs: a collection of the libraries needed to compile/run MUTEX
+- DataRepository: handles storing/retrieving values from the database
+- Tokenizer: contains the language-agnostic elements needed for parsing source code
+- CTokenizer: implements a specialised grammar for the C programming language with the aim to better detect plagiarism
+- GSTLibrary: the "good" stuff - most everything that relates to the GST algorithm can be found in this project
+- GSTAppLogic: ties all the ends (data storage, tokenizer, GST algorithm) together into a functional piece of software
+- GSTConsole: a basic command line UI for MUTEX

0 comments on commit 54688f3

Please sign in to comment.