You will find a lot of comments explaining the reasoning behind the implementation.
With maven support, compile then run
mvn compile
mvn exec:java -Dexec.mainClass=searcher.Searcher -Dexec.args="filesystem"
mvn test -Dtest=searcher.TestSuiteWe open up a tree pointing to the specified directory / path.
.cli-searcher
+-- filesystem
| +-- a.txt
| +-- b.json
| +-- subdirectory
| +-- c.txt
Run the cli on /filesystem
java -jar Rank.jar Ranker /filesystemWe obtain a list of files:
files = [a.txt, b.json, c.txt]We then want to run a query with a text expression.
And find similiarty among all this files based on it.
Ranker> some text to find and compare among files
a.txt:100%
b.json:90%
c.txt:0%
Ranker> new text to find and compare among files
a.txt:80%
b.json:70%
c.txt:65%
Ranker> :quit- We read the filesystem and obtain all documents
- Read input from user, text_search to base similarity upon
- Vectorize, apply word embedding for each document
- Apply similarity algorithm