Skip to content
Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"
Branch: master
Clone or download
taki0112 Merge pull request #2 from Maki94/master
radian-degree mismatch fixed
Latest commit 5aa6712 Feb 19, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
image image Apr 13, 2017
java/TS_SS radians-degree mismatch fixed May 20, 2018
python/TS_SS radians-degree mismatch fixed May 20, 2018
LICENSE Create LICENSE Apr 13, 2017
README.md Update README.md May 16, 2017
TS-SS_paper.pdf add TS-SS paper May 16, 2017

README.md

Vector_Similarity

  • Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"
  • Also, I have summarized "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"
  • I recommend TS-SS instead of Cosine distance or Euclidean distance.

The reasons are...

Cosine drawbacks

coise_drawback

Euclidean drawbacks

euclidean drawback

Triangle's Area Similarity (TS)

TS

Sector's Area Similarity (SS)

SS

TS-SS

TS_SS

Results

results

Conclusion

  • In biggest dataset, TS-SS outperforms Cosine with a significant difference, while in other datasets TS-SS outperforms Cosine slightly

  • Therefore, the significant better result of TS-SS in biggest dataset justifies the robustness and reliability of the model for big data and real world data where the variety of documents/texts are high

Reference

[1] A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering [link1] [link2] [View Article]

You can’t perform that action at this time.