Skip to content

An implementation of a methodology to cluster dynamic URLs using word embeddings.

License

Notifications You must be signed in to change notification settings

url-clusterer/white-paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

URL Clusterer - White Paper

Description

A prototype implementation of a methodology to cluster dynamic URLs of a website. There hereby 2 repositories in this organization for achieving this:

  • LinkGraphExtractor: Crawls a given website and stores its URLs on Neo4j.
  • URLClusterer: Clusters the URLs it takes as input by running an Apache Spark pipeline over them.

There is also a paper we had written for this study that is published on 2020 IEEE International Conference on Big Data's proceedings.

Credits

About

An implementation of a methodology to cluster dynamic URLs using word embeddings.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published