Skip to content

uci-dsp-lab/dns_forum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About The Project

Multiprocessing crawlers were built to crawl data from different forums with polite frequencies. Each crawler consists of classes:

  1. Crawler class: stores the browser headers, interacts with our database server, sends & receives data from the corresponding forum website, and handles failed requests.
  2. Refiner class: called by the Crawler class to extract useful information from raw HTML contents sent by forum websites.
  3. Parser class: called by the Crawler class to lemmatize posts and replies to generate tokens.

TF-IDF was generated for each dataset and then classified by using DBSCAN and K-means.

Built With

Data Storage Example

MongoDB

Classification Example

Classi

Link to Our Dataset

(Back to Top)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published