Multiprocessing crawlers were built to crawl data from different forums with polite frequencies. Each crawler consists of classes:
- Crawler class: stores the browser headers, interacts with our database server, sends & receives data from the corresponding forum website, and handles failed requests.
- Refiner class: called by the Crawler class to extract useful information from raw HTML contents sent by forum websites.
- Parser class: called by the Crawler class to lemmatize posts and replies to generate tokens.
TF-IDF was generated for each dataset and then classified by using DBSCAN and K-means.