An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
-
Updated
Apr 21, 2021 - Go
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
goLang crawler restricted to only topic relevant/curated URLs. Includes token frequency analysis and NLP nGram detection
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
Add a description, image, and links to the corpus-linguistics topic page so that developers can more easily learn about it.
To associate your repository with the corpus-linguistics topic, visit your repo's landing page and select "manage topics."