Corpus creator for Chinese Wikipedia
-
Updated
Jun 30, 2021 - Python
Corpus creator for Chinese Wikipedia
Downloads and imports Wikipedia page histories to a git repository
Extracting useful metadata from Wikipedia dumps in any language.
Python package for working with MediaWiki XML content dumps
Collects a multimodal dataset of Wikipedia articles and their images
A Python toolkit to generate a tokenized dump of Wikipedia for NLP
A library that assists in traversing and downloading from Wikimedia Data Dumps and their mirrors.
Research for master degree, operation projizz-I/O
Contains code to build a search engine by creating an index and perform search over Wikipedia data.
Wikicompiler is a fully extensible python library that compile and evaluate text from Wikipedia dump. You can extract text, do text analysis or even evaluate the AST(Abstract Syntax Tree) yourself
WikiBank is a new partially annotated resource for multilingual frame-semantic parsing task.
A search system based on the Wikipedia dump dataset.
Convert WIKI dumped XML (Chinese) to human readable documents in markdown and txt.
Framework for the extraction of features from Wikipedia XML dumps.
Chat with local Wikipedia embeddings 📚
Identifies acronyms in a text file and disambiguates possible expansions
Framework for the extraction of features from Wikipedia XML dumps.
Add a description, image, and links to the wikipedia-dump topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-dump topic, visit your repo's landing page and select "manage topics."