Corpus creator for Chinese Wikipedia
-
Updated
Jun 30, 2021 - Python
Corpus creator for Chinese Wikipedia
Downloads and imports Wikipedia page histories to a git repository
Extracting useful metadata from Wikipedia dumps in any language.
Python package for working with MediaWiki XML content dumps
Collects a multimodal dataset of Wikipedia articles and their images
A Python toolkit to generate a tokenized dump of Wikipedia for NLP
A library that assists in traversing and downloading from Wikimedia Data Dumps and their mirrors.
Research for master degree, operation projizz-I/O
Convert WIKI dumped XML (Chinese) to human readable documents in markdown and txt.
Contains code to build a search engine by creating an index and perform search over Wikipedia data.
Chat with local Wikipedia embeddings 📚
Network Visualizer for the 'Geschichten aus der Geschichte' Podcast
Wikicompiler is a fully extensible python library that compile and evaluate text from Wikipedia dump. You can extract text, do text analysis or even evaluate the AST(Abstract Syntax Tree) yourself
WikiBank is a new partially annotated resource for multilingual frame-semantic parsing task.
A search system based on the Wikipedia dump dataset.
Framework for the extraction of features from Wikipedia XML dumps.
Identifies acronyms in a text file and disambiguates possible expansions
Add a description, image, and links to the wikipedia-dump topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-dump topic, visit your repo's landing page and select "manage topics."