Wiktionary dump file parser and multilingual data extractor
-
Updated
Jun 21, 2024 - Python
Wiktionary dump file parser and multilingual data extractor
This repository contains a python script for parsing an xml dump of the Italian Wiktionary (Wikizionario); it also contains the parsed dictionary in a JSON file and a ONLI (italian database of neologisms) scraper with the scraped data in a CSV file
Selected data processing scripts including language agnostic multilingual wiktionary parser
Simple and memory-efficient word extractor for Wiktionary
Prototype of an interface to use Wiktionary translations
Parses the Russian Wiktionary HTML dumps into JSON and generates ereader dictionaries
A scraper which extracts data from the German Wiktionary HTML dump.
Extract hyphenation from Italian Wiktionary
A library for parsing the french wiktionary
🇫🇷 Source code for frenchhomophones website. [inactive]
Extraction of the Russian word forms and their segmentation from the Russian Wiktionary
Code for the paper: Wikinflection: Massive semi-supervised generation of multilingual inflectional corpus from Wiktionary (Metheniti and Neumann, 2018)
Add a description, image, and links to the wiktionary-parser topic page so that developers can more easily learn about it.
To associate your repository with the wiktionary-parser topic, visit your repo's landing page and select "manage topics."