wikipedia-dump

Here are 35 public repositories matching this topic...

macbre / faroese-corpus

Some Faroese language statistics taken from fo.wikipedia.org content dump

linguistics corpus-linguistics python3-script wikipedia-dump wikipedia-corpus linguistic-analysis faroe faroese faroese-language

Updated Dec 8, 2022
Python

priyendumori / Wiki-Search-Engine

Star

A complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.

search-engine indexing wikipedia-dump ranking-algorithm external-merge-sort tf-idf-score

Updated Sep 12, 2019
Python

afuschetto / wiki-extractor

Star

Command line tool to extract plain text from Wikipedia database dumps

wikipedia wikipedia-dump wikipedia-corpus

Updated Feb 25, 2021
Python

rajatyadav1994 / Wise--WikiPedia-Search-Engine

Star

A Search Engine built based on Wikipedia dump of 75GB. Involves creation of Index file and returns search results in real time

search-engine wikipedia-dump wikipedia-corpus infomation-retrieval

Updated Nov 2, 2019
Python

MatiasCarabella / formula1-wikipedia-data-retriever

Star

Generates a JSON file with F1 Driver stats from a given year based on its wikipedia page

python-script wikipedia-dump

Updated Mar 26, 2023
Python

foxsquad / wikidump

Star

wikipedia-dump

Updated Jun 20, 2019
Python

patniharshit / Wikipedia-Search-Engine

Star

python information-retrieval wikipedia-dump serach-engine

Updated Sep 24, 2017
Python

macbre / mediawiki-tags-cloud

Star

Generates tags cloud using MediaWiki XML content dump

wikipedia wikia tag-cloud wikipedia-dump fandom

Updated Dec 9, 2022
Python

ALSAREM / word2vec-model-generation

Star

Use the Word2Vec proposed by Google to train models (vectors) to be used in any word2vec application.

word2vec word2vec-model wikipedia-dump word2vec-algorithm

Updated Jan 15, 2018
Python

trungkak / wikiparser

Star

A simple SAX parser for large wikipedia dump files

python parser wikipedia-dump

Updated Apr 9, 2017
Python

harpreet1237 / wikipedia_dump_top_k_pages

Star

This project uses the concept of random walk on a network and using the Power law, we can lay down topmost visited pages in random walk over the network. Main motive of this project was to discover which pages have more chances of being visited at any point of time and has high traffic.

python3 xml-parser wikipedia-dump bz2 random-walk