WikiSearch

An Information Retrieval Project to extract the information from wiki dump and deploy search on it.

Query Example

This project generates a sorted indexer for the dump specified. It is optimized by compression techniques. Given a dump, it will create the inverted index file in Index/ folder, create a tree of indexers in Split/ folder for the inverted index and tree in Title/ for title-docID mappings file. Inverted index and title mapping file can be found in Index/ folder.
For a corpus of ~40GB , an index of ~14GB was created along with an mapping file of ~320MB.
The index file was then split into smaller chunks in a B-tree manner so as to retrieve only the essential block instead of the whole file.
Posting list contains the count of that term in title, body, infobox, references, external links and categories.
T - Title
B - body
I - Infobox
C - Categories
L - External Links
R - References

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Phase1		Phase1
Phase2		Phase2
README.md		README.md