Apache Lucene open-source search software
-
Updated
Jul 19, 2024 - Java
Apache Lucene open-source search software
Apache Solr open-source search software
Anserini is a Lucene toolkit for reproducible information retrieval research
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Burp Extender plugin that generates a sitemap of a website using Wayback Machine
The Cognitive Foundry is an open-source Java library for building intelligent systems using machine learning
Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures
Flexible classic and NeurAl Retrieval Toolkit
Persian Analyzer For Elasticsearch.
Dice Solr Plugins from Simon Hughes Dice.com
source code accompanying "Deep Learning for Search" book
Hardened Fork of Ranklib learning to rank library
Various utilities regarding Levenshtein transducers. (Java)
Search Engine projects
A simple tutorial of Lucene for LIS 501 Introduction to Text Mining students at the University of Wisconsin-Madison (Fall 2021).
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.
Creating an Inverted Index of words occurring in a large set of documents extracted from web pages using Hadoop MapReduce and Google Dataproc
Multilingual automatic text summarizer using statistical approach and extraction
Add a description, image, and links to the information-retrieval topic page so that developers can more easily learn about it.
To associate your repository with the information-retrieval topic, visit your repo's landing page and select "manage topics."