A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
-
Updated
Apr 23, 2024 - HTML
A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
PySpark phonetic and string matching algorithms
Implements Rocchio Query Expansion - similar to "related searches:" found at popular search engines but based on relevant documents selected by the end-user
A Natural Language Processing with SMS Data to predict whether the SMS is Spam/Ham with various ML Algorithms like multinomial-naive-bayes,logistic regression,svm,decision trees to compare accuracy and using various data cleaning and processing techniques like PorterStemmer,CountVectorizer,TFIDF Vetorizer,WordnetLemmatizer. It is implemented usi…
Advanced model of Bm25 is Bm25+, which was implemented and compared with the baseline model (bm25)
Implementation of a Vector Space Retrieval Model using TF-IDF and cosine similarity on the Cranfield document corpus
Performs tokenization, stemming, lemmatization, index creation, index compression and ranked retrieval of Cranfield documents
Snowball version of the Porter stemmer for the Lithuanian language.
Created Hate speech detection model using Count Vectorizer & XGBoost Classifier with an Accuracy upto 0.9471, which can be used to predict tweets which are hate or non-hate.
An efficient implementation of the German porter-stemming algorithm in Golang.
Small code snippets written in Python covering fundamental concepts in NLP used in all major NLP projects.
The MOOC Recommender System utilizes NLP techniques for course recommendations in Massive Open Online Courses (MOOCs). It processes raw data, leveraging Tokenization, Porter Stemming, Cosine Similarity, etc., to extract tags from course descriptions, summaries, syllabuses, instructors, and subjects.
Collection of stemming algorithms in Rust
A Search Engine based on the principle of TF-IDF and comparing documents in a vector space using Cosine Similarity
MacOS desktop application for processing Google Takeout export files
🔝 HW1 of Intelligent Information Retrieval MSc Course ECE@UT
Classification of tweets into positive and negative using classifiers like SVM, Logistic Regression, Naive bayes. Implementation of porter stemmer algorithm.
Crawling news and information website and anticipating the likelihood of its virality.
Add a description, image, and links to the porter-stemmer topic page so that developers can more easily learn about it.
To associate your repository with the porter-stemmer topic, visit your repo's landing page and select "manage topics."