Given a collection of documents, this project does the tokenization and stemming of all the words in the document collections. The implementation is done in java.
-
Updated
Feb 16, 2017 - Java
A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.
Given a collection of documents, this project does the tokenization and stemming of all the words in the document collections. The implementation is done in java.
A library for mentions on Android
An interpreter for a small imperative language.
Exploring Information Retrieval | Indexer Creation
Clone from https://github.com/phuonglh/vn.vitk and add some code to work with directory. Example command: bin/spark-submit /pathTo/vn.vitk-3.0.jar -i /pathTo/DataFolder/
A new programming language CORE and its interpreter
This is a Sheme Lang scanner (lexer) implemented with JFLEX
Cool interpreted programming language
A simple search engine that has been implemented for the Information Retrieval course
A Processing Toolbox for Persian Texts
Research about text manipulation (tokenizing, chunking, parsing)
An interpreter for a (very) simple functional programming language.
Implement 'bad apple' program language in nearly 450 lines
An interpreter for somewhat adapted Lisp source code. I have worked on this program for a school project.
ntc-vntok is a library Tokenizer for the Vietnamese language