A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
-
Updated
Jun 25, 2024 - Python
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Portuguese pre-trained BERT models
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
A lexicon for Sudachi
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
A Python module that fetches a page of a word/phrase from the Online Indonesian Dictionary (https://kbbi.kemdikbud.go.id).
Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
Interface for reading the Paraphrase Database (PPDB)
A collection of natural language processing notebooks.
Roundtrip translation (aka back translation) python package
Debiasing word embeddings
A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference
ThamizhiPOSt - A neural based POS tagger for Tamil
Basic Universal Dependencies Part-of-Speech Tagger for Tibetan
Assignment solutions for CS224N: Natural Language Processing with Deep Learning - Stanford / Winter 2023
An extensive dataset for latin-written arabic.
Data for testing the Tibetan Lucene analyzers
Created by Alan Turing