A data package containing lexicons and dictionaries for text analysis
Clone or download
Tyler Rinker
Tyler Rinker Updated cliche scrape
Latest commit e05c61e Dec 28, 2018

README.md

lexicon

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Build Status

Table of Contents

Description

lexicon is a collection of lexical hash tables, dictionaries, and word lists. The data prefixes help to categorize the data types:

Prefix Meaning
key_ A data.frame with a lookup and return value
hash_ A keyed data.table hash table
freq_ A data.table of terms with frequencies
profanity_ A profane words vector
pos_ A part of speech vector
pos_df_ A part of speech data.frame
sw_ A stopword vector

Data

Data Description

cliches

Common Cliches

common_names

First Names (U.S.)

constraining_loughran_mcdonald

Loughran-McDonald Constraining Words

emojis_sentiment

Emoji Sentiment Data

freq_first_names

Frequent U.S. First Names

freq_last_names

Frequent U.S. Last Names

function_words

Function Words

grady_augmented

Augmented List of Grady Ward's English Words and Mark Kantrowitz's Names List

hash_emojis

Emoji Description Lookup Table

hash_emojis_identifier

Emoji Identifier Lookup Table

hash_emoticons

Emoticons

hash_grady_pos

Grady Ward's Moby Parts of Speech

hash_internet_slang

List of Internet Slang and Corresponding Meanings

hash_lemmas

Lemmatization List

hash_nrc_emotions

NRC Emotion Table

hash_sentiment_emojis

Emoji Sentiment Polarity Lookup Table

hash_sentiment_huliu

Hu Liu Polarity Lookup Table

hash_sentiment_jockers

Jockers Sentiment Polarity Table

hash_sentiment_jockers_rinker

Combined Jockers & Rinker Polarity Lookup Table

hash_sentiment_loughran_mcdonald

Loughran-McDonald Polarity Table

hash_sentiment_nrc

NRC Sentiment Polarity Table

hash_sentiment_senticnet

Augmented SenticNet Polarity Table

hash_sentiment_sentiword

Augmented Sentiword Polarity Table

hash_sentiment_slangsd

SlangSD Sentiment Polarity Table

hash_sentiment_socal_google

SO-CAL Google Polarity Table

hash_valence_shifters

Valence Shifters

key_contractions

Contraction Conversions

key_corporate_social_responsibility

Nadra Pencle and Irina Malaescu's Corporate Social Responsibility Dictionary

key_grade

Grades Data Set

key_rating

Ratings Data Set

key_regressive_imagery

Colin Martindale's English Regressive Imagery Dictionary

key_sentiment_jockers

Jockers Sentiment Data Set

modal_loughran_mcdonald

Loughran-McDonald Modal List

nrc_emotions

NRC Emotions

pos_action_verb

Action Word List

pos_df_irregular_nouns

Irregular Nouns Word Dataframe

pos_df_pronouns

Pronouns

pos_interjections

Interjections

pos_preposition

Preposition Words

profanity_alvarez

Alejandro U. Alvarez's List of Profane Words

profanity_arr_bad

Stackoverflow user2592414's List of Profane Words

profanity_banned

bannedwordlist.com's List of Profane Words

profanity_racist

Titus Wormer's List of Racist Words

profanity_zac_anger

Zac Anger's List of Profane Words

sw_dolch

Leveled Dolch List of 220 Common Words

sw_fry_100

Fry's 100 Most Commonly Used English Words

sw_fry_1000

Fry's 1000 Most Commonly Used English Words

sw_fry_200

Fry's 200 Most Commonly Used English Words

sw_fry_25

Fry's 25 Most Commonly Used English Words

sw_jockers

Matthew Jocker's Expanded Topic Modeling Stopword List

sw_loughran_mcdonald_long

Loughran-McDonald Long Stopword List

sw_loughran_mcdonald_short

Loughran-McDonald Short Stopword List

sw_lucene

Lucene Stopword List

sw_mallet

MALLET Stopword List

sw_python

Python Stopword List

Installation

To download the development version of lexicon:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/lexicon")

Contact

You are welcome to: