# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'mostly', 'but', 'hereby', 'become', 'elsewhere', 'with', 'through', 'name', 'never', 're', 'perhaps', 'former', 'regarding', 'much', 'which', 'what', 'below', 'sixty', 'on', 'side', 'him', 'everywhere', 'either', 'how', 'indeed', 'eleven', 'off', 'anyhow', 'always', 'amount', 'sometime', 'various', 'back', 'first', 'their', 'and', 'moreover', 'she', 'cannot', 'too', 'they', 'take', 'during', 'then', 'now', 'all', 'against', 'full', 'top', 'must', 'wherever', 'thereupon', 'do', 'ever', 'any', 'becomes', 'have', 'into', 'namely', 'seems', 'her', 'out', 'over', 'neither', 'per', 'when', 'than', 'by', 'them', 'none', 'yourselves', 'somehow', 'us', 'though', 'hers', 'became', 'therefore', 'since', 'etc', 'formerly', 'others', 'as', 'a', 'five', 'latter', 'sometimes', 'bottom', 'hence', 'beyond', 'without', 'only', 'hereupon', 'could', 'such', 'throughout', 'say', 'why', 'your', 'can', 'around', 'his', 'six', 'part', 'four', 'therein', 'down', 'does', 'because', 'due', 'nine', 'some', 'ca'

In [3]:
len(nlp.Defaults.stop_words)

307

## To see if a word is a stop word

In [4]:
nlp.vocab['myself'].is_stop

True

In [5]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [6]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [7]:
len(nlp.Defaults.stop_words)

308

In [8]:
nlp.vocab['btw'].is_stop

True

When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to vocab.

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [9]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [10]:
len(nlp.Defaults.stop_words)

307

In [11]:
nlp.vocab['beyond'].is_stop

False