# Speech and Language Processing
### An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Daniel Jurafsky and James H. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (1st. ed.). Prentice Hall PTR, USA.

### Normalization

Normalization refers to the process of transforming data into a standard format in order to facilitate data comparison and analysis. In the context of natural language processing (NLP), normalization typically refers to the process of converting text data into a standard format. Normalization techniques in NLP include several processes such as converting text to lowercase, removing stop words, stemming, and lemmatization. The objective of normalization is to reduce the variability of text data, so that it can be more easily compared and analyzed.

Converting text to lowercase is a basic normalization technique that involves converting all the letters in a piece of text to lowercase. This is useful because it reduces the number of variations of the same word that may occur in a text corpus. For example, "Dog", "DOG", and "dog" would all be converted to "dog". Removing stop words is another common normalization technique, which involves removing common words such as "the", "a", and "an". These words do not add significant meaning to a sentence and removing them helps to reduce the dimensionality of the data.

Stemming and lemmatization are two other common normalization techniques in NLP. Stemming involves reducing words to their root form, while lemmatization involves reducing words to their base form. These techniques help to reduce the variability of words that can occur in a text corpus, making it easier to compare and analyze the data. Normalization is an important step in the preprocessing stage of NLP, as it can significantly improve the accuracy and performance of machine learning models that use text data. By reducing the variability of text data, normalization techniques help to improve the accuracy of text classification, sentiment analysis, and other NLP tasks.

<img src="https://devopedia.org/images/article/293/1027.1608556695.png">

In [22]:
import re
from textblob import TextBlob

def lowercase(text):
    return text.lower()

def remove_special_characters(text):
    pattern = r'[^a-zA-Z\s]'
    return re.sub(pattern, '', text)

def spelling_correction(text):
    blob = TextBlob(text)
    return str(blob.correct())


In [16]:
text = """ User: I am unhappy. @ELIZA: DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPP $3Y. User: I need some help, that much seems certain. ELIZA: WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP"""

In [17]:
lowercase(text)

' user: i am unhappy. @eliza: do you think coming here will help you not to be unhapp $3y. user: i need some help, that much seems certain. eliza: what would it mean to you if you got some help'

In [18]:
remove_special_characters(text)

' User I am unhappy ELIZA DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPP Y User I need some help that much seems certain ELIZA WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP'

In [33]:
spelling_correction('do yyou think comin here will help you not to be unhappy')

'do you think coming here will help you not to be unhappy'