# Speech and Language Processing
### An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Daniel Jurafsky and James H. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (1st. ed.). Prentice Hall PTR, USA.

### Regular Expression 

Regular Expressions, commonly referred to as regex, are a powerful tool in Natural Language Processing (NLP) used to identify patterns in text data. A regular expression is a sequence of characters that define a search pattern, used to match and manipulate text strings. This allows NLP applications to find specific words, phrases, or patterns in text data with greater accuracy and efficiency.

Regular expressions can be used for a wide range of NLP tasks, including tokenization, text normalization, and information extraction. For example, a regular expression can be used to identify all occurrences of email addresses, phone numbers, or dates in a text document. This is done by defining a pattern that matches the desired structure of these entities, such as the format of an email address or phone number.

Regular expressions are also useful for text preprocessing tasks such as removing stopwords, punctuation, or other non-relevant characters from the text data. For example, a regular expression can be used to remove all punctuation marks from a text document, leaving only the words.

While regular expressions are a powerful tool in NLP, they can also be complex and difficult to master. In addition, the use of regular expressions requires a deep understanding of the language and the specific patterns that are being searched for. As such, regular expressions are often used in combination with other NLP techniques, such as machine learning algorithms or rule-based systems, to improve accuracy and efficiency.

In summary, regular expressions are a valuable tool in NLP for identifying patterns in text data. They are commonly used for tasks such as tokenization, text normalization, and information extraction. However, their use requires a deep understanding of the language and specific patterns being searched for, and they are often used in combination with other NLP techniques to improve accuracy and efficiency.

In [1]:
import re

In [2]:
text = """ 
        User: I am unhappy. 
        ELIZA: DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY
        User: I need some help, that much seems certain.
        ELIZA: WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP
        User: Perhaps I could learn to get along with my mother.
        ELIZA: TELL ME MORE ABOUT YOUR FAMILY
        User: My mother takes care of me.
        ELIZA: WHO ELSE IN YOU FAMILY TAKES CARE OF YOU
        User: My father.
        ELIZA: YOUR FATHER
        User: You are like my father in some ways.
"""

In [5]:
re.findall("ELIZA", text)

['ELIZA', 'ELIZA', 'ELIZA', 'ELIZA', 'ELIZA']

In [6]:
re.findall("User", text)

['User', 'User', 'User', 'User', 'User', 'User']

In [9]:
re.sub("[:]", "", text)

' \n        User I am unhappy. \n        ELIZA DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY\n        User I need some help, that much seems certain.\n        ELIZA WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP\n        User Perhaps I could learn to get along with my mother.\n        ELIZA TELL ME MORE ABOUT YOUR FAMILY\n        User My mother takes care of me.\n        ELIZA WHO ELSE IN YOU FAMILY TAKES CARE OF YOU\n        User My father.\n        ELIZA YOUR FATHER\n        User You are like my father in some ways.\n'