# Text Cleaning with Regular Expressions

## In this exercise, you will learn how to use regular expressions to clean and preprocess text data. You will be given a sample text with various types of unwanted elements, and you will write functions to remove them.

### The goals of this exercise are:
1. Understand the basics of regular expressions
2. Learn how to apply regular expressions to clean text data
3. Practice implementing text cleaning functions

### Instructions:
1. Examine the `sample_text` provided below.
2. Implement the functions to clean the text according to the specified requirements.
3. Fill in the blanks in the functions to complete the solutions.
4. Test your functions by running the provided code.
5. Compare your solutions to the completed function bodies at the end.

Hint: Go [here](https://regexr.com/) and play around with regex

In [2]:
import re

sample_text = """
This is some sample text with various unwanted elements:

- URLs: https://www.example.com, http://abc.xyz, ftp://server.com
- Email addresses: info@example.com, john@doe.org
- Phone numbers: 123-456-7890, (987) 654-3210
- Hashtags: #cleantext #regex #python
- Mentions: @johndoe @janesmith
- Numbers: 123 45.67 -8.9
- Special characters: $%^&*()_+{}|:"<>?
- Emojis: 😀 🙂 👍 💯
"""

In [3]:

def remove_emails(text):
    """
    Remove email addresses from the given text.
    """
    return re.sub(r"\S+@\S+", "", text)

def remove_hashtags(text):
    """
    Remove hashtags from the given text.
    """
    return re.sub(r"#\w+", "", text)

def remove_mentions(text):
    """
    Remove mentions from the given text.
    """
    return re.sub(r"@\w+", "", text)


In [4]:
def remove_urls(text):
    """
    Remove URLs from the given text.
    """
    return re.sub(r"", text, "")

def remove_phone_numbers(text):
    """
    Remove phone numbers from the given text.
    """
    return re.sub(r"", text, "")


def remove_numbers(text):
    """
    Remove numbers from the given text.
    """
    return re.sub(r"", text, "")

def remove_special_characters(text):
    """
    Remove special characters from the given text.
    """
    return re.sub(r"", text, "")

def remove_emojis(text):
    """
    Remove emojis from the given text.
    """
    return re.sub(r"", text, "")


In [6]:

def clean_text(text):
    """
    Apply all the cleaning functions to the given text.
    """
    text = remove_urls(text)
    text = remove_emails(text)
    text = remove_phone_numbers(text)
    text = remove_hashtags(text)
    text = remove_mentions(text)
    text = remove_numbers(text)
    text = remove_special_characters(text)
    text = remove_emojis(text)
    return text

# Test the cleaning functions
cleaned_text = clean_text(sample_text)
print("Original text:")
print(sample_text)
print("\nCleaned text:")
print(cleaned_text)

Original text:

This is some sample text with various unwanted elements:

- URLs: https://www.example.com, http://abc.xyz, ftp://server.com
- Email addresses: info@example.com, john@doe.org
- Phone numbers: 123-456-7890, (987) 654-3210
- Hashtags: #cleantext #regex #python
- Mentions: @johndoe @janesmith
- Numbers: 123 45.67 -8.9
- Special characters: $%^&*()_+{}|:"<>?
- Emojis: 😀 🙂 👍 💯


Cleaned text:

This is some sample text with various unwanted elements:

- URLs: https://www.example.com, http://abc.xyz, ftp://server.com
- Email addresses:  
- Phone numbers: 123-456-7890, (987) 654-3210
- Hashtags:   
- Mentions:  
- Numbers: 123 45.67 -8.9
- Special characters: $%^&*()_+{}|:"<>?
- Emojis: 😀 🙂 👍 💯



In [None]:
# Reset the text
sample_text = """
This is some sample text with various unwanted elements:

- URLs: https://www.example.com, http://abc.xyz, ftp://server.com
- Email addresses: info@example.com, john@doe.org
- Phone numbers: 123-456-7890, (987) 654-3210
- Hashtags: #cleantext #regex #python
- Mentions: @johndoe @janesmith
- Numbers: 123 45.67 -8.9
- Special characters: $%^&*()_+{}|:"<>?
- Emojis: 😀 🙂 👍 💯
"""