# Lab 1: Text Basics & Regular Expressions (Enhanced)

## Introduction
Welcome to the first lab of the NLP series! In this lab, we will cover the fundamental building blocks of Natural Language Processing: **Text Manipulation**.

### Objectives
1.  Master Python string operations.
2.  Understand and use Regular Expressions (`re`).
3.  **Visualization**: Plot word frequency distributions.

## Part 1: Python String Operations

In [None]:
text = "   Hello, NLP World! This is a GREAT day to learn.   "
print(f"Original: '{text}'")
print(f"Stripped: '{text.strip()}'")
print(f"Lower: '{text.strip().lower()}'")

## Part 2: Interactive Regex Tester
Let's make this interactive! Use the widgets below to test your regex patterns in real-time.

In [None]:
%pip install ipywidgets

In [None]:
import re
import ipywidgets as widgets
from IPython.display import display

def test_regex(pattern, text):
    try:
        matches = re.findall(pattern, text)
        print(f"Found {len(matches)} matches: {matches}")
    except re.error as e:
        print(f"Invalid Regex: {e}")

pattern_widget = widgets.Text(value=r'\w+', description='Regex Pattern:', placeholder='e.g., \d+')
text_widget = widgets.Textarea(value='Contact us at test@example.com or 123-456-7890.', description='Test Text:')

widgets.interactive(test_regex, pattern=pattern_widget, text=text_widget)

## Part 3: Visualizing Word Frequency
Let's visualize the most common words in a sample text.

In [None]:
%pip install matplotlib

In [None]:
import matplotlib.pyplot as plt
from collections import Counter

sample_text = """
Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.
"""

words = re.findall(r'\w+', sample_text.lower())
counts = Counter(words)

common = counts.most_common(10)
labels, values = zip(*common)

plt.figure(figsize=(10, 5))
plt.bar(labels, values)
plt.title("Top 10 Words in NLP Text")
plt.show()