# Topic 02: String Operations

## Overview
Strings are one of Python's most important data types. This notebook covers comprehensive string operations, methods, and formatting techniques.

### What You'll Learn:
- String creation and representation
- String methods and operations
- String formatting techniques
- String slicing and indexing
- Regular expressions basics

---

## 1. String Creation and Types

Python supports various ways to create strings:

In [1]:
# Different ways to create strings
single_quotes = 'Hello, World!'
double_quotes = "Hello, World!"
triple_single = '''This is a
multi-line string
using triple single quotes'''
triple_double = """This is a
multi-line string
using triple double quotes"""

print(f"Single quotes: {single_quotes}")
print(f"Double quotes: {double_quotes}")
print(f"Are they equal? {single_quotes == double_quotes}")
print(f"\nTriple single quotes:\n{triple_single}")
print(f"\nTriple double quotes:\n{triple_double}")

Single quotes: Hello, World!
Double quotes: Hello, World!
Are they equal? True

Triple single quotes:
This is a
multi-line string
using triple single quotes

Triple double quotes:
This is a
multi-line string
using triple double quotes


In [2]:
# Special string types
raw_string = r"Raw string: \n \t \\ no escape sequences"
byte_string = b"Byte string"
unicode_string = "Unicode: ñ, é, 中文, العربية, 🐍, 🚀"
formatted_string = f"Formatted string with variable: {len(unicode_string)}"

print(f"Raw string: {raw_string}")
print(f"Byte string: {byte_string} (type: {type(byte_string)})")
print(f"Unicode string: {unicode_string}")
print(f"Formatted string: {formatted_string}")

# Escape sequences
escape_examples = "Line 1\nLine 2\tTabbed\\Backslash\"Quote"
print(f"\nEscape sequences:\n{escape_examples}")

Raw string: Raw string: \n \t \\ no escape sequences
Byte string: b'Byte string' (type: <class 'bytes'>)
Unicode string: Unicode: ñ, é, 中文, العربية, 🐍, 🚀
Formatted string: Formatted string with variable: 32

Escape sequences:
Line 1
Line 2	Tabbed\Backslash"Quote


## 2. String Indexing and Slicing

Strings are sequences, so they support indexing and slicing:

In [3]:
# String indexing
text = "Python Programming"
print(f"Original string: '{text}'")
print(f"Length: {len(text)}")
print(f"\nIndexing:")
print(f"First character (index 0): '{text[0]}'")
print(f"Last character (index -1): '{text[-1]}'")
print(f"Second character: '{text[1]}'")
print(f"Second from end: '{text[-2]}'")

# Display index positions
print(f"\nIndex positions:")
for i, char in enumerate(text):
    print(f"Index {i:2d}: '{char}'")

Original string: 'Python Programming'
Length: 18

Indexing:
First character (index 0): 'P'
Last character (index -1): 'g'
Second character: 'y'
Second from end: 'n'

Index positions:
Index  0: 'P'
Index  1: 'y'
Index  2: 't'
Index  3: 'h'
Index  4: 'o'
Index  5: 'n'
Index  6: ' '
Index  7: 'P'
Index  8: 'r'
Index  9: 'o'
Index 10: 'g'
Index 11: 'r'
Index 12: 'a'
Index 13: 'm'
Index 14: 'm'
Index 15: 'i'
Index 16: 'n'
Index 17: 'g'


In [4]:
# String slicing [start:end:step]
text = "Python Programming"
print(f"Original: '{text}'")
print(f"\nSlicing examples:")
print(f"text[:6] (first 6): '{text[:6]}'")
print(f"text[7:] (from index 7): '{text[7:]}'")
print(f"text[7:18] (index 7 to 17): '{text[7:18]}'")
print(f"text[-11:] (last 11): '{text[-11:]}'")
print(f"text[::2] (every 2nd): '{text[::2]}'")
print(f"text[::-1] (reverse): '{text[::-1]}'")
print(f"text[1::3] (start at 1, every 3rd): '{text[1::3]}'")

Original: 'Python Programming'

Slicing examples:
text[:6] (first 6): 'Python'
text[7:] (from index 7): 'Programming'
text[7:18] (index 7 to 17): 'Programming'
text[-11:] (last 11): 'Programming'
text[::2] (every 2nd): 'Pto rgamn'
text[::-1] (reverse): 'gnimmargorP nohtyP'
text[1::3] (start at 1, every 3rd): 'yoPgmn'


## 3. String Methods - Case Operations

Python provides numerous methods for string manipulation:

In [5]:
# Case manipulation methods
text = "  Hello Python World  "
print(f"Original: '{text}'")
print(f"\nCase operations:")
print(f"upper(): '{text.upper()}'")
print(f"lower(): '{text.lower()}'")
print(f"title(): '{text.title()}'")
print(f"capitalize(): '{text.capitalize()}'")
print(f"swapcase(): '{text.swapcase()}'")

# Case checking methods
test_strings = ['HELLO', 'hello', 'Hello', 'Hello World', '123', 'Hello123']
print(f"\nCase checking:")
for s in test_strings:
    print(f"'{s}': upper={s.isupper()}, lower={s.islower()}, title={s.istitle()}")

Original: '  Hello Python World  '

Case operations:
upper(): '  HELLO PYTHON WORLD  '
lower(): '  hello python world  '
title(): '  Hello Python World  '
capitalize(): '  hello python world  '
swapcase(): '  hELLO pYTHON wORLD  '

Case checking:
'HELLO': upper=True, lower=False, title=False
'hello': upper=False, lower=True, title=False
'Hello': upper=False, lower=False, title=True
'Hello World': upper=False, lower=False, title=True
'123': upper=False, lower=False, title=False
'Hello123': upper=False, lower=False, title=True


## 4. String Methods - Whitespace and Cleaning

Methods for cleaning and processing strings:

In [6]:
# Whitespace methods
messy_text = "  \t  Hello Python World  \n  "
print(f"Original: {repr(messy_text)}")
print(f"\nWhitespace methods:")
print(f"strip(): {repr(messy_text.strip())}")
print(f"lstrip(): {repr(messy_text.lstrip())}")
print(f"rstrip(): {repr(messy_text.rstrip())}")

# Custom character stripping
text_with_chars = "...***Hello World***..."
print(f"\nCustom stripping:")
print(f"Original: '{text_with_chars}'")
print(f"strip('.*'): '{text_with_chars.strip('.*')}'")
print(f"lstrip('.*'): '{text_with_chars.lstrip('.*')}'")
print(f"rstrip('.*'): '{text_with_chars.rstrip('.*')}'")

Original: '  \t  Hello Python World  \n  '

Whitespace methods:
strip(): 'Hello Python World'
lstrip(): 'Hello Python World  \n  '
rstrip(): '  \t  Hello Python World'

Custom stripping:
Original: '...***Hello World***...'
strip('.*'): 'Hello World'
lstrip('.*'): 'Hello World***...'
rstrip('.*'): '...***Hello World'


## 5. String Methods - Search and Replace

Finding and replacing content in strings:

In [7]:
# Search methods
text = "Python is awesome. Python is powerful. Python is versatile."
search_term = "Python"

print(f"Text: {text}")
print(f"Search term: '{search_term}'")
print(f"\nSearch methods:")
print(f"find('{search_term}'): {text.find(search_term)}")
print(f"rfind('{search_term}'): {text.rfind(search_term)}")
print(f"index('{search_term}'): {text.index(search_term)}")
print(f"count('{search_term}'): {text.count(search_term)}")

# Boolean search methods
print(f"\nBoolean search:")
print(f"startswith('Python'): {text.startswith('Python')}")
print(f"endswith('versatile.'): {text.endswith('versatile.')}")
print(f"'awesome' in text: {'awesome' in text}")
print(f"'java' in text: {'java' in text}")

Text: Python is awesome. Python is powerful. Python is versatile.
Search term: 'Python'

Search methods:
find('Python'): 0
rfind('Python'): 39
index('Python'): 0
count('Python'): 3

Boolean search:
startswith('Python'): True
endswith('versatile.'): True
'awesome' in text: True
'java' in text: False


In [8]:
# Replace methods
text = "I love Java. Java is great. Java programming is fun."
print(f"Original: {text}")
print(f"\nReplace methods:")
print(f"replace('Java', 'Python'): {text.replace('Java', 'Python')}")
print(f"replace('Java', 'Python', 2): {text.replace('Java', 'Python', 2)}")

# Multiple replacements
def multiple_replace(text, replacements):
    """Replace multiple substrings"""
    for old, new in replacements.items():
        text = text.replace(old, new)
    return text

replacements = {'Java': 'Python', 'great': 'awesome', 'fun': 'exciting'}
result = multiple_replace(text, replacements)
print(f"\nMultiple replacements: {result}")

Original: I love Java. Java is great. Java programming is fun.

Replace methods:
replace('Java', 'Python'): I love Python. Python is great. Python programming is fun.
replace('Java', 'Python', 2): I love Python. Python is great. Java programming is fun.

Multiple replacements: I love Python. Python is awesome. Python programming is exciting.


## 6. String Methods - Split and Join

Breaking strings apart and putting them back together:

In [9]:
# Split methods
sentence = "Python is a powerful programming language"
csv_data = "apple,banana,cherry,date,elderberry"
multiline = "Line 1\nLine 2\nLine 3\nLine 4"

print(f"Sentence: {sentence}")
print(f"split(): {sentence.split()}")
print(f"split(' ', 2): {sentence.split(' ', 2)}")

print(f"\nCSV data: {csv_data}")
print(f"split(','): {csv_data.split(',')}")

print(f"\nMultiline: {repr(multiline)}")
print(f"splitlines(): {multiline.splitlines()}")

# Partition methods
email = "user@example.com"
print(f"\nEmail: {email}")
print(f"partition('@'): {email.partition('@')}")
print(f"rpartition('.'): {email.rpartition('.')}")

Sentence: Python is a powerful programming language
split(): ['Python', 'is', 'a', 'powerful', 'programming', 'language']
split(' ', 2): ['Python', 'is', 'a powerful programming language']

CSV data: apple,banana,cherry,date,elderberry
split(','): ['apple', 'banana', 'cherry', 'date', 'elderberry']

Multiline: 'Line 1\nLine 2\nLine 3\nLine 4'
splitlines(): ['Line 1', 'Line 2', 'Line 3', 'Line 4']

Email: user@example.com
partition('@'): ('user', '@', 'example.com')
rpartition('.'): ('user@example', '.', 'com')


In [10]:
# Join methods
words = ['Python', 'is', 'awesome']
fruits = ['apple', 'banana', 'cherry']
numbers = [1, 2, 3, 4, 5]

print(f"Words list: {words}")
print(f"' '.join(words): '{' '.join(words)}'")
print(f"'-'.join(words): '{'-'.join(words)}'")
print(f"''.join(words): '{'-'.join(words)}'")

print(f"\nFruits: {fruits}")
print(f"', '.join(fruits): '{', '.join(fruits)}'")
print(f"' and '.join(fruits): '{' and '.join(fruits)}'")

# Join numbers (need to convert to strings first)
print(f"\nNumbers: {numbers}")
number_strings = [str(n) for n in numbers]
print(f"'-'.join(map(str, numbers)): '{'-'.join(map(str, numbers))}'")

Words list: ['Python', 'is', 'awesome']
' '.join(words): 'Python is awesome'
'-'.join(words): 'Python-is-awesome'
''.join(words): 'Python-is-awesome'

Fruits: ['apple', 'banana', 'cherry']
', '.join(fruits): 'apple, banana, cherry'
' and '.join(fruits): 'apple and banana and cherry'

Numbers: [1, 2, 3, 4, 5]
'-'.join(map(str, numbers)): '1-2-3-4-5'


## 7. String Validation Methods

Methods to check string content and characteristics:

In [11]:
# String validation methods
test_strings = [
    '123',           # digits
    'abc',           # alpha
    'abc123',        # alphanumeric
    'ABC',           # uppercase
    'abc',           # lowercase
    'Hello World',   # title case
    '   ',           # whitespace
    'hello_world',   # identifier
    '123abc',        # mixed
    '',              # empty
]

print(f"String validation methods:")
print(f"{'String':<12} {'isdigit':<8} {'isalpha':<8} {'isalnum':<8} {'isspace':<8} {'isidentifier':<12}")
print("-" * 60)

for s in test_strings:
    display_s = repr(s) if len(s) < 8 else repr(s[:8] + '...')
    print(f"{display_s:<12} {str(s.isdigit()):<8} {str(s.isalpha()):<8} {str(s.isalnum()):<8} {str(s.isspace()):<8} {str(s.isidentifier()):<12}")

String validation methods:
String       isdigit  isalpha  isalnum  isspace  isidentifier
------------------------------------------------------------
'123'        True     False    True     False    False       
'abc'        False    True     True     False    True        
'abc123'     False    False    True     False    True        
'ABC'        False    True     True     False    True        
'abc'        False    True     True     False    True        
'Hello Wo...' False    False    False    False    False       
'   '        False    False    False    True     False       
'hello_wo...' False    False    False    False    True        
'123abc'     False    False    True     False    False       
''           False    False    False    False    False       


In [12]:
# More validation methods
test_cases = {
    'isprintable': ['hello', 'hello\n', 'hello\tworld', '🐍'],
    'isdecimal': ['123', '123.45', '½', '²'],
    'isnumeric': ['123', '½', '²', 'Ⅴ', 'hello'],
    'isascii': ['hello', 'café', '🐍', 'naïve']
}

for method, strings in test_cases.items():
    print(f"\n{method}() examples:")
    for s in strings:
        result = getattr(s, method)()
        print(f"  {repr(s):<15} -> {result}")


isprintable() examples:
  'hello'         -> True
  'hello\n'       -> False
  'hello\tworld'  -> False
  '🐍'             -> True

isdecimal() examples:
  '123'           -> True
  '123.45'        -> False
  '½'             -> False
  '²'             -> False

isnumeric() examples:
  '123'           -> True
  '½'             -> True
  '²'             -> True
  'Ⅴ'             -> True
  'hello'         -> False

isascii() examples:
  'hello'         -> True
  'café'          -> False
  '🐍'             -> False
  'naïve'         -> False


## 8. String Formatting - Multiple Techniques

Python offers several ways to format strings:

In [13]:
# Variables for formatting examples
name = "Alice"
age = 25
height = 5.75
balance = 1234.56
is_student = True

print("String Formatting Techniques:")
print("=" * 40)

# 1. f-strings (Python 3.6+) - Recommended
print("\n1. f-strings (recommended):")
print(f"Name: {name}")
print(f"Age: {age} years old")
print(f"Height: {height} feet")
print(f"Balance: ${balance:.2f}")
print(f"Student status: {is_student}")
print(f"Summary: {name} is {age} years old and {'is' if is_student else 'is not'} a student")

String Formatting Techniques:

1. f-strings (recommended):
Name: Alice
Age: 25 years old
Height: 5.75 feet
Balance: $1234.56
Student status: True
Summary: Alice is 25 years old and is a student


In [14]:
# 2. .format() method
print("\n2. .format() method:")
print("Name: {}".format(name))
print("Name: {0}, Age: {1}".format(name, age))
print("Age: {1}, Name: {0}".format(name, age))  # Different order
print("Name: {name}, Age: {age}".format(name=name, age=age))
print("Balance: ${:.2f}".format(balance))
print("Height: {:.1f} feet".format(height))


2. .format() method:
Name: Alice
Name: Alice, Age: 25
Age: 25, Name: Alice
Name: Alice, Age: 25
Balance: $1234.56
Height: 5.8 feet


In [15]:
# 3. % formatting (old style, but still useful)
print("\n3. % formatting (old style):")
print("Name: %s" % name)
print("Name: %s, Age: %d" % (name, age))
print("Balance: $%.2f" % balance)
print("Height: %.1f feet" % height)
print("Student: %s" % is_student)


3. % formatting (old style):
Name: Alice
Name: Alice, Age: 25
Balance: $1234.56
Height: 5.8 feet
Student: True


## 9. Advanced String Formatting

Detailed formatting options and alignment:

In [16]:
# Advanced f-string formatting
name = "Bob"
score = 95.67
count = 1234
percentage = 0.875

print("Advanced f-string formatting:")
print(f"\nAlignment and width:")
print(f"Left aligned: '{name:<20}'")
print(f"Right aligned: '{name:>20}'")
print(f"Center aligned: '{name:^20}'")
print(f"Center with fill: '{name:*^20}'")

print(f"\nNumber formatting:")
print(f"Score with 2 decimals: {score:.2f}")
print(f"Score with comma separator: {count:,}")
print(f"Percentage: {percentage:.1%}")
print(f"Scientific notation: {count:.2e}")
print(f"Binary: {count:b}")
print(f"Hexadecimal: {count:x}")
print(f"Octal: {count:o}")

Advanced f-string formatting:

Alignment and width:
Left aligned: 'Bob                 '
Right aligned: '                 Bob'
Center aligned: '        Bob         '
Center with fill: '********Bob*********'

Number formatting:
Score with 2 decimals: 95.67
Score with comma separator: 1,234
Percentage: 87.5%
Scientific notation: 1.23e+03
Binary: 10011010010
Hexadecimal: 4d2
Octal: 2322


In [17]:
# Padding and zero-filling
numbers = [5, 42, 123, 1000]
print("Number padding examples:")
for num in numbers:
    print(f"Number: {num:5d} | Zero-padded: {num:05d} | Right-aligned: {num:>8}")

# Date and time formatting
from datetime import datetime
now = datetime.now()

print(f"\nDate and time formatting:")
print(f"Default: {now}")
print(f"Date only: {now:%Y-%m-%d}")
print(f"Time only: {now:%H:%M:%S}")
print(f"Full format: {now:%A, %B %d, %Y at %I:%M %p}")

Number padding examples:
Number:     5 | Zero-padded: 00005 | Right-aligned:        5
Number:    42 | Zero-padded: 00042 | Right-aligned:       42
Number:   123 | Zero-padded: 00123 | Right-aligned:      123
Number:  1000 | Zero-padded: 01000 | Right-aligned:     1000

Date and time formatting:
Default: 2025-08-01 17:36:22.933830
Date only: 2025-08-01
Time only: 17:36:22
Full format: Friday, August 01, 2025 at 05:36 PM


## 10. String Encoding and Decoding

Working with different text encodings:

In [18]:
# String encoding and decoding
text = "Hello, 世界! 🌍"
print(f"Original text: {text}")
print(f"Type: {type(text)}")

# Encode to bytes
utf8_bytes = text.encode('utf-8')
ascii_bytes = text.encode('ascii', errors='ignore')  # Ignore non-ASCII
latin1_bytes = text.encode('latin-1', errors='replace')  # Replace with ?

print(f"\nEncoded to bytes:")
print(f"UTF-8: {utf8_bytes}")
print(f"ASCII (ignored): {ascii_bytes}")
print(f"Latin-1 (replaced): {latin1_bytes}")

# Decode back to string
decoded_utf8 = utf8_bytes.decode('utf-8')
decoded_ascii = ascii_bytes.decode('ascii')

print(f"\nDecoded back:")
print(f"From UTF-8: {decoded_utf8}")
print(f"From ASCII: {decoded_ascii}")

Original text: Hello, 世界! 🌍
Type: <class 'str'>

Encoded to bytes:
UTF-8: b'Hello, \xe4\xb8\x96\xe7\x95\x8c! \xf0\x9f\x8c\x8d'
ASCII (ignored): b'Hello, ! '
Latin-1 (replaced): b'Hello, ??! ?'

Decoded back:
From UTF-8: Hello, 世界! 🌍
From ASCII: Hello, ! 


## 11. Regular Expressions Basics

Introduction to pattern matching with regex:

In [19]:
import re

# Basic regex patterns
text = "My phone number is 123-456-7890 and my email is user@example.com"
print(f"Text: {text}")

# Find phone number
phone_pattern = r'\d{3}-\d{3}-\d{4}'
phone_match = re.search(phone_pattern, text)
if phone_match:
    print(f"Phone found: {phone_match.group()}")

# Find email
email_pattern = r'\w+@\w+\.\w+'
email_match = re.search(email_pattern, text)
if email_match:
    print(f"Email found: {email_match.group()}")

# Find all numbers
numbers = re.findall(r'\d+', text)
print(f"All numbers: {numbers}")

Text: My phone number is 123-456-7890 and my email is user@example.com
Phone found: 123-456-7890
Email found: user@example.com
All numbers: ['123', '456', '7890']


In [20]:
# Practical regex examples
test_strings = [
    "Valid email: user@domain.com",
    "Invalid email: userdomaincom",
    "Phone: (555) 123-4567",
    "Phone: 555.123.4567",
    "Phone: 5551234567",
]

# Email validation pattern
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

# Phone validation pattern (flexible)
phone_pattern = r'\(?\d{3}\)?[-.)\s]?\d{3}[-.)\s]?\d{4}'

print("Pattern matching examples:")
for text in test_strings:
    print(f"\nText: {text}")
    
    email_found = re.search(email_pattern, text)
    phone_found = re.search(phone_pattern, text)
    
    if email_found:
        print(f"  Email found: {email_found.group()}")
    if phone_found:
        print(f"  Phone found: {phone_found.group()}")
    if not email_found and not phone_found:
        print("  No patterns found")

Pattern matching examples:

Text: Valid email: user@domain.com
  Email found: user@domain.com

Text: Invalid email: userdomaincom
  No patterns found

Text: Phone: (555) 123-4567
  Phone found: (555) 123-4567

Text: Phone: 555.123.4567
  Phone found: 555.123.4567

Text: Phone: 5551234567
  Phone found: 5551234567


## 12. String Performance and Best Practices

Tips for efficient string operations:

In [21]:
import time

# String concatenation performance comparison
def test_concatenation(n=1000):
    """Compare different string concatenation methods"""
    
    # Method 1: += operator (inefficient for many operations)
    start = time.time()
    result1 = ""
    for i in range(n):
        result1 += f"item{i} "
    time1 = time.time() - start
    
    # Method 2: join() method (efficient)
    start = time.time()
    items = [f"item{i} " for i in range(n)]
    result2 = "".join(items)
    time2 = time.time() - start
    
    # Method 3: f-string with join (most readable)
    start = time.time()
    result3 = " ".join(f"item{i}" for i in range(n)) + " "
    time3 = time.time() - start
    
    print(f"Concatenation performance test ({n} items):")
    print(f"  += operator: {time1:.4f} seconds")
    print(f"  join() method: {time2:.4f} seconds")
    print(f"  f-string + join: {time3:.4f} seconds")
    print(f"  Results equal: {result1 == result2 == result3}")

test_concatenation(1000)

Concatenation performance test (1000 items):
  += operator: 0.0000 seconds
  join() method: 0.0010 seconds
  f-string + join: 0.0000 seconds
  Results equal: True


In [22]:
# String best practices
print("String Best Practices:")
print("=" * 30)

# 1. Use f-strings for readability
name, age = "Alice", 25
# Good
good_format = f"Hello, {name}! You are {age} years old."
# Less readable
old_format = "Hello, {}! You are {} years old.".format(name, age)
print(f"1. F-strings are more readable:")
print(f"   Good: {good_format}")
print(f"   Okay: {old_format}")

# 2. Use join() for multiple concatenations
words = ['Python', 'is', 'awesome', 'and', 'powerful']
# Good
good_join = " ".join(words)
# Inefficient
bad_concat = ""
for word in words:
    bad_concat += word + " "
bad_concat = bad_concat.strip()

print(f"\n2. Use join() for multiple strings:")
print(f"   Good: {good_join}")
print(f"   Works but inefficient: {bad_concat}")

# 3. Use raw strings for regex patterns
import re
# Good - raw string
pattern_good = r'\d+\.\d+'
# Bad - need to escape backslashes
pattern_bad = '\\d+\\.\\d+'
print(f"\n3. Raw strings for regex:")
print(f"   Good: {repr(pattern_good)}")
print(f"   Bad: {repr(pattern_bad)}")
print(f"   Both work: {pattern_good == pattern_bad}")

String Best Practices:
1. F-strings are more readable:
   Good: Hello, Alice! You are 25 years old.
   Okay: Hello, Alice! You are 25 years old.

2. Use join() for multiple strings:
   Good: Python is awesome and powerful
   Works but inefficient: Python is awesome and powerful

3. Raw strings for regex:
   Good: '\\d+\\.\\d+'
   Bad: '\\d+\\.\\d+'
   Both work: True


## 13. Practice Exercises

Let's practice string operations:

In [23]:
# Exercise 1: Text analyzer
def analyze_text(text):
    """Analyze text and return statistics"""
    words = text.split()
    sentences = text.split('.')
    
    stats = {
        'characters': len(text),
        'characters_no_spaces': len(text.replace(' ', '')),
        'words': len(words),
        'sentences': len([s for s in sentences if s.strip()]),
        'avg_word_length': sum(len(word.strip('.,!?;:')) for word in words) / len(words) if words else 0,
        'longest_word': max(words, key=len) if words else '',
        'most_common_char': max(text.lower(), key=text.lower().count) if text else ''
    }
    return stats

sample_text = "Python is a powerful programming language. It is easy to learn and versatile."
analysis = analyze_text(sample_text)

print(f"Text Analysis for: '{sample_text}'")
print("-" * 50)
for key, value in analysis.items():
    if isinstance(value, float):
        print(f"{key.replace('_', ' ').title()}: {value:.2f}")
    else:
        print(f"{key.replace('_', ' ').title()}: {value}")

Text Analysis for: 'Python is a powerful programming language. It is easy to learn and versatile.'
--------------------------------------------------
Characters: 77
Characters No Spaces: 65
Words: 13
Sentences: 2
Avg Word Length: 4.85
Longest Word: programming
Most Common Char:  


In [24]:
# Exercise 2: String validator
def validate_input(text, rules):
    """Validate text against multiple rules"""
    results = {}
    
    for rule_name, rule_func in rules.items():
        try:
            results[rule_name] = rule_func(text)
        except Exception as e:
            results[rule_name] = f"Error: {e}"
    
    return results

# Define validation rules
validation_rules = {
    'not_empty': lambda x: len(x.strip()) > 0,
    'min_length': lambda x: len(x) >= 3,
    'max_length': lambda x: len(x) <= 50,
    'has_letter': lambda x: any(c.isalpha() for c in x),
    'has_digit': lambda x: any(c.isdigit() for c in x),
    'no_special_chars': lambda x: x.replace(' ', '').isalnum(),
    'starts_with_letter': lambda x: x[0].isalpha() if x else False,
}

# Test different inputs
test_inputs = [
    "Hello123",
    "Hi",
    "   ",
    "123456",
    "Hello@World",
    "9StartWithNumber"
]

print("Input Validation Results:")
print("=" * 60)
for test_input in test_inputs:
    print(f"\nInput: '{test_input}'")
    results = validate_input(test_input, validation_rules)
    for rule, result in results.items():
        status = "✓" if result else "✗"
        print(f"  {status} {rule.replace('_', ' ').title()}: {result}")

Input Validation Results:

Input: 'Hello123'
  ✓ Not Empty: True
  ✓ Min Length: True
  ✓ Max Length: True
  ✓ Has Letter: True
  ✓ Has Digit: True
  ✓ No Special Chars: True
  ✓ Starts With Letter: True

Input: 'Hi'
  ✓ Not Empty: True
  ✗ Min Length: False
  ✓ Max Length: True
  ✓ Has Letter: True
  ✗ Has Digit: False
  ✓ No Special Chars: True
  ✓ Starts With Letter: True

Input: '   '
  ✗ Not Empty: False
  ✓ Min Length: True
  ✓ Max Length: True
  ✗ Has Letter: False
  ✗ Has Digit: False
  ✗ No Special Chars: False
  ✗ Starts With Letter: False

Input: '123456'
  ✓ Not Empty: True
  ✓ Min Length: True
  ✓ Max Length: True
  ✗ Has Letter: False
  ✓ Has Digit: True
  ✓ No Special Chars: True
  ✗ Starts With Letter: False

Input: 'Hello@World'
  ✓ Not Empty: True
  ✓ Min Length: True
  ✓ Max Length: True
  ✓ Has Letter: True
  ✗ Has Digit: False
  ✗ No Special Chars: False
  ✓ Starts With Letter: True

Input: '9StartWithNumber'
  ✓ Not Empty: True
  ✓ Min Length: True
  ✓ Max Length:

In [25]:
# Exercise 3: Text formatter
def format_text_table(data, headers):
    """Format data as a text table"""
    # Calculate column widths
    col_widths = []
    for i, header in enumerate(headers):
        max_width = len(header)
        for row in data:
            max_width = max(max_width, len(str(row[i])))
        col_widths.append(max_width + 2)  # Add padding
    
    # Create table
    separator = '+' + '+'.join('-' * width for width in col_widths) + '+'
    
    # Header
    table = [separator]
    header_row = '|' + '|'.join(f' {header:<{col_widths[i]-1}}' for i, header in enumerate(headers)) + '|'
    table.append(header_row)
    table.append(separator)
    
    # Data rows
    for row in data:
        data_row = '|' + '|'.join(f' {str(row[i]):<{col_widths[i]-1}}' for i in range(len(row))) + '|'
        table.append(data_row)
    
    table.append(separator)
    return '\n'.join(table)

# Test the table formatter
student_data = [
    ['Alice', 20, 'Computer Science', 3.8],
    ['Bob', 19, 'Mathematics', 3.6],
    ['Charlie', 21, 'Physics', 3.9],
    ['Diana', 22, 'Chemistry', 3.7]
]

headers = ['Name', 'Age', 'Major', 'GPA']
formatted_table = format_text_table(student_data, headers)

print("Formatted Student Table:")
print(formatted_table)

Formatted Student Table:
+---------+-----+------------------+-----+
| Name    | Age | Major            | GPA |
+---------+-----+------------------+-----+
| Alice   | 20  | Computer Science | 3.8 |
| Bob     | 19  | Mathematics      | 3.6 |
| Charlie | 21  | Physics          | 3.9 |
| Diana   | 22  | Chemistry        | 3.7 |
+---------+-----+------------------+-----+


## Summary

In this notebook, you learned about:

✅ **String Creation**: Different ways to create and represent strings  
✅ **Indexing & Slicing**: Accessing parts of strings  
✅ **String Methods**: Case, whitespace, search, replace, split, join  
✅ **String Validation**: Methods to check string content  
✅ **String Formatting**: f-strings, .format(), % formatting  
✅ **Encoding/Decoding**: Working with different text encodings  
✅ **Regular Expressions**: Basic pattern matching  
✅ **Performance**: Best practices for string operations  

### Key Takeaways:
1. Strings are immutable - operations return new strings
2. Use f-strings for readable string formatting
3. Use join() for efficient string concatenation
4. Raw strings (r"") are useful for regex patterns
5. Python handles Unicode seamlessly
6. Many string methods return boolean values for validation

### Next Topic: 03_numbers_and_math.ipynb
Learn about numeric types, mathematical operations, and the math module.