# Topic 02: String Operations

## Overview
Strings are one of Python's most important data types. This notebook covers comprehensive string operations, methods, and formatting techniques.

### What You'll Learn:
- String creation and representation
- String methods and operations
- String formatting techniques
- String slicing and indexing
- Regular expressions basics

---

## 1. String Creation and Types

Python supports various ways to create strings:

In [None]:
# Different ways to create strings
single_quotes = 'Hello, World!'
double_quotes = "Hello, World!"
triple_single = '''This is a
multi-line string
using triple single quotes'''
triple_double = """This is a
multi-line string
using triple double quotes"""

print(f"Single quotes: {single_quotes}")
print(f"Double quotes: {double_quotes}")
print(f"Are they equal? {single_quotes == double_quotes}")
print(f"\nTriple single quotes:\n{triple_single}")
print(f"\nTriple double quotes:\n{triple_double}")

In [None]:
# Special string types
raw_string = r"Raw string: \n \t \\ no escape sequences"
byte_string = b"Byte string"
unicode_string = "Unicode: ñ, é, 中文, العربية, 🐍, 🚀"
formatted_string = f"Formatted string with variable: {len(unicode_string)}"

print(f"Raw string: {raw_string}")
print(f"Byte string: {byte_string} (type: {type(byte_string)})")
print(f"Unicode string: {unicode_string}")
print(f"Formatted string: {formatted_string}")

# Escape sequences
escape_examples = "Line 1\nLine 2\tTabbed\\Backslash\"Quote"
print(f"\nEscape sequences:\n{escape_examples}")

## 2. String Indexing and Slicing

Strings are sequences, so they support indexing and slicing:

In [None]:
# String indexing
text = "Python Programming"
print(f"Original string: '{text}'")
print(f"Length: {len(text)}")
print(f"\nIndexing:")
print(f"First character (index 0): '{text[0]}'")
print(f"Last character (index -1): '{text[-1]}'")
print(f"Second character: '{text[1]}'")
print(f"Second from end: '{text[-2]}'")

# Display index positions
print(f"\nIndex positions:")
for i, char in enumerate(text):
    print(f"Index {i:2d}: '{char}'")

In [None]:
# String slicing [start:end:step]
text = "Python Programming"
print(f"Original: '{text}'")
print(f"\nSlicing examples:")
print(f"text[:6] (first 6): '{text[:6]}'")
print(f"text[7:] (from index 7): '{text[7:]}'")
print(f"text[7:18] (index 7 to 17): '{text[7:18]}'")
print(f"text[-11:] (last 11): '{text[-11:]}'")
print(f"text[::2] (every 2nd): '{text[::2]}'")
print(f"text[::-1] (reverse): '{text[::-1]}'")
print(f"text[1::3] (start at 1, every 3rd): '{text[1::3]}'")

## 3. String Methods - Case Operations

Python provides numerous methods for string manipulation:

In [None]:
# Case manipulation methods
text = "  Hello Python World  "
print(f"Original: '{text}'")
print(f"\nCase operations:")
print(f"upper(): '{text.upper()}'")
print(f"lower(): '{text.lower()}'")
print(f"title(): '{text.title()}'")
print(f"capitalize(): '{text.capitalize()}'")
print(f"swapcase(): '{text.swapcase()}'")

# Case checking methods
test_strings = ['HELLO', 'hello', 'Hello', 'Hello World', '123', 'Hello123']
print(f"\nCase checking:")
for s in test_strings:
    print(f"'{s}': upper={s.isupper()}, lower={s.islower()}, title={s.istitle()}")

## 4. String Methods - Whitespace and Cleaning

Methods for cleaning and processing strings:

In [None]:
# Whitespace methods
messy_text = "  \t  Hello Python World  \n  "
print(f"Original: {repr(messy_text)}")
print(f"\nWhitespace methods:")
print(f"strip(): {repr(messy_text.strip())}")
print(f"lstrip(): {repr(messy_text.lstrip())}")
print(f"rstrip(): {repr(messy_text.rstrip())}")

# Custom character stripping
text_with_chars = "...***Hello World***..."
print(f"\nCustom stripping:")
print(f"Original: '{text_with_chars}'")
print(f"strip('.*'): '{text_with_chars.strip('.*')}'")
print(f"lstrip('.*'): '{text_with_chars.lstrip('.*')}'")
print(f"rstrip('.*'): '{text_with_chars.rstrip('.*')}'")

## 5. String Methods - Search and Replace

Finding and replacing content in strings:

In [None]:
# Search methods
text = "Python is awesome. Python is powerful. Python is versatile."
search_term = "Python"

print(f"Text: {text}")
print(f"Search term: '{search_term}'")
print(f"\nSearch methods:")
print(f"find('{search_term}'): {text.find(search_term)}")
print(f"rfind('{search_term}'): {text.rfind(search_term)}")
print(f"index('{search_term}'): {text.index(search_term)}")
print(f"count('{search_term}'): {text.count(search_term)}")

# Boolean search methods
print(f"\nBoolean search:")
print(f"startswith('Python'): {text.startswith('Python')}")
print(f"endswith('versatile.'): {text.endswith('versatile.')}")
print(f"'awesome' in text: {'awesome' in text}")
print(f"'java' in text: {'java' in text}")

In [None]:
# Replace methods
text = "I love Java. Java is great. Java programming is fun."
print(f"Original: {text}")
print(f"\nReplace methods:")
print(f"replace('Java', 'Python'): {text.replace('Java', 'Python')}")
print(f"replace('Java', 'Python', 2): {text.replace('Java', 'Python', 2)}")

# Multiple replacements
def multiple_replace(text, replacements):
    """Replace multiple substrings"""
    for old, new in replacements.items():
        text = text.replace(old, new)
    return text

replacements = {'Java': 'Python', 'great': 'awesome', 'fun': 'exciting'}
result = multiple_replace(text, replacements)
print(f"\nMultiple replacements: {result}")

## 6. String Methods - Split and Join

Breaking strings apart and putting them back together:

In [None]:
# Split methods
sentence = "Python is a powerful programming language"
csv_data = "apple,banana,cherry,date,elderberry"
multiline = "Line 1\nLine 2\nLine 3\nLine 4"

print(f"Sentence: {sentence}")
print(f"split(): {sentence.split()}")
print(f"split(' ', 2): {sentence.split(' ', 2)}")

print(f"\nCSV data: {csv_data}")
print(f"split(','): {csv_data.split(',')}")

print(f"\nMultiline: {repr(multiline)}")
print(f"splitlines(): {multiline.splitlines()}")

# Partition methods
email = "user@example.com"
print(f"\nEmail: {email}")
print(f"partition('@'): {email.partition('@')}")
print(f"rpartition('.'): {email.rpartition('.')}")

In [None]:
# Join methods
words = ['Python', 'is', 'awesome']
fruits = ['apple', 'banana', 'cherry']
numbers = [1, 2, 3, 4, 5]

print(f"Words list: {words}")
print(f"' '.join(words): '{' '.join(words)}'")
print(f"'-'.join(words): '{'-'.join(words)}'")
print(f"''.join(words): '{'-'.join(words)}'")

print(f"\nFruits: {fruits}")
print(f"', '.join(fruits): '{', '.join(fruits)}'")
print(f"' and '.join(fruits): '{' and '.join(fruits)}'")

# Join numbers (need to convert to strings first)
print(f"\nNumbers: {numbers}")
number_strings = [str(n) for n in numbers]
print(f"'-'.join(map(str, numbers)): '{'-'.join(map(str, numbers))}'")

## 7. String Validation Methods

Methods to check string content and characteristics:

In [None]:
# String validation methods
test_strings = [
    '123',           # digits
    'abc',           # alpha
    'abc123',        # alphanumeric
    'ABC',           # uppercase
    'abc',           # lowercase
    'Hello World',   # title case
    '   ',           # whitespace
    'hello_world',   # identifier
    '123abc',        # mixed
    '',              # empty
]

print(f"String validation methods:")
print(f"{'String':<12} {'isdigit':<8} {'isalpha':<8} {'isalnum':<8} {'isspace':<8} {'isidentifier':<12}")
print("-" * 60)

for s in test_strings:
    display_s = repr(s) if len(s) < 8 else repr(s[:8] + '...')
    print(f"{display_s:<12} {str(s.isdigit()):<8} {str(s.isalpha()):<8} {str(s.isalnum()):<8} {str(s.isspace()):<8} {str(s.isidentifier()):<12}")

In [None]:
# More validation methods
test_cases = {
    'isprintable': ['hello', 'hello\n', 'hello\tworld', '🐍'],
    'isdecimal': ['123', '123.45', '½', '²'],
    'isnumeric': ['123', '½', '²', 'Ⅴ', 'hello'],
    'isascii': ['hello', 'café', '🐍', 'naïve']
}

for method, strings in test_cases.items():
    print(f"\n{method}() examples:")
    for s in strings:
        result = getattr(s, method)()
        print(f"  {repr(s):<15} -> {result}")

## 8. String Formatting - Multiple Techniques

Python offers several ways to format strings:

In [None]:
# Variables for formatting examples
name = "Alice"
age = 25
height = 5.75
balance = 1234.56
is_student = True

print("String Formatting Techniques:")
print("=" * 40)

# 1. f-strings (Python 3.6+) - Recommended
print("\n1. f-strings (recommended):")
print(f"Name: {name}")
print(f"Age: {age} years old")
print(f"Height: {height} feet")
print(f"Balance: ${balance:.2f}")
print(f"Student status: {is_student}")
print(f"Summary: {name} is {age} years old and {'is' if is_student else 'is not'} a student")

In [None]:
# 2. .format() method
print("\n2. .format() method:")
print("Name: {}".format(name))
print("Name: {0}, Age: {1}".format(name, age))
print("Age: {1}, Name: {0}".format(name, age))  # Different order
print("Name: {name}, Age: {age}".format(name=name, age=age))
print("Balance: ${:.2f}".format(balance))
print("Height: {:.1f} feet".format(height))

In [None]:
# 3. % formatting (old style, but still useful)
print("\n3. % formatting (old style):")
print("Name: %s" % name)
print("Name: %s, Age: %d" % (name, age))
print("Balance: $%.2f" % balance)
print("Height: %.1f feet" % height)
print("Student: %s" % is_student)

## 9. Advanced String Formatting

Detailed formatting options and alignment:

In [None]:
# Advanced f-string formatting
name = "Bob"
score = 95.67
count = 1234
percentage = 0.875

print("Advanced f-string formatting:")
print(f"\nAlignment and width:")
print(f"Left aligned: '{name:<20}'")
print(f"Right aligned: '{name:>20}'")
print(f"Center aligned: '{name:^20}'")
print(f"Center with fill: '{name:*^20}'")

print(f"\nNumber formatting:")
print(f"Score with 2 decimals: {score:.2f}")
print(f"Score with comma separator: {count:,}")
print(f"Percentage: {percentage:.1%}")
print(f"Scientific notation: {count:.2e}")
print(f"Binary: {count:b}")
print(f"Hexadecimal: {count:x}")
print(f"Octal: {count:o}")

In [None]:
# Padding and zero-filling
numbers = [5, 42, 123, 1000]
print("Number padding examples:")
for num in numbers:
    print(f"Number: {num:5d} | Zero-padded: {num:05d} | Right-aligned: {num:>8}")

# Date and time formatting
from datetime import datetime
now = datetime.now()

print(f"\nDate and time formatting:")
print(f"Default: {now}")
print(f"Date only: {now:%Y-%m-%d}")
print(f"Time only: {now:%H:%M:%S}")
print(f"Full format: {now:%A, %B %d, %Y at %I:%M %p}")

## 10. String Encoding and Decoding

Working with different text encodings:

In [None]:
# String encoding and decoding
text = "Hello, 世界! 🌍"
print(f"Original text: {text}")
print(f"Type: {type(text)}")

# Encode to bytes
utf8_bytes = text.encode('utf-8')
ascii_bytes = text.encode('ascii', errors='ignore')  # Ignore non-ASCII
latin1_bytes = text.encode('latin-1', errors='replace')  # Replace with ?

print(f"\nEncoded to bytes:")
print(f"UTF-8: {utf8_bytes}")
print(f"ASCII (ignored): {ascii_bytes}")
print(f"Latin-1 (replaced): {latin1_bytes}")

# Decode back to string
decoded_utf8 = utf8_bytes.decode('utf-8')
decoded_ascii = ascii_bytes.decode('ascii')

print(f"\nDecoded back:")
print(f"From UTF-8: {decoded_utf8}")
print(f"From ASCII: {decoded_ascii}")

## 11. Regular Expressions Basics

Introduction to pattern matching with regex:

In [None]:
import re

# Basic regex patterns
text = "My phone number is 123-456-7890 and my email is user@example.com"
print(f"Text: {text}")

# Find phone number
phone_pattern = r'\d{3}-\d{3}-\d{4}'
phone_match = re.search(phone_pattern, text)
if phone_match:
    print(f"Phone found: {phone_match.group()}")

# Find email
email_pattern = r'\w+@\w+\.\w+'
email_match = re.search(email_pattern, text)
if email_match:
    print(f"Email found: {email_match.group()}")

# Find all numbers
numbers = re.findall(r'\d+', text)
print(f"All numbers: {numbers}")

In [None]:
# Practical regex examples
test_strings = [
    "Valid email: user@domain.com",
    "Invalid email: userdomaincom",
    "Phone: (555) 123-4567",
    "Phone: 555.123.4567",
    "Phone: 5551234567",
]

# Email validation pattern
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

# Phone validation pattern (flexible)
phone_pattern = r'\(?\d{3}\)?[-.)\s]?\d{3}[-.)\s]?\d{4}'

print("Pattern matching examples:")
for text in test_strings:
    print(f"\nText: {text}")
    
    email_found = re.search(email_pattern, text)
    phone_found = re.search(phone_pattern, text)
    
    if email_found:
        print(f"  Email found: {email_found.group()}")
    if phone_found:
        print(f"  Phone found: {phone_found.group()}")
    if not email_found and not phone_found:
        print("  No patterns found")

## 12. String Performance and Best Practices

Tips for efficient string operations:

In [None]:
import time

# String concatenation performance comparison
def test_concatenation(n=1000):
    """Compare different string concatenation methods"""
    
    # Method 1: += operator (inefficient for many operations)
    start = time.time()
    result1 = ""
    for i in range(n):
        result1 += f"item{i} "
    time1 = time.time() - start
    
    # Method 2: join() method (efficient)
    start = time.time()
    items = [f"item{i} " for i in range(n)]
    result2 = "".join(items)
    time2 = time.time() - start
    
    # Method 3: f-string with join (most readable)
    start = time.time()
    result3 = " ".join(f"item{i}" for i in range(n)) + " "
    time3 = time.time() - start
    
    print(f"Concatenation performance test ({n} items):")
    print(f"  += operator: {time1:.4f} seconds")
    print(f"  join() method: {time2:.4f} seconds")
    print(f"  f-string + join: {time3:.4f} seconds")
    print(f"  Results equal: {result1 == result2 == result3}")

test_concatenation(1000)

In [None]:
# String best practices
print("String Best Practices:")
print("=" * 30)

# 1. Use f-strings for readability
name, age = "Alice", 25
# Good
good_format = f"Hello, {name}! You are {age} years old."
# Less readable
old_format = "Hello, {}! You are {} years old.".format(name, age)
print(f"1. F-strings are more readable:")
print(f"   Good: {good_format}")
print(f"   Okay: {old_format}")

# 2. Use join() for multiple concatenations
words = ['Python', 'is', 'awesome', 'and', 'powerful']
# Good
good_join = " ".join(words)
# Inefficient
bad_concat = ""
for word in words:
    bad_concat += word + " "
bad_concat = bad_concat.strip()

print(f"\n2. Use join() for multiple strings:")
print(f"   Good: {good_join}")
print(f"   Works but inefficient: {bad_concat}")

# 3. Use raw strings for regex patterns
import re
# Good - raw string
pattern_good = r'\d+\.\d+'
# Bad - need to escape backslashes
pattern_bad = '\\d+\\.\\d+'
print(f"\n3. Raw strings for regex:")
print(f"   Good: {repr(pattern_good)}")
print(f"   Bad: {repr(pattern_bad)}")
print(f"   Both work: {pattern_good == pattern_bad}")

## 13. Practice Exercises

Let's practice string operations:

In [None]:
# Exercise 1: Text analyzer
def analyze_text(text):
    """Analyze text and return statistics"""
    words = text.split()
    sentences = text.split('.')
    
    stats = {
        'characters': len(text),
        'characters_no_spaces': len(text.replace(' ', '')),
        'words': len(words),
        'sentences': len([s for s in sentences if s.strip()]),
        'avg_word_length': sum(len(word.strip('.,!?;:')) for word in words) / len(words) if words else 0,
        'longest_word': max(words, key=len) if words else '',
        'most_common_char': max(text.lower(), key=text.lower().count) if text else ''
    }
    return stats

sample_text = "Python is a powerful programming language. It is easy to learn and versatile."
analysis = analyze_text(sample_text)

print(f"Text Analysis for: '{sample_text}'")
print("-" * 50)
for key, value in analysis.items():
    if isinstance(value, float):
        print(f"{key.replace('_', ' ').title()}: {value:.2f}")
    else:
        print(f"{key.replace('_', ' ').title()}: {value}")

In [None]:
# Exercise 2: String validator
def validate_input(text, rules):
    """Validate text against multiple rules"""
    results = {}
    
    for rule_name, rule_func in rules.items():
        try:
            results[rule_name] = rule_func(text)
        except Exception as e:
            results[rule_name] = f"Error: {e}"
    
    return results

# Define validation rules
validation_rules = {
    'not_empty': lambda x: len(x.strip()) > 0,
    'min_length': lambda x: len(x) >= 3,
    'max_length': lambda x: len(x) <= 50,
    'has_letter': lambda x: any(c.isalpha() for c in x),
    'has_digit': lambda x: any(c.isdigit() for c in x),
    'no_special_chars': lambda x: x.replace(' ', '').isalnum(),
    'starts_with_letter': lambda x: x[0].isalpha() if x else False,
}

# Test different inputs
test_inputs = [
    "Hello123",
    "Hi",
    "   ",
    "123456",
    "Hello@World",
    "9StartWithNumber"
]

print("Input Validation Results:")
print("=" * 60)
for test_input in test_inputs:
    print(f"\nInput: '{test_input}'")
    results = validate_input(test_input, validation_rules)
    for rule, result in results.items():
        status = "✓" if result else "✗"
        print(f"  {status} {rule.replace('_', ' ').title()}: {result}")

In [None]:
# Exercise 3: Text formatter
def format_text_table(data, headers):
    """Format data as a text table"""
    # Calculate column widths
    col_widths = []
    for i, header in enumerate(headers):
        max_width = len(header)
        for row in data:
            max_width = max(max_width, len(str(row[i])))
        col_widths.append(max_width + 2)  # Add padding
    
    # Create table
    separator = '+' + '+'.join('-' * width for width in col_widths) + '+'
    
    # Header
    table = [separator]
    header_row = '|' + '|'.join(f' {header:<{col_widths[i]-1}}' for i, header in enumerate(headers)) + '|'
    table.append(header_row)
    table.append(separator)
    
    # Data rows
    for row in data:
        data_row = '|' + '|'.join(f' {str(row[i]):<{col_widths[i]-1}}' for i in range(len(row))) + '|'
        table.append(data_row)
    
    table.append(separator)
    return '\n'.join(table)

# Test the table formatter
student_data = [
    ['Alice', 20, 'Computer Science', 3.8],
    ['Bob', 19, 'Mathematics', 3.6],
    ['Charlie', 21, 'Physics', 3.9],
    ['Diana', 22, 'Chemistry', 3.7]
]

headers = ['Name', 'Age', 'Major', 'GPA']
formatted_table = format_text_table(student_data, headers)

print("Formatted Student Table:")
print(formatted_table)

## Summary

In this notebook, you learned about:

✅ **String Creation**: Different ways to create and represent strings  
✅ **Indexing & Slicing**: Accessing parts of strings  
✅ **String Methods**: Case, whitespace, search, replace, split, join  
✅ **String Validation**: Methods to check string content  
✅ **String Formatting**: f-strings, .format(), % formatting  
✅ **Encoding/Decoding**: Working with different text encodings  
✅ **Regular Expressions**: Basic pattern matching  
✅ **Performance**: Best practices for string operations  

### Key Takeaways:
1. Strings are immutable - operations return new strings
2. Use f-strings for readable string formatting
3. Use join() for efficient string concatenation
4. Raw strings (r"") are useful for regex patterns
5. Python handles Unicode seamlessly
6. Many string methods return boolean values for validation

### Next Topic: 03_numbers_and_math.ipynb
Learn about numeric types, mathematical operations, and the math module.