# String
---
---

## What Are Strings?

A string is a sequence of characters (e.g., letters, digits, symbols) stored in memory, often used to represent text. In Python, strings are implemented as a sequence of Unicode characters, making them versatile for handling various languages and symbols.

### Key characteristics:

- Sequence: Strings are ordered, so characters can be accessed by index (e.g., s[0] for the first character).
- Contiguous Memory: In Python, strings are stored as a contiguous block of memory (like arrays), enabling O(1) access to characters by index.
- Length: The number of characters in a string (accessed via len(s)).

### Immutability in Python

In Python, strings are immutable, meaning they cannot be modified after creation. Once a string is created, its contents cannot be changed in-place—you must create a new string to reflect changes.

- **Why Immutable?**

    - Safety: Prevents accidental modifications, making strings reliable for keys in dictionaries or other immutable-dependent structures.
    - Memory Efficiency: Immutable strings can be reused (interned) by Python, saving memory for identical strings.
    - Thread Safety: Immutability simplifies concurrent programming, as strings can’t change unexpectedly.

- What Immutability Means:

    - Operations like concatenation, slicing, or replacing characters create new strings rather than modifying the original.
    - **Example**

In [2]:
s = "hello"
# s[0] = "H"  # Error: TypeError: 'str' object does not support item assignment
s = "H" + s[1:]  # Creates new string: "Hello"
print(s)  # Output: Hello

Hello


    - Here, s[0] = "H" fails because strings are immutable. Instead, we create a new string.

- Memory Implications:

    - Each modification (e.g., concatenation) allocates a new contiguous block of memory for the new string.
    - Frequent modifications (e.g., in a loop) can be inefficient due to repeated memory allocation.
    - Example

In [4]:
s = ""
for i in range(1000):
    s += "a"  # Creates a new string each time, O(n²) time for n=1000
print(s)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

    - Better approach: Use a list for modifications, then join:

In [6]:
chars = []
for i in range(1000):
    chars.append("a")  # O(1) per append (amortized)
s = "".join(chars)  # O(n) to join
s

'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

- Python-Specific Details:

    - Strings are objects of the str class, stored as Unicode (UTF-8 internally in Python 3).
    - Accessing a character (s[i]) is O(1) due to contiguous storage, like arrays.
    - Strings are stored in a contiguous block, but immutability means operations like replace or upper return new strings.


- Connection to DSA:

    - Immutability affects algorithm design. For example, string manipulations often require extra space (O(n) for a new string of length n).
    - Algorithms like substring search or pattern matching (e.g., KMP) treat strings as read-only sequences.
    - In DSA problems, strings are often processed as arrays of characters, leveraging array-like indexing.

In [7]:
s = "hello"
print(s[0])  # O(1): Output: h
s = s.upper()  # Creates new string: HELLO
print(s)  # Output: HELLO
id_before = id("hello")
s = "hello" + " world"  # New string
print(id(s) != id_before)  # True: Different memory address

h
HELLO
True


---
---
## Task 2: Common String Operation

Strings support a variety of operations critical for DSA problems, such as concatenation, slicing, and more. Below are common operations, their Python implementations, and complexities.

### 1. Concatenation

- Description: Combining two or more strings into one.

- Python: Use + or join().

- Example:

In [1]:
s1 = "hello"
s2 = "world"
s3 = s1 + " " + s2  # New string: "hello world"
print(s3)  # Output: hello world
# Better for multiple strings:
s4 = "".join([s1, " ", s2])  # Same result

hello world


- **Time Complexity:**

    - **+:** O(n), where n is the total length of the resulting string (due to copying into a new memory block).

    - **join():** O(n), where n is the total length of all strings. More efficient for multiple concatenations.

    - Looping with **+** (e.g., **s += "a"** in a loop) is O(n²) for n iterations, as each creates a new string.


- **Space Complexity:** O(n) auxiliary for the new string.

- **DSA Note:** Avoid repeated concatenation in loops; use join() or lists for efficiency.

### 2. Slicing

- Description: Extracting a substring using start:end:step indices.

- Python: s[start:end:step].

- Example:

In [2]:
s = "hello world"
print(s[0:5])  # Output: hello
print(s[::-1])  # Output: dlrow olleh (reverse)

hello
dlrow olleh


- **Time Complexity:** O(k), where k is the length of the slice (due to copying characters to a new string).

- **Space Complexity:** O(k) auxiliary for the new substring.

- **DSA Note:** Slicing is common in problems like palindrome checks or substring searches.

### 3. Length

- Description: Get the number of characters.
- Python: len(s).
- Example:

In [4]:
s = "hello"
print(len(s))  # Output: 5

5


- Time Complexity: O(1). Python stores the length internally.
- Space Complexity: O(1) auxiliary.
- DSA Note: Used in all string algorithms to determine bounds.

### 4. Character Access

- Description: Access a character at a specific index.
- Python: s[i].
- Example:



In [7]:
pythons = "hello"
print(s[2])  # Output: l

l


- Time Complexity: O(1). Contiguous memory allows direct address calculation.
- Space Complexity: O(1) auxiliary.
- DSA Note: Like array access, critical for algorithms like pattern matching.

### 5. Replace

- Description: Replace occurrences of a substring with another.
- Python: s.replace(old, new).
- Example:

In [9]:
pythons = "hello world"
print(s.replace("world", "Python"))  # Output: hello Python

hello


- Time Complexity: O(n) for scanning and creating a new string (n is string length).
- Space Complexity: O(n) auxiliary for the new string.
- DSA Note: Useful in text processing but costly for frequent changes due to immutability.

### 6. Splitting and Joining

- Description: Split a string into a list of substrings; join a list into a string.
- Python: s.split(delimiter) and "delimiter".join(list).
- Example:

In [11]:
pythons = "a,b,c"
words = s.split(",")  # Output: ['a', 'b', 'c']
new_s = "-".join(words)  # Output: a-b-c

- Time Complexity:
    - split: O(n) to scan and create substrings.
    - join: O(n) for total length of strings.


- Space Complexity: O(n) auxiliary for the list or new string.
- DSA Note: Common in parsing or tokenizing input for algorithms.

---
---

## Task 3: Substring Search Basics
Substring search involves finding a smaller string (the pattern) within a larger string (the text) or determining if it exists. This is a core DSA problem with applications in text processing, search engines, and bioinformatics.

### Key Concepts

- Problem: Given a text T (length n) and pattern P (length m), find the starting index of P in T (or all occurrences) or return -1 if not found.
- Naive Approach: Check each position in the text to see if the pattern matches.
- Advanced Algorithms: KMP, Rabin-Karp, or Boyer-Moore improve efficiency for large texts.

### Naive Substring Search

- How It Works:

- Iterate through each possible starting position in the text (0 to n-m).
- For each position, check if the pattern matches by comparing characters.


### Python Implementation:

In [15]:
def naive_substring_search(text, pattern):
    n, m = len(text), len(pattern)
    if m > n:
        return -1
    for i in range(n - m + 1):
        j = 0
        while j < m and text[i + j] == pattern[j]:
            j += 1
        if j == m:
            return i
    return -1


text = "hello world"
pattern = 'world'
print(naive_substring_search(text, pattern))

6


- Time Complexity: O(n * m).
    - For each of the n-m+1 positions, compare up to m characters.
    - Worst case: Every position requires checking all m characters (e.g., text="aaaa", pattern="aaa").

- Space Complexity: O(1) auxiliary. Only uses a few variables.

- Why Inefficient?: Repeated comparisons and no reuse of prior matches.

### Better Approach: KMP Algorithm (Introduction)
The Knuth-Morris-Pratt (KMP) algorithm improves efficiency by avoiding redundant comparisons using a prefix table (or failure function) to skip positions.

- How It Works:

    - Preprocess the pattern to create a table indicating how many characters can be skipped when a mismatch occurs.
    - Use the table to slide the pattern efficiently during the search.


- Python Implementation (Simplified):

In [16]:
def build_kmp_table(pattern):
    m = len(pattern)
    table = [0] * m
    j = 0
    for i in range(1, m):
        while j > 0 and pattern[i] != pattern[j]:
            j = table[j - 1]
        if pattern[i] == pattern[j]:
            j += 1
        table[i] = j
    return table

def kmp_search(text, pattern):
    n, m = len(text), len(pattern)
    if m == 0:
        return 0
    table = build_kmp_table(pattern)
    j = 0  # Pattern index
    for i in range(n):
        while j > 0 and text[i] != pattern[j]:
            j = table[j - 1]
        if text[i] == pattern[j]:
            j += 1
        if j == m:
            return i - m + 1  # Found at index i-m+1
    return -1

# Example
text = "hello world"
pattern = "world"
print(kmp_search(text, pattern))  # Output: 6

6


- Time Complexity:

    - Table construction: O(m).
    - Search: O(n).
    - Total: O(n + m), much better than O(n * m).

- Space Complexity: O(m) auxiliary for the prefix table.
- Why Efficient?: The table allows skipping redundant comparisons by leveraging pattern structure.

Connection to DSA and Arrays

- Strings as Arrays: Strings are treated like arrays of characters in substring search, with O(1) access to text[i] or pattern[j] due to contiguous memory.
- Sliding Window: Substring search is related to sliding window techniques (Task 3 from Day 2). The naive method slides a window of size m over the text; KMP optimizes this sliding.
- Immutability: String immutability ensures the text and pattern remain unchanged, simplifying algorithms but requiring new strings for modifications (e.g., preprocessing).