# 30. Substring with Concatenation of All Words

## Topic Alignment
- **Role Relevance**: Mirrors validating that a window contains all tokens from a feature bundle.
- **Scenario**: Useful for log scanning where every required keyword must appear exactly once in arbitrary order.

## Metadata Summary
- Source: [LeetCode - Substring with Concatenation of All Words](https://leetcode.com/problems/substring-with-concatenation-of-all-words/)
- Tags: `String`, `Sliding Window`, `Hash Table`
- Difficulty: Hard
- Recommended Priority: High

## Problem Statement
You are given a string `s` and an array of strings `words`. All the strings in `words` are of the same length. Return all starting indices of substrings in `s` that are a concatenation of each word in `words` exactly once in any order, without any intervening characters.

## Constraints
- `1 <= s.length <= 10^4`
- `1 <= words.length <= 5000`
- `1 <= words[i].length <= 30`
- `s` and `words[i]` consist of lowercase English letters

## Progressive Hints
- Hint 1: Each valid window has total length `len(words) * word_len`.
- Hint 2: Use a frequency map for `words` and scan `s` with multiple offsets modulo `word_len`.
- Hint 3: Maintain counts of words in the current window and shrink when any word appears too many times.

## Solution Overview
Slide a window in increments of `word_len` for each possible starting offset. Use a hash map to count how many instances of each word appear in the window and adjust the window boundaries to maintain valid counts.

## Detailed Explanation
1. Build `required` as a frequency map of `words`. Let `word_len` be the length of each word and `window_len = word_len * len(words)`.
2. For each offset in `[0, word_len)`, use two pointers `left` and `right` that move in steps of `word_len`.
3. Extract the word at `right`, update its count in `seen`, and increment a `matched` counter when the word count matches the requirement.
4. While any word exceeds its required count, move `left` forward by `word_len`, decrementing counts and adjusting `matched`.
5. When `matched` equals the number of distinct words and the window size equals `window_len`, record `left` as a valid start.
6. Continue scanning all offsets.

## Complexity Trade-off Table
| Approach | Time Complexity | Space Complexity | Notes |
| --- | --- | --- | --- |
| Check every substring naively | O(n * words_len) | O(words_len) | Excessive repetition of work. |
| Sliding window by word length | O(n) | O(words_len) | Efficient; leverages hash maps for word counts. |

In [None]:
from collections import Counter
from typing import List


def findSubstring(s: str, words: List[str]) -> List[int]:
    """Return starting indices where words concatenate exactly once each."""
    if not s or not words:
        return []

    word_len = len(words[0])
    total_words = len(words)
    window_len = word_len * total_words
    required = Counter(words)
    result: List[int] = []

    for offset in range(word_len):
        left = offset
        seen = Counter()
        matched = 0  # Number of words matching required frequency.

        for right in range(offset, len(s) - word_len + 1, word_len):
            word = s[right:right + word_len]
            if word in required:
                seen[word] += 1
                if seen[word] == required[word]:
                    matched += 1
                while seen[word] > required[word]:
                    left_word = s[left:left + word_len]
                    if seen[left_word] == required[left_word]:
                        matched -= 1
                    seen[left_word] -= 1
                    left += word_len
                if matched == len(required) and right - left + word_len == window_len:
                    result.append(left)
            else:
                seen.clear()
                matched = 0
                left = right + word_len
    return result


## Complexity Analysis
- Time Complexity: `O(n)` where `n` is the length of `s`, since each word-sized chunk is visited a constant number of times.
- Space Complexity: `O(m)` where `m` is the number of distinct words.
- Bottleneck: Counter updates for each word chunk; manageable due to fixed word length.

## Edge Cases & Pitfalls
- If `s` is shorter than the concatenation length, return an empty list.
- Words may repeat in `words`; ensure counts are respected.
- Resets are required whenever a non-matching word appears.

## Follow-up Variants
- Allow wildcards within words by normalizing tokens before counting.
- Return the substrings themselves for debugging pipelines.
- Handle variable word lengths by tokenizing `s` first (increases complexity).

## Takeaways
- Sliding windows with step sizes aligned to token boundaries avoid redundant work.
- Hash maps balance inclusions and excess counts succinctly.
- The technique reflects log parsing tasks where orderless token bundles must be detected.

## Similar Problems
| Problem ID | Problem Title | Technique |
| --- | --- | --- |
| 438 | Find All Anagrams in a String | Fixed-size window counts |
| 567 | Permutation in String | Sliding window permutation detection |
| 76 | Minimum Window Substring | Variable window covering |