# 49. Group Anagrams

## Topic Alignment
- **Role Relevance**: Clustering tokens by normalized form mirrors feature bucketing for text preprocessing in ML pipelines.
- **Scenario**: Helps deduplicate query terms or surface candidate synonyms during data cleaning stages.

## Metadata Summary
- Source: [LeetCode - Group Anagrams](https://leetcode.com/problems/group-anagrams/)
- Tags: `Array`, `Hash Table`, `String`
- Difficulty: Medium
- Recommended Priority: High

## Problem Statement
Given an array of strings `strs`, group the anagrams together. You can return the answer in any order.

An anagram is a word or phrase formed by rearranging the letters of another word or phrase, using all the original letters exactly once.

## Progressive Hints
- Hint 1: Sorting each word provides a canonical representation for grouping.
- Hint 2: Alternatively, use character frequency counts as the hash key to avoid sorting costs.
- Hint 3: Store groups in a dictionary keyed by the canonical form and append words as you traverse the list.

## Solution Overview
Transform each word into a canonical signature that represents its letter composition (either sorted string or frequency tuple), then group words by that signature using a hash map.

## Detailed Explanation
1. Initialize a dictionary mapping from signature to the list of words.
2. For each string, compute a 26-length count tuple describing letter frequencies.
3. Use the tuple as the key; append the original string to the map entry.
4. After processing all words, return the dictionary values as grouped anagrams.

The frequency tuple approach avoids repeated string sorting and keeps the key generation O(L).

## Complexity Trade-off Table
| Approach | Time Complexity | Space Complexity | Notes |
| --- | --- | --- | --- |
| Sort each string | O(n * L log L) | O(n * L) | Simple but sorting costs dominate for long strings. |
| Frequency tuple key | O(n * L) | O(n * L) | Optimal for lowercase alphabet; trades memory for speed. |

In [None]:
from collections import defaultdict
from typing import List


def groupAnagrams(strs: List[str]) -> List[List[str]]:
    """Group words that are anagrams using character frequency signatures."""
    groups = defaultdict(list)
    for word in strs:
        counts = [0] * 26
        for char in word:
            counts[ord(char) - ord('a')] += 1  # Count each letter.
        signature = tuple(counts)  # Use tuple as hashable key.
        groups[signature].append(word)
    return list(groups.values())


## Complexity Analysis
- Time Complexity: `O(n * L)` where `n` is the number of words and `L` is the average word length.
- Space Complexity: `O(n * L)` to store grouped words and their signatures.
- Bottleneck: Building and hashing the frequency tuples dominates runtime.

## Edge Cases & Pitfalls
- Handle empty strings; they form a valid anagram group.
- Ensure the signature accounts for all 26 letters even if absent.
- Beware of mutating lists used as keys; tuples are required for hashing.

## Follow-up Variants
- Support Unicode characters by expanding the signature dimension dynamically.
- Provide counts of groups rather than the grouped words themselves for analytics.
- Stream inputs and emit group IDs on the fly using consistent hashing.

## Takeaways
- Canonical representations let hash tables cluster equivalent data efficiently.
- Frequency vectors are a common trick in NLP preprocessing for ML pipelines.
- Separating signature generation from grouping keeps the solution modular.

## Similar Problems
| Problem ID | Problem Title | Technique |
| --- | --- | --- |
| 242 | Valid Anagram | Frequency counting |
| 451 | Sort Characters By Frequency | Hash map frequencies with sorting |
| 187 | Repeated DNA Sequences | Hashing fixed-length strings |