## Problem Statement

Given a string `paragraph` and a list of strings `banned`, return the **most frequent word** in the paragraph that is **not banned**.

It is guaranteed that:
- At least one word is not banned.
- The answer is unique.

The comparison is **case-insensitive**, and the returned word must be in **lowercase**.
All punctuation should be ignored.



## Example 1

**Input:**
```text
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
```
**Output:**

"ball"

**Explanation:**

- "hit" appears 3 times, but it is banned.

- "ball" appears 2 times, which is the highest frequency among non-banned words.

- Words are case-insensitive.

- Punctuation is ignored (e.g., "ball," becomes "ball").

## Example 2

**Input:**
```text
paragraph = "a."
banned = []
```
**Output:**

"a"

## Constraints

- 1 ≤ paragraph.length ≤ 1000

- paragraph consists of English letters, spaces ' ', or one of the symbols: "!?',;."

- 0 ≤ banned.length ≤ 100

- 1 ≤ banned[i].length ≤ 10

- banned[i] consists only of lowercase English letters

## Approach

To find the most frequent word in a paragraph that is not in the banned list, we normalize the text, extract valid words, count their frequencies, and track the word with the highest occurrence.

### Key Idea

- The paragraph is **case-insensitive**, so convert it to lowercase.
- Punctuation should be ignored, so extract only alphabetic words.
- Use a **set** for banned words to allow fast lookups.
- Use a **dictionary** to count word frequencies and track the maximum as we iterate.



### Algorithm

1. Convert the list of banned words into a set for fast membership checks.
2. Convert the paragraph to lowercase and extract all valid words using a regular expression (`[a-z]+`).
3. Initialize:
   - A dictionary `freq` to store the frequency of each word.
   - Variables `bestWord` and `bestCount` to track the most frequent non-banned word.
4. Iterate through each word:
   - If the word is in the banned set, skip it.
   - Otherwise, increment its frequency in the dictionary.
   - If the updated frequency is greater than the current maximum, update `bestWord` and `bestCount`.
5. After processing all words, return `bestWord`.



### Correctness

- Lowercasing ensures case-insensitive comparison.
- The regular expression removes punctuation and extracts only valid words.
- The frequency dictionary correctly counts how many times each word appears.
- Skipping banned words ensures they are not considered.
- Tracking the maximum during counting guarantees that the most frequent non-banned word is returned.



### Time Complexity

- Extracting words takes **O(n)** time, where `n` is the length of the paragraph.
- Counting words and tracking the maximum also takes **O(n)** time.
- Overall time complexity: **O(n)**.



### Space Complexity

- The frequency dictionary stores up to all unique words.
- Space complexity: **O(n)** in the worst case.


In [None]:
class Solution(object):
    def mostCommonWord(self, paragraph, banned):
        """
        :type paragraph: str
        :type banned: List[str]
        :rtype: str
        """
        ban = set(banned)
        words = re.findall(r"[a-z]+", paragraph.lower())

        freq = {}
        bestWord = ""
        bestCount = 0

        for w in words:
            if w in ban:
                continue
            freq[w] = freq.get(w, 0) + 1
            if freq[w] > bestCount:
                bestWord = w
                bestCount = freq[w]
        return bestWord

    # similar more concise solution:
        ban = set(banned)
        words = re.findall(r"[a-z]+", paragraph.lower())

        freq = {}

        for w in words:
            if w in ban:
                continue
            freq[w] = freq.get(w, 0) +1
        return max(freq, key=freq.get)


## Rubber Duck Explanation

First, I turn all the text into lowercase so that `"Ball"` and `"ball"` are treated the same.
Then I extract only real words and ignore punctuation like commas and dots.

Next, I put all banned words into a set so I can quickly check if a word is forbidden.

I go through each word one by one:
- If the word is banned, I skip it.
- Otherwise, I count how many times it appears.
- Every time I update a count, I check if this word is now the most frequent one so far.

By the end, I have tracked which word appeared the most times among the non-banned words, and I return that word as the answer.

