# 3085. Minimum Deletions to Make String K-Special

# Medium

You are given a string word and an integer k.

We consider word to be k-special if |freq(word[i]) - freq(word[j])| <= k for all indices i and j in the string.

Here, freq(x) denotes the frequency of the character x in word, and |y| denotes the absolute value of y.

Return the minimum number of characters you need to delete to make word k-special.

# Example 1:

```
Input: word = "aabcaba", k = 0

Output: 3

Explanation: We can make word 0-special by deleting 2 occurrences of "a" and 1 occurrence of "c". Therefore, word becomes equal to "baba" where freq('a') == freq('b') == 2.
```

# Example 2:

```
Input: word = "dabdcbdcdcd", k = 2

Output: 2

Explanation: We can make word 2-special by deleting 1 occurrence of "a" and 1 occurrence of "d". Therefore, word becomes equal to "bdcbdcdcd" where freq('b') == 2, freq('c') == 3, and freq('d') == 4.
```

# Example 3:

```
Input: word = "aaabaaa", k = 2

Output: 1

Explanation: We can make word 2-special by deleting 1 occurrence of "b". Therefore, word becomes equal to "aaaaaa" where each letter's frequency is now uniformly 6.
```

# Constraints:

- 1 <= word.length <= 105
- 0 <= k <= 105
- word consists only of lowercase English letters.


In [None]:
from collections import Counter

def min_deletions_to_k_special(word: str, k: int) -> int:
    freq = Counter(word)
    freq_values = sorted(freq.values())
    n = len(freq_values)
    min_deletions = float('inf')

    prefix_sum = [0]
    for val in freq_values:
        prefix_sum.append(prefix_sum[-1] + val)

    for i in range(n):
        target_freq = freq_values[i]
        # we want all values > target_freq + k to be reduced
        # so we find the first value greater than target_freq + k
        j = i
        while j < n and freq_values[j] <= target_freq + k:
            j += 1

        deletions_left = prefix_sum[i]  # all smaller than current
        deletions_right = prefix_sum[n] - prefix_sum[j] - (target_freq + k) * (n - j)
        total_deletions = deletions_left + max(0, deletions_right)
        min_deletions = min(min_deletions, total_deletions)

    return min_deletions

print(min_deletions_to_k_special("aabcaba", 0))      # Output: 3
print(min_deletions_to_k_special("dabdcbdcdcd", 2))  # Output: 2
print(min_deletions_to_k_special("aaabaaa", 2))      # Output: 1

# Edge cases
print(min_deletions_to_k_special("a", 0))            # Output: 0 (Only one character)
print(min_deletions_to_k_special("abcde", 4))        # Output: 0 (All have freq 1, diff is 0-0=0 ≤ k)
print(min_deletions_to_k_special("aaaaa", 0))        # Output: 0 (All frequencies are equal)
print(min_deletions_to_k_special("aabbccddeeff", 1)) # Output: 0 (All frequencies equal)
print(min_deletions_to_k_special("zzxyyx", 0))       # Output: 2 (Need to equalize to same freq)

In [None]:
from collections import Counter

class Solution:
    def minimumDeletions(self, word: str, k: int) -> int:
        freq = Counter(word)
        min_deletions = float('inf')

        for base in set(freq.values()):
            deletions = 0
            for f in freq.values():
                if f < base:
                    deletions += f
                elif f > base + k:
                    deletions += f - (base + k)
            min_deletions = min(min_deletions, deletions)

        return min_deletions
sol = Solution()

# Provided examples
print(sol.minimumDeletions("aabcaba", 0))        # ➤ 3
print(sol.minimumDeletions("dabdcbdcdcd", 2))    # ➤ 2
print(sol.minimumDeletions("aaabaaa", 2))        # ➤ 1

# Edge cases
print(sol.minimumDeletions("a", 0))              # ➤ 0 (Single character)
print(sol.minimumDeletions("abc", 2))            # ➤ 0 (All frequencies 1, diff ≤ k)
print(sol.minimumDeletions("aaaaa", 0))          # ➤ 0 (Uniform frequency)
print(sol.minimumDeletions("aabbcc", 0))         # ➤ 0 (All frequencies equal)
print(sol.minimumDeletions("aabbcc", 1))         # ➤ 0 (Still within k)
print(sol.minimumDeletions("aabbccc", 0))        # ➤ 1 (Need to delete one 'c')
print(sol.minimumDeletions("zzxyyx", 0))         # ➤ 2 (Make all frequencies equal)
print(sol.minimumDeletions("abcabcabcabc", 0))   # ➤ 0 (Perfectly balanced)
print(sol.minimumDeletions("abcabcabcabc", 1))   # ➤ 0 (Still balanced within k)

The problem asks for the minimum number of character deletions to make a string `k-special`. A string is `k-special` if the absolute difference between the frequencies of any two characters in the string is at most `k`.

Let's denote the frequency of a character `c` as `freq(c)`. The condition is `|freq(char1) - freq(char2)| <= k` for all characters `char1`, `char2` present in the modified string. This implies that if `min_freq` is the minimum frequency of any character and `max_freq` is the maximum frequency of any character in the modified string, then `max_freq - min_freq <= k`.

The core idea is to determine a target range `[min_f, max_f]` for frequencies such that `max_f - min_f <= k`, and then calculate the minimum deletions needed to make all remaining characters' frequencies fall within this range.

### Understanding the Constraints and Problem Structure

- `word.length` up to $10^5$.
- `k` up to $10^5$.
- Only lowercase English letters (26 distinct characters).

Since there are only 26 distinct characters, their frequencies are relatively small (at most $N=10^5$). This suggests that the solution's complexity should depend more on the number of distinct characters (26) or the range of frequencies than on `N` directly.

Let's first calculate the initial frequencies of all characters in the given `word`. We can use a dictionary or a `collections.Counter` for this.

`freq_map = {'a': 5, 'b': 2, 'c': 1, ...}`

Then, extract the _non-zero_ frequencies into a list and sort them.
`frequencies = [1, 2, 5]` (e.g., for `word = "aabcaba"`)

Now, suppose we decide that the minimum frequency a character _must_ have in the `k-special` string is `min_f`.
If a character `c` has `freq(c) < min_f`, it _must_ be entirely deleted. The deletions from this character are `freq(c)`.
If a character `c` has `freq(c) > min_f + k`, its frequency is too high. It _must_ be reduced to `min_f + k`. The deletions from this character are `freq(c) - (min_f + k)`.
Characters with `min_f <= freq(c) <= min_f + k` are fine and require no deletions for that character.

The problem is to find the optimal `min_f`. What values can `min_f` take?
The `min_f` must be one of the frequencies already present in the initial frequency list (or 0 if we delete a character entirely).
If we choose `min_f` to be a value _not_ present in the initial frequencies, we could always choose an existing frequency that is closest to or less than our chosen `min_f` and get a better or equal result.
For example, if frequencies are `[10, 20, 30]` and we pick `min_f = 15`, we delete `10` entirely. If we picked `min_f = 10`, we would also delete `10` entirely.
However, it's simpler to just iterate through all possible values that `min_f` _could be_. The maximum frequency is `N`. So `min_f` can range from `0` to `N`. This is still too wide.

A crucial insight: The optimal `min_f` _must_ be one of the existing frequencies in the `word`. Why? If we choose `min_f` such that no character actually has that frequency, say `f1 < min_f < f2` where `f1, f2` are existing frequencies. We'd delete all characters with `f1` occurrences. If we instead chose `min_f = f1`, we'd delete `f1` occurrences of these characters entirely as well, but maybe save some deletions for characters with frequencies `f_x` where `min_f < f_x <= min_f+k`.

Let `freqs` be the sorted list of initial character frequencies.
`freqs = [f_1, f_2, ..., f_m]`, where `m <= 26`.

### Approach 1: Iterate Through All Possible `min_f` values (Brute Force on `min_f`)

**Idea:**

1.  Calculate frequencies of all characters in `word`.
2.  Store non-zero frequencies in a list `freq_counts`.
3.  Iterate `min_f` from `0` to `max(freq_counts)` (or `N`).
4.  For each `min_f`:
    a. Calculate `max_f_allowed = min_f + k`.
    b. Initialize `deletions_for_this_min_f = 0`.
    c. For each `char_freq` in `freq_counts`:
    i. If `char_freq < min_f`: `deletions_for_this_min_f += char_freq` (delete entirely).
    ii. If `char_freq > max_f_allowed`: `deletions_for_this_min_f += (char_freq - max_f_allowed)` (reduce frequency).
    d. Update `min_total_deletions = min(min_total_deletions, deletions_for_this_min_f)`.

**Complexity Analysis:**

- Step 1 & 2: $O(N + |\\Sigma| \\log |\\Sigma|)$ where $|\\Sigma|=26$.
- Step 3: The loop for `min_f` runs up to `N` times.
- Step 4.c: Inner loop runs up to 26 times.
- Total complexity: $O(N + N \\cdot |\\Sigma|) = O(N \\cdot |\\Sigma|)$.
- For $N=10^5, |\\Sigma|=26$, this is $10^5 \\cdot 26 \\approx 2.6 \\cdot 10^6$, which is feasible.

Let's refine the range of `min_f`. We don't need to check all values from `0` to `N`. The optimal `min_f` must be one of the unique frequencies present in the string `word`, or 0 (if we decide to delete all occurrences of some characters that would otherwise be the minimum). So, we can iterate `min_f` over `[0] + sorted(list(set(freq_counts)))`. This means `min_f` can take at most 27 distinct values.

**Refined Approach 1.1: Iterate `min_f` over unique frequencies + 0**

**Idea:**

1.  Calculate initial frequencies of all characters using a `Counter`.
2.  Get the list of unique non-zero frequencies: `unique_freqs = sorted(list(set(freq_map.values())))`.
3.  Initialize `min_total_deletions = N` (or `sum(freq_map.values())`).
4.  Consider each `f_min` in `[0] + unique_freqs`:
    a. `deletions_for_this_f_min = 0`.
    b. `f_max_allowed = f_min + k`.
    c. For each `freq` (value) in `freq_map.values()`:
    i. If `freq < f_min`: `deletions_for_this_f_min += freq`.
    ii. Else if `freq > f_max_allowed`: `deletions_for_this_f_min += (freq - f_max_allowed)`.
    d. Update `min_total_deletions = min(min_total_deletions, deletions_for_this_f_min)`.
5.  Return `min_total_deletions`.

**Complexity:** $O(N)$ for `Counter`, then $O(|\\Sigma| \\cdot |\\Sigma|)$ for the nested loops (at most 27 \* 26). This is very efficient.

### Approach 2: Sliding Window / Two Pointers on Sorted Frequencies

This approach is sometimes used when iterating over all possible minimums (or maximums) is still too slow, but here, $|\\Sigma|$ is small, so Approach 1.1 is already optimal. However, let's explore it for completeness, as it could be useful if $|\\Sigma|$ were large but the frequency values were bounded.

**Idea:**

1.  Calculate `freq_map`.

2.  Extract `unique_freqs = sorted(list(set(freq_map.values())))`.

3.  The problem is to select a range `[L, R]` such that `R - L <= k` and minimize deletions.
    `L` will correspond to our `min_f`, and `R` will correspond to `min_f + k`.
    The frequencies `f` that fall within `[L, R]` are kept.
    Frequencies `f < L` are deleted entirely (cost `f`).
    Frequencies `f > R` are reduced to `R` (cost `f - R`).

4.  We can iterate `L` through all values in `unique_freqs`.
    For a chosen `L`, `R = L + k`.
    Calculate deletions:

    - Sum of `f` for all `f` in `unique_freqs` where `f < L`. (Prefix sum concept)
    - Sum of `(f - R)` for all `f` in `unique_freqs` where `f > R`. (Suffix sum concept)

This is essentially the same as Approach 1.1, but it suggests pre-calculating prefix/suffix sums of frequencies or counts to speed up the calculation of deletions within the loop. Since $|\\Sigma|$ is so small, the direct iteration is fine.

Let's try to optimize `deletions_for_this_f_min` calculation using sorted frequencies.
Let `freq_counts` be the list of all individual frequencies (e.g., `[1, 2, 5]` for `aabcaba`), not just unique. Sorted.
`freq_counts = sorted(list(freq_map.values()))`

**Precomputation:**

- `total_sum_freq = sum(freq_counts)`
- `prefix_sum_freq[i]` = sum of first `i` frequencies.

Consider `min_f` as `freq_counts[i]` for some `i`.
`max_f_allowed = freq_counts[i] + k`.

When `min_f` is fixed as `freq_counts[i]`:

- Characters with `freq < freq_counts[i]` must be deleted. Their count is `i`. Their sum is `prefix_sum_freq[i]`.
- Characters with `freq > freq_counts[i] + k` must be truncated.
  Find `j` such that `freq_counts[j]` is the first frequency `> freq_counts[i] + k`.
  Sum of `(f - (freq_counts[i] + k))` for `f` from `freq_counts[j]` to `freq_counts[m-1]`.
  This can be calculated as `(sum(freq_counts[j:]) - (m - j) * (freq_counts[i] + k))`.

This version would involve:

1.  Get all frequencies: `freqs = sorted(list(freq_map.values()))`.
2.  Compute prefix sums on `freqs`.
3.  Iterate `f_min` from `0` to `max(freqs)`.
4.  Use `bisect_left` to find indices for `f_min` and `f_min + k`.

**Example:** `word = "aabcaba"`, `k = 0`
Frequencies: `a:4, b:2, c:1`
`freq_map = {'a': 4, 'b': 2, 'c': 1}`
`freqs = [1, 2, 4]` (sorted frequencies)

Possible `f_min` values: `[0, 1, 2, 4]`

**Case 1: `f_min = 0`, `f_max_allowed = 0 + 0 = 0`**

- `freq = 1 ('c')`: `1 > 0`. Delete `1 - 0 = 1`.
- `freq = 2 ('b')`: `2 > 0`. Delete `2 - 0 = 2`.
- `freq = 4 ('a')`: `4 > 0`. Delete `4 - 0 = 4`.
- Deletions = `1 + 2 + 4 = 7`.

**Case 2: `f_min = 1`, `f_max_allowed = 1 + 0 = 1`**

- `freq = 1 ('c')`: `1 == 1`. No deletions.
- `freq = 2 ('b')`: `2 > 1`. Delete `2 - 1 = 1`.
- `freq = 4 ('a')`: `4 > 1`. Delete `4 - 1 = 3`.
- Deletions = `0 + 1 + 3 = 4`.

**Case 3: `f_min = 2`, `f_max_allowed = 2 + 0 = 2`**

- `freq = 1 ('c')`: `1 < 2`. Delete `1`.
- `freq = 2 ('b')`: `2 == 2`. No deletions.
- `freq = 4 ('a')`: `4 > 2`. Delete `4 - 2 = 2`.
- Deletions = `1 + 0 + 2 = 3`.

**Case 4: `f_min = 4`, `f_max_allowed = 4 + 0 = 4`**

- `freq = 1 ('c')`: `1 < 4`. Delete `1`.
- `freq = 2 ('b')`: `2 < 4`. Delete `2`.
- `freq = 4 ('a')`: `4 == 4`. No deletions.
- Deletions = `1 + 2 + 0 = 3`.

Minimum deletions = `3`. This matches Example 1.

The refined approach 1.1 seems robust and efficient enough.

### Complexity of `min(total_deletions, N)`

The maximum possible deletions is `N` (delete the entire string). So initializing `min_total_deletions = N` is fine.

### Implementation Details

Using `collections.Counter` is convenient for frequency counting.
Iterating through `freq_map.values()` ensures we only process characters present in the string.
The `[0] + unique_freqs` strategy for `f_min` ensures we cover the case where we might delete all occurrences of some characters, bringing the effective minimum frequency to 0.

Let's write down the final algorithm steps clearly:

**Algorithm: Iterating `min_f` (Optimal Approach)**

1.  **Count Frequencies:**

    - Use `collections.Counter` to get frequencies of all characters in `word`. Let this be `freq_map`.
    - `from collections import Counter`

2.  **Extract Unique Frequencies:**

    - Create a list `unique_freqs` containing all _unique_ frequency counts from `freq_map.values()`.
    - Sort `unique_freqs` in ascending order.
    - Add `0` to the beginning of `unique_freqs` to consider the case where the target minimum frequency is 0 (i.e., some characters are completely deleted).
    - `unique_freqs = sorted(list(set(freq_map.values())))`
    - `unique_freqs.insert(0, 0)`

3.  **Initialize Minimum Deletions:**

    - `min_deletions = float('inf')` (or `len(word)`)

4.  **Iterate through possible `f_min` values:**

    - For each `f_min` in `unique_freqs`:
      - Calculate `f_max_allowed = f_min + k`.
      - Initialize `current_deletions = 0`.
      - **Calculate Deletions for Current `f_min`:**
        - For each `char_freq` in `freq_map.values()` (the actual frequencies, not unique ones):
          - If `char_freq < f_min`:
            - `current_deletions += char_freq` (delete this character entirely).
          - Else if `char_freq > f_max_allowed`:
            - `current_deletions += (char_freq - f_max_allowed)` (reduce frequency to `f_max_allowed`).
      - **Update Global Minimum:**
        - `min_deletions = min(min_deletions, current_deletions)`

5.  **Return:**

    - `min_deletions`

This approach correctly handles all cases and is very efficient.

```python
from collections import Counter
import math

class Solution:
    def minimumDeletions(self, word: str, k: int) -> int:
        # Step 1: Count frequencies of all characters
        freq_map = Counter(word)

        # Step 2: Extract unique frequencies and sort them
        # Include 0 to consider cases where characters are completely deleted
        unique_freqs = sorted(list(set(freq_map.values())))

        # Adding 0 to unique_freqs allows us to consider the scenario
        # where the target minimum frequency (f_min) is 0.
        # This implies that any character with a non-zero frequency might
        # be entirely deleted if its frequency is less than the current f_min.
        # If f_min is 0, characters with frequency < 0 don't exist,
        # so this loop covers all cases.
        if 0 not in unique_freqs:
            unique_freqs.insert(0, 0) # Ensure 0 is always an option for f_min

        min_total_deletions = math.inf

        # Step 3: Iterate through all possible values for the minimum frequency (f_min)
        # The optimal f_min must be one of the existing frequencies or 0.
        for f_min in unique_freqs:
            f_max_allowed = f_min + k
            current_deletions = 0

            # Step 4: Calculate deletions required for the current f_min
            for char_freq in freq_map.values():
                if char_freq < f_min:
                    # If character's frequency is less than f_min,
                    # we must delete all occurrences of this character.
                    current_deletions += char_freq
                elif char_freq > f_max_allowed:
                    # If character's frequency is greater than f_max_allowed,
                    # we must delete (char_freq - f_max_allowed) occurrences.
                    current_deletions += (char_freq - f_max_allowed)
                # If f_min <= char_freq <= f_max_allowed, no deletions needed for this char.

            # Step 5: Update the overall minimum deletions found so far
            min_total_deletions = min(min_total_deletions, current_deletions)

        return min_total_deletions

```
