# 1338. Reduce Array Size to the Half


## Topic Alignment
- MLE Connection: Pruning least informative features mirrors cutting dataset size via frequency-based heuristics.
- Hash Table Role: Count element frequencies and bucket them by occurrence to greedily remove the largest contributors.
- Interview Angle: Combines frequency maps with greedy selection using sorting or buckets.


## Metadata Summary
- Source: https://leetcode.com/problems/reduce-array-size-to-the-half/
- Tags: Array, Hash Table, Greedy
- Difficulty: Medium
- Recommended Review Priority: High


## Problem Statement
Given an array of integers arr, you can choose a set of integers and remove all their occurrences from the array. Return the minimum size of the set such that removing those integers makes the size of the array at most half of its original size.


## Progressive Hints
- Hint 1: Count how often each distinct value appears.
- Hint 2: Removing the most frequent values first reduces the array size fastest.
- Hint 3: Sort counts or use bucket sort to accumulate removals until at least half the elements are covered.


## Solution Overview
Build a frequency map, sort counts in descending order, and greedily remove the highest frequencies until at least half the array is eliminated. Count how many distinct values were removed.


## Detailed Explanation
1. Use a Counter to compute `freq[value]` for every distinct value in arr.
2. Sort the frequency values in descending order.
3. Iterate through the sorted counts, maintaining `removed` as the total number of elements removed and `chosen` as the number of values selected.
4. Stop when `removed` reaches at least half the original array size and return `chosen`.

A bucket sort variant can achieve linear time, but sorting the counts is usually sufficient for n up to 10^5.


## Complexity Trade-off Table
| Approach | Time Complexity | Space Complexity |
| --- | --- | --- |
| Sort frequencies | O(n log n) | O(n) |
| Bucket sort frequencies | O(n) | O(n) |


## Reference Implementation


In [None]:
from collections import Counter
from typing import List


class Solution:
    def minSetSize(self, arr: List[int]) -> int:
        freq = Counter(arr)
        counts = sorted(freq.values(), reverse=True)

        removed = 0
        chosen = 0
        target = (len(arr) + 1) // 2

        for count in counts:
            removed += count
            chosen += 1
            if removed >= target:
                return chosen

        return chosen


## Complexity Analysis
- Time Complexity: O(n log n) to sort the frequency counts.
- Space Complexity: O(n) to store the frequency map and sorted counts.
- Bottlenecks: Sorting dominates; the greedy accumulation is linear.


## Edge Cases & Pitfalls
- Compute the removal target with ceiling division to handle odd lengths correctly.
- A single dominant value may satisfy the condition alone; the loop must return 1 in that case.
- Bucket sort is optional but beneficial if counts are heavily skewed and n is large.


## Follow-up Variants
- Output the actual set of values removed instead of just its size.
- Use bucket sort to achieve O(n) time when arr length is extremely large.
- Consider weighted deletions where removing a value has an associated cost.


## Takeaways
- Frequency counting plus greedy selection is a standard reduction strategy.
- Sorting frequencies provides a simple yet effective heuristic.
- Careful threshold calculations prevent off-by-one errors when halving sizes.


## Similar Problems
| Problem ID | Problem Title | Technique |
| --- | --- | --- |
| 347 | Top K Frequent Elements | Frequency sorting |
| 451 | Sort Characters by Frequency | Bucket counts |
| 2085 | Count Common Words With One Occurrence | Hash counting |
