# 287. Find the Duplicate Number


## Topic Alignment
- **Role Relevance**: Detect repeated user ids in batched ingestion without modifying the source snapshot.
- **Scenario**: Audit telemetry uploads by revealing the repeated label using only read-only access.


## Metadata Summary
- Source: [Find the Duplicate Number](https://leetcode.com/problems/find-the-duplicate-number/)
- Tags: `Array`, `Binary Search`, `Counting`
- Difficulty: Medium
- Recommended Priority: High


## Problem Statement
Given an array `nums` containing `n + 1` integers where each integer is between `1` and `n` inclusive, there is exactly one duplicated number but it may appear multiple times. Return the duplicated number without modifying the input array and using constant extra space.



## Progressive Hints
- Binary search on the value range `[1, n]` rather than on indices.
- Count how many numbers are `<= mid`; a count larger than `mid` indicates the duplicate sits in the lower half.
- Maintain the lowest feasible candidate until the window closes.


## Solution Overview
The pigeonhole principle guarantees that too many numbers land in the lower half if the duplicate is there. Use binary search on the value range and shrink it with counting queries.


## Detailed Explanation
1. Set `left = 1`, `right = len(nums) - 1`.
2. While `left < right`, compute `mid` and count values `<= mid`.
3. If the count exceeds `mid`, there are more numbers than slots in the lower half, so discard the upper half by setting `right = mid`.
4. Otherwise discard the lower half by moving `left = mid + 1`.
5. The remaining value represents the duplicate.


## Complexity Trade-off Table
| Approach | Time Complexity | Space Complexity | Notes |
| --- | --- | --- | --- |
| Value-range binary search | O(n log n) | O(1) | Works on read-only arrays |
| Floyd cycle detection | O(n) | O(1) | Faster but leverages linked-list interpretation |



## Reference Implementation


In [None]:
from typing import List


def find_duplicate(nums: List[int]) -> int:
    left, right = 1, len(nums) - 1
    while left < right:
        mid = left + (right - left) // 2
        count = sum(num <= mid for num in nums)
        if count > mid:
            right = mid
        else:
            left = mid + 1
    return left


## Validation


In [None]:
cases = [
    (([1, 3, 4, 2, 2],), 2),
    (([3, 1, 3, 4, 2],), 3),
    (([1, 1],), 1),
    (([1, 1, 2],), 1),
]
for args, expected in cases:
    result = find_duplicate(*args)
    assert result == expected, f"find_duplicate{args} -> {result}, expected {expected}"


## Complexity Analysis
- Time Complexity: `O(n log n)` because each counting pass scans the array.
- Space Complexity: `O(1)` extra memory.
- Bottleneck: Counting pass for each binary search iteration.



## Edge Cases & Pitfalls
- Minimum length arrays with two identical numbers.
- Arrays where the duplicate appears many times.
- Arrays already sorted or randomly ordered.



## Follow-up Variants
- Contrast with Floyd's tortoise and hare solution for constant time factor.
- Discuss using bitsets when the array can be modified.
- Extend to find multiple duplicates when there is more than one repeated value.



## Takeaways
- Midpoint checks can be applied to the value domain, not just index ranges.
- Counting comparisons are a reliable way to exploit the pigeonhole principle.



## Similar Problems
| Problem ID | Problem Title | Technique |
| --- | --- | --- |
| 442 | Find All Duplicates in an Array | In-place marking |
| 448 | Find All Numbers Disappeared in an Array | Counting via indices |

