# **Sets**

A Set in Python is used to store a collection of items with the following properties.

- No duplicate elements. If try to insert the same item again, it overwrites previous one.

- An unordered collection. When we access all items, they are accessed without any specific order and we cannot access items using indexes as we do in lists.

- Internally use hashing that makes set efficient for search, insert and delete operations. It gives a major advantage over a list for problems with these operations.

- Mutable, meaning we can add or remove elements after their creation, the individual elements within the set cannot be changed directly.

In [36]:
# typecasting list to set  
s = set(["a", "b", "c"])
print(s)

# Adding element to the set
s.add("d")
print(s)

{'c', 'a', 'b'}
{'d', 'c', 'a', 'b'}


Check unique and  Immutable with Python Set

Python sets cannot have duplicate values. While you cannot modify the individual elements directly, you can still add or remove elements from the set.

In [37]:
# Python program to demonstrate that
# a set cannot have duplicate values 
# and we cannot change its items

# a set cannot have duplicate values
s = {"Geeks", "for", "Geeks"}
print(s)

# values of a set cannot be changed
# s[1] = "Hello"
# print(s)

s.add("Hello")
print(s)

{'for', 'Geeks'}
{'for', 'Geeks', 'Hello'}


In [38]:
s.remove("for")

In [39]:
s

{'Geeks', 'Hello'}

Heterogeneous Element with Python Set

Python sets can store heterogeneous elements in it, i.e., a set can store a mixture of string, integer, boolean, etc datatypes.

In [40]:
# Python example demonstrate that a set
# can store heterogeneous elements
s = {"Geeks", "for", 10, 52.7, True}
print(s)

{True, 'for', 'Geeks', 52.7, 10}


### Python Frozen Sets

Frozen sets in Python are immutable objects that only support methods and operators that produce a result without affecting the frozen set or sets to which they are applied. It can be done with frozenset() method in Python.

While elements of a set can be modified at any time, elements of the frozen set remain the same after creation. 

If no parameters are passed, it returns an empty frozenset.

In [41]:
# Python program to demonstrate differences
# between normal and frozen set

# Same as {"a", "b","c"}
s = set(["a", "b","c"])

print("Normal Set")
print(s)

# A frozen set
fs = frozenset(["e", "f", "g"])

print("\nFrozen Set")
print(fs)

# Uncommenting below line would cause error as
# we are trying to add element to a frozen set
# fs.add("h")
s.add("h")
print("Normal set after adding element", s)

Normal Set
{'c', 'a', 'b'}

Frozen Set
frozenset({'g', 'e', 'f'})
Normal set after adding element {'c', 'a', 'b', 'h'}


**Internal working of Set**

This is based on a data structure known as a hash table. 

🔹 What is Hashing?

**Hashing** is a process that takes some data (like a number, string, or object) and runs it through a function (called a **hash function**) to produce a **fixed-size integer** (called a **hash value** or **hash code**).

Think of it like this:

> 🔑 *Hashing is a way to turn any object into a unique digital fingerprint — a number that represents that object.*

---

🔹 Why is Hashing Used in Sets?

Hashing lets Python quickly **find out where to store or look for** an element in a set without scanning every element.

Instead of checking each element one by one, Python:

1. Hashes the element to get a number.
2. Uses that number to jump directly to a **specific slot** in memory.
3. Checks if the element is already there (or nearby if there was a collision).

This is why operations like `in`, `add`, and `remove` are so fast — average **O(1) time**.

---

🔹 Hash Function in Python

In Python, every object that can go into a set or dictionary must be **hashable** — it must implement the `__hash__()` method.

Examples:

```python
hash(42)        # Output: 42
hash("apple")   # Output: some integer like -1859899447714603290
```

Each value produces a unique integer (as much as possible).

---

🔹 Real-World Analogy

Imagine you’re putting books into 100 lockers.

* Instead of checking each locker to find the right one, you run the book title through a **hash function** that tells you to put it in **Locker #17**.
* When you want to check if a book is already there, you hash the title again and go straight to **Locker #17** to check.

---

🔹 Summary

| Term           | Meaning                                                           |
| -------------- | ----------------------------------------------------------------- |
| Hash           | A number generated from data using a hash function                |
| Hash Function  | Function that takes data and returns a hash                       |
| Hash Table     | Data structure that stores items using their hash as a lookup key |
| Hash Collision | When two items produce the same hash                              |



In Python, hash functions are mostly abstracted from you. That means you usually don’t need to know the exact algorithm used under the hood. Python handles it internally when you call the built-in hash() function.

In [42]:
# generating hash value of an object
hash(89)

89

In [43]:
hash("42")

-6226758453999805989

In [44]:
hash("Ujjwal")

301850336790104621

🔁 Step-by-Step: How Sets Work Internally in Python

1. **Each Element is Hashed**

When you add a value to a set (e.g. `myset.add("apple")`):

* Python calls `hash("apple")` → returns a number (say `123456789`).
* This **hash value** is like a digital fingerprint of `"apple"`.

---

2. **Hash Value → Index in an Array**

* Internally, Python maintains an **array (table)** to store all set elements.
* The hash value is **reduced to a valid index** using **modulo**:

```python
index = hash("apple") % size_of_table
```

If the internal table has 8 slots:

```python
index = 123456789 % 8 = 5
```

So `"apple"` is intended to be stored at index 5.

---

3. **Collision Handling: Open Addressing**

* Suppose two elements hash to the same index.
* Python uses **open addressing**: it looks at the next empty slot in the array.

Example:

* `"apple"` hashes to index 5.
* `"banana"` also hashes to index 5.
* So `"banana"` is stored at **index 6**, the next available slot.

This process is called **probing**.

---

4. **Checking Membership (`in`)**

When you do:

```python
"apple" in myset
```

Python:

1. Hashes `"apple"` to get an index.
2. Jumps directly to that index.
3. Checks if the slot holds `"apple"` (using `==`).
4. If not (due to a collision), it probes nearby slots until it either finds `"apple"` or an empty slot (which means it's not present).

This is why **lookup is O(1) on average**, even for large sets.

---

5. **No Duplicates**

Before inserting, Python checks if an element already exists using its hash and `__eq__()` method. If found, it won't insert it again.

---

6. **Resizing**

* If too many elements are added (and the table is too full), Python resizes the internal array (usually doubles it).
* All existing elements are **rehashed and reinserted** into the new larger array.

This keeps operations fast.

---

## ✅ Recap

| Concept        | What Happens                                |
| -------------- | ------------------------------------------- |
| **Hashing**    | `hash(obj)` gives an integer                |
| **Indexing**   | `index = hash(obj) % table_size`            |
| **Collision**  | Resolved by probing (looking for next free) |
| **Membership** | Hash, then probe to check presence          |
| **Resize**     | Happens when too full; elements rehashed    |



In [45]:
# A Python program to
# demonstrate adding elements
# in a set

# Creating a Set
people = {"Jay", "Idrish", "Archi"}

print("People:", end = " ")
print(people)

# This will add Daxit
# in the set
people.add("Daxit")

# Adding elements to the
# set using iterator
for i in range(1, 6):
    people.add(i)

print("\nSet after adding element:", end = " ")
print(people)

People: {'Idrish', 'Archi', 'Jay'}

Set after adding element: {1, 2, 'Archi', 3, 4, 5, 'Daxit', 'Idrish', 'Jay'}


**Union operation on Python Sets**

Two sets can be merged using union() function or | operator. Both Hash Table values are accessed and traversed with merge operation perform on them to combine the elements, at the same time duplicates are removed. The Time Complexity of this is O(len(s1) + len(s2)) where s1 and s2 are two sets whose union needs to be done.

In [46]:
# Python Program to
# demonstrate union of
# two sets

people = {"Jay", "Idrish", "Archil"}
vampires = {"Karan", "Arjun"}
dracula = {"Deepanshu", "Raju"}

# Union using union()
# function
population = people.union(vampires)

print("Union using union() function")
print(population)

# Union using "|"
# operator
population = people|dracula

print("\nUnion using '|' operator")
print(population)

Union using union() function
{'Archil', 'Arjun', 'Karan', 'Idrish', 'Jay'}

Union using '|' operator
{'Archil', 'Raju', 'Idrish', 'Deepanshu', 'Jay'}


**Intersection operation on Python Sets**

This can be done through intersection() or & operator. Common Elements are selected. They are similar to iteration over the Hash lists and combining the same values on both the Table. Time Complexity of this is O(min(len(s1), len(s2)) where s1 and s2 are two sets whose union needs to be done.

In [47]:
# Python program to
# demonstrate intersection
# of two sets

set1 = set()
set2 = set()

for i in range(5):
    set1.add(i)

for i in range(3,9):
    set2.add(i)

print("Set 1:", end = " ")
print(set1)
print("Set 2:", end = " ")  
print(set2)
# Intersection using
# intersection() function
set3 = set1.intersection(set2)

print("\nIntersection using intersection() function")
print(set3)

# Intersection using
# "&" operator
set3 = set1 & set2

print("\nIntersection using '&' operator")
print(set3)

Set 1: {0, 1, 2, 3, 4}
Set 2: {3, 4, 5, 6, 7, 8}

Intersection using intersection() function
{3, 4}

Intersection using '&' operator
{3, 4}


**Finding Differences of Sets in Python**

To find differences between sets. Similar to finding differences in the linked list. This is done through difference() or – operator. Time complexity of finding difference s1 – s2 is O(len(s1))

In [48]:
# Python program to
# demonstrate difference
# of two sets

set1 = set()
set2 = set()

for i in range(5):
    set1.add(i)

for i in range(3,9):
    set2.add(i)

print("Set 1:", end = " ")
print(set1)
print("Set 2:", end = " ")  
print(set2)

# Difference of two sets
# using difference() function
set3 = set1.difference(set2)

print("\nDifference of two sets using difference() function")
print(set3)

# Difference of two sets
# using '-' operator
set3 = set1 - set2

print("\nDifference of two sets using '-' operator")
print(set3)

Set 1: {0, 1, 2, 3, 4}
Set 2: {3, 4, 5, 6, 7, 8}

Difference of two sets using difference() function
{0, 1, 2}

Difference of two sets using '-' operator
{0, 1, 2}


**Clearing Python Sets**

Set Clear() method empties the whole set inplace.

In [49]:
# Python program to
# demonstrate clearing
# of set

set1 = {1,2,3,4,5,6}

print("Initial set")
print(set1)

# This method will remove
# all the elements of the set
set1.clear()

print("\nSet after using clear() function")
print(set1)

Initial set
{1, 2, 3, 4, 5, 6}

Set after using clear() function
set()


However, there are two major pitfalls in Python sets: 

    The set doesn’t maintain elements in any particular order.
    Only instances of immutable types can be added to a Python set.