# Sets in Python — Beginner Friendly Guide

This notebook explains the Python `set` data structure in a clear, example-driven way for absolute beginners.

## 1. Introduction to Sets

Definition:
- A set is an unordered collection of unique elements in Python.

Purpose:
- Use sets when you need uniqueness (no duplicates), fast membership tests, and to perform mathematical set operations (union, intersection, difference).

## 2. Creating Sets

Syntax:
- Literal form: {1, 2, 3}
- Built-in constructor: set(iterable)

Notes:
- An empty pair of braces {} creates an empty dictionary, not a set. Use `set()` to create an empty set.

In [None]:
# Examples: creating sets
# Empty set
empty_set = set()

# Literal with values
fruit_set = {"apple", "banana", "apple", "orange"}  # duplicates removed

# From iterable (list)
nums = set([1, 2, 2, 3])

# Set comprehension
squares = {x*x for x in range(5)}

empty_set, fruit_set, nums, squares

## 3. Set Characteristics

- Unordered: no indexing, order not preserved.
- Unique elements: duplicates are removed automatically.
- Mutable: you can add/remove elements (but the set itself is unhashable).
- Elements must be immutable (e.g., numbers, strings, tuples).

## 4. Common Set Operations (brief)

- Membership test: `x in my_set`
- Add: `my_set.add(x)`
- Remove: `my_set.remove(x)` (raises KeyError if missing)
- Discard: `my_set.discard(x)` (no error if missing)
- Length: `len(my_set)`
- Iterate: `for x in my_set:`

In [None]:
# Examples: add, remove, membership
s = {1, 2, 3}
print('start', s)

s.add(4)
print('after add 4:', s)

s.discard(2)
print('after discard 2:', s)

# remove raises an error if missing - shown with try/except
try:
    s.remove(10)
except KeyError:
    print('remove(10) would raise KeyError; use discard to avoid this')

print('3 in s?', 3 in s)
len(s)

## 5. Mathematical Set Operations (overview)

- Union: all elements present in either set. `A | B` or `A.union(B)`
- Intersection: elements present in both sets. `A & B` or `A.intersection(B)`
- Difference: elements in A but not in B. `A - B` or `A.difference(B)`
- Symmetric difference: elements in either A or B but not both. `A ^ B` or `A.symmetric_difference(B)`

In [None]:
# Examples: union, intersection, difference, symmetric difference
A = {1, 2, 3, 4}
B = {3, 4, 5}

print('A:', A)
print('B:', B)
print('A | B (union):', A | B)
print('A.union(B):', A.union(B))
print('A & B (intersection):', A & B)
print('A.intersection(B):', A.intersection(B))
print('A - B (difference):', A - B)
print('A.difference(B):', A.difference(B))
print('A ^ B (sym diff):', A ^ B)
print('A.symmetric_difference(B):', A.symmetric_difference(B))

## 6. Set Methods (quick reference)

- add(elem): add element
- remove(elem): remove element, KeyError if missing
- discard(elem): remove element if present
- pop(): remove and return an arbitrary element
- clear(): remove all elements
- union()/|, intersection()/&, difference()/- , symmetric_difference()/^
- issubset(other), issuperset(other)
- isdisjoint(other)
- copy()
- frozenset(iterable): an immutable set type

### Examples: Set Methods (hands-on)

Below are short examples showing how each common set method behaves. Run the previous cell to see outputs and try modifying values to experiment.

In [None]:
# Demonstrations of set methods
s = {10, 20, 30}
print('Start:', s)

# add
s.add(40)
print('After add(40):', s)

# discard (no error if missing)
s.discard(99)
print('After discard(99) (no error):', s)

# remove (raises KeyError if missing) with safe pattern
if 20 in s:
    s.remove(20)
    print('After remove(20):', s)
else:
    print('20 not present')

# pop (removes an arbitrary element)
elem = s.pop()
print('pop() removed:', elem, 'remaining:', s)

# copy vs shallow alias
s2 = s.copy()
s_alias = s
s2.add(999)
s_alias.add(888)  # affects original because s_alias is same object
print('Original after copy add 999 (should not include 999):', s)
print('s2 (copy) after adding 999:', s2)

# issubset / issuperset / isdisjoint
A = {1, 2}
B = {1, 2, 3}
print('A.issubset(B):', A.issubset(B))
print('B.issuperset(A):', B.issuperset(A))
print('A.isdisjoint({5,6}):', A.isdisjoint({5,6}))

# frozenset (immutable set) - can be used as dict key
fs = frozenset([1,2,3])
d = {fs: 'value for frozen set'}
print('frozenset as key example:', d)

# clear() to empty a set
s2.clear()
print('s2 after clear():', s2)

# Note: sets require hashable elements; tuples are allowed, lists are not.
try:
    bad = {[1,2], 3}
except TypeError as e:
    print('Cannot create set with list inside:', e)


## 7. Applications of Sets

- Removing duplicates from lists (data cleaning)
- Fast membership testing (e.g., lookups)
- Mathematical operations on groups of items (tags, permissions)
- Graph algorithms (maintaining visited nodes)
- Counting distinct items (unique users, unique words)
- Building sets of features in ML preprocessing

### Examples: Applications of Sets

These short examples show real use-cases where sets simplify tasks: deduping, membership lookups, merging tags, visited nodes for graph traversal, counting unique words, and simple feature extraction for ML.

In [None]:
# Application examples

# 1. Remove duplicates from a list (data cleaning)
raw = ['apple', 'banana', 'apple', 'orange']
clean = list(dict.fromkeys(raw))  # preserves order
clean_using_set = list(set(raw))   # does not preserve order

# 2. Fast membership testing (e.g., blocked users)
blocked = set(['bad_user', 'spammer'])
user = 'alice'
print(user, 'blocked?', user in blocked)

# 3. Merge tags/permissions (union) and find common tags (intersection)
post1_tags = {'python', 'data', 'tips'}
post2_tags = {'python', 'tutorial', 'beginner'}
all_tags = post1_tags | post2_tags
common_tags = post1_tags & post2_tags
print('Merged tags:', all_tags)
print('Common tags:', common_tags)

# 4. Visited nodes example (graph traversal - BFS/DFS simplified)
graph = {1:[2,3], 2:[4], 3:[], 4:[]}
visited = set()
stack = [1]
while stack:
    node = stack.pop()
    if node in visited:
        continue
    visited.add(node)
    for n in graph[node]:
        stack.append(n)
print('Visited nodes:', visited)

# 5. Count unique words in text (simple analytics)
text = 'hello world hello'
unique_word_count = len(set(text.split()))
print('Unique words:', unique_word_count)

# 6. Simple feature extraction: unique characters in a string (toy example)
sample = 'abracadabra'
features = set(sample)
print('Feature set (unique chars):', features)

# Return values to show in notebook output (where appropriate)
clean, clean_using_set, all_tags, common_tags, visited, unique_word_count, features

## 8. Limitations of Sets

- Unordered: no indexing or slicing.
- Cannot contain mutable elements (e.g., lists, dicts) because elements must be hashable.
- Sets themselves are mutable and therefore cannot be used as dict keys — use `frozenset` if you need an immutable set.
- No guaranteed order; if you need a stable order, use a list or sorted view (`sorted(my_set)`).

## 9. Tips for Beginners

- To create an empty set, use `set()`, not `{}`.
- Use `discard()` if you want to remove an item without risking an error.
- Convert to `list` if you need ordering or indexing: `sorted(list(s))` for a stable order.
- Use `frozenset` when you need an immutable set (e.g., as dictionary keys).
- Use set comprehensions for concise creation: `{x for x in iterable if condition}`.

## 10. Practical Examples (exercises)

1. Remove duplicates from a list.
2. Find common elements between two lists.
3. Check whether one list is a subset of another.
4. Count unique words in a sentence.

Try to solve them, then run the next cell to see example solutions.

In [None]:
# Solutions to practical examples
# 1. Remove duplicates
lst = [1,2,2,3,4,4]
unique_lst = list(set(lst))

# 2. Common elements
a = [1,2,3,4]
b = [3,4,5]
common = list(set(a) & set(b))

# 3. Subset check
small = [1,2]
big = [1,2,3]
is_subset = set(small).issubset(big)

# 4. Count unique words
sentence = 'this is a test this is only a test'
unique_words = len(set(sentence.split()))

unique_lst, common, is_subset, unique_words

## 11. Important information about Sets

- Complexity: average O(1) for add, remove, and membership tests (hash table-based). Worst-case can be worse if hash collisions occur.
- Use `frozenset` for immutable set values when you need hashing or using sets as dictionary keys.
- Sets are best when you need uniqueness and fast membership checks, not when order matters.
- Converting between list and set is common: `list(set(iterable))` — remember this loses order.



## 12. Summary & Next steps

- Sets are simple, powerful, and useful for uniqueness and fast membership tests.
- Practice: try solving small data cleaning tasks using sets (remove duplicates, merge unique tags, etc.).
- Next: learn about `frozenset`, and then explore dictionaries and how hashing underlies both sets and dicts.

References:
- Official Python docs: https://docs.python.org/3/library/stdtypes.html#set
- Tutorials and exercises: search for "Python set comprehension" and "set operations Python".