# Chapter 3. Built-In Data Structures, Functions, and Files

This chapter 3 focuses on fundamental elements of the Python language that are essential throughout the book. It emphasizes the synergy between Python's built-in data manipulation tools and additional libraries like pandas and NumPy, designed for more complex computational tasks with large datasets. The chapter covers core data structures (tuples, lists, dictionaries, and sets), creating reusable functions in Python, and understanding the basics of working with file objects and local file systems.

# 3.1 Data Structures and Sequences
Python boasts straightforward yet potent data structures. Proficiency in their utilization is a crucial aspect of achieving expertise as a Python programmer. Our journey begins with tuples, lists, and dictionaries, among the sequence types commonly employed.

**Tuple**
A tuple is a fixed-length, immutable sequence of Python objects, which, once assigned, cannot be changed. The simplest method to create a tuple is by specifying a comma-separated sequence of values enclosed within parentheses:

```python
tup = (4, 5, 6)
```

In this example, the tuple `tup` is defined with the values 4, 5, and 6. Once created, the content of the tuple remains3 constant throughout its existence.

In [1]:
tup = (4, 5, 6)
tup

(4, 5, 6)

Certainly, in many contexts, the parentheses can be omitted when creating a tuple. Therefore, the tuple assignment can also be expressed without explicit parentheses, like so:

```python
tup = 4, 5, 6
```

This syntax is equivalent to the previous example and is a concise alternative when creating tuples with a simple sequence of values.

In [2]:
tup = 4, 5, 6
tup

(4, 5, 6)

Certainly, you have the capability to convert any sequence or iterator into a tuple by utilizing the `tuple` function. Here are examples illustrating this conversion:

```python
result1 = tuple([4, 0, 2])  # Converts a list to a tuple
result2 = tuple('string')  # Converts a string to a tuple
```

In the first example, the `tuple` function is used to convert the list `[4, 0, 2]` into a tuple. In the second example, the characters of the string 'string' are converted into a tuple. This flexibility allows you to easily create tuples from various iterable objects.

In [3]:
tuple([4, 0, 2])

(4, 0, 2)

In [4]:
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

Indeed, elements within a tuple can be accessed using square brackets (`[]`), following the convention of many other sequence types. In Python, as in languages such as C, C++, Java, and others, sequences are 0-indexed. Here's an example of accessing the first element of a tuple:

```python
tup = (4, 5, 6)
first_element = tup[0]
```

In this case, `first_element` will be assigned the value 4, as indexing starts from 0 in Python.

In [5]:
tup[0]

's'

Certainly, when defining tuples within more intricate expressions, it is often necessary to enclose the values in parentheses. Here's an example of creating a tuple of tuples and accessing its elements:

```python
nested_tup = (4, 5, 6), (7, 8)

# Accessing the entire tuple
complete_tuple = nested_tup

# Accessing the first tuple within the nested structure
first_tuple = nested_tup[0]

# Accessing the second tuple within the nested structure
second_tuple = nested_tup[1]
```

In this case, `complete_tuple` is the entire tuple of tuples, `first_tuple` corresponds to the tuple (4, 5, 6), and `second_tuple` corresponds to the tuple (7, 8). The use of parentheses aids in creating and referencing nested structures.

In [6]:
nested_tup = (4, 5, 6), (7, 8)
nested_tup



((4, 5, 6), (7, 8))

In [7]:
nested_tup[0]

(4, 5, 6)

In [8]:
nested_tup[1][1]

8

Tuples in Python are immutable, meaning that once created, you cannot modify the objects stored in each slot. The following example demonstrates an attempt to modify an element within a tuple, which is not allowed:

```python
tup = tuple(['foo', [1, 2], True])

# This operation is not allowed and will result in an error
tup[2] = False
```

This would raise a `TypeError` since tuples do not support item assignment after creation. The immutability of tuples ensures their integrity and consistency throughout their existence. If you need a data structure with mutable elements, a list might be a more suitable choice.

In [9]:
tup = tuple(['foo', [1, 2], True])
tup[2] = False

TypeError: 'tuple' object does not support item assignment

If an object contained within a tuple is mutable, for example, a list, it can be modified in place. The following example illustrates this concept:

```python
tup = ('foo', [1, 2], True)

# Modifying the mutable object (list) within the tuple
tup[1].append(3)
```

In this case, the `append` method is used to modify the list `[1, 2]` within the tuple. As a result, the updated tuple is `('foo', [1, 2, 3], True)`. While the tuple itself remains immutable, its elements, if mutable, can be modified in place.

In [None]:
tup[1].append(3)   # Modifying the mutable object (list) within the tuple
tup

('foo', [1, 2, 3], True)

#### Concatenate tuples
You can combine or concatenate tuples by using the `+` operator, resulting in the creation of longer tuples. Here's an example along with an explanation:

```python
# Concatenating three tuples to form a longer tuple
result = (4, None, 'foo') + (6, 0) + ('bar',)
```

The `+` operator is employed to concatenate the tuples `(4, None, 'foo')`, `(6, 0)`, and `('bar',)`. The resulting tuple, assigned to the variable `result`, is `(4, None, 'foo', 6, 0, 'bar')`. This operation allows for the seamless combination of individual tuples into a single, longer tuple.

In [None]:
(4, None, 'foo') + (6, 0) + ('bar',)   # Concatenating three tuples to form a longer tuple

(4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, akin to lists, results in concatenating the tuple with itself multiple times. In the provided example (`('foo', 'bar') * 4`), it signifies that the original tuple `('foo', 'bar')` is repeated four times, generating a new tuple: `('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')`. This multiplication operation provides a concise way to replicate and extend tuples according to the specified multiplier.

In [None]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

It's important to note that when multiplying a tuple by an integer, the objects themselves are not duplicated; instead, only references to the existing objects are replicated. This means that the elements within the multiplied tuples point to the same underlying objects as the original tuple. Any modification to the referenced objects will be reflected across all instances in the multiplied tuple. Understanding this behavior is crucial for handling mutable objects within tuples and avoiding unexpected consequences.

In the following example, when the original tuple is multiplied by 2, the elements within the multiplied tuple still reference the same list object as the original tuple. Therefore, when we modify the list inside the original tuple, the change is reflected in the multiplied tuple as well.

In [None]:
# Original tuple with a list as one of its elements
original_tuple = (1, 2, [3, 4])
original_tuple

(1, 2, [3, 4])

In [None]:
# Multiplying the tuple by an integer
multiplied_tuple = original_tuple * 2
multiplied_tuple

(1, 2, [3, 4], 1, 2, [3, 4])

In [None]:
# Modifying the list inside the original tuple
original_tuple[2].append(5)
original_tuple

(1, 2, [3, 4, 5])

In [None]:
multiplied_tuple

(1, 2, [3, 4, 5], 1, 2, [3, 4, 5])

#### Unpacking Tuples:
When you attempt to assign values to a tuple-like expression of variables, Python automatically endeavors to unpack the values from the right-hand side of the equals sign. 

In the following example, the values `(4, 5, 6)` are unpacked into the variables `a`, `b`, and `c`. After this operation, the value of `b` will be 5. This unpacking feature provides a concise and expressive way to assign multiple variables simultaneously based on the contents of a tuple.

In [None]:
tup = (4, 5, 6)
a, b, c = tup
b

5

Even sequences containing nested tuples can be unpacked in Python: In the following example, the tuple `(6, 7)` within the original tuple is unpacked into the variables `c` and `d`. Consequently, the value of `d` will be 7 after the unpacking operation. This capability allows for flexible and hierarchical unpacking of values from nested structures within tuples.

In [None]:
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
d

7

Utilizing this unpacking functionality in Python, you can effortlessly swap variable names—a task that, in many other languages, might involve temporary variables. In Python, the swapping process can be succinctly expressed without the need for a temporary variable:

```python
a, b = b, a
```

This elegant one-liner takes advantage of tuple packing and unpacking to swap the values of `a` and `b` without the necessity of an auxiliary variable. The right-hand side creates a tuple `(b, a)`, and the variables on the left-hand side are then unpacked accordingly.

Absolutely, in Python, the swap can be achieved succinctly, as demonstrated in your example:

```python
a, b = 1, 2
b, a = a, b
```

This concise syntax takes advantage of tuple packing and unpacking, allowing for the direct swapping of values between `a` and `b` without the need for a temporary variable. This not only enhances code readability but also exemplifies the flexibility and elegance of Python's syntax.

In [None]:
a, b = 1, 2
a

1

In [None]:
b

2

In [None]:
b, a = a, b
a

2

In [None]:
b

1

A common and powerful use of variable unpacking in Python is when iterating over sequences of tuples or lists. The following example illustrates this well:

In this case, each iteration unpacks the tuple `(1, 2, 3)`, `(4, 5, 6)`, and `(7, 8, 9)` into the variables `a`, `b`, and `c` respectively. This type of iterable unpacking simplifies the code when working with structured data, making it more readable and expressive.

In [None]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


Another common and powerful application of variable unpacking is when returning multiple values from a function.

Additionally, the concept of "plucking" elements from the beginning of a tuple is facilitated by a special syntax: `*rest`. This syntax is not only applicable in unpacking tuples but is also used in function signatures to capture an arbitrarily long list of positional arguments.

Here's an example illustrating the use of `*rest` to capture the remaining elements after extracting `a` and `b` from the tuple:

After this operation, `a` will be 1, `b` will be 2, and `rest` will be a list containing the remaining elements `[3, 4, 5]`. This provides a flexible way to handle variable-length structures and is commonly used in functions where the number of arguments may vary.

In [None]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
a

1

In [None]:
b

2

In [None]:
rest

[3, 4, 5]

In situations where you want to discard the remaining elements, it's common to use a placeholder variable, and as a convention, many Python programmers opt for the underscore (`_`) for this purpose. The underscore indicates that the variable is intentionally unused and serves as a visual cue to readers that the value is disregarded.

Here's an example illustrating the convention of using underscore for the unwanted variables:

In this case, `a` will be 1, `b` will be 2, and the underscore `_` signifies that the remaining elements are intentionally ignored. This practice enhances code clarity and informs others that the specific values are not relevant to the current context.

In [None]:
a, b,*_ = values

In [None]:
_

[3, 4, 5]

Indeed, due to the immutability of tuples, they have a limited set of instance methods. However, one particularly useful method, which is also available for lists, is the `count` method. This method allows you to determine the number of occurrences of a specific value within the tuple.

In this In the following example, `count_of_2` will be equal to 4, indicating that the value 2 appears four times in the tuple `a`. The `count` method provides a convenient way to analyze the frequency of specific elements within a tuple.

In [None]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

### List
In contrast to tuples, lists exhibit variable length, and their contents can be modified in place, making them mutable. Lists can be defined using square brackets `[]` or the `list` type function:

Lists, being mutable, allow for modifications to their elements after creation. In the following example, a list `a_list` is created directly using square brackets, and another list `b_list` is generated by converting a tuple `tup` using the `list` function. 

In [None]:
a_list = [2, 3, 7, None]
a_list


[2, 3, 7, None]

In [None]:
tup = ("foo", "bar", "baz")


In [None]:
b_list = list(tup)
b_list


['foo', 'bar', 'baz']

The ability to modify individual elements in the list is demonstrated by changing the value at index 1 in `b_list` from 'bar' to 'peekaboo'.

In [None]:
b_list[1] = "peekaboo"
b_list

['foo', 'peekaboo', 'baz']

Lists and tuples share semantic similarities, although tuples, being immutable, cannot be modified. They can often be used interchangeably in many functions. The `list` built-in function is commonly employed in data processing to materialize an iterator or generator expression:

In the following example, the `range(10)` generator expression is materialized into a list using the `list` function. This process is beneficial in scenarios where the iterator or generator expression needs to be converted into a concrete list for further manipulation or analysis.

In [None]:
gen = range(10)  # Creating a generator expression
gen


range(0, 10)

In [None]:
list(gen)  # Using the list function to materialize the generator into a list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#### Adding and removing elements

Elements can be added to the end of a list using the `append` method. 

In the following case, the `append` method is used to add the string "dwarf" to the end of the list `b_list`. The result is an updated list containing the additional element. The `append` method is a convenient way to extend lists dynamically.

In [None]:
b_list.append("dwarf")
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

By utilizing the `insert` method, you can add an element at a particular position within a list. 

In the following instance, the `insert` method is employed to insert the string "red" at index 1 in the list `b_list`. The resulting list reflects the addition of the element at the specified position. The `insert` method allows for precise placement of elements within a list.

In [None]:
b_list.insert(2, "red")    # Inserting the element "red" at position 1 in the list
b_list

['foo', 'red', 'red', 'peekaboo', 'baz']

When using the `insert` method, it's essential to ensure that the insertion index falls within the valid range of 0 to the length of the list (inclusive). Attempting to insert an element at an index outside this range will result in an `IndexError`. 

The counterpart to the `insert` operation is `pop`, which removes and retrieves an element from a specific index. In the following instance, the `pop` method is employed to eliminate the element at index 2 in the list `b_list`. The value 'peekaboo' is then returned and stored in the variable `removed_element`. Subsequently, the list is updated, and its contents become `['foo', 'red', 'baz', 'dwarf']`.

The `pop` method provides a way to selectively remove elements from a list based on their index while simultaneously obtaining the removed value.

In [None]:
b_list.pop(2)   # Using pop to remove and return the element at index 2
b_list

['foo', 'red', 'peekaboo', 'baz']

Elements can be eliminated based on their value using the `remove` method, which identifies the first occurrence of the specified value and removes it from the list. 

In the following scenario, the `remove` method is applied to eliminate the first occurrence of the value "foo" from the list `b_list`. After this operation, the list is updated, and its contents become `['red', 'baz', 'dwarf', 'foo']`. The `remove` method is useful when you want to delete a specific value from a list without considering its index.

In [None]:
b_list.append("foo")   # Adding "foo" to the list
b_list

['foo', 'red', 'peekaboo', 'baz', 'foo']

In [None]:
b_list.remove("foo")   # Removing the first occurrence of "foo" from the list
b_list

['red', 'peekaboo', 'baz', 'foo']

If performance considerations are not crucial, you can emulate a set-like behavior using a Python list by employing `append` and `remove`. Although Python includes actual set objects (discussed later).

In the following case, the `in` keyword is utilized to determine if the value "dwarf" exists in the list `b_list`. The result is a boolean indicating whether the specified value is present in the list.

It's important to note that while this approach may offer set-like functionality, Python provides dedicated set objects for more efficient and optimized set operations, especially in scenarios involving larger datasets.

In [None]:
"dwarf" in b_list   # Checking if "dwarf" is present in the list

False

The `not` keyword can be employed to negate the result obtained from using the `in` keyword. 

In the following instance, the `not in` expression is used to evaluate whether the value "dwarf" is not present in the list `b_list`. The result is a boolean indicating whether the specified value is absent in the list. In this particular case, the output is `False`, suggesting that "dwarf" is indeed present in the list.

Verifying whether a list contains a specific value is considerably slower compared to dictionaries and sets (to be discussed shortly). This is because Python performs a linear scan across the values of the list, resulting in a time complexity proportional to the size of the list. In contrast, dictionaries and sets, which are based on hash tables, can execute such checks in constant time, providing faster performance for membership tests.

In [None]:
"dwarf" not in b_list   # Checking if "dwarf" is not present in the list

True

#### Concatenating and combining lists
Similar to tuples, combining lists with the `+` operator concatenates them. If you already have a list defined, the `extend` method allows you to append multiple elements to it. It's important to note that concatenating lists with `+` creates a new list, copying the objects over, making it relatively expensive. In contrast, using `extend` to append elements to an existing list is generally more efficient, especially when building up a large list. Therefore, when dealing with lists of lists, using `extend` in a loop is faster than the alternative concatenative approach.

In [None]:
[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [None]:
x = [4, None, "foo"]
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

#### Sort
You can arrange the elements of a list in ascending order using the `sort` function, which modifies the list in place without creating a new object. For instance, if you have a list `a` containing numeric elements, calling `a.sort()` will rearrange its elements in ascending order.

In [None]:
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

To sort the list `a` in descending order, you can use the `sort` method with the `reverse` parameter set to `True`. Here's the modified code:

In [None]:
a.sort(reverse=True)
a

[7, 5, 3, 2, 1]

The `sort` function offers options, one of which is the ability to provide a secondary sort key—a function that determines the value to be used for sorting. This can be useful in scenarios where you want to sort a collection of strings based on their lengths, for example. In the given example, the list `b` is sorted using the `len` function as the key, resulting in a list ordered by string lengths.

Additionally, there is a mention of the upcoming `sorted` function, which will be discussed later. The `sorted` function can create a sorted copy of a general sequence, offering an alternative approach to sorting without modifying the original sequence.

In [None]:
b = ["saw", "small", "He", "foxes", "six"]
b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

#### Slicing
You can use slice notation to extract specific sections from various sequence types. The basic format involves using start:stop within square brackets ([]), where "start" is the starting index and "stop" is the index up to which the elements will be included.

For instance, consider the sequence:
```python
seq = [7, 2, 3, 7, 5, 6, 0, 1]
```

If you apply slice notation `seq[1:5]`, it will retrieve elements from index 1 to index 4 (5 is excluded), resulting in the output `[2, 3, 7, 5]`.


In [None]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq

[7, 2, 3, 7, 5, 6, 0, 1]

You can assign new values to a specific slice within a sequence. In the given example:

```python
seq[3:5] = [6, 3]
```

It means replacing the elements in the sequence `seq` from index 3 to index 4 with the values `[6, 3]`. After this assignment, the sequence is modified, and the output of `seq` becomes `[7, 2, 3, 6, 3, 6, 0, 1]`.

In [None]:
seq[1:5]

[2, 3, 7, 5]

The slice notation in Python follows the rule that while the element at the start index is included, the stop index is not included. Consequently, the number of elements in the result is determined by subtracting the start index from the stop index.

If either the start or stop is omitted, it defaults to the beginning of the sequence for the start and the end of the sequence for the stop. For instance:

```python
seq[:5]
```
This retrieves elements from the start of the sequence up to (but not including) index 5, resulting in the output `[7, 2, 3, 6, 3]`.

```python
seq[3:]
```
This fetches elements from index 3 to the end of the sequence, yielding the output `[6, 3, 6, 0, 1]`.

In [None]:
seq[3:5] = [6, 3]
seq

[7, 2, 3, 6, 3, 6, 0, 1]

In [None]:
seq[:5]

[7, 2, 3, 6, 3]

In [None]:
seq[3:]

[6, 3, 6, 0, 1]

Negative indices in Python slice notation indicate slicing the sequence relative to the end. 

![list](images/list1.jpg)

In the provided examples:

```python
seq[-4:]
```
This extracts elements from the fourth-to-last index to the end of the sequence, resulting in the output `[3, 6, 0, 1]`.

```python
seq[-6:-2]
```
This retrieves elements from the sixth-to-last index up to (but not including) the second-to-last index, yielding the output `[3, 6, 3, 6]`.

In [None]:
seq[-4:]

[3, 6, 0, 1]

In [None]:
seq[-6:-2]

[3, 6, 3, 6]

Understanding slicing semantics in Python may require some adjustment, especially for those transitioning from languages like R or MATLAB. The provided Figure 3.1 offers a helpful illustration of slicing using both positive and negative integers. In the figure, indices are depicted at the "bin edges" to clarify where slice selections begin and end with positive or negative indices.

Additionally, a step value can be employed after a second colon to, for example, select every other element. In the given example:

```python
seq[::2]
```

This retrieves elements with a step of 2, meaning every second element is selected. The output is `[7, 3, 3, 0]`.

In [None]:
seq[::2]

[7, 3, 3, 0]

Indeed, a clever application of the step value is to use -1, which effectively reverses a list or tuple. In the provided example:

```python
seq[::-1]
```

This constructs a reversed version of the sequence, as it iterates through the elements with a step of -1. Consequently, the output is `[1, 0, 6, 3, 6, 3, 2, 7]`.

In [None]:
seq[::-1]

[1, 0, 6, 3, 6, 3, 2, 7]

In [None]:
seq

[7, 2, 3, 6, 3, 6, 0, 1]

### Dictionary

The dictionary, or `dict`, is a crucial built-in data structure in Python. In some other programming languages, dictionaries are referred to as hash maps or associative arrays. A dictionary in Python stores a collection of key-value pairs, where both the key and the value are Python objects. Each key is associated with a corresponding value, enabling convenient retrieval, insertion, modification, or deletion of values based on specific keys.

One common method for creating a dictionary is by using curly braces `{}` and colons to separate keys and values. Here are a few examples:

```python
empty_dict = {}
d1 = {"a": "some value", "b": [1, 2, 3, 4]}
```

The dictionary `d1` contains key-value pairs where the key "a" is associated with the value "some value," and the key "b" is associated with the list `[1, 2, 3, 4]`. The output of `d1` is `{'a': 'some value', 'b': [1, 2, 3, 4]}`.

In [None]:
empty_dict = {}
empty_dict

{}

In [None]:
d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

The syntax for accessing, inserting, or setting elements in a dictionary is similar to that used for lists or tuples. In the provided examples:

```python
d1[7] = "an integer"
```

This line inserts a new key-value pair into the dictionary `d1`, associating the key 7 with the value "an integer." After this operation, the dictionary becomes `{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}`.

```python
d1["b"]
```

This syntax allows you to access the value associated with the key "b" in the dictionary. In this case, the output is `[1, 2, 3, 4]`.

In [None]:
d1[7] = "an integer"
d1


{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [None]:
d1["b"]

[1, 2, 3, 4]

You can determine if a dictionary contains a specific key using the same syntax employed for checking the presence of a value in a list or tuple. In the given example:

```python
"b" in d1
```

This expression checks if the key "b" is present in the dictionary `d1`. The output is `True`, indicating that the key "b" is indeed present in the dictionary.

In [None]:
"b" in d1

True

You have two methods for deleting values from a dictionary: using the `del` keyword or the `pop` method, which not only deletes the key but also returns the corresponding value. Here's an illustration:

```python
# Initial dictionary
d1 = {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}

# Adding new key-value pairs
d1[5] = 'some value'
d1['dummy'] = 'another value'

# After additions
# {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value', 'dummy': 'another value'}

# Deleting a key-value pair using del
del d1[5]

# After deletion using del
# {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 'dummy': 'another value'}

# Deleting a key-value pair using pop
ret = d1.pop('dummy')

# Returned value from pop
# 'another value'

# After deletion using pop
# {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}
```

In summary, `del d1[5]` removes the key-value pair with the key 5, and `ret = d1.pop('dummy')` removes the key-value pair with the key 'dummy' while also assigning the value 'another value' to the variable `ret`.

In [None]:
d1[5] = "some value"
d1


{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}

In [None]:
d1["dummy"] = "another value"
d1


{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [None]:
del d1[5]
d1


{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}

In [None]:
ret = d1.pop("dummy")
ret


'another value'

In [None]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

Indeed, the `keys()` and `values()` methods provide iterators for the keys and values of a dictionary, respectively. The order of the keys is determined by their insertion order, and these methods output the keys and values in the same respective order. Here's an example:

```python
# Given dictionary
d1 = {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

# Obtaining a list of keys
list(d1.keys())
# Output: ['a', 'b', 7]

# Obtaining a list of values
list(d1.values())
# Output: ['some value', [1, 2, 3, 4], 'an integer']
```

In this case, `list(d1.keys())` generates a list of keys in the dictionary, and `list(d1.values())` produces a list of corresponding values. The order of elements in these lists is based on the order in which the keys were originally inserted into the dictionary.

In [None]:
list(d1.keys())    # Obtaining a list of keys


['a', 'b', 7]

In [None]:
list(d1.values())    # Obtaining a list of values

['some value', [1, 2, 3, 4], 'an integer']

If you need to iterate over both the keys and values of a dictionary simultaneously, you can utilize the `items()` method, which returns an iterator of 2-tuples containing the key-value pairs. Here's an example:

```python
# Given dictionary
d1 = {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

# Obtaining a list of key-value pairs as 2-tuples
list(d1.items())
# Output: [('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]
```

In this case, `list(d1.items())` generates a list of 2-tuples where each tuple contains a key-value pair from the dictionary. The order of these tuples corresponds to the order in which the keys were originally inserted into the dictionary.

In [None]:
list(d1.items())    # Obtaining a list of key-value pairs as 2-tuples

[('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]

You can merge the contents of one dictionary into another using the `update` method. In the provided example:

```python
# Given dictionary
d1 = {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

# Updating d1 with the contents of another dictionary
d1.update({"b": "foo", "c": 12})

# Resulting dictionary after update
# {'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}
```

The `update` method takes a dictionary as an argument and incorporates its key-value pairs into the calling dictionary (`d1` in this case). If a key from the provided dictionary already exists in the calling dictionary, its corresponding value is updated. If a new key is present, it is added to the calling dictionary.

In [None]:
d1.update({"b": "foo", "c": 12})
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

The `update` method modifies dictionaries in place. When using `update`, existing keys in the target dictionary (the one calling `update`) will have their old values replaced by the values from the dictionary being passed to `update`. If a key does not exist in the target dictionary, it will be added. In the provided example, the value associated with the key "b" in the original `d1` dictionary is replaced by the value "foo" from the dictionary passed to `update`, and a new key "c" is added with the value 12.

#### Creating dictionaries from sequences

It is a common scenario to have two sequences that need to be paired up element-wise into a dictionary. Initially, one might use code similar to the following:

```python
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
```

This code utilizes the `zip` function to iterate over corresponding elements from `key_list` and `value_list`, assigning each pair to the `mapping` dictionary where elements from `key_list` become keys and elements from `value_list` become values.

In [None]:
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value

NameError: name 'key_list' is not defined

Because a dictionary can be viewed as a collection of 2-tuples, the `dict` function provides a convenient way to create a dictionary from a list of such tuples. In the provided example:

```python
tuples = zip(range(5), reversed(range(5)))
mapping = dict(tuples)
```

The `zip` function pairs up corresponding elements from `range(5)` and `reversed(range(5))`, creating a sequence of 2-tuples. The `dict` function then transforms this list of tuples into a dictionary named `mapping`. In this case, the resulting dictionary has keys from the first sequence (`range(5)`) and values from the reversed second sequence, producing the output `{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}`.

In [None]:
tuples = zip(range(5), reversed(range(5)))
tuples


<zip at 0x7f379823ba40>

In [None]:
mapping = dict(tuples)
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

#### Default values

Indeed, the `get` method for dictionaries in Python allows you to retrieve a value associated with a key, and it also allows specifying a default value to be returned if the key is not found. This can simplify code, as shown in the example you provided:

```python
value = some_dict.get(key, default_value)
```

This line of code retrieves the value for the specified key from `some_dict`. If the key is not present, it returns `default_value` instead. This eliminates the need for an explicit if-else block to check for key existence, making the code more concise and readable.

value = some_dict.get(key, default_value)
value

The `get` method in Python returns `None` by default if the specified key is not present in the dictionary. On the other hand, the `pop` method will raise an exception when attempting to retrieve a non-existent key. When assigning values in a dictionary, these values can be of various types, including collections like lists.

For instance, consider a scenario where you have a list of words: ["apple", "bat", "bar", "atom", "book"]. You may want to categorize these words based on their first letters, creating a dictionary of lists. In the provided example:

```python
words = ["apple", "bat", "bar", "atom", "book"]
by_letter = {}

for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)
```

After this process, the `by_letter` dictionary would be {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}. This demonstrates how values in a dictionary can be structured, with keys representing categories and corresponding values being lists containing elements associated with those categories.

In [None]:
words = ["apple", "bat", "bar", "atom", "book"]
by_letter = {}

for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The `setdefault` dictionary method in Python provides a concise way to simplify the workflow of the previous example. The for loop can be rewritten using `setdefault` as follows:

```python
by_letter = {}

for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
```

This achieves the same result as the previous code snippet but in a more compact form. The `setdefault` method checks if the specified key (in this case, the variable `letter`) exists in the dictionary. If it does, it returns the corresponding value; otherwise, it sets the key to the default value provided (an empty list `[]` in this case) and then appends the current word to that list. This eliminates the need for explicit conditional statements to handle the creation of lists for new keys.

In [None]:
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The built-in `collections` module in Python includes a convenient class called `defaultdict`, which further simplifies the process. By using `defaultdict`, you can create a dictionary with default values assigned to each slot. In this case, you pass the `list` type to `defaultdict`, indicating that each key will have an associated list as its default value:

```python
from collections import defaultdict

by_letter = defaultdict(list)

for word in words:
    by_letter[word[0]].append(word)
```

This eliminates the need for explicit initialization of empty lists for each key, as `defaultdict` automatically handles it. The resulting `by_letter` dictionary will have the same structure as before: {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}. Using `defaultdict` is a concise and efficient way to achieve the desired dictionary structure.

In [None]:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)
by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

#### Valid dictionary key types

In Python dictionaries, while the values can be any Python object, the keys typically need to be immutable objects such as scalar types (int, float, string) or tuples. This requirement is based on the concept of "hashability." The technical term refers to whether an object can be hashed, allowing it to be used as a key in a dictionary.

You can use the `hash` function to check the hashability of an object. For example:

```python
hash("string")               # Output: 4022908869268713487
hash((1, 2, (2, 3)))         # Output: -9209053662355515447
```

In these cases, the objects (string and tuple) are hashable and can be used as keys in a dictionary. However, attempting to hash an object that contains a mutable element, such as a list, will result in a `TypeError`:

```python
hash((1, 2, [2, 3]))          # TypeError: unhashable type: 'list'
```

This error occurs because lists are mutable, and mutable objects are not hashable. Therefore, it's important to use immutable objects as keys when working with dictionaries in Python.

In [None]:
hash("string")


987602855173242659

In [None]:
hash((1, 2, (2, 3)))


-9209053662355515447

In [None]:
hash((1, 2, [2, 3])) # fails because lists are mutable

TypeError: unhashable type: 'list'

Indeed, the hash values produced by the `hash` function can vary between different Python versions.

If you need to use a list as a key in a dictionary, one approach is to convert the list into a tuple. Tuples are hashable as long as their elements are hashable. Here's an example:

```python
d = {}
d[tuple([1, 2, 3])] = 5
```

In this case, the list `[1, 2, 3]` is converted to a tuple `(1, 2, 3)` before being used as a key in the dictionary `d`. This ensures that the key is hashable. The resulting dictionary, in this example, will be `{(1, 2, 3): 5}`. Using tuples in this way provides a workaround for using collections with mutable elements as keys in a dictionary.

In [None]:
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

### Set

A set in Python is an unordered collection of unique elements. There are two ways to create a set: using the `set` function or using a set literal with curly braces.

Using the `set` function:

```python
set([2, 2, 2, 1, 3, 3])    # Output: {1, 2, 3}
```

Using a set literal:

```python
{2, 2, 2, 1, 3, 3}         # Output: {1, 2, 3}
```

In both cases, the resulting set contains only unique elements, and the order of elements is not guaranteed since sets are unordered collections. The duplicate values are automatically removed, and you get a set with distinct elements.

In [None]:
set([2, 2, 2, 1, 3, 3])


In [None]:
{2, 2, 2, 1, 3, 3}

The Sets module in Python supports various mathematical set operations, such as union, intersection, difference, and symmetric difference. To illustrate, let's consider two example sets:

```python
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}
```

The union of these sets represents the set of distinct elements occurring in either set. This can be calculated using either the `union` method or the `|` binary operator:

```python
a.union(b)   # Output: {1, 2, 3, 4, 5, 6, 7, 8}
a | b        # Output: {1, 2, 3, 4, 5, 6, 7, 8}
```

The intersection, on the other hand, contains the elements that occur in both sets. This can be achieved using the `intersection` method or the `&` operator:

```python
a.intersection(b)   # Output: {3, 4, 5}
a & b               # Output: {3, 4, 5}
```

In summary, the Sets module provides convenient methods and operators for performing set operations in Python.

In [None]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [None]:
a.union(b)


In [None]:
a | b

In [None]:
a.intersection(b)


In [None]:
a & b

Refer to Table 3.1 for a comprehensive list of commonly used set methods in Python along with alternative syntax and descriptions:


Table 3.1: Python Set Operations

Function                      | Alternative Syntax | Description
------------------------------|---------------------|-------------------------------------------
a.add(x)                      | N/A                 | Add element x to set a
a.clear()                     | N/A                 | Reset set a to an empty state, discarding all of its elements
a.remove(x)                   | N/A                 | Remove element x from set a
a.pop()                       | N/A                 | Remove an arbitrary element from set a, raising KeyError if the set is empty
a.union(b)                    | a | b               | All of the unique elements in a and b
a.update(b)                   | a |= b              | Set the contents of a to be the union of the elements in a and b
a.intersection(b)             | a & b               | All of the elements in both a and b
a.intersection_update(b)      | a &= b              | Set the contents of a to be the intersection of the elements in a and b
a.difference(b)               | a - b               | The elements in a that are not in b
a.difference_update(b)        | a -= b              | Set a to the elements in a that are not in b
a.symmetric_difference(b)     | a ^ b               | All of the elements in either a or b but not both
a.symmetric_difference_update(b)| a ^= b             | Set a to contain the elements in either a or b but not both
a.issubset(b)                 | <=                  | True if the elements of a are all contained in b
a.issuperset(b)               | >=                  | True if the elements of b are all contained in a
a.isdisjoint(b)               | N/A                 | True if a and b have no elements in common


This table serves as a quick reference for utilizing these set methods in Python.

**Note:** 
If you provide an input that is not a set to methods such as union and intersection, Python will automatically convert the input to a set before performing the operation. However, when using the binary operators like `|` and `&`, both objects must already be sets for the operation to be executed successfully. Ensure that the data types are appropriate to avoid any unexpected behavior during set operations in Python.

All logical set operations in Python have corresponding in-place counterparts, allowing you to replace the contents of the set on the left side of the operation with the result. This can be particularly advantageous for very large sets, as it may offer improved efficiency. Here's an example:

```python
c = a.copy()
c |= b
# The contents of set c are now the union of sets a and b

d = a.copy()
d &= b
# The contents of set d are now the intersection of sets a and b
```

In this way, using the in-place counterparts, such as `|=` for union and `&=` for intersection, enables you to directly modify the existing set, potentially saving resources and time for extensive sets.

In [None]:
c = a.copy()
c |= b
c


In [None]:
d = a.copy()
d &= b
d

Similar to dictionary keys, set elements in Python generally must be immutable and hashable. Hashable means that calling the `hash` function on a value should not raise an exception. To accommodate elements that are list-like or other mutable sequences in a set, you can convert them to tuples. Here's an example:

```python
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
# The list-like elements are converted to a tuple before being added to the set

print(my_set)
# Output: {(1, 2, 3, 4)}
```

By converting the mutable sequence `my_data` to an immutable tuple, you can store it in a set without any issues, ensuring that the set remains consistent with the requirement of having hashable and immutable elements.

In [None]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

You can verify whether a set is a subset of (contained in) or a superset of (contains all elements of) another set in Python. Here's an example:

```python
a_set = {1, 2, 3, 4, 5}

# Check if {1, 2, 3} is a subset of a_set
subset_check = {1, 2, 3}.issubset(a_set)
print(subset_check)
# Output: True

# Check if a_set is a superset of {1, 2, 3}
superset_check = a_set.issuperset({1, 2, 3})
print(superset_check)
# Output: True
```

In this example, the `issubset` method is used to determine if the set `{1, 2, 3}` is a subset of `a_set`, and the `issuperset` method is used to check if `a_set` is a superset of `{1, 2, 3}`. Both operations return `True` in this case.

In [None]:
a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set)

In [None]:
a_set.issuperset({1, 2, 3})

Sets in Python are considered equal if and only if their contents are equal. The order of elements does not affect the equality of sets. Here's an example:

```python
# Check if {1, 2, 3} is equal to {3, 2, 1}
equality_check = {1, 2, 3} == {3, 2, 1}
print(equality_check)
# Output: True
```

In this case, the sets `{1, 2, 3}` and `{3, 2, 1}` are considered equal because they contain the same elements, regardless of the order in which the elements are specified.

In [None]:
{1, 2, 3} == {3, 2, 1}

True

### Built-In Sequence Functions

#### enumerate

Python provides several useful sequence functions, and one of them is `enumerate`. When iterating over a sequence, it's often necessary to keep track of the index of the current item. While a manual approach involves maintaining an index variable, Python simplifies this task with the `enumerate` function. Here's a comparison:

**Manual Approach:**
```python
index = 0
for value in collection:
    # do something with value
    index += 1
```

**Using enumerate:**
```python
for index, value in enumerate(collection):
    # do something with value
```

The `enumerate` function returns a sequence of tuples, where each tuple contains the index (`i`) and the corresponding value from the collection. This provides a more concise and readable way to iterate over a sequence while keeping track of the index.

#### sorted
The `sorted` function in Python is a versatile tool for obtaining a new sorted list from the elements of any sequence. Here are a couple of examples:

```python
# Sorting a list of numbers
sorted_list1 = sorted([7, 1, 2, 6, 0, 3, 2])
# Output: [0, 1, 2, 2, 3, 6, 7]

# Sorting a string
sorted_list2 = sorted("horse race")
# Output: [' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']
```

The `sorted` function works on various types of sequences, including lists and strings. It returns a new sorted list without modifying the original sequence. It's worth noting that the `sorted` function accepts the same arguments as the `sort` method on lists, making it convenient to use in different scenarios.

In [None]:
sorted([7, 1, 2, 6, 0, 3, 2])


In [None]:
sorted("horse race")


#### `zip`

The `zip` function in Python combines the elements of multiple sequences, such as lists or tuples, to create a list of tuples. Here's an example:

```python
# Two sequences to zip
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]

# Using zip to create pairs
zipped = zip(seq1, seq2)

# Converting the result to a list of tuples
result_list = list(zipped)

print(result_list)
# Output: [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]
```

In this example, `zip(seq1, seq2)` pairs up the elements from `seq1` and `seq2`, creating a list of tuples where each tuple contains corresponding elements from the two sequences. The `list(zipped)` call is used to display the result as a list of tuples.

In [2]:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
zipped = zip(seq1, seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

The `zip` function in Python can handle an arbitrary number of sequences, and the number of elements it produces is determined by the shortest sequence. Here's an example:

```python
# Another sequence to zip
seq3 = [False, True]

# Using zip with three sequences
zipped_result = list(zip(seq1, seq2, seq3))

print(zipped_result)
# Output: [('foo', 'one', False), ('bar', 'two', True)]
```

In this case, `zip(seq1, seq2, seq3)` pairs up the elements from `seq1`, `seq2`, and `seq3`. Since `seq3` is shorter than the other sequences, only two tuples are produced in the result, each containing corresponding elements from all three sequences. The `list(zipped_result)` call is used to display the result as a list of tuples.

In [3]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

A common and powerful use of `zip` is to simultaneously iterate over multiple sequences. This is often combined with `enumerate` to also keep track of the index. Here's an example:

```python
# Two sequences to zip
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]

# Simultaneously iterating over sequences with zip and enumerate
for index, (a, b) in enumerate(zip(seq1, seq2)):
    print(f"{index}: {a}, {b}")
```

Output:
```
0: foo, one
1: bar, two
2: baz, three
```

In this example, `enumerate(zip(seq1, seq2))` pairs up elements from `seq1` and `seq2`, and `enumerate` is used to get both the index and the tuple containing elements from both sequences. This allows for a clean and concise way to iterate over multiple sequences simultaneously.

In [None]:
for index, (a, b) in enumerate(zip(seq1, seq2)):
    print(f"{index}: {a}, {b}")


#### `reversed`

The `reversed` function in Python is used to iterate over the elements of a sequence in reverse order. Here's an example:

```python
# Using reversed with a range
reversed_result = list(reversed(range(10)))

print(reversed_result)
# Output: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
```

In this case, `reversed(range(10))` generates the elements of the `range(10)` sequence in reverse order, and the `list()` call is used to materialize them into a list.



In [None]:
list(reversed(range(10)))

It's important to note that `reversed` is a generator, meaning it does not create the reversed sequence until materialized, either by using `list` or within a `for` loop. This can be advantageous for memory efficiency when dealing with large sequences.

### List, Set, and Dictionary Comprehensions

List comprehensions in Python are a popular and convenient language feature. They provide a concise way to create a new list by filtering elements from an existing collection and applying a transformation to those elements, all in a single expression. The basic syntax is:

```python
[expr for value in collection if condition]
```

This is essentially equivalent to the following for loop:

```python
result = []
for value in collection:
    if condition:
        result.append(expr)
```

You can omit the filter condition in a list comprehension, leaving only the expression. For instance, if you have a list of strings, you can filter out strings with a length of 2 or less and convert the remaining ones to uppercase, as demonstrated below:

```python
strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]
```

The output would be:

```python
['BAT', 'CAR', 'DOVE', 'PYTHON']
```

This list comprehension creates a new list containing the uppercase versions of strings with lengths greater than 2 from the original list.

In [5]:
strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

Set and dictionary comprehensions are extensions of list comprehensions, allowing you to create sets and dictionaries in a similar concise manner.

For a dictionary comprehension, the syntax is as follows:

```python
dict_comp = {key_expr: value_expr for value in collection if condition}
```

In this expression, `key_expr` and `value_expr` represent the expressions for the key and value, respectively. The comprehension iterates over the elements in the collection, and the key-value pairs are included in the dictionary if they satisfy the specified condition.

A set comprehension is akin to a list comprehension, but with curly braces `{}` instead of square brackets `[]`. The syntax is as follows:

```python
set_comp = {expr for value in collection if condition}
```

In this expression, `expr` represents the expression to be included in the set for each element in the collection that satisfies the given condition. It results in a set containing the unique values produced by the expression for the qualifying elements.

Set comprehensions provide a concise way to create sets by applying an expression to elements from a collection, just as demonstrated in your example. If you have a list of strings, for instance, and you want to obtain a set containing the lengths of those strings, you can achieve this with a set comprehension:

```python
unique_lengths = {len(x) for x in strings}
```

In this case, `unique_lengths` will be a set containing the unique lengths of the strings in the original collection. This not only simplifies the code but also enhances readability by expressing the intention more clearly.

In [6]:
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

The `map` function provides another way to achieve the same result more functionally. For example:

```python
set(map(len, strings))
```

This code uses `map` to apply the `len` function to each element in the `strings` collection, and then the resulting lengths are used to create a set. Both the set comprehension and the `map` function approach accomplish the same task of obtaining a set with unique lengths of strings, providing flexibility in coding styles.

In [7]:
set(map(len, strings))

{1, 2, 3, 4, 6}

As an illustration of a straightforward dictionary comprehension, we can generate a mapping that associates each string with its corresponding index in the list. The code for creating this mapping is as follows:

```python
loc_mapping = {value: index for index, value in enumerate(strings)}
```

The resulting `loc_mapping` dictionary would look like this:

```python
{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}
```

This dictionary relates each string in the list to its position (index) within the list.

In [8]:
loc_mapping = {value: index for index, value in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

#### Nested list comprehensions

Consider a scenario where we have a list of lists containing both English and Spanish names:

```python
all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]
```

Now, let's say we want to create a single list that includes all names with two or more occurrences of the letter 'a'. To achieve this, we can use a straightforward for loop:

```python
names_of_interest = []

for names in all_data:
    enough_as = [name for name in names if name.count("a") >= 2]
    names_of_interest.extend(enough_as)
```

The resulting `names_of_interest` list will contain names that satisfy the condition of having at least two occurrences of the letter 'a'.

In [11]:
all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

names_of_interest = []
for names in all_data:
    enough_as = [name for name in names if name.count("a") >= 2]
    names_of_interest.extend(enough_as)
names_of_interest

['Maria', 'Natalia']

The given code can be condensed into a single nested list comprehension:

```python
result = [name for names in all_data for name in names if name.count("a") >= 2]
```

This concise expression achieves the same outcome as the previous for loop, producing the list of names with two or more occurrences of the letter 'a'.

In [12]:
result = [name for names in all_data for name in names if name.count("a") >= 2]
result

['Maria', 'Natalia']

Initially, comprehending nested list comprehensions may pose a challenge. The 'for' clauses in the list comprehension are structured based on the nesting order, and any filtering condition is placed at the end as usual. To illustrate, consider the following example where we transform a list of tuples of integers into a flat list of integers:

```python
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
```

In this case, the resulting 'flattened' list would be `[1, 2, 3, 4, 5, 6, 7, 8, 9]`. 

In [None]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

It's crucial to note that the order of the 'for' expressions remains the same as if you were to use nested 'for' loops instead of a list comprehension:

```python
flattened = []
for tup in some_tuples:
    for x in tup:
        flattened.append(x)
```

In [None]:
flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x)

It's possible to have multiple levels of nesting, but if the nesting exceeds two or three levels, it's advisable to evaluate whether it enhances or hinders code readability. It's important to distinguish the demonstrated syntax from a list comprehension within a list comprehension, which is also valid:

```python
[[x for x in tup] for tup in some_tuples]
```

This expression results in `[[1, 2, 3], [4, 5, 6], [7, 8, 9]]`.

In [None]:
[[x for x in tup] for tup in some_tuples]

## 3.2 Functions

In Python, functions serve as the fundamental and crucial means of organizing and reusing code. As a general guideline, when you foresee the necessity to replicate the same or closely related code multiple times, it is advisable to create a reusable function. Additionally, functions contribute to enhancing code readability by assigning a name to a set of Python statements.

The `def` keyword is employed to declare functions in Python. A function comprises a code block, and it may include the optional use of the `return` keyword. Here's an example:

```python
def my_function(x, y):
    return x + y
```

In this case, the `my_function` is defined with parameters `x` and `y`, and it returns the sum of these two parameters.

In [None]:
def my_function(x, y):
    return x + y

When a line with the `return` statement is encountered in a function, the value or expression following `return` is sent back to the context where the function was called. In your example:

```python
my_function(1, 2)
# Output: 3

result = my_function(1, 2)
print(result)
# Output: 3
```

The function `my_function` is called with arguments `1` and `2`, and since it contains a `return x + y` statement, it returns the sum of `1` and `2`, which is `3`. This value is then assigned to the variable `result` in the second call, and when `result` is printed, it outputs `3`.

In [None]:
my_function(1, 2)   # Output: 3
result = my_function(1, 2)   # Output: 3
result

In Python, having multiple `return` statements in a function is acceptable. If Python reaches the end of a function without encountering a `return` statement, it automatically returns `None`. For example:

```python
def function_without_return(x):
    print(x)

result = function_without_return("hello!")
print(result)
```

The function `function_without_return` prints the value of `x`, but it doesn't have a `return` statement. When this function is called with the argument `"hello!"`, it prints `"hello!"` and returns `None`, which is then assigned to the variable `result`. Printing `result` outputs `None`.

In [None]:
def function_without_return(x):
    print(x)

result = function_without_return("hello!")
print(result)

hello!
None


In Python, functions can have both positional arguments and keyword arguments. Keyword arguments are often employed to provide default values or make certain parameters optional.

Here's an example function, `my_function2`, with a positional argument `x`, a positional argument `y`, and an optional keyword argument `z` with a default value of `1.5`:

```python
def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)
```

In this function, if `z` is not provided when calling the function, it defaults to `1.5`. However, all positional arguments (`x` and `y` in this case) must be specified when calling the function.

In [None]:
def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

In Python, when calling a function, values can be passed to keyword arguments with or without explicitly mentioning the keyword, although using the keyword is recommended for clarity.

As demonstrated in youthe following examples:

```python
my_function2(5, 6, z=0.7)
# Output: 0.06363636363636363

my_function2(3.14, 7, 3.5)
# Output: 35.49

my_function2(10, 20)
# Output: 45.0
```

The key point you highlighted is that keyword arguments must follow positional arguments (if any). However, the order in which you specify keyword arguments is flexible, allowing you to use them in any order, making it easier to remember the names of the arguments rather than their specific positions. This enhances code readability and reduces the risk of errors.

In [None]:
my_function2(5, 6, z=0.7)


In [None]:
my_function2(3.14, 7, 3.5)


In [None]:
my_function2(10, 20)

#### Namespaces, Scope, and Local Functions

In Python, functions have access to variables within their own local namespace, as well as variables in higher scopes, including the global scope. The term "namespace" is often used to describe the context in which variables are defined.

In the provided example function `func` below:

```python
def func():
    a = []
    for i in range(5):
        a.append(i)
```

The variable `a` is assigned within the function and belongs to the local namespace of that function. The local namespace is created when the function is called, populated with function arguments and local variables, and destroyed when the function execution is complete.

It's important to note that the local namespace has its own scope, and variables defined within it do not affect variables outside the function (unless explicitly modified using global or nonlocal keywords). This encapsulation of namespaces contributes to code organization and prevents unintended side effects. Suppose we initially define the list 'a' as an empty list:

```python
a = []
```

Then, we have a function named 'func' as follows:

```python
def func():
    for i in range(5):
        a.append(i)
```

Each invocation of the 'func' function results in the modification of the list 'a.' For instance:

```python
func()
# Now, the content of 'a' is [0, 1, 2, 3, 4]

func()
# After the second call to 'func,' the content of 'a' becomes [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
```

In summary, calling the 'func' function appends the values from 0 to 4 to the list 'a,' and with each invocation, these values are added again to the existing content of 'a.'

In [None]:
a = []
def func():
    for i in range(5):
        a.append(i)

In [None]:
func()
a    # the content of 'a' is [0, 1, 2, 3, 4]
 

In [None]:
func()
a    # After the second call to 'func,' the content of 'a' becomes [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

Assigning variables outside of a function's scope is possible, but it requires explicit declaration using either the `global` or `nonlocal` keywords. Here's an example:

```python
a = None

def bind_a_variable():
    global a
    a = []

bind_a_variable()
print(a)
# Output: []
```

The `global` keyword allows a function to modify a variable declared in the global scope. On the other hand, `nonlocal` allows a function to modify variables from an enclosing (but non-global) scope. The latter is less common and is advised to be explored in the Python documentation for a deeper understanding.

However, it's worth noting that the use of the `global` keyword is generally discouraged. Global variables are often used to maintain state in a system, and excessive use may indicate a need for object-oriented programming (using classes) for better code organization and maintainability.

In [None]:
a = None
def bind_a_variable():
    global a
    a = []
bind_a_variable()
print(a)

### Returning Multiple Values

One of the features is the ease of returning multiple values from a function using a straightforward syntax. Here's an example to illustrate:

```python
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()
```

In this example, the function `f()` assigns values to variables `a`, `b`, and `c`, and then returns them as a tuple. The assignment `a, b, c = f()` allows for the unpacking of the tuple returned by the function, assigning each value to its corresponding variable. This concise syntax enhances code readability and simplifies the process of working with multiple return values.

In [13]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()

In data analysis and various scientific applications, the practice of returning multiple values from a function and unpacking them is common. In essence, the function is returning a single object, typically a tuple, which is then unpacked into individual variables. Instead of the explicit unpacking as shown earlier:

```python
a, b, c = f()
```

You could achieve the same result by assigning the returned tuple to a single variable:

```python
return_value = f()
```

In this case, `return_value` would be a 3-tuple containing the three returned variables.

In [14]:
return_value = f()
return_value

(5, 6, 7)

As an alternative approach, you might consider returning a dictionary instead of a tuple:

```python
def f():
    a = 5
    b = 6
    c = 7
    return {"a": a, "b": b, "c": c}
```

This technique can be appealing depending on the specific requirements of your task. It allows you to associate each value with a meaningful key, making the returned data more self-descriptive and potentially improving code readability, especially when dealing with a larger set of variables.

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return {"a": a, "b": b, "c": c}
f()

{'a': 5, 'b': 6, 'c': 7}

### Functions Are Objects

Given that Python functions are treated as objects, it becomes feasible to express constructs that might be challenging in other programming languages. Consider a scenario where data cleaning is required for a list of strings, such as the one provided below:

```python
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
          "south   carolina##", "West virginia?"]
```

In [17]:
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
          "south   carolina##", "West virginia?"]

In situations involving user-submitted survey data, messy outcomes are common. To prepare this list of strings for analysis, various operations like stripping whitespace, removing punctuation symbols, and ensuring standardized capitalization are essential. An effective approach involves leveraging built-in string methods and the `re` standard library module for regular expressions. The following code demonstrates how to achieve this:

In [15]:
import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub("[!#?]", "", value)
        value = value.title()
        result.append(value)
    return result

Applying the `clean_strings` function to the `states` list produces the desired uniform and cleaned output:

In [18]:
clean_strings(states)    # Output: ['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South   Carolina', 'West Virginia']

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

An alternative and potentially more versatile approach involves creating a list of operations to be applied to a specific set of strings. Here's an implementation:

```python
import re

def remove_punctuation(value):
    return re.sub("[!#?]", "", value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for func in ops:
            value = func(value)
        result.append(value)
    return result
```

In [None]:
def remove_punctuation(value):
    return re.sub("[!#?]", "", value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for func in ops:
            value = func(value)
        result.append(value)
    return result

With this approach, the `clean_ops` list specifies the operations to be performed, including stripping whitespace, removing punctuation, and ensuring proper capitalization. The `clean_strings` function then iterates through the list of strings, applying each operation sequentially. This results in a more functional and adaptable pattern, allowing easy modification of string transformations at a high level. Moreover, the `clean_strings` function becomes more reusable and generic, providing a flexible solution for various scenarios:

In [None]:
clean_strings(states, clean_ops)   # Output: ['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South   Carolina', 'West Virginia']

The `map` function is a powerful tool in Python that applies a specified function to all items in an input list (or any iterable). In this case, you've used `map` to apply the `remove_punctuation` function to each element in the `states` list. Here's a demonstration:


In [None]:
for x in map(remove_punctuation, states):
    print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia


It's worth noting that `map` can be an alternative to list comprehensions when you want to apply a function to each element of a sequence without any filtering. This can lead to more concise and readable code in certain situations.

### Anonymous (Lambda) Functions
Python supports anonymous functions, often referred to as lambda functions. Lambda functions are a concise way to define small, one-line functions without explicitly naming them. In your example, you've shown how a regular function and its equivalent lambda function can achieve the same result:


In [None]:
def short_function(x):
    return x * 2


# Equivalent lambda function
equiv_anon = lambda x: x * 2

Both `short_function` and `equiv_anon` double the input value `x`. The lambda keyword is used to declare an anonymous function, and in this case, it takes one argument `x` and returns `x * 2`. Lambda functions are particularly handy when you need a simple function for a short-lived purpose, such as passing it as an argument to higher-order functions like `map` or `filter`.

Lambda functions are particularly convenient in data analysis tasks, where functions are often passed as arguments to other functions. For example:

In [None]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)   # Output: [8, 0, 2, 10, 12]

Here, the `apply_to_list` function takes a list (`some_list`) and a function (`f`) as arguments and applies the function to each element of the list using a list comprehension. The use of a lambda function allows for a concise way to define the transformation to be applied.

While the example could be achieved with a list comprehension directly (`[x * 2 for x in ints]`), using `apply_to_list` with a lambda function demonstrates the flexibility and readability that lambda functions can bring, especially when the transformation logic is more complex or when you want to reuse a particular transformation in multiple places.

An another example effectively demonstrates the use of a lambda function for sorting a collection of strings based on the number of distinct letters in each string:

In [None]:
strings = ["foo", "card", "bar", "aaaa", "abab"]
strings.sort(key=lambda x: len(set(x)))    # Sorting based on the number of distinct letters using a lambda function
strings   # Output: ['aaaa', 'foo', 'abab', 'bar', 'card']

In this case, the `sort` method is utilized with the `key` parameter, which takes a function to determine the sorting criterion. The lambda function `lambda x: len(set(x))` calculates the number of distinct letters in each string by converting the string to a set and then measuring the length of that set. As a result, the strings are sorted based on the count of unique letters in ascending order.

### Generators

Python provides support for iteration in many objects, such as lists or file lines, through the iterator protocol. This protocol offers a generic way to enable iteration on objects. For instance, when iterating over a dictionary, it iterates through its keys, as shown in the following example:

In [None]:
some_dict = {"a": 1, "b": 2, "c": 3}
for key in some_dict:
    print(key)

a
b
c


The code above iterates through the keys of the dictionary (`"a"`, `"b"`, `"c"`), printing each key. When the loop is initiated with `for key in some_dict`, the Python interpreter internally creates an iterator for `some_dict` using the `iter()` function:

In [None]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x7f03b054e250>

The `dict_iterator` is then used for iteration, and it represents the dictionary's keys through the iterator protocol.

An iterator in Python is an object that, when employed in contexts like a for loop, provides objects to the Python interpreter. Many methods that anticipate a list or list-like entity can also handle any iterable object. This encompasses built-in functions like min, max, and sum, as well as type constructors such as list and tuple. For example, by applying the list constructor to the previously created `dict_iterator`, the iterator's contents, representing the keys of the dictionary, are converted into a list:

In [None]:
list(dict_iterator)

['a', 'b', 'c']

The resulting output would be `['a', 'b', 'c']`, demonstrating that the iterator has been successfully converted into a list containing the keys of the dictionary.

A generator in Python provides a convenient way, akin to crafting a regular function, to create a new iterable object. Unlike normal functions that execute and return a single result at a time, generators can yield a sequence of multiple values by pausing and resuming execution each time the generator is invoked. To construct a generator, the `yield` keyword is employed instead of `return` within a function.

For example, consider the generator function `squares`:

In [None]:
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

Upon calling the generator, no code is immediately executed:

In [None]:
gen = squares()
gen

<generator object squares at 0x7f03b82070d0>

The output shows that a generator object has been created. Execution of the generator's code only begins when elements are requested from it:

In [None]:
for x in gen:
    print(x, end=" ")

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

It's worth noting that generators output one element at a time, as opposed to generating an entire list all at once. This characteristic can be beneficial in terms of memory usage for programs.

### Generator expressions

An alternative method to create a generator is by utilizing a generator expression, which is akin to list, dictionary, and set comprehensions. To construct a generator expression, enclose the comprehension within parentheses instead of using brackets. For instance:

In [None]:
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x7f03b8207680>

The equivalent more explicit generator can be defined as follows:

In [None]:
def _make_gen():
    for x in range(100):
        yield x ** 2

gen = _make_gen()

Generator expressions can be employed in lieu of list comprehensions in certain function arguments. Examples include using them with the `sum` function:

In [None]:
sum(x ** 2 for x in range(100))

And with dictionary creation:

In [None]:
dict((i, i ** 2) for i in range(5))

Depending on the number of elements generated by the comprehension expression, the generator version may exhibit meaningful performance advantages in some cases.

### itertools module

The `itertools` module in the standard library offers a variety of generators for common data algorithms. One such generator is `groupby`, which takes a sequence and a function, grouping consecutive elements in the sequence based on the return value of the function. Here's an illustrative example:

In [None]:
import itertools
def first_letter(x):
    return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


In this example, the function `first_letter` extracts the first letter of each name. The `groupby` generator then groups the names based on their first letter. Each iteration of the loop produces a tuple where the first element is the key (in this case, the first letter) and the second element is a generator of the corresponding grouped items. It's important to note that the `groupby` function assumes that the input sequence is sorted based on the key function, as it groups consecutive elements with the same key. See Table 3.2 for a list of a few other itertools functions.


##### Table 3.2: Some Useful `itertools` Functions

| Function                       | Description                                                                                             |
|--------------------------------|---------------------------------------------------------------------------------------------------------|
| `chain(*iterables)`             | Generates a sequence by chaining iterators together. Once elements from the first iterator are exhausted, elements from the next iterator are returned, and so on. |
| `combinations(iterable, k)`    | Generates a sequence of all possible k-tuples of elements in the iterable, ignoring order and without replacement (see also the companion function `combinations_with_replacement`). |
| `permutations(iterable, k)`    | Generates a sequence of all possible k-tuples of elements in the iterable, respecting order.             |
| `groupby(iterable[, keyfunc])` | Generates `(key, sub-iterator)` for each unique key.                                                      |
| `product(*iterables, repeat=1)` | Generates the Cartesian product of the input iterables as tuples, similar to a nested `for` loop.          |

For more details on these and other `itertools` functions, refer to the official Python documentation for the useful built-in utility module.

#### Errors and Exception Handling

Effectively managing Python errors or exceptions is a crucial aspect of developing resilient programs. When working with data analysis applications, it's common for functions to operate successfully only on specific types of input. For instance, consider Python's `float` function, which can successfully convert a string to a floating-point number, as demonstrated here:

In [None]:
float("1.2345")   # Outputs: 1.2345

However, when the input is inappropriate, such as a non-numeric string, the `float` function raises a `ValueError`:

In [None]:
float("something")   # Raises: ValueError: could not convert string to float: 'something'

ValueError: could not convert string to float: 'something'

In this example, attempting to convert the string "something" to a float results in a `ValueError` with a message indicating the failure to convert the given string to a float. Handling such exceptions in a graceful manner is vital for ensuring that your program can respond appropriately to unexpected input, preventing crashes and enhancing overall robustness.

Consider a scenario where we desire a modified version of the float function that handles errors gracefully by returning the input argument in case of failure. This can be achieved by creating a function that wraps the call to float within a try/except block. The function is defined as follows:

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

The try block attempts to convert the input argument `x` to a float using the `float(x)` operation. If successful, the result is returned. However, if an exception occurs during the conversion (e.g., if `x` is not a valid numeric representation), the except block is triggered, and the function returns the original input argument `x`.

Here are some examples of using this function:

In [None]:
attempt_float("1.2345")

In [None]:
attempt_float("something")

In the first example, where the input is a valid numeric string, the function successfully converts it to a float. In the second example, where the input is not a valid numeric representation, the function gracefully handles the exception and returns the original input string.

You might observe that the `float` function can raise exceptions other than `ValueError`. For instance, attempting to convert a tuple to a float raises a `TypeError`:

In [None]:
float((1, 2))

TypeError: float() argument must be a string or a real number, not 'tuple'

In situations where you want to suppress only the `ValueError` exception, as a `TypeError` might indicate a genuine bug in your program (e.g., the input was not a string or numeric value), you can specify the exception type after the `except` keyword:

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

With this modification, if a `ValueError` occurs during the conversion, the function gracefully handles it and returns the original input. However, if a `TypeError` or any other exception occurs, it will propagate up the call stack, potentially revealing a bug in the program:

In [None]:
attempt_float((1, 2))

TypeError: float() argument must be a string or a real number, not 'tuple'

This way, you can selectively handle specific exceptions and let others propagate for debugging purposes.

You can handle multiple types of exceptions by specifying a tuple of exception types in the except clause, enclosed in parentheses. For example:

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In situations where you want certain code to run regardless of whether the try block succeeds or not, you can use the finally clause. For instance:

In [None]:
f = open(path, mode="w")

try:
    write_to_file(f)
finally:
    f.close()


This ensures that the file object `f` is always closed, regardless of whether an exception occurs or not. Additionally, you can include code that should only execute if the try block succeeds by using the else clause:

In [None]:
f = open(path, mode="w")

try:
    write_to_file(f)
except:
    print("Failed")
else:
    print("Succeeded")
finally:
    f.close()


Succeeded


In this example, the "Succeeded" message will be printed only if no exceptions are raised in the try block, and the file will still be closed in the finally block.

## 3.3 Files and the Operating System

The majority of the content in this book employs advanced tools such as `pandas.read_csv` for importing data files from disk into Python data structures. Nevertheless, it is crucial to grasp the fundamentals of file manipulation in Python. Thankfully, this process is relatively simple, contributing to Python's popularity for tasks involving text and file handling.

To access a file for reading or writing purposes, utilize the built-in `open` function along with a relative or absolute file path, and you may include an optional file encoding as well:


In [None]:
# Specify the file path
path = "examples/segismundo.txt"

# Open the file with UTF-8 encoding
f = open(path, encoding="utf-8")

In this example, I explicitly specify `encoding="utf-8"` as a recommended practice because the default Unicode encoding for reading files can vary across different platforms.

By default, the file is opened in read-only mode, denoted by "r". Following this, you can treat the file object `f` like a list and iterate over its lines using a `for` loop:

In [None]:
# Iterate over the lines in the file
for line in f:
    print(line)

This loop allows you to access and print each line within the file.

The lines retrieved from the file maintain their end-of-line (EOL) markers. To obtain a list of lines without these markers, it is common to use code like the following:

In [None]:
# Create a list of lines without end-of-line markers
lines = [x.rstrip() for x in open(path, encoding="utf-8")]
lines

['Sueña el rico en su riqueza,',
 'que más cuidados le ofrece;',
 '',
 'sueña el pobre que padece',
 'su miseria y su pobreza;',
 '',
 'sueña el que a medrar empieza,',
 'sueña el que afana y pretende,',
 'sueña el que agravia y ofende,',
 '',
 'y en el mundo, en conclusión,',
 'todos sueñan lo que son,',
 'aunque ninguno lo entiende.']

In this code, a list comprehension is employed to iterate over each line in the file, applying the `rstrip()` method to remove trailing whitespaces, including the EOL markers. The resulting `lines` list contains the file content without these markers. The output demonstrates a clean representation of the lines.

When utilizing the `open` function to create file objects, it is advisable to close the file once you have completed your operations on it. Closing the file is important as it releases its resources back to the operating system, preventing potential issues. Here's an example demonstrating the recommended practice of closing a file:

In [None]:
# Close the file when finished
f.close()

A convenient way to streamline the process of cleaning up open files is by using the `with` statement. This ensures that the file is automatically closed when exiting the `with` block. Here's an example:

In [None]:
with open(path, encoding="utf-8") as f:    # Using the 'with' statement to open and automatically close the file
    lines = [x.rstrip() for x in f]

In this context, the `with` statement takes care of closing the file (`f`) when the code block is exited. This approach is particularly useful for preventing resource leaks and is considered a best practice.

It's important to note that failing to close files may not pose issues in small programs or scripts, but it becomes crucial in larger programs dealing with numerous files.

If the file had been opened with the "w" mode (e.g., `f = open(path, "w")`), a new file at the specified path would have been created, potentially overwriting any existing file. Additionally, there is the "x" file mode, which creates a writable file but fails if the file path already exists. Refer to Table 3.3 for a comprehensive list of valid file read/write modes.

Table 3.3: Python File Modes

| Mode | Description |
| ---- | ----------- |
| `r`  | Read-only mode |
| `w`  | Write-only mode; creates a new file (erasing the data for any file with the same name) |
| `x`  | Write-only mode; creates a new file but fails if the file path already exists |
| `a`  | Append to existing file (creates the file if it does not already exist) |
| `r+` | Read and write |
| `b`  | Add to mode for binary files (i.e., "rb" or "wb") |
| `t`  | Text mode for files (automatically decoding bytes to Unicode); this is the default if not specified |

These file modes are used as arguments when opening a file with the `open` function in Python. Each mode serves a specific purpose, such as reading, writing, appending, or handling binary data. It's crucial to choose the appropriate mode based on the intended file operation.


When dealing with readable files, some commonly used methods include `read`, `seek`, and `tell`. The `read` method retrieves a specific number of characters from the file. The definition of a "character" is dependent on the file encoding or, in the case of binary mode, raw bytes. Here's an illustration using both text and binary modes:

For text mode:

In [None]:
f1 = open(path)   # Open the file in text mode
f1.read(10)   # Read the first 10 characters


For binary mode:

In [None]:
f2 = open(path, mode="rb")  #  Open the file in binary mode
f2.read(10)     # Read the first 10 bytes

In the binary mode example, the `b` prefix before the string indicates that the result is a bytes object. The characters are represented in their raw byte form, reflecting the file's content in a binary context. The specific encoding is crucial in text mode, as it influences the interpretation of characters from the file.

The `read` method not only retrieves data from a file but also advances the file object's position by the number of bytes read. The current position of the file object can be obtained using the `tell` method. Here are examples illustrating this behavior in both text and binary modes:

In [None]:
f1.tell()    # Get the current position after reading from the file opened in text mode


In [None]:
f2.tell()   # Get the current position after reading from the file opened in binary mode

In the text mode example, even though we read 10 characters, the position is 11. This is because it took that many bytes to decode 10 characters using the default encoding, which is UTF-8 in this case.

To check the default encoding, you can use the `sys` module:

In [None]:
import sys    # Import the sys module
sys.getdefaultencoding()   # Get the default encoding

The default encoding is determined by the system and can be retrieved using `sys.getdefaultencoding()`. In this case, the default encoding is UTF-8.

To ensure consistent behavior across different platforms, it is advisable to specify an encoding, such as `encoding="utf-8"` (a widely used encoding), when opening files.

The `seek` method allows you to change the file position to the indicated byte in the file. Here's an example:

In [None]:
f1.seek(3)   # Change the file position to byte 3
f1.read(1)   # Read 1 character from the new position


After using `seek(3)` to set the file position to byte 3, the subsequent `read(1)` operation retrieves the character 'ñ' at that position. Finally, the `tell` method confirms the updated file position:

In [None]:
f1.tell()    # Get the current position after seeking

The `tell` method indicates that the file position is now at byte 5 after the `seek(3)` operation. This combination of `seek` and `tell` allows for precise navigation within a file. 

Closing files is a crucial step to ensure proper resource management. Here's how you can close the files in your examples:

In [None]:
f1.close()   # Close the file opened in text mode
f2.close()   # Close the file opened in binary mode

To write text to a file, you can use the `write` or `writelines` methods of the file object. Below is an example that creates a new version of "examples/segismundo.txt" with no blank lines:

In [None]:
path = 'examples/segismundo.txt'   # Input file path

# Write non-empty lines to a new file
with open("tmp.txt", mode="w") as handle:   
    handle.writelines(x for x in open(path) if len(x) > 1)

# Read the contents of the new file
with open("tmp.txt") as f:
    lines = f.readlines()

lines

['Sueña el rico en su riqueza,\n',
 'que más cuidados le ofrece;\n',
 'sueña el pobre que padece\n',
 'su miseria y su pobreza;\n',
 'sueña el que a medrar empieza,\n',
 'sueña el que afana y pretende,\n',
 'sueña el que agravia y ofende,\n',
 'y en el mundo, en conclusión,\n',
 'todos sueñan lo que son,\n',
 'aunque ninguno lo entiende.']

In this example, the `writelines` method is used to write only the non-empty lines from the original file to a new file. The resulting content of the new file is then read and displayed. 

The `os.remove` function is used to delete or remove a file. In our case, it's being used to remove the "tmp.txt" file. Here's the code:

In [None]:
import os
os.remove("tmp.txt")   # Remove the "tmp.txt" file

This code snippet will delete the file named "tmp.txt" from the current working directory. Make sure to use it judiciously, as it permanently deletes the specified file.

#### Table 3.4: Important Python File Methods or Attributes

| Method/Attribute | Description |
| ---------------- | ----------- |
| `read([size])` | Return data from the file as bytes or string depending on the file mode, with an optional size argument indicating the number of bytes or string characters to read. |
| `readable()` | Return `True` if the file supports read operations. |
| `readlines([size])` | Return a list of lines in the file, with an optional size argument. |
| `write(string)` | Write the passed string to the file. |
| `writable()` | Return `True` if the file supports write operations. |
| `writelines(strings)` | Write the passed sequence of strings to the file. |
| `close()` | Close the file object. |
| `flush()` | Flush the internal I/O buffer to disk. |
| `seek(pos)` | Move to the indicated file position (integer). |
| `seekable()` | Return `True` if the file object supports seeking and thus random access (some file-like objects do not). |
| `tell()` | Return the current file position as an integer. |
| `closed` | `True` if the file is closed. |
| `encoding` | The encoding used to interpret bytes in the file as Unicode (typically UTF-8). |

These methods and attributes provide essential functionalities for reading from and writing to files, as well as managing the file object's state and properties. Understanding and using these methods appropriately is crucial for effective file handling in Python.

### Bytes and Unicode with Files

In Python, the default behavior for file handling, whether for readable or writable files, is text mode. This implies that the intention is to work with Python strings, which are Unicode. This is in contrast to binary mode, which can be obtained by appending 'b' to the file mode.

Here's an example revisiting a file (which contains non-ASCII characters with UTF-8 encoding) from the previous section:

In [None]:
# Open the file in text mode and read 10 characters
with open(path) as f:
    chars = f.read(10)

chars
len(chars)    # Display the length of the characters

In this example, the file is opened in text mode (`open(path)`), and the `read(10)` method is used to read 10 characters. The resulting `chars` variable contains a string of length 10. This behavior is expected in text mode, where characters are decoded using the specified or default encoding (UTF-8 in this case).

When a file is opened in binary mode (`"rb"`), Python reads the exact number of bytes specified by the `read` method, without decoding them. This is especially relevant when dealing with variable-length encodings such as UTF-8. In binary mode, the `read` method retrieves raw bytes.

Here's a recap of the example:

In [None]:
# Open the file in binary mode and read 10 bytes
with open(path, mode="rb") as f:
    data = f.read(10)

data

b'Sue\xc3\xb1a el '

In binary mode, the `read(10)` operation retrieves exactly 10 bytes of raw binary data from the file, and the `b` prefix before the string indicates that the result is a bytes object. This mode is useful when working with non-text files or when you want to handle the encoding explicitly in your code.

When working with binary data, decoding is necessary to interpret the bytes as text. However, decoding may result in errors if the bytes do not represent a valid encoding sequence. Here's an example illustrating this:

In [None]:
data.decode("utf-8")   # Decode the binary data using UTF-8


'Sueña el '

In this case, decoding the entire `data` using UTF-8 works correctly because the bytes represent a valid UTF-8 encoding.

However, attempting to decode only a portion of the bytes may result in a `UnicodeDecodeError` if the decoding process encounters an incomplete or invalid sequence:

In [None]:
data[:4].decode("utf-8")   # Try to decode only the first 4 bytes

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 3: unexpected end of data

This error occurs because the first 4 bytes represent an incomplete UTF-8 character.

Regarding converting between different Unicode encodings, text mode in combination with the encoding option of `open` provides a convenient way to achieve this. Here's an example converting from one encoding (UTF-8) to another (ISO-8859-1):

In [None]:
sink_path = "sink.txt"

# Open the source file in text mode
with open(path) as source:
    # Open the destination file in text mode with a different encoding
    with open(sink_path, "x", encoding="iso-8859-1") as sink:
        # Write the content of the source file to the destination file
        sink.write(source.read())

# Read and print the first 10 characters from the destination file with the new encoding
with open(sink_path, encoding="iso-8859-1") as f:
    print(f.read(10))

Sueña el r


This example demonstrates converting the content of a file from UTF-8 encoding to ISO-8859-1 encoding using the `open` function with different encoding options.

In [None]:
os.remove(sink_path)

The cautionary example you provided is a good reminder of the potential issues that can arise when using `seek` in text mode with non-binary files, especially when the file position is within the bytes defining a Unicode character.

In the example:

In [None]:
f = open(path, encoding='utf-8')   # Open the file in text mode with UTF-8 encoding
f.read(5)   # Read 5 characters from the file

In [None]:
f.seek(4)   # Move the file position to byte 4


In [None]:
f.read(1)   # Attempt to read 1 character from the new position


In [None]:
f.close()   # Close the file

The `seek(4)` operation positioned the file pointer in the middle of the UTF-8-encoded character 'ñ', resulting in a `UnicodeDecodeError` when attempting to read a single character.

This highlights the importance of being mindful of the file's encoding and the potential consequences of manipulating the file position, especially when dealing with Unicode characters. When working with text data that includes non-ASCII characters, understanding Python's Unicode functionality becomes crucial for robust and error-free data analysis.

For more in-depth information, Python's official documentation on Unicode is a valuable resource.