#Data Analysis with Python
Estimated time needed: 60 minutes

##Objectives
After completing this lab you will be able to:

*   Defining tuples, lists, dictionaries, and sets
*   Creating reusable Python functions
*   Understand mechanics of Python file objects and interacting with local hard drive

##Table of Contents
1.   Data Structures and Sequences
        * Tuple
        * List
        * Dictionary
        * Set
        * Built-In Sequence Functions
        * List, Set, and Dictionary Comprehensions
        * Functions
        * Files and the Operating System
        * Conclusion

###1.   Data Structures and Sequences

####Tuple
A tuple is a fixed-length, immutable sequence of Python objects which, once assigned, cannot be changed. The easiest way to create one is with a comma-separated sequence of values wrapped in parentheses:

In [1]:
tup = (4, 5, 6) #or tup = 4, 5, 6

tup

(4, 5, 6)

You can convert any sequence or iterator to a tuple by invoking tuple:

In [2]:
tuple([4, 0, 2])

(4, 0, 2)

In [3]:
tup = tuple('string')

tup

('s', 't', 'r', 'i', 'n', 'g')

Elements can be accessed with square brackets [] as with most other sequence types. As in C, C++, Java, and many other languages, sequences are 0-indexed in Python:

In [4]:
tup[0]

's'

When you're defining tuples within more complicated expressions, it’s often necessary to enclose the values in parentheses, as in this example of creating a tuple of tuples:

In [5]:
nested_tup = (4, 5, 6), (7, 8)

nested_tup

((4, 5, 6), (7, 8))

In [6]:
nested_tup[0]

(4, 5, 6)

In [7]:
nested_tup[1]

(7, 8)

While the objects stored in a tuple may be mutable themselves, once the tuple is created it’s not possible to modify.

If an object inside a tuple is mutable, such as a list, you can modify it in place:

In [8]:
tup = tuple(['foo', [1, 2], True])

tup[1].append(3)
tup

('foo', [1, 2, 3], True)

You can concatenate tuples using the **+** operator to produce longer tuples:

In [9]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, as with lists, has the effect of concatenating that many copies of the tuple:

In [10]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

Note that the objects themselves are not copied, only the references to them.

#####Unpacking Tuples

If you try to assign to a tuple-like expression of variables, Python will attempt to unpack the value on the righthand side of the equals sign:



In [11]:
tup = (4,5,6)

a, b, c = tup
b

5

A common use of variable unpacking is iterating over sequences of tuples or lists:

In [12]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


There are some situations where you may want to "pluck" a few elements from the beginning of a tuple. There is a special syntax that can do this, ***rest**, which is also used in function signatures to capture an arbitrarily long list of positional arguments:

In [13]:
values = 1, 2, 3, 4, 5

a, b, *rest = values
rest

[3, 4, 5]

This rest bit is sometimes something you want to discard; there is nothing special about the rest name. As a matter of convention, many Python programmers will use the underscore (_) for unwanted variables:

In [14]:
values = 1, 2, 3, 4, 5

a, b, *_ = values
_

[3, 4, 5]

#####Tuple Methods


Since the size and contents of a tuple cannot be modified, it is very light on instance methods. A particularly useful one (also available on lists) is **count**, which counts the number of occurrences of a value:

In [15]:
a = (1, 2, 2, 2, 3, 4, 2)

a.count(2)

4

####List

In contrast with tuples, lists are variable length and their contents can be modified in place. Lists are **mutable**. You can define them using square brackets [] or using the list type function:

In [16]:
a_list = [2, 3, 7, None]

tup = ("foo", "bar", "baz")

b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [17]:
b_list[1] = "peekaboo"
b_list

['foo', 'peekaboo', 'baz']

#####Adding and Removing Elements

Elements can be appended to the end of the list with the **append** method:

In [18]:
b_list.append('dwarf')
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

Using **insert** you can insert an element at a specific location in the list:

In [19]:
b_list.insert(1, 'red')
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']



---

Inserting elements into a sequence is **computationally expensive** compared to appending because it requires shifting internal references. If you need to insert elements at both the beginning and end of a sequence, consider using [collections.deque](https://www.geeksforgeeks.org/deque-in-python/), a double-ended queue optimized for this purpose found in the Python Standard Library.

---







The inverse operation to insert is **pop**, which removes and returns an element at a particular index:



In [22]:
b_list.pop(2)
b_list

['foo', 'red', 'dwarf']

Elements can be removed by value with **remove**, which locates the first such value and removes it from the list:

In [25]:
b_list.remove('red')
b_list

['dwarf']

Check if a list contains a value using the **in** keyword:

In [26]:
'dwarf' in b_list

True

The keyword **not** can be used to negate **in**:

In [27]:
'dwarf' not in b_list

False



---
Checking whether a **list** contains a value is a lot **slower than** doing so with **dictionaries and sets** (to be introduced shortly), as Python makes a linear scan across the values of the list, whereas it can check the others (based on hash tables) in constant time.


---





#####Concatenating and Combining Lists

Similar to tuples, adding two lists together with **+** concatenates them:

In [28]:
[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

If you have a list **already defined**, you can append multiple elements to it using the **extend** method:

In [29]:
x = [4, None, "foo"]

x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

Note that list concatenation by addition is a **comparatively expensive operation** since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable. Thus:

```
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)
```

is **faster** than the concatenative alternative:

```
everything = []
for chunk in list_of_lists:
    everything = everything + chunk
```

#####Sorting

You can sort a list in place (without creating a new object) by calling its **sort** function:

In [30]:
a = [7, 2, 5, 1, 3]

a.sort()
a

[1, 2, 3, 5, 7]

**sort** has a few options that will occasionally come in handy. One is the ability to pass a secondary sort key—that is, a function that produces a value to use to **sort the objects**. For example, we could sort a collection of strings by their lengths:

In [31]:
b = ["saw", "small", "He", "foxes", "six"]

b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

#####Slicing

You can select sections of most sequence types by using slice notation, which in its basic form consists of **start:stop** passed to the indexing operator **[]**:

In [33]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]

seq[1:5]

[2, 3, 7, 5]

While the element at the **start** index is **included**, the **stop** index is **not included**, so that the number of elements in the result is stop - start.

Either the start or stop can be **omitted**, in which case they default to the start of the sequence and the end of the sequence, respectively:

In [34]:
seq[:5]

[7, 2, 3, 7, 5]

In [35]:
seq[3:]

[7, 5, 6, 0, 1]

***Negative indices*** slice the sequence relative to the end:

In [36]:
seq[-4:]

[5, 6, 0, 1]

In [40]:
seq[-6:-2]

[3, 7, 5, 6]

A **step** can also be used after a second colon to, say, take every other element:

In [53]:
seq[::4]

[7, 5]

A clever use of this is to pass **-1**, which has the useful effect of reversing a list or tuple:

In [55]:
seq[::-1]

[1, 0, 6, 5, 7, 3, 2, 7]

####Dictionary