CHAPTER 3 BUILT IN DATA STRUCTURES, FUNCTIONS, AND FILES

Tuples:  (a fixed-length, immutable sequence of Python objects which once assigned, cannot be changed.)

In [1]:
tuple([4, 0, 2])
tup = tuple('string')
print(tup)

('s', 't', 'r', 'i', 'n', 'g')


In [2]:
tup[0]

's'

In [3]:
#Always enclose in parenthesis, particularly as data becomes more complicated
nested_tup = (4, 5, 6), (7, 8)
nested_tup

((4, 5, 6), (7, 8))

In [4]:
nested_tup[0]

(4, 5, 6)

In [5]:
nested_tup[1]

(7, 8)

In [6]:
tup = tuple(['foo', [1, 2], True])
tup[2] = False
#This errors since can't change the assigned value, in this case the boolean value True

TypeError: 'tuple' object does not support item assignment

In [7]:
#If wanted to replace the True value with False, would need to create a new tuple or can convert the tuple to a list
#Make the changes and then convert back: (this is stil a new tuple but extra way of doing this)

temp_list = list(tup)
temp_list[2] = False

new_tup = tuple(temp_list)
print(new_tup)

('foo', [1, 2], False)


In [9]:
#Can concatenate tuples with operators
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

In [10]:
#Or multiply
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

Unpacking tuples: NOTE can also unpack nested tuples

In [14]:
tup = (4, 5, 6)
a, b, c = tup

print(b)

5


In [None]:
#Variable unpacking also common when iterating over a sequence of tuples or lists

In [15]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


Using the *rest function. This syntax allows to obtain a few elements from the beginning of a tuple, or also used in function signatures to capture an arbitrarily long list of positional arguments

In [16]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
a

1

In [17]:
b

2

In [18]:
rest

[3, 4, 5]

In [19]:
#Counting for the number of occurances of a variable, in this case is occurances of int 2
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

Lists: NOTE are mutable!
Can define using square brackets [] or using the list type function.

In [21]:
tup = ("foo", "bar", "baz")
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [22]:
b_list[1] = "peekaboo"
b_list

['foo', 'peekaboo', 'baz']

In [23]:
#Often used to expand on expressions:
gen = range(10)
gen

range(0, 10)

In [25]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Adding or removing elements:

Elements can be appended to the end of the list using append.
Elements can be inserted at a specific location on a list using insert (the insertion index must be between 0 and the length of the list)

The inverse to insert is pop, which removes and returns an element at a particular index
Elements can be removed by value with remove, which locates the first such value and removes it from the list.


In [31]:
#Remember b_list
b_list

['foo', 'peekaboo', 'baz']

In [32]:
b_list.append("dwarf")
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

In [33]:
b_list.insert(1, "red")
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']

In [34]:
b_list.pop(2)
b_list

['foo', 'red', 'baz', 'dwarf']

In [35]:
b_list.append("foo")
b_list

['foo', 'red', 'baz', 'dwarf', 'foo']

In [36]:
b_list.remove("foo")
b_list

['red', 'baz', 'dwarf', 'foo']

In [37]:
#Checking for keywords
"dwarf" in b_list

True

In [38]:
"dwarf" not in b_list

False

To extend lists can either + or use extend syntax:

In [39]:
[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [40]:
x = [4, None, "foo"]
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

Can also sort a list in place using sort function:

In [41]:
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

In [42]:
b = ["saw", "small", "He", "foxes", "six"]
b.sort(key=len)
b
#Sort key is a function that produces a value to use to sort the objects, eg here are sorting by the string length

['He', 'saw', 'six', 'small', 'foxes']

Slicing: (Use closed brackets)

In [54]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

In [55]:
seq[:5]


[7, 2, 3, 7, 5]

In [56]:
seq[3:]

[7, 5, 6, 0, 1]

In [57]:
#Can also use negative values for slice the sequence relative to the end
seq[-4:]

[5, 6, 0, 1]

In [59]:
#A step can also be used after a second colon, to for example take every other element eg 
#Original sequnece = 7, 2, 3, 7, 5, 6, 0, 1

seq[::2]

[7, 3, 5, 0]

In [60]:
seq[::-1]
#Reverses a list or tuple

[1, 0, 6, 5, 7, 3, 2, 7]

Dictionary:

(A dictionary stores a collection of key-value pairs, where key and value are Python objects. Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key.)

(Common to use {} and colons to seperate keys and values)

In [61]:
empty_dict = {}
d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [62]:
d1[7] = "an integer"
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [63]:
d1["b"]

[1, 2, 3, 4]

In [64]:
"b" in d1
#See if dictionary contains value

True

Set: An unordered collection of unique elements. Can be created in two ways: Via set function or via a set lateral with curly braces

Note: Seet notepad on all set operations and practice work

In [65]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [66]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [67]:
c = a.copy()
c |= b
c
#This therefore efficient for large datasets

{1, 2, 3, 4, 5, 6, 7, 8}

In [68]:
d = a.copy()
d &= b
d

{3, 4, 5}

Built In Sequence Functions:

1) Enumerate -> Returns a sequence of (i, value) tuples
2) Sorted -> Returns a new sorted list from the elements of any sequence
3) Zip -> "pairs" up the elements of a number of lists, tuples, or other sequences to create a list of tuples.
4) Reversed -> iterates over the elements of a sequence in a reversed order

In [69]:
all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

names_of_interest = []
for names in all_data:
    enough_as = [name for name in names if name.count("a") >= 2]
    names_of_interest.extend(enough_as)

names_of_interest

#Ie For each namie it checks how many times "a" appears and keeps the name if it has two or more occurances

['Maria', 'Natalia']

In [70]:
#Easier method for the same result: 
result = [name for names in all_data for name in names
          if name.count("a") >= 2]
result

['Maria', 'Natalia']

Functions: ie use def 

In [71]:
def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)
    
my_function2(5, 6, z=0.7)

0.06363636363636363

In [72]:
my_function2(3.14, 7, 3.5)

35.49

In [73]:
my_function2(10, 20)
#z defaults to 1.5

45.0

In [74]:
#Cleaning data

states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
          "south   carolina##", "West virginia?"]

import re #Used for pattern matching

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip() #Removes whitespace
        value = re.sub("[!#?]", "", value) #Removes !?#
        value = value.title() #Capitalises first letter
        result.append(value) #After cleaned, added to the result list
    return result

clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

Lambda Functions: Ie Anonymous

In [75]:
def short_function(x):
    return x * 2 #Ie doubles input value

equiv_anon = lambda x: x * 2 #Takes argument x and returns x * 2. Ie if equiv_anon(5) would return 10

def apply_to_list(some_list, f):
    return [f(x) for x in some_list] #Creates the list to apply the lambda function to each element

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [84]:
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

gen = squares()
 
for x in gen:
    print(x, end=" ")

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

NOTE: Itertool functions:

chain, combinations, permutations, groupby, product 