**Learning Python -- The Programming Language for Artificial Intelligence and Data Science**

**Lecture 7: Tuples and Dictionaries**

**By Allen Y. Yang, PhD**

(c) Copyright Intelligent Racing Inc., 2021-2024. All rights reserved. Materials may NOT be distributed or used for any commercial purposes.


# Keywords

* **tuple**: Keyword for the tuple data type in Python. Tuple is an immutable type.
* **dict**: Keyword for the dictionary data type in Python. Dictionary is a mutable type. Each element in a dictionary is a (key, value) pair.
* **Histogram**: A data structure that represents a list of unique elements and their frequency of occurances within some data.

# Tuples

A tuple variable stores an ordered sequence of values. The following examples show the three ways a tuple object can be assigned:
    1. Using a pair of parentheses: t = ('A', 'B')
    2. A sequence of values separated by commas without the parentheses: t = 'a', 'b', 'c', 'd', 'e'
    3. Using tuple() function: t = tuple('abcde')

In [None]:
t = 'a', 'b', 'c', 'd', 'e'
print(t == tuple('abcde') )

t = ('A', 'B') + t[2:]
print(t)

print(() == tuple())

In many ways, tuples are similar to lists in Python. In the above example, we see that elements in a tuple can be addressed using square brackets just like in a list. Concatenating two tuples into a new tuple variable is denoted using the "+" operator. In the last example, an empty tuple is denoted using *()* or *tuple()*.

A tuple may contain just one element. However, the way to declare a single-element tuple must be different from the way to declare a single variable. See the examples below:

In [None]:
t = 'a',  # Single-element tuple
s = ('a') # a string
print(type(t), type(s))

t = 0,    # Single-element tuple
i = (0)   # an integer
print(type(t), type(i))

Different from lists, however, tuples are immutable type variables. Once created, tuple elements cannot be changed without forcing Python to create a new object. The following statement will return runtime error:

In [None]:
t = 'a', 'b'
t[1] = 'c'

However, if tuple elements are themselves mutable, then the values of the elements can be changed without changing the tuple object itself. See the following examples:

In [None]:
# Negative example: when the elements are not mutable
string = 'A'; number = 1
t = string, number
print('Initial value: ', t)
number = 2
print('Values are immutable: ', t)

# Positive example: when the elements are mutable
l = list('abc')
t = l, l
print('Initial value: ', t)
l[2] = 'a'
print('Mutable elements: ', t)

In Python, a sequence of values can be compared to another sequence. The comparison is performed from left to right element-wise, with two exceptions:
    1. Circuit breaker: When comparing "<" or ">" relationships, the comparison will return True or False when the first non-identical position satisfies either "<" or ">" relationships, and will disregard any possible situation after the position.
    2. Element-wise comparison must be of the same type: When comparing two elements at the same sequence position, they must have the same type. Otherwise Python will return "TypeError".

In [None]:
# String comparison with circuit breaker
print('abc'>'a')

# Tuple comparison with circuit breaker
print( (1, 2, 'c') > (1, 2))
print( (1, 2, 'c') > (1, 3, 3))

# List comparison
print([1, 2, 'c'] > [1, 2, 'a'])
print([1, 2, 'c'] > [1, 2, 3])

Another scenario where tuples are used is in defining input and output values of functions. Let us see some examples below:

In [1]:
email = 'allenyang@berkeley.edu'
name, address = email.split('@')       # return of the string.split() method is a tuple
print(name, address)

poem = 'Roses are red, Violets are blue, Sugar is sweet'
a, b = poem.split(',', 1)
print(a, b)

a, b, c = poem.split(',')
print(c)

allenyang berkeley.edu
Roses are red  Violets are blue, Sugar is sweet
 Sugar is sweet


In the above sample code, a built-in method of string type *split()* will return a tuple. For example, if we split a string of an email address using the pattern "@", then the substring before "@" will be assigned to the first element of the tuple *name* and the substring after "@" will be assigned to the second element "address". In the second example, *split()* may return a tuple of multiple elements, if the split pattern appears in multiple locations of the string. 

Tuples can be used to hold return values from a function. They can also be used to hold input argument values of variable length. Let us see another simple example below:

In [5]:
print(max(2, 3))
print(max(2, 3, 4))

def my_max(*args):
    print('Input length: ', len(args))
    print('args = ', args)
    return(max(args))

print(my_max(1))

3
4
Input length:  1
args =  (1,)
1


In the above sample code, we see that some Python functions accept input arguments of variable length, such as *max()* and *print()*. In addition, using tuples, we can define our own functions that accept arguments of variable length. In the definition of *my_max()*, we use a reserved symbol "*" to indicate that args should be treated as a tuple to hold multiple input values of variable length. In Python, the "*" symbol used in this scenario is called the **packing operator**.

# Dictionaries

Dictionary can be viewed as a generalization of the list type. In the list type, each element is indexed by an unique integer in ascending order. In dictionaries, each element is a **(key, value)** pair. Furthermore, the value of a dictionary entry will be addressed by its corresponding unique key. 

Let us see a few examples below:

In [None]:
D = {1: 'one', 2: 'two', 3: 'three'}
print(len(D))
print(D[2])

English_to_Chinese={}     # Define an empty dictionary
English_to_Chinese['one'] = '一'
English_to_Chinese['two'] = '二'
English_to_Chinese['three'] = '三'

del(English_to_Chinese['two'])
print(English_to_Chinese)

In the first example above, the dictionary *D* has three entries, i.e., first entry has the (key, value) pair as (1, 'one), etc. Retrieving the entry value corresponding to the key value of 2 is by *D[2]*.

The second example demonstrates more powerful ways to retrieve a dictionary's entry values. The key can be other variable types such as strings. We can create a dictionary for translation purposes, where a unique English word is used as the key, and its corresponding entry value is its translation in another language such as Chinese. 

In here, we should note that any immutable type value can be assigned to be a key. However, not all variable type can be used as a key value. Let us consider the counterexample below:

In [None]:
English_to_Chinese={} 
English_to_Chinese[[1, 2, 3]] = 'variable'

If you run the above code, Python will return a runtime error: "TypeError: unhashable type: 'list'". It shows list type variables cannot be used as valid keys. The concept of **hashable** mentioned in the error message will be discussed in more detail in the next class. 

One of the widely used applications for dictionaries is the construction of histograms. A histogram is a representation of the frequency of distinct data values. A histogram can be conveniently stored in a dictionary:

    * Key: distinct data values
    * Value: occurrence frequency (absolute count or relative percentage)
    
Let us see the sample code below:

In [2]:
# Build a character histogram

histogram = dict()
text = 'We can know only that we know nothing. \
    And that is the highest degree of human wisdom.' # From War and Peace

for c in text:
    if c.isalpha():    # Test alphabet property
        c = c.lower()  # Identify uppercase and lowercase
        if c in histogram:
            histogram[c] += 1
        else:
            histogram[c] = 1

for key in histogram:
    print(key, end = ' ')
print()
for key in histogram:
    print(histogram[key], end = ' ')

w e c a n k o l y t h i g d s r f u m 
5 7 1 5 8 2 6 1 1 7 7 4 3 3 3 1 1 1 2 

In the above sample code, a *for* loop enumerates all the characters in the text variable. The goal is to calculate the occurrence count of distinct English letters. The histogram only calculates valid alphabets and also ignores the difference between uppercase and lowercase. Therefore, the algorithm used *c.isalpha()* and *c.lower()* functions. 

The expression *c in histogram* uses the *in* operator to search if the value of *c* is already a key in the dictionary. If yes, then the algorithm will add the occurrence plus one; otherwise, the dictionary will create a new (key, value) pair with the new value assigned to the initial value of 1.

The final output of the alphabet histogram from the *text* quote indicates there are five occurrences of letter "w" or "W", seven occurrences of letter "e" or "E", etc.

Next, we consider a problem of inverting a dictionary (key, value) pair. That is, creating a new dictionary whereby the existing dictionary's values are keys in the new dictionary, and vice versa. 

Let us first see the solution code:

In [None]:
# Build a histogram dictionary
histogram = dict()
text = 'We can know only that we know nothing. And that is the highest degree of human wisdom'

for c in text:
    if c.isalpha():
        c = c.lower()
        if c in histogram:
            histogram[c] += 1
        else:
            histogram[c] = 1

def invert_dictionary(input_dictionary):
    '''
    Invert the mapping between keys and values of a dictionary
    Parameters
    Input:  input_dictionary    - a dict type
    Output: result              - dict result
    '''

    if type(input_dictionary)!=dict:
        raise TypeError('Argument must be dict type.')

    result = dict()
    for key in input_dictionary:
        value = input_dictionary[key]
        if value not in result:
            result[value] = [key]
        else:
            result[value].append(key)

    return result

# print out the histogram
for key in histogram:
    print(key, end = ' ')
print()
for key in histogram:
    print(histogram[key], end = ' ')
print()
# Call invert_dictionary
print("Invert the histogram ...")
inverse = invert_dictionary(histogram)
print(inverse)


The function *invert_dictionary() creates and returns a new dictionary, whose keys are the value entries of the input dictionary and the values are the keys of the input dictionary. In line 25, the *for* loop enumerates all the unique entries of *input_dictionary* with their key value assigned to the variable *key*. Then the corresponding *value* entry is retrieved in line 26. In lines 27 to 30, the *result* dictionary accepts *value* as its key value.

Specifically, since *value* is retrieved from *input_dictionary*, it may not be unique. In the above case, multiple alphabets may have the same occurrence. Therefore, those keys from *input_dictionary* will need to be organized into a list corresponding to using their occurrence number as the key in the new dictionary.

Python supports very efficient search to determine if a key is in a dictionary or not using the operator *in*, such as in line 27 we have: *if value not in result:*

To demonstrate how efficient is the key search implemented in Python, let us evaluate another sample code below:

In [2]:
# IMPORTANT: to run this script, you need to enable Internet in Kaggle. Default Kaggle kernels do not have Internet access
# Enabling Internet is on the right side of the Settings menu. 
# If you run this Jupyter Notebook on your own computer, then make sure your computer has Internet access

from urllib.request import urlopen
import random
import time

Dictionary10 = dict()
Dictionary1000 = dict()
DictionaryTotal = dict()
file_url = "http://www.nasdaqtrader.com/dynamic/symdir/nasdaqlisted.txt"        

# Put IO functions in try -- finally
print('Reading text file from url ... ', end = ' ')
file = urlopen(file_url)

# Create three dictionaries of different lengths
count = 0
for line in file:
    decoded_line = line.decode("utf-8")
    count += 1
    ticker, info = decoded_line.split('|',1)
    if count<=10:
        Dictionary10[ticker] = info
    if count<=1000:
        Dictionary1000[ticker] = info
    DictionaryTotal[ticker] = info

print('done')

# Create 1M queries to time the performance of three dictionaries
print('Generating 1M random tickers ... ', end = ' ')
trial_total = 1000000
TICKER_LETTER = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
search_list = []
for index in range(trial_total):
    new_random_ticker = ''
    for letter_index in range(random.randint(1,5)):
        new_random_ticker = new_random_ticker + (random.choice(TICKER_LETTER))

    search_list.append(new_random_ticker)
print('done')

# Test speed for query Dictionary10
begin_time = time.time()
for index in range(trial_total):
    query_result = search_list[index] in Dictionary10
elapsed_time = time.time() - begin_time
print("Searching a size-{0} dictionary 1M times takes: {1}s".format(len(Dictionary10),
    elapsed_time))

# Test speed for query Dictionary1000
begin_time = time.time()
for index in range(trial_total):
    query_result = search_list[index] in Dictionary1000
elapsed_time = time.time() - begin_time
print("Searching a size-{0} dictionary 1M times takes: {1}s".format(len(Dictionary1000),
    elapsed_time))

# Test speed for query DictionaryTotal
begin_time = time.time()
for index in range(trial_total):
    query_result = search_list[index] in DictionaryTotal
elapsed_time = time.time() - begin_time
print("Searching a size-{0} dictionary 1M times takes: {1}s".format(len(DictionaryTotal),
    elapsed_time))

Reading text file from url ...  done
Generating 1M random tickers ...  done
Searching a size-10 dictionary 1M times takes: 0.18244504928588867s
Searching a size-1000 dictionary 1M times takes: 0.14960026741027832s
Searching a size-5203 dictionary 1M times takes: 0.1539783477783203s


In this sample code, we demonstrated using a new Python function *urlopen()* from the imported module *urllib.request* to retrieve a text file from the Internet. The file "nasdaqlisted.txt" is a public file that lists all the companies currently listed on the United States Nasdaq stock exchange. Naturally, for any public company, it has a unique ticker number for trading, such as AAPL for Apple and GOOGL for Google's parent company Alphabet Inc.

In the code, we break down each line in the text file to retrieve its ticker string as the key and the rest of the string as the value for one dictionary entry. For comparison purposes, we constructed three dictionaries of different sizes, from 10 entries to 1000 entries to the total 4023 entries.

Then we will randomly query 1 million times to check if a random ticker string exists as a key in the three dictionaries. We observe the final results for their total time complexity. We see that regardless of the size of the dictionary, the random 1 million key query roughly costs the same amount of time, which is somewhat counter-intuitive. 

The constant-time key search algorithm in Python is related to the concept of "hashable" that we have mentioned above. We will explain in more details in the next lecture. For now, the good news is regardless of the size of a dictionary, the search for its keys seems to be quite efficient.

Finally, the sample code below demonstrates how to convert the tuple type and dictionary type variables in Python. The code uses the cast functions *list()* and *dict()*. Grouping two lists of the same length into tuple pairs is by the function *zip()*.

In [None]:
# From list to zip into tuples
list1 = [1,  2,  3, 4]
list2 = ['one', 'two', 'three', 'four']
list3 = ['uno', 'due', 'tre']
for pair in zip(list1, list2, list3): print(pair)
# (1, 'one', 'uno')
# (2, 'two', 'due')
# (3, 'three', 'tre')

print("List of tuples: ", list(zip(list1, list2, list3)))

# From tuple to dictionary
tuple_list = [(1, 'one'), (2, 'two'), (3, 'three')]
D = dict(tuple_list)
print("Dictionary: ", D)

# Exercises

1. Following the example of my_max() function in the lecture, code another function my_min(), which returns the minimal value from a variable number of input arguments.

2. Use type() function to verify the variable type of two assignment values: ('a'), ('a', ). Please discuss the difference.

3. Use id() function to code an example, which demonstrates that dictionary is a mutable variable type.

4. Create a simple English-to-Italian dictionary, with the following five (key, value) pairs:
> ("one", "uno"), ("two", "due"), ("three", "tre"), ("four", "quattro"), ("five", "cinque")

    Please code a program that performs the following two functions (can be implemented separately or jointly):
    1. When a user inputs an English word from the dictionary keys, output the corresponding value to translate English to Italian.
    2. When a user inputs an Italian word from the dictionary values, output the corresponding key to translate Italian to English.
    3. In any of the above cases, it the corresponding (key, value) is not found, display a message: Translation is not available.
    

5. Although a tuple is an immutable type, sometimes Python may successfully modify values inside a tuple, such as the code below. Explain why?

In [None]:
#If the tuple contains mutable values such as lists then those are able to be changed because of their properties

In [36]:
words = ("one", "uno"), ("two", "due"), ("three", "tre"), ("four", "quattro"), ("five", "cinque")
dic = dict(words)

def translate(word):
    if word in dic:
        return dic[word]
    for english, italian in dic.items():
        if word == italian:
            return english
    return 'Translation is not available'
translate('')

'Translation is not available'

In [9]:
dic = {}
print(id(dic))
dic[1] = 'hi'
print(id(dic))

4604681408
4604681408


In [47]:
s = ('a')
se = ('a',)
print(type(s), type(se))
  

<class 'str'> <class 'tuple'>


In [4]:
def my_min(*args):
    return(min(args))

my_min(4,7,9)

4

In [None]:
T = (0.11, [30, 35], 20, [40,45], 50)
T[1][0] = 3
print(T)

# Challenges

1. Take the same text input:
>text = 'We can know only that we know nothing. And that is the highest degree of human wisdom.' # From War and Peace

    Now use a dictionary type to construct a histogram. Each key in the dictionary is a unique word in the text string by identifying upper case and lower case letters, and the corresponding value is the count of occurrence in the text. For example, "we" is a valid key phrase from the text, and its occurrence value in the histogram is 2. Similarly, the histogram value for "that" is also two. Hint: Use string.split() to split a string into words separated by spaces. Use string.lower() to convert all words to lower cases, and remember to ignore symbols from the text as they are not counted as words.

In [44]:
text = 'We can know only that we know nothing. And that is the highest degree of human wisdom.'
dic = {}
for w in text.split(" "):
    if w.isalpha:
        w = w.lower()
        if w in dic:
            dic[w] +=1
        else:
            dic[w]=1
print(dic)
        



    







{'we': 2, 'can': 1, 'know': 2, 'only': 1, 'that': 2, 'nothing.': 1, 'and': 1, 'is': 1, 'the': 1, 'highest': 1, 'degree': 1, 'of': 1, 'human': 1, 'wisdom.': 1}
