**Learning Python -- The Programming Language for Artificial Intelligence and Data Science**

**Lecture 4: Lists**

**By Allen Y. Yang, PhD**

(c) Copyright Intelligent Racing Inc., 2021-2024. All rights reserved. Materials may NOT be distributed or used for any commercial purposes.

# Keywords

* **list**: The keyword for the list data type in Python.
* **Method**: In object-oriented programming languages such as Python, a method is a built-in function encaptulated together with the variable data.
* **Immutable data type**: A variable of an immutable data type cannot modify the stored data value at the referenced data address. Assigning a new value to the same variable leads to the variable pointing to a new memory address with the new value.
* **Mutable data type**: A variable of a mutable data type can modify the stored data value while keeping the same data address.

In [None]:
a = 4990
b = 4990
print(id(a)==id(b))

# Definition and Indexing

As a string stores an array of characters, the *list* type in Python stores an ordered sequence of variables. Most notable is that fact that the variables in a list can be of different types. Let us see some examples below:

In [None]:
l1 = []
l2 = ['a', 'b', 5]
l3 = [l1, l2, 5]
print(l3)

In the above code block, a list is defined by enumerating its elements contained within a pair of square brackets. A pair of empty brackets define an empty list (of length zero). The second list variable *l2* is formed from a list of mixed variables, namely, two string types and one integer type. 

More interestingly, a list may contain other lists as its elements. In the third example, *l3* contains first element as l1, second element as l2, and third element an integer.

Next, we see how to address the elements in a list.

In [None]:
L = [['a', 'b'], 5, 'c']
print(L[0])
print(L[0][1])
print(L[-1])

Similar to the string type, elements in a list can be addressed by their indexes. In the first example above, the first element in L has offset zero, so *L[0]* retrieves the first element, which itself is a list.

Since *L[0]* is a list, we can recursively address the elements in this list using the same square brackets. In the second example, *L[0][1]* references the second element of *L[0]*, which is 'b'.

The third example shows that a list can also be indexed from the end to the beginning by negative index values. For example, the last element of *L* also can be addressed by index -1.

In [2]:
"""
In this sample code, we program an algorithm to reverse a list. Specifically, we will implement two methods and compare their time difference
The code shows a standard way to measure time to compare how fast an algorithm can run, using the time module
"""

import time

test_list = list('Python'*1000000)      # Using "*" to duplicate a string 1,000,000 times

# Method 1: Slicing operator
tic = time.time()                   # In practice, tic is a jargon referring the begin clock time
reverse_list = test_list[::-1]
toc = time.time()                   # toc is a jargon referring the end clock time in the unit of second
elapsed_time = toc - tic
print('Slicing method: Elapsed time: %.2f seconds' % elapsed_time )


# Method 2: Built-in object method
tic = time.time()                   
test_list.reverse()  # a list type is a class in Python, which contains its own built-in functions
toc = time.time()                   
elapsed_time = toc - tic
print('Built-in class method: Elapsed time: {time:.2f} seconds'.format(time = elapsed_time) )

Slicing method: Elapsed time: 0.02 seconds
Built-in class method: Elapsed time: 0.00 seconds


A good quality of an algorithm is measured by how fast it can run compared to other algorithms achieving the same results. In the above code block, we introduce a standard procedure to count the elapsed time, using the module function *time.time()*. The function returns a float value in the unit of seconds, which means the fraction part indicates sub-second accuracies. The function retrieves the current computer system clock, and reports a difference with the "beginning of time". In modern programming languages such as Python, the beginning of time with respect to modern computer systems is set to January 1, 1970, 00:00:00 (UTC). 

If we have designed an algorithm, when we retrieve *time.time()* before the execution as *tic*, and again after the execution as *toc*, then the difference is the elapsed time. This is a very useful measure of algorithm speed that the reader should learn to use fluently.

Next let us talk about how the elapsed time is reported in a *print()* function. Since when we code the printed string format, it is not known to us about the value of elapsed_time. The exact value has to be formatted at runtime (meaning, when the code is actually running by the computer's processor). Python provides two basic ways to combine pre-defined constant string and dynamic argument together at runtime:

1. The way inherited from older versions of Python is to use the "%" symbol to indicate an argument input to a string, which should be determined at runtime. One example is shown in line 15. In this example ".2f" means displaying the string argument input as a floating point number with 2 digits after the decimal point.
2. The modern way recommended by Python 3 standards is to use a built-in function for any string type variables, as shown in line 23. The built-in function is called *format()*. Python will insert one or more arguments provided in *format()* function and convert them as part of the string as indicated by the reserved symbols {}. In this modern way, the displayed text still can be formatted by user specifications, such as ".2f" in the example

The new concept of owning built-in functions for different variable types is a benefit of so-called object-oriented programming (OOP). We will talk about OOP in the last part of the course, but previously we have noticed that all Python variable types are actually classes. A class in OOP languages contains data *and* methods that directly apply to the data (Note that in OOP, built-in functions in a class are often called **methods** to differentiate with regular functions that are not tightly associated with a class object). In other words, when a programmer creates a class, they not only have prepared a memory storage of the data, but also have prepared a list of functions that are properly coded to process the data. This is one of the benefits of using OOP.

In the above sample code, we see that if the variable is a list, it also has a built-in method to perform exactly the task of reversing the list. Thanks to this built-in method, the operation can be coded in just one line built-in method call in line 20.

Finally, the difference in algorithm design between the two used methods is that *list.reverse()* method is an "in-place" sorting algorithm, while slicing requires doubling the memory space to traverse from the list source. For more details, please be sure to watch the lecture video.

# List Methods

In Python and many object-oriented programming languages, a method refers to a built-in function encaptulated together with the variable data, when the variable is an object type. In Python, all variable types are objects. Therefore, each variable comes with its list of encaptulated methods. In this section, let us consider several some useful methods built in to the list type. 

    * clear(): Removes all list elements and causes the list to be empty.
    * insert(pos, element): Add an *element* at the *pos* position. If the position is greater than the current last position, then it will be added behind the current last element.
    * pop(pos): Remove and return the element at *pos* position. The argument can be empty, then the default position is the last one.
    * append(element): Add *element* from the end, and the length of the list is added by one.
    * extend(new_list): Add all elements from *new_list* to the end of the current list.
    * sort(): Sorts the list elements.
    * count(element): Counts the number of appearances of a specific element value.

In [None]:
source = list('notebook')
# Add an element at position-0
source.insert(0, 'a')
print(source)

# pop position-0, add a list element
# Note here append creates a nested list
source.pop(0)
source.append(list('notebook'))
print(source)

# pop last position then merge two lists
source.pop()
source.extend(list('notebook'))
print(source)

# built-in sort method
source.sort()
print(source)
print(source.count('o'))

The following example demonstrates functions to convert a string to a list, and vice versa.

In [6]:
string = "Python"
L = list(string)
print(L)

S1 = "".join(L)
S2 = ",".join(L)
print(S1)
print(S2)

list_string = str(L)
print(list_string)

['P', 'y', 't', 'h', 'o', 'n']
Python
P,y,t,h,o,n
['P', 'y', 't', 'h', 'o', 'n']


# Mutable and Immutable Variable Types

List type is also different from most of the previous variable types we have learned in the course in one important way. In Python, list type  has an additional property to be mutable, while the variable types including int, float, bool, and string are immutable. In the rest of this lecture, let us familiarize ourselves about this distinction.

In Python, we have said that each variable represents a class-type object stored in computer memory, and the object's class encapsulates both data and methods applied on the data. The use of class methods will be more clear when we formally introduce the OOP and classes. For the discussion about mutable versus immutable properties, we shall focus on the object's data.

In particular, after an object is created and its memory allocated by Python, an **immutable** variable type will not permit the data to be updated later. For a **mutable** variable type, its data can be modified while the allocation of the object memory is preserved..

To illustrate this distinction, let us introduce a Python function *id()*. The function takes a variable pointing to an object in the memory as its input, and return a unique reference to the said memory address. The Python standards guarantee for any Python implementation to provide unique ID numbers for unique objects in the memory. However, it is not guaranteed that the ID number is a valid memory address. The exact choice of the unique ID number is left for the language software to implement. 

In the following code block, we will examine the properties of mutability when different variables may represent the same or different class objects in Python:

In [None]:
a = 10
print(id(a))
a = 32759680
print(id(a))
b = ['i','m','m','u','t','a','b','l','e']
print(b, id(b))
b.pop(0); b.pop(0)
print(b, id(b))
b = "immutable"
b[0] = "a"

The above code block examines the mutability of three variable types when the code changes their value. First, *int* type is immutable. Therefore, updating variable *a*'s value leads to Python allocating two different memory locations for two integer values. Second, *list* type is mutable. Therefore, even when we pop the first two elements from the list ['i','m','m','u','t','a','b','l','e'] to ['m','u','t','a','b','l','e'], we can check that the variable *b*'s ID number remains the same. Finally, *string* is immutable. Therefore, any attempt to change a string's individual character in the array will lead to a runtime error.

In [None]:
# Two small integers of the same value
a = 10
b = 10
print('small int: ', id(a), id(b), id(a)==id(b))

# Two large integers of the same value
a = 32759680
b = 32759680
print('large int: ', id(a), id(b), id(a)==id(b))

# Two floats of the same value
a = 10.1
b = 10.1
print('float: ', id(a), id(b), id(a)==id(b))

# Two short strings of the same value
a = 'immutable'
b = 'immutable'
print('short str: ', id(a), id(b), id(a)==id(b))

# Two long strings of the same value
a = 'immutable'*1000
b = 'immutable'*1000
print('long str: ', id(a), id(b), id(a)==id(b))

# Two lists of the same value
a = ['m', 'u', 't', 'a', 'b', 'l', 'e']
b = ['m', 'u', 't', 'a', 'b', 'l', 'e']
print('list: ', id(a), id(b), id(a)==id(b))

# Forcing one variable to be equal to the other
b = a
print('identified lists: ', id(a), id(b), id(a)==id(b))
a.clear()
print(a, b)

# Creating duplicate objects that have different IDs
# Then modifying one varible will not be reflected on the second
b = a.copy()
print('copied lists: ', id(a), id(b), id(a)==id(b))
a.insert(0, 'new')
print(a, b)

Now let us consider a more subtle question: Previously, when two variable values or types are different, two distinct objects will be created by Python. Hence, their IDs and memory addresses will be different. However, consider the following question: *When two variables represent the same type and same value, are they referencing the same object or different objects in the memory?*

The answer to this is not that straightforward. Please carefully review the above updated code block:
1. When †wo variables are small immutable integers, Python is smart enough to point the two variables to the same object in the memory. Hence their IDs are identical.
2. However, this conclusion cannot be blindly extended to consider large integers and float numbers, because searching for existing integer or float objects costs time. So a software cannot do that for an arbitrary range of integers and float. So in the second and third examples, we see that for two variables of identical large integer value and float value, their IDs are not the same.
3. The same argument also applies to the immutable string type. In the two subsequent examples, short strings of the same sequence "immutable" are assigned the same ID. However, two strings of the same long sequences are assigned different IDs. 
4. For the next variation, we consider two mutable list variables of the same value. Since mutable variables can be modified after creation, even if their initial values are assigned to be the same, Python will allocate different IDs and different memory addresses to store the variable values.
5. Nevertheless, for all the above situations, there is an approach to force Python to identify two variable names to reference the same object in the memory, that is to assign one variable to be equal to the other, as in the example *b = a*. In such a case, regardless of the mutability type, the two variables will return the same ID number. Furthermore, since the two variables are referencing the same object, it causes the situation that modifying the first mutable variable will be reflected also on the second variable.
6. Finally, for mutable variables, we can use a common built-in method *copy()* to tell Python to create two objects but copy the data from one object to the other. In such cases, the two variables will be initially of the same value just like the case above. However, the two objects have different memory addresses, later modifying one variable will not affect the object referenced by the second variable.

The reader may wonder what ultimately let a Python programming software to determine when to identify two objects of the same value in the memory. We caution that except in the case of explicitly identifying two variables as in the example of *b = a*, practitioners should not assume that two objects of the same value are identical in memory. The choice of doing so is entirely for implementation optimization considerations, and it is up to the software designer to decide. In other words, if we want two variables to be identical, we should explicitly proclaim *b = a*. In other cases we should not imply in any way that two variables are the same object.

# Summary

* A list type variable stores an ordered sequence of variables, whose types can be different.
* A list element can be addressed by using a pair of square brackets: l[index]. Elements in a list of lists can be addressed recursively as: l[index0][index1]
* list is a class with built-in methods: clear(), insert(), pop(), append(), extend(), sort(), count().
* list is a mutable variable type. Earlier other variable types are immutable, including int, float, string, bool.
* id() function can be conveniently used to compare the assigned memory address of a variable. In particular, if two variables share the same ID, then they point to the same content in memory; if a variable after updating its value keeps the same ID, then it is a mutable variable type.
* list() casts an input argument to a list variable output.

# Exercises

1. Given a list variable: a = [2, 9, 1, "2", "s", 6], write a single line of code to sort the variable using the built-in sort() function. Explain the result.

2. Write a code to type cast a string "immutable" into a list type variable, called input_list. Then remove the first two elements in input_list using the built-in method pop(). Finally, type cast the resulting input_list back to a string and print out.

3. Continue with the above program. Now starting from a string "mutable", please type cast the variable to a list type, then add two new elements "m" and "i" one by one to the head of the list. Finally, type cast the resulting list back to a string and print out. Hint: The result should be "immutable".

4. *sorted()* is a useful built-in function that can sort either a list or a string as input. Write a code to demonstrate of the output result of the function when the input argument is a list or a string.

5. Debug

In [None]:
a = [2, 9, 1, "2", "s", 6]
a.sort() #sorts alphabetically

In [17]:
st = 'asdf'
s = ['h','t','y']
print(st.sort())
print(s.sort())
#strings cannot be sorted

AttributeError: 'str' object has no attribute 'sort'

In [15]:
input_list = list('immutable')
input_list.pop(0)
input_list.pop(0)
input_list = "".join(input_list)
print(input_list)
input_list = list(input_list)
input_list.insert(0, 'm')
input_list.insert(0, 'i')
input_list = "".join(input_list)
print(input_list)


mutable
immutable


In [None]:
a = "a test string"

a_list = list(a)
print(a_list.sort())

# Please use the follow sorted() approach to again sort a string
# a_result = sorted(a)

6. Debug

In [27]:
# Please concatenate two strings from the elements of two lists
List = ['abcde']
List = List + list('fghij')
print(List)
List = "".join(List)
print(List)

['abcde', 'f', 'g', 'h', 'i', 'j']
abcdefghij


# Challenges

1. Create a list of lists with the variable name L. The first element of L is itself a list [1, 2], and the second element of L is another list ['a', 'b']. Then, swap the elements between L[0][1] and L[1][0]. Please note that a swap should be coded as a generic operation so as to not hard set the given values of L.

2. Modify the above program: Use id() function to print out the ID of variable L before and after the swap operation. Observe that the ID numbers remain the same, and discuss the reason why.

3. Further modify the above program: Use id() function to print out the ID values of elements L[0][1] and L[1][0] before and after the swap. Observe the change on the ID numbers and discuss the reason why.

In [25]:
L = [[1,2], ['a','b']]
print(id(L))
L = [L[1], L[0]]
print(id(L))


4394398592
4466139456
