# CHAPTER 2. Data Structures Used in Algorithms

# 1 Exploring data structures in Python

In any language data structures are used to store and manipulate complex data. In Python, the data structures are a storage containers to manage, organize and search data in an efficient way. They are used to store a group of data elements called collection, that need to be stored and processed together. In Python, there are five various data structures that can be used to store collections:


## 1.1 Lists
In Python, a list is the  main data structure used to store a mutable sequence of elements. The sequence of data elements stored in the list need not be of the same type.

To create a list, the data elements need to be enclosed in [ ] and they need to be separated by a comma. For example, the following code creates four data elements together which are of different types.

In [1]:
list_a = ["John", 33,"Toronto", True]

In [2]:
list_a

['John', 33, 'Toronto', True]

In [3]:
type(list_a)

list

In [4]:
bin_colors=['Red','Green','Blue','Yellow']
bin_colors[1]

'Green'

### 1.1.2 Slicing
If we want to retrieve a subset of the elements of a list by specifying a range of indexes, it is called slicing.  Following code can be used to create a slice of the list:

In [5]:
bin_colors[0:2]

['Red', 'Green']

In [6]:
bin_colors[1]

'Green'

In [7]:
bin_colors[2:]

['Blue', 'Yellow']

In [8]:
bin_colors[:2]

['Red', 'Green']

### 1.1.3 Negative Indices
 In Python we also have negative indices which count from the end of the list. This is demonstrated in the following code:

In [9]:
bin_colors[:-1]

['Red', 'Green', 'Blue']

In [10]:
bin_colors[:-2]

['Red', 'Green']

In [11]:
bin_colors[-2:-1]

['Blue']

### 1.1.4 Nesting
An element of a list can be of a simple data type or a complex data type. This allows nesting in lists. For iterative and recursive algorithms, this provides important capabilities.

In [12]:
a = [1,2,[100,200,300],6]

In [13]:
max(a[2])

300

In [14]:
a[2][1]

200

### 1.1.5 Iteration

In [15]:
for aColor in bin_colors:
    print(aColor+ " Square")

Red Square
Green Square
Blue Square
Yellow Square


In [16]:
numbers = [1,2,3]
letters = ['a','b','c']
combined = zip (numbers,letters)
combined_list = list(combined)
combined_list

[(1, 'a'), (2, 'b'), (3, 'c')]

## Tuples
The second data structure that can be used to store a collection is tuple. In contrast to list, tuples are immutable(read-only) data structures. Tuples consist of several elements surrounded by ( ).

Like lists, elements within a tuple can be of different types. They also allow complex data types for their elements. So there can be a tuple within a tuple providing a way to create nested data structure. Capability to create nested data structures is especially useful in iterative and recursive algorithms.

Following code demonstrate how to create tuples

In [17]:
bin_colors=('Red','Green','Blue','Yellow')
print(f"The second element of the tuple is {bin_colors[1]}")

The second element of the tuple is Green


In [18]:
print(f"The elements after thrid element onwards are {bin_colors[2:]}")

The elements after thrid element onwards are ('Blue', 'Yellow')


Let us define nested tuple data structure

In [19]:
nested_tuple = (1,2,(100,200,300),6)
print(f"The maximum value of the inner tuple {max(nested_tuple[2])}")

The maximum value of the inner tuple 300


## 1.2 Dictionary
To create a simple dictionary that assigns colors to various variables, the key-value pairs need to be enclosed in { }. For example, the following code creates a simple dictionary consisting of three key-value pairs:

In [20]:
bin_colors ={
  "manual_color": "Yellow",
  "approved_color": "Green",
  "refused_color": "Red"
}

In [21]:
print(bin_colors)

{'manual_color': 'Yellow', 'approved_color': 'Green', 'refused_color': 'Red'}


In [22]:
bin_colors.get('approved_color')

'Green'

In [23]:
bin_colors['approved_color']

'Green'

In [24]:
bin_colors['approved_color']="Purple"

In [25]:
print(bin_colors)

{'manual_color': 'Yellow', 'approved_color': 'Purple', 'refused_color': 'Red'}


## 1.3 Set
A set is defined as a collection of elements which can be of different types. The elements are enclosed with { }. For example, have a look at the following code block:

In [26]:
green = {'grass', 'leaves'}
print(green)

{'leaves', 'grass'}


In [27]:
yellow = {'dandelions', 'fire hydrant', 'leaves'}
red = {'fire hydrant', 'blood', 'rose', 'leaves'}

In [28]:
print(f"The union of yellow and red sets is {yellow|red}")

The union of yellow and red sets is {'dandelions', 'leaves', 'rose', 'fire hydrant', 'blood'}


In [29]:
print(f"The intersaction of yellow and red is {yellow&red}")

The intersaction of yellow and red is {'fire hydrant', 'leaves'}


## 1.4 DataFrame
DataFrame is a data structure used to store tabular data available in Python's pandas package. It is one of the most important data structures for algorithm and is used to process traditional structured data.  
Now let us represent this by a DataFrame

A simple DataFrame can be created by using the following code.

In [30]:
import pandas as pd
df = pd.DataFrame([
    ['1', 'Fares', 32, True],
    ['2', 'Elena', 23, False],
    ['3', 'Steven', 40, True]])
df.columns = ['id', 'name', 'age', 'decision']
print(df)

  id    name  age  decision
0  1   Fares   32      True
1  2   Elena   23     False
2  3  Steven   40      True


### 1.4.1 Column Selection

The values of name and age columns are

In [31]:
df[['name','age']]

Unnamed: 0,name,age
0,Fares,32
1,Elena,23
2,Steven,40


The positioning of a column is deterministic in a data frame. Fourth column can be retrieved by its position as follows:

In [32]:
df.iloc[:,3]

0     True
1    False
2     True
Name: decision, dtype: bool

### 1.4.2 Row Selection

Each row in DataFrame corresponds to a data-point in our problem space. We need to perform row selection if we want to create a subset of the data elements we have in our problem space. This subset can be created by using one of the two methods

By specifying their position
By specifying a filter
A subset of rows can be retrieved by its position as follows:

In [33]:
df.iloc[1:3,:]

Unnamed: 0,id,name,age,decision
1,2,Elena,23,False
2,3,Steven,40,True


To create a subset by specifying the filter, we need to use one or more columns to define the selection criterion. For example, a subset of data elements can be selected by this method as follows:

In [34]:
df[df.age>30]

Unnamed: 0,id,name,age,decision
0,1,Fares,32,True
2,3,Steven,40,True


In [35]:
df[(df.age<35)&(df.decision==True)]

Unnamed: 0,id,name,age,decision
0,1,Fares,32,True


# 2 Stack
Stack is a linear data structure to store one dimensional list. It can store items either in Last-In First-Out (LIFO) or First-In Last-Out (FILO) manner. The defining characteristic of a stack is way elements are added and removed from it. A new element is added at one end and an element is removed from that end only.

Following are the operations related to stack:

isEmpty: Returns true if stack is empty
push: Adds a new element.
pop: It returns the element added most recently and removes it.

Let us create a class named Stack in Python, where we will define all the operations related to stack class. The code of this class will be as following:


In [36]:
class Stack:
     def __init__(self):
         self.items = []
     def isEmpty(self):
         return self.items == []
     def push(self, item):
         self.items.append(item)
     def pop(self):
         return self.items.pop()
     def peek(self):
         return self.items[len(self.items)-1]
     def size(self):
         return len(self.items)

To push four elements to the stack following code can be used:

## Populate the stack

In [37]:
stack=Stack()
stack.push('Red')
stack.push('Green')
stack.push("Blue")
stack.push("Yellow")

### Pop

In [38]:
stack.pop()

'Yellow'

In [39]:
stack.isEmpty()

False

In [40]:
colors = ['Red']

In [41]:
colors.append('Green')
colors.append('Yellow')
colors.append('Blue')

In [42]:
colors

['Red', 'Green', 'Yellow', 'Blue']

# Queue
Like stack, queue stores n elements in a single-dimensional structure. The elements are added and removed in First-In First-Out (FIFO) format. One end of the queue is called rear and the other one is called the front. The elements are removed from the front, and the operation is called dequeue.  The elements are added at the rear and the operation is called enqueue.

The queue shown in the preceding figure can be implemented by using the following code:

In [43]:
class Queue(object):
   def __init__(self):
      self.items = []
   def isEmpty(self):
      return self.items == []
   def enqueue(self, item):
       self.items.insert(0,item)
   def dequeue(self):
      return self.items.pop()
   def size(self):
      return len(self.items)

In [44]:
queue = Queue()

In [45]:
queue.enqueue("Red")

In [46]:
queue.enqueue('Green')

In [47]:
queue.enqueue('Blue')

In [48]:
queue.enqueue('Yellow')

In [49]:
print(f"Size of queue is {queue.size()}")

Size of queue is 4


In [50]:
print(queue.dequeue())

Red
