# 4.   Writing Structured Programs

## 4.1   Back to the Basics

- Assignment

- Equality

- Conditionals

## 4.2   Sequences

- Operating on Sequence Types

- Combining Different Sequence Types

- Generator Expressions

## 4.3   Questions of Style

- Python Coding Style

- Procedural vs Declarative Style

- Some Legitimate Uses for Counters

## 4.4   Functions: The Foundation of Structured Programming

- Function Inputs and Outputs

- Parameter Passing

- Variable Scope

- Checking Parameter Types

- Functional Decomposition

- Documenting Functions

## 4.5   Doing More with Functions

- Functions as Arguments

- Accumulative Functions

- Higher-Order Functions

- Named Arguments

## 4.6   Program Development

- Structure of a Python Module

- Multi-Module Programs

- Sources of Error

- Debugging Techniques

- Defensive Programming

The above content is out of the scope of our course and you can learn it by yourself:

- Refer to the content in <a href="https://www.nltk.org/book/ch04.html" target="_blank">Chapter 4</a> of *Natural Language Processing with Python*

- I strongly recommend you to learn the related lectures in my *programming basics* course: <a href="https://zhangjianzhang.github.io/programming_basics/" target="_blank">Course Website</a>


## 4.7   Algorithm Design

### 4.7.1 divide-and-conquer

We attack a problem of size n by **dividing it into two problems of size n/2**, solve these problems, and **combine their results into a solution of the original problem**. 

<font size=2 style="color:#2ECC71">**Example**</font>

**Merge Sort (归并排序)**

归并排序是采用分治法（Divide and Conquer）的一个非常典型的应用。

Reference: 

- https://www.cnblogs.com/pythonbao/p/10800699.html

- https://www.runoob.com/python3/python-merge-sort.html

1. 自顶向下，递归地**二等分**列表，直到不可分为止，即每个子列表只包含一个元素

<div align=center>
<img width="1000" height="750" src="https://raw.githubusercontent.com/zhangjianzhang/text_mining/master/files/codes/lecture_4/split.png">
<br>
<center><em><strong>递归划分列表</strong></em></center>
</div>

In [1]:
def merge_sort(arr):
    if len(arr) == 1:
        return arr
    else:
        length = len(arr)
        left_arr = arr[:length//2]
        right_arr = arr[length//2:]
        print('split',arr,'--->',left_arr, right_arr)
        return sort_list(merge_sort(left_arr), merge_sort(right_arr))

2. 自底向上，递归地**排序**、**合并**子列表

<div align=center>
<img width="1000" height="750" src="https://raw.githubusercontent.com/zhangjianzhang/text_mining/master/files/codes/lecture_4/merge.png">
<br>
<center><em><strong>递归排序、合并列表</strong></em></center>
</div>

**已排序子列表**`[11, 18]`和`[6, 9, 10]`的排序合并过程如下：

1. 对比第一个元素，$11 < 6$，6放入结果列表中，结果列表为`[6]`；

2. 两个子列表变为`[11, 18]`和`[9, 10]`；

3. 对比第一个元素$11 > 9$，9放入结果列表中，结果列表为`[6, 9]`；

4. 两个子列表变为`[11, 18]`和`[10]`；

5. 对比第一个元素$11 > 10$，10放入结果列表中，结果列表为`[6, 9, 10]`；

6. 两个子列表变为`[11, 18]`和`[]`；

7. 第一个子列表有序，第二个子列表为空，将第一个子列表加入结果列表，结果列表为`[6, 9, 10, 11, 18]`。

In [2]:
from copy import deepcopy

def sort_list(left, right):
    
    l, r = deepcopy(left), deepcopy(right)
    
    result = []
    while len(left) > 0 and len(right) > 0:
        if left[0] < right[0]:
            result.append(left.pop(0))
        else:
            result.append(right.pop(0))
    
    result += left
    result += right
    
    print('merge',l, r,'--->',result)
    
    return result

In [3]:
arr = [11, 18, 10, 9, 6]

In [4]:
merge_sort(arr)

split [11, 18, 10, 9, 6] ---> [11, 18] [10, 9, 6]
split [11, 18] ---> [11] [18]
merge [11] [18] ---> [11, 18]
split [10, 9, 6] ---> [10] [9, 6]
split [9, 6] ---> [9] [6]
merge [9] [6] ---> [6, 9]
merge [10] [6, 9] ---> [6, 9, 10]
merge [11, 18] [6, 9, 10] ---> [6, 9, 10, 11, 18]


[6, 9, 10, 11, 18]

In [5]:
arr = [1,4,2]

In [6]:
merge_sort(arr)

split [1, 4, 2] ---> [1] [4, 2]
split [4, 2] ---> [4] [2]
merge [4] [2] ---> [2, 4]
merge [1] [2, 4] ---> [1, 2, 4]


[1, 2, 4]

### 4.7.2 Recursion

<font size=2 style="color:#2ECC71">**Example**</font>

Let's count the size of the hypernym hierarchy rooted at a given synset s. (以同义词集s为根的WordNet子树所包含的节点个数)

<div align=center>
<img width="550" height="350" src="https://raw.githubusercontent.com/zhangjianzhang/text_mining/master/files/codes/lecture_4/tree_noeds.png">
<br>
<center><em><strong>计算树的节点数</strong></em></center>
</div>

In [7]:
# 采用字典结构存储上图中的树
n1 = {
        'n2':{},
        'n3':{
            'n4':{
                'n6':{},
                'n7':{},
                'n8':{}
            },
            'n5':{}
        }
    }

两种基本情况：

1. 叶子节点（没有子节点），如，`'n5':{}`

2. 非叶子节点（有至少一个子节点），如，`'n4':{'n6':{}, 'n7':{}, 'n8':{}}`

In [8]:
# 第一种递归写法
def tree_nodes_count(tree_dict):
    return 1 + sum(tree_nodes_count(v) for v in tree_dict.values())

In [9]:
# 第二种递归写法，是第一种写法的展开
def tree_nodes_count_alt(tree_dict):
    # 第一种基本情况
    if len(tree_dict) == 1:
        return 1
    # 第二种基本情况
    else:
        result = 1
        for v in tree_dict.values():
            result += tree_nodes_count_alt(v)
        return result

In [10]:
tree_nodes_count(n1)

8

In [11]:
tree_nodes_count_alt(n1)

8

In [12]:
n1

{'n2': {}, 'n3': {'n4': {'n6': {}, 'n7': {}, 'n8': {}}, 'n5': {}}}

In [13]:
tree_nodes_count({'n2': {}, 'n3':{}})

3

In [14]:
tree_nodes_count_alt({'n2': {}, 'n3':{}})

3

In [15]:
# 递归遍历计数
def size1(s):
    return 1 + sum(size1(child) for child in s.hyponyms())

In [16]:
# 逐层遍历计数
def size2(s):
    layer = [s] # The first layer is the synset itself
    total = 0
    # it computes the next layer by finding the hyponyms of everything in the last layer
    while layer:
        total += len(layer)
        layer = [h for c in layer for h in c.hyponyms()]
    return total

In [17]:
from nltk.corpus import wordnet as wn

In [18]:
dog = wn.synset('dog.n.01')

In [19]:
size1(dog)

190

In [20]:
size2(dog)

190

<font size=2 style="color:#2ECC71">**Example**</font>

A **letter trie** is a data structure that can be used for indexing a lexicon.

字典树是一种索引词典的数据结构。