# Today's Coding Topics
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/xiangshiyin/data-programming-with-python/blob/main/2023-summmer/2023-06-12/notebook/concept_and_code_demo.ipynb)

* Recap of previous lecture
* `for` loops
* `functions` in Python
* Files and I/O
* Python libraries and how to use them
* Hands-on practices


# Recap of previous lecture

## More on `string` type


In [None]:
# split string by certain delimiters
x = 'a.b.c'
x.split('.')

In [None]:
x = 'a.b.c'
x.split(b)

In [None]:
x = 'adbdc'
# x.split('d')
x.strip('d')

In [None]:
# remove leading or trailing spaces
x = 'd  abc   '
x1 = x.strip('d')
# x1
type(x1)

In [None]:
x = '  abc   '
x.lstrip()

In [None]:
x = '  abc   '
x.rstrip()

In [None]:
# replace

x = 'abcd'
x.replace('d', '1')

## Loops

### `while` loops

* Example: print text in an iterative way

  * Requirements:
    * Print the given text sentence by sentence.
    * Customize the number of sentence to print by user input

In [None]:
text = "The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didn't think they could bear it if anyone found out about the Potters. Mrs. Potter was Mrs. Dursley's sister, but they hadn't met for several years; in fact, Mrs. Dursley pretended she didn't have a sister, because her sister and her good-for-nothing husband were as unDursleyish as it was possible to be. The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street. The Dursleys knew that the Potters had a small son, too, but they had never even seen him. This boy was another good reason for keeping the Potters away; they didn't want Dudley mixing with a child like that."

## Collect number of sentences from keyboard input

## Create a loop to print the text sentence by sentence


### `for` loops

* **Example 1**:  Iterate through a range

* **Example 2**: Iterate through list

* **Example 3**: Iterate through a dictionary

### `break` and `continue` statements
* `break` is used to exit a loop
* `continue` is used to skip the code block after the continue statement for the current iteration

**Example 1**: print the even numbers in a range
* Requirements:
    * Define the range from keyboard input
    * Print the even numbers within the range one at a time

In [None]:
numrange = eval(input('Input the upper bound of the range: '))
for num in range(numrange):


In [None]:
## How do we achieve the same goal with the break statement?
numrange = eval(input('Input the upper bound of the range: '))
num = 0
while numrange>0:


In [None]:
## The above two code examples all seem to be an overkill ...

**Example 2**: Reverse a list
* Requirements:
    * For any input list, reverse the sequence in-place
* Example:
    * Input: [1,2,3,4]
    * Output: [4,3,2,1]

### List Comprehensions
List comprehensions are a convenient and widely used Python language feature. They allow you to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter into one concise expression. They take the basic form:

`[expr for value in collection if condition]`

This is equivalent to the following for loop:

```python
result = [] for value in collection:

if condition:

result.append(expr)
```

# Functions

## Create a function

In [None]:
## simple hello world
def hello():
       
    
## call the function
hello()

In [None]:
## two sum
def twoSum(num1,num2):

## call the function
twoSum(1,2)

In [None]:
x = twoSum(1,2)
print(x)

### Practice: Calculate the factorial of an integer
In mathematics, the factorial of a positive integer n, denoted by n!, is the product of all positive integers less than or equal to n:

$n!=n\times(n-1)\times(n-2)\times(n-3)\times...\times3\times2\times1$

For example,
$5!=5\times4\times3\times2\times1=120$

In [None]:
## The lazy way
import math
math.factorial(5)

In [None]:
def fact(n):


x = fact(5)


### Practice: Sum up all integers within a range

In [None]:
def sumAll(n):


sumAll(5) # 0 + 1 + 2 + 3 + 4

## Lambda Expression
* Lambda expressions (sometimes called lambda forms) are often used to create anonymous functions. The expression `lambda <parameters>: <expression>` yields a function object. 
* You can also name it like `func = lambda <parameters>: <expression>`. The named object behaves like a function object defined with:
```python
def func(parameters):
    return expression
```
* It can take any number of arguments, but can only have one expression
* Note that the function can't contain statement or annotations

In [None]:
f = lambda a: a**(1/2)
f(5)

In [None]:
f = lambda a,b: (a**2+b**2)**(1/2) ## validate the pythagorean theorem
f(3,4)

## Built-In Functions
[[Official Documentation](https://docs.python.org/3/library/functions.html)]
* We actually have used many built-in functions: `enumerate()`, `min()`, `max()`, `int()`, `float()`, `complex()`, `eval()`, `input()`, `list()`, `type()`, etc.
* Here are more ...

**any(), all()**
* `any()` takes an `iterable`<sup>[*](#footenote1)</sup>, and return `True` if any of the items is `True`
* `all()` takes an `iterable`, and return `True` only if all of the items are `True`

<a name="footnote1">*</a> `iterable` is an object that you can loop over. Sequences are a very common type of `iterable`.

In [None]:
any([False, False, False, False])

In [None]:
all([True, True, True, False])

**map()**
* `map(func, iter)` applies `func` to every item of the `iterable`, returns an `iterator`<sup>[*](#footenote2)</sup> of the results

<a name="footnote2">*</a> `iterator` is an object representing a stream of data. You can use the built-in `next()` function to retrieve the items one by one, and you can also loop over it. A clear distinction between `iterable` and `iterator` is that you can't see all the items of an iterator if you print it. You need to use `list()` function to see the complete set.

Offline reading: **Iteration, Iterables, Iterators, and Looping** [Doc #3](https://towardsdatascience.com/python-basics-iteration-and-looping-6ca63b30835c)

In [None]:
outputs = map(lambda x: x**2, [1,2,3])
print(outputs)
i = 0
while i<3:
    print(next(outputs))
    i+=1

In [None]:
list(outputs)

In [None]:
outputs = map(lambda x: x**2, [1,2,3])
list(outputs)

**zip()**
* `zip(*iterables)` takes a list of iterables and returns an iterator that aggregates the elements

In [None]:
## Print the full names of the past 4 presidents of United States
first_name = ['Donald','Barack','George','Bill']
last_name = ['Trump','Obama','Bush','Clinton']

In [None]:
['{} {}'.format(first,last) for first,last in zip(first_name,last_name)]

**abs(), round()**

* `abs(num)` returns the absolute value of the input `num`
* `round(num,precision)` rounds the input `num` to the defined `precision` digits

In [None]:
## abs()
abs(-12)

In [None]:
## round()
round(3.567,2)

# Files and I/O in Python

## Read from and write to files

* Major tool/function: `open(file, mode='r')` (https://docs.python.org/3/library/functions.html#open)
* The default mode is 'r' (open for reading text, synonym of 'rt'). The available modes:

| Character | Meaning                                                         |
|-----------|-----------------------------------------------------------------|
| 'r'       | open for reading (default)                                      |
| 'w'       | open for writing, truncating the file first                     |
| 'x'       | open for exclusive creation, failing if the file already exists |
| 'a'       | open for writing, appending to the end of the file if it exists |
| 'b'       | binary mode                                                     |
| 't'       | text mode (default)                                             |
| '+'       | open for updating (reading and writing)                         |

**read**

In [None]:
## Read from a file
var = 'test-read.txt'
fr = open(var,'r') # create one file handle
# fr
# fr.readlines()
# for line in fr.readlines():
#     print(line)
# fr.close()


lines = fr.readlines()

fr.close()

In [None]:
lines

In [None]:
fr.readlines()

In [None]:
## Read from a file
fr = open('test-read.txt','r') # create one file handle
fr.readlines()

In [None]:
## Another convenient way to automatically handle file handle closure

with open('test-read.txt','r') as fr:
    for line in fr.readlines():
        print(line)

In [None]:
with open('test-read.txt','r') as fr:
    for line in fr:
        print(line)

**write**

In [None]:
## open file in 'w' mode
fw =  open('test-read-1.txt','w')
fw.write('this is a test')
fw.close()

In [None]:
with open('test-read.txt','r') as fr:
    for line in fr:
        print(line)

In [None]:
## Write to a file
with open('test-read-2.txt','w') as fw:
    for i in range(1,6):
        fw.write('this is line {}\n'.format(i))

In [None]:
fw =  open('test-read.txt','w')
fw.close()

In [None]:
## Write new content to a file
with open('test-read.txt','w') as fw:
    for i in range(6,11):
        fw.write('this is line {}\n'.format(i))

In [None]:
## Append to an existing file
with open('test-read.txt','w') as fw:
    for i in range(11,16):
        fw.write('this is line {}\n'.format(i))

In [None]:
## Read and write
with open('test-read.txt','a+') as fr:
    fr.write('this is a new line\n')
#     for line in fr:
#         print(line)
#     fr.readlines()

In [None]:
with open('test-read.txt','r+') as fr:
    for line in fr:
        print(line)

## Manipulating the file system

In [None]:
## check if a path exists
import os

In [None]:
## Check if a director exists, you can replace the directory with yours
os.path.exists('/Users/xyin/abc')

In [None]:
os.path.exists('/Users/xyin/Documents')

In [None]:
## Check if a file exists
os.path.exists('/Users/xiangshiyin/Documents/Teaching/data-programming-with-python/2023-summmer/2023-06-12/README.md')

In [None]:
## check if it's a file or a directory
os.path.isfile('/Users/xyin/Documents')

In [None]:
os.path.isdir('/Users/xiangshiyin/Documents/Teaching/data-programming-with-python/2023-summmer/2023-06-12/README.md')

In [None]:
## find the true path, only works in MacOS/Linux. You need to replace the file directory with your own. For windows user, you need to try the relative directory pattern in windows
os.path.realpath('concept_and_code_demo.ipynb')

In [None]:
os.path.abspath('concept_and_code_demo.ipynb')

In [None]:
## find the parent directory of a file
os.path.dirname('concept_and_code_demo.ipynb')

In [None]:
## Join directories and files
os.path.join('/Users/','/xyin','Documents')

In [None]:
## How do we create an empty file? Any guess?


In [None]:
## Create a directory
os.mkdir('testdir')

In [None]:
## Rename a file
os.rename('testfile.txt','testfile2.txt')

In [None]:
os.rename('testfile2.txt','testfile.txt')

## Example

### Read the whole book

In [None]:
## read the whole book
with open('hp.txt', 'r') as file:
    content = file.readlines()

In [None]:
content[:10]

In [None]:
for c in content[:10]:
    print(c)

### Can we do a little cleaning of the book?

In [None]:
## common punctuations

import string
string.punctuation

In [None]:
## most straighforward way

x = 'I love soccer, do you?'
x.replace(',', '')

In [None]:
len(string.punctuation)

In [None]:
for p in string.punctuation:
    x = x.replace(p, '')

In [None]:
x

In [None]:
## remove all common punctuations
text = 'I love soccer, do you?'
translator = str.maketrans('', '', string.punctuation)
clean_text = text.translate(translator)

clean_text

In [None]:
## remove punctuations in the book
with open('hp.txt', 'r') as file:
    content = file.read()

In [None]:
type(content)

In [None]:
content[:200]

In [None]:
translator = str.maketrans('', '', string.punctuation)
content2 = content.translate(translator)

content2[:200]

In [None]:
content_copy = content
content_copy[:200]

In [None]:
for p in string.punctuation:
    content_copy = content_copy.replace(p, '')
content_copy[:200]

In [None]:
## cast the content to lower case

content3 = content2.lower()

In [None]:
content3[:200]

### What are the frequently used words?

In [None]:
words = content3.split()

In [None]:
words[:10]

In [None]:
len(words)

In [None]:
## get word frequency
counter = {}

for w in words:
    if not w in counter:
        counter[w] = 1
    else:
        counter[w] += 1
# print(counter)

In [None]:
counter['harry']

In [None]:
counter['snape']

In [None]:
counter['dumbledore']

In [None]:
## rank the words by frequency
words_ranked = sorted(counter.keys(), key=lambda x: -1 * counter[x])

In [None]:
words_ranked[:20]

In [None]:
## Do all the words make sense?


### Remove the stop words

In [None]:
# pip install nltk
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords

In [None]:
### get the stop words
stop_words = set(stopwords.words('english'))

In [None]:
len(stop_words)

In [None]:
stop_words_ch = set(stopwords.words('chinese'))
len(stop_words_ch)

In [None]:
### filter out the stop words
words_ranked2 = []
for w in words_ranked:
    if w not in stop_words:
        words_ranked2.append(w)

In [None]:
## display the "cleaned" version
len(words_ranked2)

In [None]:
len(words_ranked)

In [None]:
len(words_ranked2) / len(words_ranked)

In [None]:
for w in words_ranked2[:50]:
    print(w, counter[w])

# Python Libraries - A Brief Introduction

## What is a library?
* A library is a collection of files/scripts that contains pre-written functions, constants, etc.
* It makes our code easy to write and understand

## Import the library
* Use `import <libname>` to load the complete library into memory
* Use `<libname>.<modulename>` to call the modules within the loaded library (Python uses `.` to reference modules and attributes of given library or object, we'll cover more on this in the next lecture)
* You can also define an alias to the library like `import <libname> as abc`, and call the library module with `abc.<modulename>`

In [None]:
import string

# string.ascii_lowercase
string.ascii_uppercase

* You can also import specific modules of a library by doing `from <libname> import <modulename>`
* In this case, you can directly use `<modulename>` to call the modules and execute certain operations

In [None]:
from string import ascii_lowercase,ascii_uppercase
ascii_lowercase # , ascii_uppercase

In [None]:
from string import *
ascii_lowercase #, ascii_uppercase, digits

## Get library documentation
* You can always use `help()` function to pull the corresponding documentation of certain modules or submodules
* Jupyter notebook and other commonly IDEs (integrated development environment) also has functionalities or plugins to help you access the documentation of certain libraries or modules

In [None]:
help(string)

## Sample library
* `math`: a collection of mathematical functions
    * Official documentation: https://docs.python.org/3.8/library/math.html

In [None]:
import math
math.ceil(4.6)

In [None]:
math.floor(4.6)

In [None]:
math.gcd(8,6) # math.gcd(a,b) returns the greatest common divisor of the integers a and b

In [None]:
math.exp(1) # math.exp(x) returns e raised to the power x, where e = 2.718281

`math.log(x[, base])`
* With one argument, return the natural logarithm of x (to base e).
* With two arguments, return the logarithm of x to the given base, calculated as log(x)/log(base).

In [None]:
math.log(100, 10)

In [None]:
math.log(1)
# math.log(4,2)

In [None]:
4 ** (1/2)

In [None]:
math.sqrt(4)

## Library import in depth
### A simple Python package
Assume we have a package with the following file distribution
```md
└── sample_package
    └── sample.py
    └── subpackage
        └── subsample.py
```
The content of `sample.py` is like
```python
x = 123
y = 234

def hello():
    print('Hello World')
```

The content of `subsample.py`
```python
xx = 1
yy = 2
```

### Things might be more complicated
![](../pics/library_tree.png)

***You could***
* `import` the whole library, by `import a`
* `import` a module (python script), by `import a.aa`
* `import` a object (variable, function, class, etc.) in a module, by `import a.aa.aaa`, or `from a.aa import aaa`


**However**, you should keep using the `<object>` name in the `import <object>` statement in your program to reference the object you imported. **Sometimes, this could be quite inconvenient** because the `<object>` string could be pretty long due to the complicatedd file structures in the python library

**There are two ways** to solve the problem:
* `from a import aa` (use the `from` statement to reference the complicated folder relationships)
* `import a.aa as aa` (create an alias)

In [None]:
%%sh

tree sample_package

In [None]:
from sample_package.sample import hello
hello()

In [None]:
from sample_package.subpackage.subsample import xx

In [None]:
xx