<h1>Tutorial: Fancy Tools for Exploring Data Science with Python</h1>

*To open in Colab, click the badge below!*

<a href="https://colab.research.google.com/github/teboozas/python_tutorial_for_data_science/blob/master/Eng/Tutorial_Ch2_6(useful functionalities).ipynb" target="_parent"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 2. Python & Object-Oriented Programming(OOP)

## 2.6 Useful Functionalities for Data Science

### Handing Sequence Type Objects (Indexing, Slicing, Duplication)

**Sequence type**

Sequence type object is that, contaning entities are ordered and given numerical index. `list`, `tuple`, and objects made via `array()` or `range()` functions are kind of sequence type objects. Even string data type is a sequence.

Handling sequence type object is extremely important, because most of the object, used to implement methodology in data science, are compound data types with order. **Indexing**, **slicing**, **duplication** are important methods to handle sequence type objects.

**Indexing**

* Indexing is a method to call certain entity stored in sequence type object. This is done by write the index to call, with squared bracktes right after variable name.
* Index in Python starts with `0`. Thus you have to write `[0]` to call the first entity in sequence.

    > `my_list = [1,2,3,4,5]`
    >
    > `print(my_list[0])`
    >
    > `1`
    >
    > `print(my_list[3])`
    >
    > `4`
* Negative index search entity with reversed order.
    > `print(my_list[-3])`
    >
    > `3`

* Indexing can change values in sequence type object, if the object is a mutable type (like `list`).
    > `my_list[3] = 7`
    >
    > `print(my_list[3])`
    >
    > `7`

**Slicing**

* Slicing is the expansion of indexing, which can call multiple entities at once. Range and step can be specified with colon(`:`) expression like:
    > `my_list2 = [1,2,3,4,5,6,7,8,9,10]`
    >
    > `print(my_list2[0:5:2])`
    >
    > `[1,3,5]`
* Slicing with expression `[a:b:c]` means that call entities with the rule: $a\le \ <b$ with step size $c$


    

**Duplication of sequence**

* Duplicating copy of the object can be done with slicing. Making copy of the object is strongly recommanded in programming including Python.

* To make a copy of object, use assignment operator with empty slicing (`[:]`).
    > `new_variable = sequence_name[:]`
* Note that, simple assignment of variable is not duplication; original object can be changed with assigned variable.

In [6]:
# example of assigned variable and duplicated object
A = [1,2,3,4,5]

# 'B' is a assigned variable of object 'A'
B = A

# 'C' is a duplication of object 'A'
C = A[:]

print(A)
print(B)
print(C)

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]


In [4]:
# changes in assigned variable affect on 'A' and its copy
B[0] = 7
print(A)
print(B)
print(C)

[7, 2, 3, 4, 5]
[7, 2, 3, 4, 5]
[7, 2, 3, 4, 5]


In [5]:
# changes in duplicated copy do not affect on 'A' and assigned variable
C[0] = 9
print(A)
print(B)
print(C)

[7, 2, 3, 4, 5]
[7, 2, 3, 4, 5]
[9, 2, 3, 4, 5]


**`len()` function**

* `len()` function returns a length of sequence type object, which is useful to contol loop techniques.

In [7]:
len(A)

5

### Handling 'Iterable'

**Iterable**

Iterable is the object which can be used for loop operation. **Most of the sequence type objects are iterable**, including `list`, `dictionary`, `set`, etc.

`range()` **function**

* `range()` function simply generates iterable object.
* The way to use is similar to slicing technique, but comma(`,`) is used instead of colon.
* Sequence data types can be generated via `range()` function like:
    > `my_list3 = list(range(1,10,2))`
    >
    > `print(my_list3)`
    >
    > `[1,3,5,7,9]`
* Object made by `range()` object can be used in `for` loop.

In [8]:
for number in range(1,10,2):
    print(number)

1
3
5
7
9


`enumerate()` **function**

* `enumerate()` function generates iterable containing both index and value.
* This is useful for loop technique to extract index information from the loop.

In [10]:
for rank, animal in enumerate(['human','tiger','eagle']):
    print(rank, animal)

0 human
1 tiger
2 eagle


`zip()` **function**

* `zip()` function gather multiple iterables into a single iterable object.
* Lengths of target iterables must be same.
* This is useful to handle multiple iterables in a loop.

In [12]:
a = [1,2,3]
b = ['first','second','third']

for number, order in zip(a,b):
    print(number, order)

1 first
2 second
3 third


### Iterator and Generator

**Iterator**

* Iterator is a class that can generate iterables. Thus, all iterators are iterables themselves.
* `next()`, `iter()` methods in iterator can control the size of generated iterable.

**Generator**

* Generator means a functional object, defined via `yield` statement.
* Generator can generate iterator object. Thus, iterator is a special case of generator.

※ *Details of iterator and generator are out of focus of our tutorial. So only definition and depiction of relationship will be introduced.*

※ *Just keep in mind **the importance of iterable object** in data science with Python.*

**※ Relationship of iterable, iterator, and generator**

<img src = "https://mingrammer.com/images/2017-01-25-iter-vs-gen-relationships.png" width = 700>
    
(Source: https://mingrammer.com)

### Comprehension

* Comprehension syntax is a way to generate container object(compound data type) via loop and conditional expression.

* Container object includes `list`, `dictionary`, `set`, etc.

* Comprehenshion syntax, especially used to generate `list` type object, is called **list comprehension**.

In [13]:
# example of list comprehension to generate even numbers less than 5
even_numbers = [x for x in range(5) if x % 2 == 0]
print(even_numbers)

[0, 2, 4]


### `args` and `kwargs`

`args` and `kwargs` keywords are used to assign undetermined number of arguments in function.

* `kwargs` means keyword arguments, and `args` means positional arguments without specified keywords. Syntax to express both of them is like:
> `def my_function(*args, **kwargs)` 
* Usually, list type is assigned for `args`, and dictionary for `kwargs`.

※ Note : `*` and `**` syntax are called **unpaking operators**, which unpacks entities in list/tuple or dictionary object, respectively.

In [29]:
# example of 'args' syntax
def player_names(*Names):
    for name in Names:
        print(name)

player = ['Son','Messi','Salah']

# single asterisk is used to unpack entities in list 'player'
player_names(*player)

Son
Messi
Salah


In [37]:
# example of 'kwargs' syntax
# note that 'items()' method is used to return both keys and values
def player_info(**Info):
    for name, number in Info.items():
        print(name, number)

info = {'Son':7, 'Messi':10, 'Salah':10}

# double asterisk is used to unpack dictionary 'info'
player_info(**info)

Son 7
Messi 10
Salah 10


### Generating Boolean

* Boolean variable can be directly assigned with `True` or `False`. In addition, boolean also can be assigned with values or results of some operation.
* Examples that return `False` are introduced below:
> `0` (integer)
>
> `0.0` (floating number)
>
> `[]` (list)
>
> `""` (string)

In [42]:
# example of generating boolean
# note that 'not' command check whether variable is 'False' or not
# if some variable 'a' takes 'False', 'not a' returns 'True'

# integer
a = 0
print(not a)

# floating number
b = 0.0
print(not b)

# list
c = []
print(not c)

# string
d = ""
print(not d)

True
True
True
True


### `OrderedDict` data type

* `OrderedDict` is a data type included in package `collections`, which memorizes order of input of key-value pairs. This can be imported by:
> `from collections import OrderedDict`
* `OrderedDict` data type is iterable, and available in loop operation.

## Reference (Python & object-oriented programming(OOP))

* [Python official documentation (Python version 3.6.9)](https://docs.python.org/3.6/index.html) - can explore Python official documentations, including tutorial (*also available in Korean*).
* ['Jump to Python' WikiDocs (Korean)](https://wikidocs.net/book/1) - well-known Python material in Korean, which is free-accessible online via WikiDocs.
* [a Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/) - introductory text of Python, written by the author of 'Python Data Science Handbook' (also participaed in opening video of Colab introduction).
* and much of open-source lectures are available online (in [Coursera](https://www.coursera.org/specializations/python), [Edwith](https://www.edwith.org/sogang_python), [OpenTutorials](https://opentutorials.org/course/1750), etc.)