## Python for Data Science

@author: Sangkyun Lee  (sangkyun@korea.ac.kr)

___Python Basics___

<hr>

### Introduction

- Python is a interpreter language (like R, Matlab, ...)
  - C/C++, Java are compiled languages
- (Very) popular for AI, ML, Bioinformatics, Data Analytics, ...
- Easy to learn

---

### Variables

- Variables can be used __without specifying the size or type__ (unlike in Java or C)
- Python is case-sensitive. 

### Assignment

In [1]:
x = 3 # integer
y = 3. # floating point number
z = "Hello!" # strings
Z = "Wonderful!" # another string, stored in a variable big z.
print(x)
print(y)
print(z)
print(Z)

3
3.0
Hello!
Wonderful!


You can do operations on numeric values as well as strings.

In [2]:
sum_ = x + y # int + float = float
print(sum_)

6.0


In [3]:
v = "World!"
sum_string = z + " " + v # concatenate strings
print(sum_string)

Hello! World!


Print with formating with `%`

In [4]:
print("The sum of x and y is %.2f"%sum_) # %f for floating point number

The sum of x and y is 6.00


In [5]:
print("The string `sum_string` is '%s'"%sum_string) # %s for string

The string `sum_string` is 'Hello! World!'


#### Naming convention

There are two commonly used style in programming:

1. __camelCase__
2. __snake_case__ or __lower_case_with_underscore__

Be consistent across one project.

All variable (function and class) names must start with a letter or underscore (\_). You can include numbers.

In [6]:
myStringHere = 'my string'
myStringHere

'my string'

In [7]:
x = 3 # valid
x_3 = "xyz" # valid

In [8]:
3_x = "456" # invalid. Numbers cannot be in the first position.

SyntaxError: invalid token (<ipython-input-8-520aa7218b05>, line 1)

### Wildcard variable ( _ )
'_' is a valid variable name in Python: usually reserved to store temporary values

In [9]:
_ = 3
print(_)

3


###  Indexing

To initialize a string variable, you can use either double or single quotes.

In [10]:
store_name = "ABCDE"

- A string is a sequence of characters
- Indices and bracket notations can be used to access specific ranges of characters.

In [11]:
name_13 = store_name[1:4] # [start, end), end is exclusive; Python index starts from 0 (NOT from 1)
print(name_13)

BCD


In [12]:
last_letter = store_name[-1] # -1 means the last element
print(last_letter)

E


In [13]:
print(store_name[1:])

BCDE


In [14]:
print(store_name[:3])

ABC


In [15]:
print(store_name[-4:-2])

BC


---

### Primitive (Basic) Data Types

#### Numbers

For numbers w/o fractional parts, we say they are ___integer___. In Python, they are called `int`

In [16]:
x = 3
type(x)

int

For numbers w/ fractional parts, they are floating point numbers. They are named `float` in Python.

In [17]:
y = 3.0
type(y)

float

We can apply arithmetic to these numbers. However, one thing we need to be careful about is ___type conversion___. See the example below.

In [18]:
z = 2 * x
type(z)

int

In [19]:
z = y + x
type(z)

float

#### Text/Characters/Strings

In Python, we use `str` type for storing letters, words, and any other characters, as mentioned previously in Section 2.2

In [20]:
my_word = "see you"
type(my_word)

str

Unlike numbers, `str` is an iterable object, meaning that we can iterate through each individual character:

In [21]:
my_word[0], my_word[2:6]

('s', 'e yo')

We can also use `+` to _concatenate_ different strings 

In [22]:
my_word + ' tomorrow'

'see you tomorrow'

#### Boolean

Boolean type comes in handy when we need to check conditions. For example:

In [23]:
my_error = 1.6
compare_result = my_error < 0.1
compare_result, type(compare_result)

(False, bool)

There are two and only two valid Boolean values: `True` and `False`. We can also think of them as `1` and `0`, respectively.

In [24]:
my_error > 0

True

When we use Boolean values for arithmetic operations, they will become `1` or `0` automatically

In [25]:
(my_error>0) + 2

3

#### Type Conversion

Since variables in Python are dynamically typed, we need to be careful about type conversion.

When two variables share the same data type, there is not much to be worried about:

In [26]:
s1 = "no problem. "
s2 = "talk to you later"
s1 + s2

'no problem. talk to you later'

But be careful when we are mixing variables up:

In [27]:
a = 3 # recall that this is an ____?
b = 2.7 # how about this?
c = a + b # what is the type of `c`?

To make things work between string and numbers, we can explicitly convert numbers into `str`:

In [28]:
s1 + 3

TypeError: must be str, not int

In [29]:
s1 + str(3)

'no problem. 3'

---

### Data Structures

- A sort of _containers_ to store primitive type variables

#### List

In [30]:
a_list = [1, 2, 3] # commas to seperate elements

Initialize a list with brackets. You can store anything in a list, even if they are different types
- note that we use [___string formatting___](https://pyformat.info/) to display strings
- `%i` is a placeholder for `int`
- `%s` for `str`

In [31]:
print("Length of a_list is: %i"%(len(a_list)))
print("The 3rd element of a_list is: %s" %(a_list[2])) # Remember Python starts with 0
print("The last element of a_list is: %s" %(a_list[-1])) # -1 means the end
print("The sum of a_list is %.2f"%(sum(a_list)))

Length of a_list is: 3
The 3rd element of a_list is: 3
The last element of a_list is: 3
The sum of a_list is 6.00


We can put different types in a list

In [32]:
b_list = [20, True, "good", "good"] 
b_list

[20, True, 'good', 'good']

Update a list: __pop__, __remove__, __append__, __extend__

In [33]:
print(a_list)
print("Pop %i out of a_list" % a_list.pop(1)) # pop the value of an index
print(a_list)

[1, 2, 3]
Pop 2 out of a_list
[1, 3]


In [34]:
print("Remove the string good from b_list:")
b_list.remove("good") # remove a specific value (the first one in the list)
print(b_list)

Remove the string good from b_list:
[20, True, 'good']


In [35]:
a_list.append(10)
print("After appending a new value, a_list is now: %s" % (str(a_list)))

After appending a new value, a_list is now: [1, 3, 10]


merge `a_list` and `b_list`: 

In [36]:
a_list.extend(b_list)
print("Merging a_list and b_list: %s" % (str(a_list)))

Merging a_list and b_list: [1, 3, 10, 20, True, 'good']


We can also use `+` to concatenate two lists

In [37]:
a_list + b_list 

[1, 3, 10, 20, True, 'good', 20, True, 'good']

Use `*` to repeat lists.

In [38]:
[1,2]*3

[1, 2, 1, 2, 1, 2]

Use the `in` operator to check if something is inside a list.

In [39]:
2 in [1, 2, 3]

True

In [40]:
name = 'Korea University Hospital at Ansan'
tokens = name.split(' ')
print(tokens[0])
print(tokens[-1])

Korea
Ansan


#### Tuple (A special case of list whose elements cannot be changed)

In [41]:
a_tuple = (1, 2, 3, 10)
print(a_tuple)
print("First element of a_tuple: %i"%a_tuple[0])
type(a_tuple)

(1, 2, 3, 10)
First element of a_tuple: 1


tuple

You cannot change the values of a_tuple

In [42]:
a_tuple[0] = 5

TypeError: 'tuple' object does not support item assignment

In order to create a single value tuple, you need to add a ','

In [43]:
a_tuple = (1) # this would create a int type
print(type(a_tuple))
b_tuple = (1,) # this would create a tuple type, take note of the comma.
print(type(b_tuple))

<class 'int'>
<class 'tuple'>


#### Dictionary: key-value pairs

Initialize a dict by curly brackets `{}`

In [44]:
d = {} # empty dictionary
d['Sangkyun Lee'] = "sangkyun@korea.ac.kr" # add a key-value 
d['Jaesung Kim'] = "jkim@korea.ac.kr"
print(d)

{'Sangkyun Lee': 'sangkyun@korea.ac.kr', 'Jaesung Kim': 'jkim@korea.ac.kr'}


Iterate over all of the keys:

In [45]:
for i in d:
    print(i)

Sangkyun Lee
Jaesung Kim


Iterate over all of the values:

In [46]:
for i in d.values():
    print(i)

sangkyun@korea.ac.kr
jkim@korea.ac.kr


Iterate over all of the items in the dictionary:

In [47]:
for name, email in d.items():
    print(name)
    print(email)

Sangkyun Lee
sangkyun@korea.ac.kr
Jaesung Kim
jkim@korea.ac.kr


You can unpack a sequence into different variables:

In [48]:
x = ('Sangkyun', 'Lee', 'sangkyun@korea.ac.kr')
fname, lname, email = x
print(lname)
print(email)

Lee
sangkyun@korea.ac.kr


String formatting

In [49]:
sales_record = {
'price': 3.24,
'num_items': 4,
'person': 'Chris'}

sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'

print(sales_statement.format(sales_record['person'],
                             sales_record['num_items'],
                             sales_record['price'],
                             sales_record['num_items']*sales_record['price']))

Chris bought 4 item(s) at a price of 3.24 each for a total of 12.96


---

### Control Logics

In the following examples, we show examples of comparison, `if-else` loop, `for` loop, and `while` loop.

#### Comparison

Python syntax for comparison is the same as our hand-written convention: 

1. Larger (or equal): `>` (`>=`)
2. Smaller (or equal): `<` (`<=`)
3. Equal to: `==` (__Notie here that there are double equal signs__)
4. Not equal to: `!=`

In [50]:
3 == 5 

False

In [51]:
72 >= 2

True

In [52]:
store_name

'ABCDE'

In [53]:
store_name == "HyVee" # Will return a boolean value True or False

False

IMPORTANT: It is worth noting that comparisons between floating point numbers are tricky.

In [54]:
print(2.2 * 3.0)
2.2 * 3.0 == 6.6

6.6000000000000005


False

In [55]:
3.3 * 2.0 == 6.6

True

#### If-Else

In [56]:
sum_ = -1

In [57]:
if sum_ == 0:
    print("sum_ is 0")
elif sum_ < 0:
    print("sum_ is less than 0")
else:
    print("sum_ is above 0 and its value is " + str(sum_)) # Cast sum_ into string type.

sum_ is less than 0


Note that you do not have to use `if-else` or `if-elif-...-else`. You can use `if` without other clauses following that.

In [58]:
if sum_ > 5:
    print('sum_ is above 5')

Comparing strings are similar

In [59]:
store_name = 'Walmart'

In [60]:
if 'Wal' in store_name:
    print("The store is likely to be the Walmart.")
else:
    print("The store is not likely to be the Walmart.")

The store is likely to be the Walmart.


#### For loop: Iterating thru a sequence

In [61]:
for letter in store_name:
    print(letter)

W
a
l
m
a
r
t


`range()` is a function to create interger sequences:
- `range(0,1000)`, `range(1000)` : 0 ~ 999

In [62]:
range(1,10)

range(1, 10)

Range to list

In [63]:
list(range(1,10))

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [64]:
for index in range(len(store_name)): # length of a sequence
    print("The %ith letter in store_name is: %s" % (index, store_name[index]))

The 0th letter in store_name is: W
The 1th letter in store_name is: a
The 2th letter in store_name is: l
The 3th letter in store_name is: m
The 4th letter in store_name is: a
The 5th letter in store_name is: r
The 6th letter in store_name is: t


#### While loop: Keep doing until condition no longer holds.

Use `for` when you know __the exact number of iterations__; use `while` when you __do not (e.g., checking convergence)__.

In [72]:
x = 1

In [73]:
while x < 10:
    print(x)
    x += 1  # same as x = x+1

print(0)

1
2
3
4
5
6
7
8
9
0


#### `break` and `continue`

`break` means get out of the loop immediately. Any code after the `break` will NOT be executed.

In [67]:
store_name = 'Walmart'

In [76]:
index = 0
while True:
    print(store_name[index])
    index += 1 
    if store_name[index] == "a":
        print("We've found 'a'")
        break 
        print("Hello!") # This will NOT be run

W
We've found 'a'


`continue` means get to the next iteration of loop. It will __break__ the current iteration and __continue__ to the next.

In [77]:
for letter in store_name:
    if letter == "a":
        continue
    else:
        print(letter)

W
l
m
r
t
