# Visualization and Modern Data Science

> Getting Started with Python

Kuo, Yao-Jen <yaojenkuo@ntu.edu.tw> from [DATAINPOINT](https://www.datainpoint.com/)

## The Python Trivia

## What is Python: the Monty Python British surreal comedy troupe

![](https://media.giphy.com/media/ezR4SY7GQQ6fC/giphy.gif)

Source: <https://giphy.com/>

## Python: the programming language

> Python is a clear and powerful object-oriented programming language.

Source: <https://www.python.org/>

## Python: the gigantic snake and its relatives

- Python: the programming language itself
- Anaconda: the data science total solution
- Reticulate: the Python and R interface package

## Python is a general-purposed language that is extremely powerful and popular among many applications

- Automation
- Databases
- Analytics
- Graphical user interfaces
- Machine learning
- Web frameworks
- Web scraping
- ...etc.

## How does Python do that?

Simply put: third-party libraries.

## What is a third party library?

A third party library refers to any library where the latest version of the code is not maintained and hosted by neither ourselves nor the official organization, say [Python.org](https://www.python.org/).

## So which third party library makes Python the go-to choice for data science?

- NumPy
- Pandas
- Matplotlib
- Scikit-Learn
- TensorFlow
- PyTorch
- ...etc.

## Python is growing so fast in the developer community in the last 10 years

[Stack Overflow Trends](https://insights.stackoverflow.com/trends?tags=java%2Cc%2Cc%2B%2B%2Cpython%2Cc%23%2Cvb.net%2Cjavascript%2Cassembly%2Cphp%2Cperl%2Cruby%2Cswift%2Cr%2Cobjective-c)

![Imgur](https://i.imgur.com/ilA2rqs.png)

## We love it SO MUCH, someone even writes a book for it!

![Imgur](https://i.imgur.com/Vs13bJj.jpg?1)

Source: Google Search

## How to utilize [Stack Overflow](https://stackoverflow.com/)?

- The first post is question itself.
- The second post, if checked "Green", is the answer chose by the initiator.
- The third post, is the answer up-voted by others.

## Setting up a Python Development Environment

## Given Python is a general-purposed programming language, there are various ways setting up a Python development environment

## It is a mix-and-match challenge

- The operating system.
- The Python interpreter.
- The Integrated Development Environment.
- The package/environment manager.

## Even among data analysts, everyone has its own favorite flavor

![](https://media.giphy.com/media/cCEt1ShfzOa3u/giphy.gif)

Source: <https://giphy.com/>

## The best practice so far for a programming analyst is Python interpreter + Jupyter

As known as the "Notebook-based" solution or the "Jupyter ecosystem".

## When you are more familiar with Python, learn advanced topics then choose your own flavor

- Path variables
- Library management
- Virtual environment
- Deployment

## What is a program?

> A program is a sequence of instructions that specifies how to perform a computation. The computation might be something mathematical, such as solving a system of equations, something symbolic, such as searching for and replacing text in a document, or something graphical, like processing an image.

Source: [Think Julia: How to Think Like a Computer Scientist](https://benlauwens.github.io/ThinkJulia.jl/latest/book.html)

## Critical elements of writing/running a Python program

- Text editor
- Terminal
- Python interpreter
- Integrated development environment(IDE)

## A tour of planet Jupyter

- <https://lab.datainpoint.com>
- `/tree` for classic Jupyter Notebook.
- `/lab` for Jupyter Lab.

## At the homepage of your Jupyter Notebook Server

- New > Text File
- New > Terminal
- New > Terminal > Type `python --version` then hit Enter.
- New > Notebook

## A few Python programs to try on first

- Hello world!
- Hello John Doe!
- The Zen of Python

## Hello world!

In [1]:
print("Hello, world!")

Hello, world!


## Hello John Doe!

"John Doe" (for males) and "Jane Doe" (for females) are multiple-use names that are used when the true name of a person is unknown or is being intentionally concealed.

Source: <https://en.wikipedia.org/wiki/John_Doe>

In [2]:
john_doe = input("Please input your name:")
print("Hello, {}!".format(john_doe))

Please input your name:Darth Vadar
Hello, Darth Vadar!


## Zen of Python

Long time Pythoneer [Tim Peters](https://en.wikipedia.org/wiki/Tim_Peters_(software_engineer)) succinctly channels the BDFL's guiding principles for Python's design into 20 aphorisms, only 19 of which have been written down.

In [3]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## My favorite one would be:

> Now is better than never.

How about yours?

## Functions

## What are `print` and `input` in the previous examples?

`print` and `input` are so-called **built-in** functions in Python.

## What is a function

> A function is a named sequence of statements that performs a computation, either mathematical, symbolic, or graphical. When we define a function, we specify the name and the sequence of statements. Later, we can call the function by name.

## How do we analyze a function?

- function name.
- inputs and arguments, if any.
- sequence of statements.
- outputs, if any.

## Take bubble tea shop for instance

![Imgur](https://i.imgur.com/6gpJebm.jpg?1)

Source: Google Search

## What is a built-in function?

> A pre-defined function, we can call the function by name without defining it.

## How many built-in functions are available for us?

- `print`
- `input`
- `help`
- `type`
- ...etc.

Source: https://docs.python.org/3/library/functions.html

## Get HELP with `help`

In [4]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [5]:
help(input)

Help on method raw_input in module ipykernel.kernelbase:

raw_input(prompt='') method of ipykernel.ipkernel.IPythonKernel instance
    Forward raw_input to frontends
    
    Raises
    ------
    StdinNotImplentedError if active frontend doesn't support stdin.



In [6]:
help(type)

Help on class type in module builtins:

class type(object)
 |  type(object_or_name, bases, dict)
 |  type(object) -> the object's type
 |  type(name, bases, dict) -> a new type
 |  
 |  Methods defined here:
 |  
 |  __call__(self, /, *args, **kwargs)
 |      Call self as a function.
 |  
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |  
 |  __dir__(...)
 |      __dir__() -> list
 |      specialized __dir__ implementation for types
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __instancecheck__(...)
 |      __instancecheck__() -> bool
 |      check if an object is an instance
 |  
 |  __new__(*args, **kwargs)
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __prepare__(...)
 |      __prepare__() -> dict
 |      used to create the namespace for the class statement
 |  
 

## We can also `help` on `help`

In [7]:
help(help)

Help on _Helper in module _sitebuiltins object:

class _Helper(builtins.object)
 |  Define the builtin 'help'.
 |  
 |  This is a wrapper around pydoc.help that provides a helpful message
 |  when 'help' is typed at the Python interactive prompt.
 |  
 |  Calling help() at the Python prompt starts an interactive help session.
 |  Calling help(thing) prints help for the python object 'thing'.
 |  
 |  Methods defined here:
 |  
 |  __call__(self, *args, **kwds)
 |      Call self as a function.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



## Besides built-in functions or library-powered functions, we often need to define our own functions

- `def` the name of our function.
- `return` the output of our function.
- Indentation marks the body of our function.

## The layout of a self-defined function

```python
def function_name(INPUTS, ARGUMENTS, ...):
    # body of function_name
    """
    docstring: print documentation when help is called
    """
    # sequence of statements
    return OUTPUTS
```

In [8]:
def power(x, n):
    """
    Equivalent to x raised to the power of n.
    """
    return x**n

help(power)

Help on function power in module __main__:

power(x, n)
    Equivalent to x raised to the power of n.



## Call the function by name after defining it

In [9]:
power(5, 2)

25

## The effect of `return` keyword

- Returns the desired output of a function.
- Marks the end of the body of a function.

## The `return` keyword returns the desired output of a function

In [10]:
def power(x, n):
    """
    Equivalent to x raised to the power of n.
    """
    x**n

power(5, 2)

## The `return` keyword marks the end of the body of a function

In [11]:
def power(x, n):
    """
    Equivalent to x raised to the power of n.
    """
    print(x)
    print(n)
    return x**n

power(5, 2)

5
2


25

## Codes written after `return` keyword are not executed, though they still reside in the indented block.

In [12]:
def power(x, n):
    """
    Equivalent to x raised to the power of n.
    """
    return x**n
    print(x)
    print(n)

power(5, 2)

25

## Arithmetic Operators in Python

## Symbols that represent computations

- `+`, `-`, `*`, `/` are quite straight-forward
- `**` for exponentiation
- `%` for remainder
- `//` for floor-divide

## When an expression contains more than one operator, the order of evaluation depends on the operator precedence

1. Parentheses have the highest precedence.
2. Exponentiation has the next highest precedence.
3. Multiplication and Division have higher precedence than Addition and Subtraction.
4. Operators with the same precedence are evaluated from left to right.

## Variables

## One of the most powerful features of a programming language is the ability to manipulate variables

A variable is a name that refers to a value.

```python
variable_name = some_values
```

## Choose names for our variables: don'ts

- Do not use built-in functions
- Cannot use [keywords](https://docs.python.org/3/reference/lexical_analysis.html#keywords)
- Cannot start with numbers

## If you accidentally replaced built-in function with variable, use `del` to release it

In [13]:
print = 5566
print("Hello, world!")

TypeError: 'int' object is not callable

In [14]:
del print
print("Hello, world!")

Hello, world!


## Choose names for our variables: dos

- Use a lowercase single letter, word, or words.
- Separate words with underscores to improve readability(so-called snake case).
- Be meaningful.

## Using `#` to write comments in our program

Comments can appear on a line by itself, or at the end of a line.

In [15]:
# A function to convert fahrenheit to celsius.
def convert_fahrenheit_to_celsius(f):
    c = (f - 32) * 5/9  # The formula to convert fahrenheit to celsius.
    return c

print(convert_fahrenheit_to_celsius(32))   # Call the function by name after defining it.
print(convert_fahrenheit_to_celsius(212))  # Call the function by name after defining it.

0.0
100.0


## Everything from `#` to the end of the line is ignored when executed

We can use [pythontutor.com](http://www.pythontutor.com/visualize.html#mode=edit) to explore the execution of our code.

## The scope of a variable

> In computer programming, the scope of a name binding, an association of a name to an entity, such as a variable, is the region of a computer program where the binding is valid.

Source: <https://en.wikipedia.org/wiki/Scope_(computer_science)>

## Simply put, as long as we have self-defined functions, the programming environment is split into 2 environments:

- Global
- Local

## A variable declared within the indented block of a function is a local variable, it is only valid inside the `def` block

In [16]:
def convert_fahrenheit_to_celsius(f):
    c = (f - 32) * 5/9  # The formula to convert fahrenheit to celsius.
    return c

print(c)

NameError: name 'c' is not defined

## A variable declared outside of the indented block of a function is a global variable, it is valid everywhere

In [17]:
c = 0
def convert_fahrenheit_to_celsius(f):
    c = (f - 32) * 5/9  # The formula to convert fahrenheit to celsius.
    return c

print(c)  # global c is 0
print(convert_fahrenheit_to_celsius(212))  # local c is 100.0

0
100.0


## Although global variable looks quite convenient, it is HIGHLY recommended NOT to use global variable locally, say inside the indented block

## Common Data Types

## Values belong to different types, we commonly use

- `int` and `float` for computing.
- `str` for symbolic,
- `bool` for conditionals.
- `None` for missing values.

## Use `type` function to check the type of a certain value/variable

In [18]:
print(type(5566))
print(type(42.195))
print(type("Hello, world!"))
print(type(True))
print(type(False))
print(type(None))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
<class 'bool'>
<class 'NoneType'>


## How to form a `str`?

Use paired `'`, `"`, or `"""` to embrace letters strung together.

In [19]:
str_with_single_quotes = 'Hello, world!'
str_with_double_quotes = "Hello, world!"
str_with_triple_double_quotes = """Hello, world!"""
print(type(str_with_single_quotes))
print(type(str_with_double_quotes))
print(type(str_with_triple_double_quotes))

<class 'str'>
<class 'str'>
<class 'str'>


## If we have single/double quotes in `str` values

In [20]:
mcd = 'I'm lovin' it!'

SyntaxError: invalid syntax (<ipython-input-20-85a683c7c2bf>, line 1)

## Use `\` to escape or paired `"` or paired `"""`

In [21]:
mcd = 'I\'m lovin\' it!'
mcd = "I'm lovin' it!"
mcd = """I'm lovin' it!"""

## We've seen arithmetic operators for numeric values

How about those for `str` and `bool`?

## `str` type takes `+` and `*`

- `+` for concatenation
- `*` for repetition

In [22]:
mcd = "I'm lovin' it!"
print(mcd)
print(mcd + mcd)
print(mcd * 3)

I'm lovin' it!
I'm lovin' it!I'm lovin' it!
I'm lovin' it!I'm lovin' it!I'm lovin' it!


## Format our `str` printings

- The `sprintf` way
- The `.format()` way
- The `f-string` way

## The `sprintf` way: uses `%` for string print with format

In [23]:
my_name = "John Doe"
print("Hello, %s!" % (my_name))

Hello, John Doe!


## The `.format()` way: uses `{}` for string print with format

In [24]:
my_name = "John Doe"
print("Hello, {}!".format(my_name))

Hello, John Doe!


## The `f-string` way: uses `{}` for string print with format

In [25]:
my_name = "John Doe"
print(f"Hello, {my_name}!")

Hello, John Doe!


## I myself, am more of a `.format()` way guy

It can take both index and key-value besides order.

In [26]:
print("{} {} is my favorite Friends character.".format("Phoebe", "Buffay")) # format with order
print("{1} {0} is my favorite Friends character.".format("Buffay", "Phoebe")) # format with index
# format with key-value
print("{first_name} {last_name} is my favorite Friends character.".format(last_name="Buffay", first_name="Phoebe"))

Phoebe Buffay is my favorite Friends character.
Phoebe Buffay is my favorite Friends character.
Phoebe Buffay is my favorite Friends character.


## How to form a `bool`?

- Use keywords `True` and `False` directly
- Use relational operators
- Use logical operators

## Use keywords `True` and `False` directly

In [27]:
print(True)
print(type(True))
print(False)
print(type(False))

True
<class 'bool'>
False
<class 'bool'>


## Use relational operators

We have `==`, `!=`, `>`, `<`, `>=`, `<=`, `in`, `not in` as common relational operators to compare values.

In [28]:
print(5566 == 5566.0)
print(5566 != 5566.0)
print('56' in '5566')

True
False
True


## Use logical operators

- We have `and`, `or`, `not` as common logical operators to manipulate `bool` type values.
- Getting a `True` only if both sides of `and` are `True`.
- Getting a `False` only if both sides of `or` are `False`.

In [29]:
print(True and True)  # get True only when both sides are True
print(True and False)
print(False and False)
print(True or True)
print(True or False)
print(False or False) # get a False only when both sides are False
# use of not is quite straight-forward
print(not True)
print(not False)

True
False
False
True
True
False
False
True


## `bool` is quite useful in programs in conditional execution, iteration, and filtering data

## Python has a special type, the `NoneType`, with a single value, None

- This is used to represent null values or nothingness
- It is not the same as `False`, or an empty string `''` or 0
- It can be used when we need to create a variable but don’t have an initial value for it

In [30]:
a_none_type = None
print(type(a_none_type))
print(a_none_type == False)
print(a_none_type == '')
print(a_none_type == 0)
print(a_none_type == None)

<class 'NoneType'>
False
False
False
True


## If we did not specify `return` when defining a function, the function outputs a `NoneType`

In [31]:
def convect_fahrenheit_to_celsius(f):
    c = (f - 32) * 5/9  # The formula to convert fahrenheit to celsius.
    print(c)  # Instead of return c, we just print c.
    
function_output = convect_fahrenheit_to_celsius(212)
print(function_output)
print(type(function_output))

100.0
None
<class 'NoneType'>


## Data types can be dynamically converted using functions

- `int()` for converting to `int`
- `float()` for converting to `float`
- `str()` for converting to `str`
- `bool()` for converting to `bool`

## Upcasting(to a supertype) are always allowed

In [32]:
print(int(True))
print(float(1))
print(str(1.0))

1
1.0
1.0


## While downcasting(to a subtype) needs type check and our attentions

In [33]:
print(float('1.0'))
print(int('1'))
print(bool('False')) # ?

1.0
1
True
