<a href="https://colab.research.google.com/github/w4bo/handsOnDataPipelines/blob/main/materials/00-PythonFundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Python was created by Guido van Rossum in the early 90s. It’s basically executable pseudocode.

- Major versions of Python: 2 (end of support 2020) and 3
- We use 3, check also: https://learnxinyminutes.com/docs/python/

Cross-platform interpreted language
- Available on the main OS (Linux, Mac, Windows, etc.)
- CPython is the reference implementation plus other alternatives
- Can be integrated into other languages (C, C++, Java, etc.)

Multi-paradigm
- Imperative, object-oriented
- Syntax can be easily extended

Emphasis on the on ease of reading and writing
- "There should be one—and preferably only one—obvious way to do it"

In [1]:
"Hello, world!"  # This is a comment


'Hello, world!'

Why Python? 
- Easy to learn
- Used for multiple purposes (scripting, data science, etc.)
- Used for prototyping and rapid development cycles
- Rising popularity
    - Includes a standard library 
    - Wide availability of external libraries
    - E.g., machine learning, deep learning

![image](https://user-images.githubusercontent.com/18005592/200129836-6094deb1-e486-4b5e-88de-19a165513f90.png)

Python features make it suitable for analysis operations 
- Interactively usable, scripts, and complete programs

Several libraries that make Python a complete data analysis environment
- Python is increasingly used as a replacement for R and other ad-hoc software 
- E.g., NumPy for the representation of data in the form of vectors and matrices 
- E.g., Pandas for the manipulation and transformation of tabular data 
- E.g., Sklearn for the application of machine learning and data mining algorithms
- E.g., Matplotlib for data visualization

A Python statement is contained by default in a row

In [2]:
# Python has a print function
print("Hello, world!")

# By default the print function also prints out a newline at the end.
# Use the optional argument end to change the end string.
print("Hello, World", end="!")  # => Hello, World!


Hello, world!
Hello, World!

Write inline instructions with `;`

In [3]:
print("Hello, ", end=""); print("world!")

Hello, world!


A statement can continue in the next row
Explicit: the row ends with “\”
Implicit: if there are unclosed brackets


In [4]:
print("Hello, "
      + "world!")


Hello, world!


Doing some math

In [5]:
# Math is what you would expect
1 + 1   # => 2
8 - 1   # => 7
10 * 2  # => 20
35 / 5  # => 7.0
# Modulo operation
7 % 3   # => 1
# Exponentiation (x**y, x to the yth power)
2 ** 3  # => 8


8

Boolean variables

In [6]:
# negate with not
not True   # => False
not False  # => True
# Boolean Operators
# Note "and" and "or" are case-sensitive
True and False  # => False
False or True   # => True

# True and False are actually 1 and 0 but with different keywords
True + True  # => 2
True * 8    # => 8
False - 5   # => -5

# Equality is ==
1 == 1  # => True
2 == 1  # => False

# Inequality is !=
1 != 1  # => False
2 != 1  # => True

# More comparisons
1 < 10  # => True
1 > 10  # => False
2 <= 2  # => True
2 >= 2  # => True


True

In other languages, code blocks (`if`, `for`, etc.) are usually delimited by specific symbols (e.g., `{` and `}`)

Java
    
    for (int i = 0; i < 5; i++) {
        if (i < 3) {
            System.out.println(i + "");
        }
    }

In Python, indentation is used for better readability
- Each row introducing a block (e.g., if) ends with “:”
- Rows within the same block are indented with the same number of spaces
- An empty block contains the keyword `pass`

In [7]:
for i in range(0, 5):
    if i < 3:
        print(i)
    else:
        pass


0
1
2


Everything is an object: numbers, lists, functions, etc.
- Not the same as Java, where int and float are not objects

Objects have attributes and methods accessed via the "." syntax
- `object.attribute`
- `object.method()`

The object type determines the available attributes and operations
- Object types are known only at execution time
- Not the same as Java, where object types are known at compile time (i.e., before execution)

The object `None` (with type `NoneType`) represents an absence of value
- As `null` in Java (which is not an object)

Python introduces different object collections
- Lists (`["cat", "cat", "dog"]`)
- Sets (`{"cat", "dog"}`)
- Dictionaries (`{"key": "value"}`)

Collections
- Can contain objects with heterogeneous types (e.g., the list `[1, "cat"]`)
- Can be nested (e.g., `["cat", ["cat", ["dog"]]]`)

Collections can be mutable or immutable
- Mutable: it is possible to add/remove/replaces elements
- Immutable: it is not possible to modify the collection (e.g., numbers, booleans, strings)

Strings (str) are immutable sequences of characters
- A character is a string with length 1, there is not char type

In [8]:
# There are no declarations, only assignments.
# Convention is to use lower_case_with_underscores
some_var = 5

# Lists
li = []         # empty list
li.append(1)    # li is now [1]
li.append(2)    # li is now [1, 2]
li.append(4)    # li is now [1, 2, 4]
li.append(3)    # li is now [1, 2, 4, 3]
li[0]           # => 1
li[1:3]   # Return list from index 1 to 3 => [2, 4]
# Examine the length with "len()"
len(li)  # => 4

# You can start with a prefilled list
other_li = [1, 2, 3, 4]

# Dictionaries store mappings from keys to values
empty_dict = {}
# Here is a prefilled dictionary
filled_dict = {"one": 1, "two": 2, "three": 3}

# Sets
empty_set = set()
# Initialize a set with a bunch of values. Yeah, it looks a bit like a dict. Sorry.
some_set = {1, 1, 2, 2, 3, 4}  # some_set is now {1, 2, 3, 4}


Control flow

In [9]:
# Let's just make a variable
some_var = 5

# Here is an if statement. Indentation is significant in Python!
if some_var > 10:
    print("some_var is totally bigger than 10.")
elif some_var < 10:    # This elif clause is optional.
    print("some_var is smaller than 10.")
else:                  # This is optional too.
    print("some_var is indeed 10.")


"""
For loops iterate over lists
prints:
    dog is a mammal
    cat is a mammal
    mouse is a mammal
"""
for animal in ["dog", "cat", "mouse"]:
    # You can use format() to interpolate formatted strings
    print("{} is a mammal.".format(animal))


some_var is smaller than 10.
dog is a mammal
cat is a mammal
mouse is a mammal


Defining some functions

In [13]:
# Use "def" to create new functions
def add(x, y):
    return x + y  # Return values with a return statement


# Calling functions with parameters
add(5, 6)  # => prints out "x is 5 and y is 6" and returns 11


11

Importing some modules

In [11]:
# You can import modules
import numpy as np
import pandas as pd


Hands on!

In [12]:
# initialize a variable `foo` with value 10

# initialize a variable `bar` with value 5

# if `foo` is higher than `bar` then print "Higher", else print "Lower"
