# Lecture 2: Basics of Python



## Course aims
* To teach think like a programmer
    * Basic concepts
    * Google, StackOverflow



* To get you above *I can do anything* threshold, or (more probably) close to it
    * Steep part of the learning curve
    * Requires substantial investments



* There is data EVERYWHERE!
    * IES


##  Course Outline

During the course you will download the data describing IES employees, merge them with students evaluations and analyze it.

The course is roughly divided into 3 blocks:
1. **Data collection** - *Developing a parser of the IES website and transforming data into well-structured format.* Beside web-scraping you will learn the Python fundamentals, programming essentials and modern data-handling techniques.
2. **Data processing** - *Merging data with student evaluations.* This block is dedicated to the advanced data-science techniques in the specific Python environment, especially Pandas, Matplotlib and Numpy.
3. **Data analysis** - *Work on your project.* The last block would be helping to get a hands-on experience and to apply the knowledge from the first two blocks in your own project


## The Final Project
**Description:**
* Students in teams by 2
* The task is to download any data from API or directly from the web. These data should be processed and visualized in the Jupyter Notebook. The project is to be submitted as a GitHub repository
* The selection of the data is entirely up to the students. 

**Deadlines:**

* March 27th: Project Topic First Submission
* April 10th: Project Topic Final Submission
* May 31st: Project Submission (TBC)

**Evaluation Criteria:** 
1. The project use correctly downloaded data from the public API or website.
2. The data were cleaned appropriately
3. The data are visualized 
4. The project is submitted as a public GitHub repository
5. All team members collaborated on the GitHub repository (history is public in public repositories)
6. The code is readable, commented and appropriately structured
7. One ready-to-run method for downloading the data.
8. Submitted as a jupyter notebook.



# Intro
** Q: What is the difference between coder and programmer? **

** Q: What is the difference between coder's code and programmer's code? **

** Q: What is elegant code? **

# Python
> "There should be one—and preferably only one—obvious way to do it"

** Q: What is a programming language do? **

```
def hello():
    s = 'Hello World!'
    return s
    
print(hello())
```


## Python as a language
* General purpose
* Open-source
* Interpreted (not-compiled)
    * Type insecure
    * Slow
* Object-oriented
* Easily extendable
* Convinient for beginners, yet very powerful
* Huge community!


![image.png](attachment:image.png)

# Syntax
* very clean
* readable
* (almost) no special characters


## 0-index
* first element `l[0]` 
* last element `l[len(l)-1]` or `l[-1]`




## Line structure

* Basically most loved feature of Python
* Line breaks and identations is part of syntax!
* It is used to separate content of functions, classes, loops, conditions, etc. It is basically the most loved feature of Python

### Identations
* both `space` and `tab` are accepted.
* to keep things simple ALWAYS ident with `tab`!

ALWAYS PRECEEDED BY "`:`"

```
for i in range(10):
    print(i)
```

```
if i == 0:
    print('i is zero')
else:
    print('i is not zero')
```

If you still need to break a line, without triggering code changes use line joins:

### Explicit line joins
```
if 1900 < year < 2100 and 1 <= month <= 12 \
   and 1 <= day <= 31 and 0 <= hour < 24 \
   and 0 <= minute < 60 and 0 <= second < 60:
   pass
```

### Implicit line joins
Expressions in parentheses `()`, square brackets `[]` or curly braces `{}` can be split over more than one physical line without using backslashes. For example:

```
month_names = ['January', 'February', 'March',      
               'April',   'May',      'June',       
               'July',    'August', 'September',  
               'October', 'November', 'December']

```
## Keywords and built-in functions
Do not use as a variable name!

In [None]:
import keyword
print(keyword.kwlist)
print(dir(__builtins__))

# Built-in Data Types
## Numerical

Python differentiates 4 built-in numerical types: Integers, Floats, Longs and Complex

We will only consider Integers and Floats as you will most probably use only these two.

In [None]:
integer = 4
print('Integer: ',integer,type(integer))
floatn = 4.0
print('Float: ',floatn,type(floatn))

** Q: What if floats and integers are combined? **

** Q: Why not to use only floats? **


### Python standard operators

* + 
* - 
* *
* /

but also:

* \*\* (exponent)
* %  (modulus)
* // (floor division)


## Binary variables
`False` ... `0`

`True` ... `1`

What happens if:
```
b = True
print(b) 
print(not b) 
print(b + 1)
print(not b -1)
print(not 10)
```

### NO NEED TO REMEMBER THESE RELATIONS!!

Only one, but very important implication:
> These nuances matter a lot!

> ALWAYS PROCEED STEP BY STEP!

"I still look forward for moment when I write larger chunk of code and it will work on the first run."

## Comparisons

![image.png](attachment:image.png)

## Assignments
![image.png](attachment:image.png)

In [None]:
x = 5

## Control flows

### For loop

### If statement

# Sequences

* All of them can be iterated over (technically objects with implemented `__iter__()` and `__next__()` functions) - you can make your own iterables!

### Strings
* A list of characters
```
"'"
```
or 
```
'"'
```
or
```
""" this is 
multi-line
string"""
```


In [None]:
s = 'Institute of Economic Studies, Faculty of Social Sciences, Charles University in Prague'
s

You can subset using `[]`:

In [None]:
s[31:38]

`Split` strings by another strings:

In [None]:
for part in s.split(', '):
    print(part)

`Format` function is very useful for working with strings!

In [None]:
s1 = '<NAME>'
s2 = '<REASON>'

'The best teacher on IES is {}. He is cool because {}'.format(s1,s2)

You can `replace` parts of strings by another strings:

`strip()` function is useful for removing white-spaces from the beginning and end of the string:

In [None]:
'   IES FSV UK    '.strip()

## List

In [None]:
l = ['Gregor','Horváth','Baruník','Bauer','Havránek','Janda','Janský']
for el in l:
    print(el)

### Subsetting sequences:
first element:

last element:

third to fifth element

first three elements

Lists combine any object types! Be careful and think in advance!

In [None]:
l + ['Kukačka']

In [None]:
l.append('Kukačka') # Beware inplace operation (TRY TO RUN SEVERAL TIMES)
l

### Testing membership

In [None]:
'Kukačka' in l

### List of lists
can be table:

In [None]:
tbl = [['Name','role','age'],['Martin Gregor','director of IES',18],['Jozef Baruník','econometric guru',15]]
tbl

### List comprehension

convinient way to write simple for loops on lists in one line

Example:

```
l = []
for empl in tbl:
    if empl[0] != 'Name':
        l.append('{} ({})'.format(empl[0],empl[1]))
l
```

### Tuples
 * immutable lists
 * constants and sets of information

In [None]:
coord = (50.082,14.431)

## Dictionaries
* extremely important datatype for more complex data structures
* key-values pair
* values can be any object - strings, numbers, but also other dictionaries and lists
* unsorted

In [None]:
greg = {'name':'Martin Gregor','role':'director of IES','age':18,'courses':['JEM013','JEB064']}
greg

In [None]:
bar = {'name':'Jozef Baruník','role':'econometric guru','age':15,'courses':['JEM005','JEM116','JED414','JED415','JED412','JEM059','JEM061']}
list_of_dicts = [greg,bar]
list_of_dicts

#  Functions

** Q: What is a function? **

* function as a basic lego-tile of programmes.

* Most often it will `return something`. The better your code, the more useful `something` is.

* in a good code all of the functionality is hidden in a various functions (and classes).

```
def functionName(input1,input2):
    do something ...
    return result
```

In [None]:
def describeEmployee(d):
    '''
    retrieves important information about the teacher from the input dictionary and return it as a string
    
    Input: dictionary with the name, role and age and courses keys
    Output: String with information
    '''
    result = '{}, {}, is at least {} years old. '.format(d['name'],d['role'],d['age'])
    
    result += 'He teaches {} courses. '.format(len(d['courses']))
    
    if len(d['courses']) > 5:
        result += 'Probably he is a teaching-superhero!'
    return result

In [None]:
for el in list_of_dicts:
    info = describeEmployee(el)
    print(info)

### How to write functions?

* Function should do just one thing!
* Use wrapping functions!

** Example **

We need to aggregate data and plot it.

We will write three functions:

```
def aggregateData(param1,param2,data):
    ... perform aggregation ...
    return aggregData
    
def plotAggregatedData(aggregData,plotParam1,plotParam2):
    ... perform plotting ...
    return plot
    
def plotAndAggregateData(data,param1,param2,plotParam,plotParam2):
    #aggregate data
    aggregData = aggregateData(param1,param2,data)
    
    plot = plotAggregatedData(aggregData,plotParam1,plotParam2)
    return (plot,aggregData)
```


### Default parameter values

If interested see *args or *kwargs - passing variables as list or dictionary

# Basic programming principles

Program is a set of instructions combined with data.

> *“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”* (Antoine de Saint Exupery)

1. Set naming standards and keep it - i.e. data object for teachers will be dictionary with keys 'name','age' and 'role'.
3. Separate algorithmic logic from the data - functions should be *general*, yet targeted on specific purpose.
4. Programmer is lazy
5. Plan before you start! Be sure you know how you will proceed before actual coding
6. Art of programming is essentialy art of googling.

Your variables names as well as function names should really describe their content - do not use `var1` etc. 

Names like `data` are OK when you have know you have the only source of data. But the more complex the program is, the more you need 


## Scope of variables

Is my variable visible outside of the function? 

![image.png](attachment:image.png)
### Global variables

In [None]:
a = 5
def doSomethingGlobal():
    global a
    a = a + 2
doSomethingGlobal()
a

### Local variables

In [None]:
a = 5 
def doSomethingLocal(a):
    return a + 2
a = doSomethingLocal(a)
a

### Many types are assigned as a reference!!!

In [None]:
x = ['Horváth','Baruník','Gregor']
y = x

y[1] = 'Kukačka'
x

# Error Handling

* GOOGLE!!!!!
* Computer is always right, it is you who did not understand the computer

* If error produced, do not panic! Read it!

* Worse problems when no errors, yet unexpected results - the mistake would be in wrong understanding of the logic => DEBUGGING!!!

Most common errors:
1. Incorrect identation (SyntaxError) - automatically corrected in Jupyter, but still do not do it!
2. Incorrect values - the program expects different values (ValueError)
3. Non-existing key in dictionary (KeyError)
4. assignment operator `=` instead of `==` (SyntaxError)
5. Variable not found (NameError) - remember Python is case-sensitive!
6. Zero-indexing!!!! 



In [None]:
for i in range(5,-5,-1):
    print(5/i)

### Try and except

In [None]:
for i in range(5,-6,-1):
    try:
        print(5/i)
    except Exception as e:
        print(e)