# 1 Python Language Basics

## Python Installation

[See the videos for the video-demos of these steps]
1.	We will install anaconda package to have python. 
    Why: Anaconda is popular because it brings many of the tools used in data science and machine learning with just one install.
 

2.	Google anaconda or, https://www.anaconda.com/distribution/


3.	Check your system to see whether you have 64-bit or 32-bit version OS installed and based on that install Python 3.8 64-bit or 32-bit.


4.	To update the installation, run:
                    
                    conda update conda
                    conda update --all


5.	From your command prompt run: 

            'jupyter notebook' or 
            'jupyter lab' [you will see it running if installed correctly]
    
    * If you have Internet connection, you can run jupyter notebook online without installing anaconda locally. 
            Go to  https://jupyter.org/try locally and then run ‘Try JupyterLab 
        

6.	Installing “jupyter-matplotlib”. This will leverage the Jupyter interactive widgets framework, jupyter-matplotlib enables the interactive features of matplotlib in the Jupyter notebook and in Jupyterlab. For more, see https://github.com/matplotlib/jupyter-matplotlib. Run the following commands:
                
                conda install -c conda-forge ipympl
                

7.	Here, we can use Python in 3 different ways:
        * **Interactive mode**:  
                From command prompt, run: ‘ipython’
        * **Jupyter Notebook mode**: 
                From command prompt, run: ‘jupyter notebook’ or ‘jupyter lab’              
        * **Script mode**: 
                Put your code in a file say, ‘file1.py’, and from command prompt, run:‘ipython file1.py’ or ‘python file1.py’



### References:
    [1] Python for Data Analysis Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney, 2nd Edn, O'Reilly 2017.
    [2] Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud, by Paul Deitel and Harvey Deitel, ISBN-10: 0-13-540467-3, 2019/2020.

## Language Basics [1]

Here, we will have an overview of essential Python programming concepts and language mechanics. 

## Language Semantics
**Indentation, not braces**: Python uses whitespace (tabs or spaces) to structure code. 
Consider a for loop from a sorting algorithm:
```python
    for x in array:
        if x < pivot:
            less.append(x)
        else:
            greater.append(x)
        
```
A colon denotes the start of an indented code block.

*Four spaces* is the standard adopted by vast majority of the Python programmers as the default indentation. 

#### Everything is an object
* An important characteristic of the Python language is the consistency of its object model. 
* Every number, string, data structure, function, class, module, and so on exists in the Python interpreter is a Python object.
* Each object has an associated type (e.g., string or function) and internal data. 
* In practice this makes the language very flexible, as even functions can be treated like any other object.


#### Comments
Any text preceded by the hash mark (pound sign) # is ignored by the Python interpreter and considered as a comment.

        example: print("Hello World!") # The rest of the text is a comment. 

#### Function and object method calls
You call functions using parentheses and passing zero or more arguments, optionally assigning the returned value to a variable:

```
result = f(x, y, z)
g()
```

Almost every object in Python has attached functions, known as *methods*, that have access to the object’s internal contents. You can call them using the following syntax:

```
obj.some_method(x, y, z)
```

#### Variables and argument passing
When assigning a variable (or name) in Python, you are creating a *reference* to the object on the righthand side of the equals sign. In practical terms, consider a list of integers:

In [6]:
a = [1, 2, 3 ,4]

In [7]:
print(a)

[1, 2, 3, 4]


In [9]:
b = a

In Python, a and b actually now refer to the same object, the original list [1, 2, 3] (see Fig. 1 below). You can prove this to yourself by appending an element to a and then examining b:

<img src="./python_language_basics/image/reference.png" width="400" border="1">

**Fig. 1**: Two references for the same object.

In [10]:
a.append(5)
b

[1, 2, 3, 4, 5]

#### Dynamic references, strong types
In contrast with many compiled languages, such as Java and C++, object *references* in Python have no type associated with them. There is no problem with the following: 

In [11]:
a = 5
type(a)

int

In [12]:
a = 'foo'
type(a)

str

In [13]:
a = 4.5
b = 2
# String formatting, to be discussed later
print('a is {0}, b is {1}'.format(type(a), type(b)))
a / b

a is <class 'float'>, b is <class 'int'>


2.25

In [14]:
#You can check that an object is an instance of a particular type using the isinstance function:
a = 5
isinstance(a, int)

True

In [15]:
a = 'foo'

In [16]:
a

'foo'

In [17]:
a.split("o")

['f', '', '']

In [None]:
a.

#### Attributes and methods
Objects in Python typically have both attributes (other Python objects stored “inside” the object) and methods (functions associated with an object that can have access to the object’s internal data). Both of them are accessed via the syntax obj.attribute_name:

```python
In [1]: a = 'foo'

In [2]: a.<Press Tab>
a.capitalize  a.format      a.isupper     a.rindex      a.strip
a.center      a.index       a.join        a.rjust       a.swapcase
a.count       a.isalnum     a.ljust       a.rpartition  a.title
a.decode      a.isalpha     a.lower       a.rsplit      a.translate
a.encode      a.isdigit     a.lstrip      a.rstrip      a.upper
a.endswith    a.islower     a.partition   a.split       a.zfill
a.expandtabs  a.isspace     a.replace     a.splitlines
a.find        a.istitle     a.rfind       a.startswith
```

#### Imports
In Python a module is simply a file with the .py extension containing Python code. Suppose that we had the following module:

```python
# some_module.py
PI = 3.14159

def f(x):
    return x + 2

def g(a, b):
    return a + b
```

If we wanted to access the variables and functions defined in some_module.py, from another file in the same directory we could do:

```python
import some_module
result = some_module.f(5)
pi = some_module.PI
```

Or equivalently:

```python
from some_module import f, g, PI
result = g(5, PI)
```

By using the **as** keyword you can give imports different variable names:

```python
import some_module as sm
from some_module import PI as pi, g as gf

r1 = sm.f(pi)
r2 = gf(6, pi)
```

#### Binary operators and comparisons

In [18]:
5 - 7

-2

In [19]:
12 + 21.5

33.5

In [20]:
5 <= 2

False

See Table 1 below for all of the available binary operators:

**Table 1**: Binary operators
<img src="./python_language_basics/image/Table1_bin_op.png" width="650" border="1">



To check if two references refer to the same object, use the `is` keyword. `is not` is also perfectly valid if you want to check that two objects are not the same:

In [21]:
a = [1, 2, 3]
b = a
c = list(a)

In [22]:
print(c)

[1, 2, 3]


In [23]:
a is b

True

In [31]:
# a is not c
print(a==c) 
a is c

False


False

Since `list` always creates a new Python list (i.e., a copy), we can be sure that c is distinct from a. Comparing with `is` is not the same as the `==` operator, because in this case we have:

In [None]:
a == c

A very common use of `is` and `is not` is to check if a variable is None, since there is only one instance of None:

In [29]:
a = None
a is None

True

#### Mutable and immutable objects
Most objects in Python, such as:

    lists, 
    dicts, 
    NumPy arrays, and 
    most user-defined types (classes), 

are **mutable**. This means that the object or values that they contain can be modified:

In [3]:
a_list = ['foo', 2.4, [4, 5]]
print(a_list)

['foo', 2.4, [4, 5]]


In [33]:
a_list[2] = (3, 4)
a_list

['foo', 2.4, (3, 4)]

**Important**: Others, like strings and tuples, are immutable:

In [1]:
a_tuple = (3, 5, (4, 5))
a_tuple

(3, 5, (4, 5))

In [4]:
a_tuple[1] = 4

TypeError: 'tuple' object does not support item assignment

### Scalar Types
Python along with its standard library has a small set of built-in types for handling numerical data, strings, boolean (True or False) values, and dates and time. These “single value” types are sometimes called *scalar types* and *scalars*. See Table 2 (below) for a list of the main scalar types. 

**Table 2**: Standard Python scalar types
<img src="./python_language_basics/image/Table2_std_scalar.png" width="500" border="1">


Date and time handling will be discussed separately, as these are provided by the datetime module in the standard library.

#### Numeric types
The primary Python types for numbers are `int` and `float`. An `int` can store arbitrarily large numbers:

In [6]:
ival = 17239871
ival ** 6

26254519291092456596965462913230729701102721

In [8]:
fval = 7.243
fval2 = 6.78e-5

In [5]:
3 / 2

1.5

Use the floor division operator `//` to get integer division that drops the fractional part:

In [9]:
3 // 2

1

#### Strings
Many people use Python for its powerful and flexible built-in string processing capabilities. You can write *string literals* using either single quotes ' or double quotes ":


```python
a = 'one way of writing a string'
b = "another way"
```

For multiline strings with line breaks, you can use triple quotes, either `'''` or `"""`:

In [None]:
c = """
This is a longer string that
spans multiple lines
"""

It may surprise you that this string c actually contains four lines of text; the line breaks after `"""` and after lines are included in the string. We can count the new line characters with the count method on c:

In [None]:
c.count('\n')

As we already mentioned, Python strings are immutable; you cannot modify a string:

In [10]:
a = 'this is a string'
a[10] = 'f'

TypeError: 'str' object does not support item assignment

In [11]:
b = a.replace('string', 'longer string')
b

'this is a longer string'

Afer this operation, the variable a is unmodified:

In [None]:
a

Many Python objects can be converted to a string using the `str` function:

In [12]:
a = 5.6
s = str(a)
print(s+' abc')

5.6 abc


Strings are a sequence of Unicode characters and therefore can be treated like other sequences, such as lists and tuples

In [13]:
s = 'python'
list(s)

['p', 'y', 't', 'h', 'o', 'n']

In [14]:
s[:3]

'pyt'

The syntax `s[:3]` is called *slicing* and is implemented for many kinds of Python sequences:

The backslash character \ is an escape character, meaning that it is used to specify special characters like newline `\n` or `Unicode` characters. To write a string literal with backslashes, you need to escape them:

In [15]:
s = '12\\34'
print(s)

12\34


Adding two strings together concatenates them and produces a new string:

In [16]:
a = 'this is the first half '
b = 'and this is the second half'
a + b

'this is the first half and this is the second half'

String templating or formatting is another important topic. String objects have a format method that can be used to substitute formatted arguments into the string, producing a new string:

In [18]:
template = '{0:.2f} {1:s} are worth US${2:d}'
template

'{0:.2f} {1:s} are worth US${2:d}'

In the above string,
* {0:.2f} means to format the first argument as a floating-point number with two
decimal places.
* {1:s} means to format the second argument as a string.
* {2:d} means to format the third argument as an exact integer.

To substitute arguments for these format parameters, we pass a sequence of arguments to the format method:

In [19]:
template.format(4.5560, 'Argentine Pesos', 1)

'4.56 Argentine Pesos are worth US$1'

#### Booleans
The two boolean values in Python are written as `True` and `False`. Comparisons and other conditional expressions evaluate to either `True` or `False`. Boolean values are combined with the `and` and `or` keywords:

In [20]:
True and True

True

In [21]:
False or True

True

#### Type casting
The str, bool, int, and float types are also functions that can be used to cast values to those types:

In [22]:
s = '3.14159'
fval = float(s)
type(fval)

float

In [23]:
int(fval)

3

In [24]:
bool(fval)

True

In [25]:
bool(0)

False

#### None
`None` is the Python null value type. If a function does not explicitly return a value, it implicitly returns None:

In [26]:
a = None
a is None

True

In [28]:
b = 5
b is not None

True

`None` is also a common default value for function arguments:

```python
def add_and_maybe_multiply(a, b, c=None):
    result = a + b

    if c is not None:
        result = result * c

    return result
```

While a technical point, it’s worth bearing in mind that `None` is not only a reserved keyword but also a unique instance of `NoneType`:

In [29]:
type(None)

NoneType

#### Dates and times
The built-in Python datetime module provides `datetime`, `date`, and `time` types. The `datetime` type, as you may imagine, combines the information stored in `date` and `time` and is the most commonly used:

In [30]:
from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21,50)
dt.year

2011

In [31]:
dt.month

10

In [32]:
dt.day

29

In [33]:
dt.hour

20

In [34]:
dt.minute

30

In [35]:
dt.second

21

Given a `datetime` instance, you can extract the equivalent `date` and `time` objects by calling methods on the `datetime` of the same name:

In [38]:
dt

datetime.datetime(2011, 10, 29, 20, 30, 21, 50)

In [39]:
dt.date()

datetime.date(2011, 10, 29)

In [40]:
dt.time()

datetime.time(20, 30, 21, 50)

The `strftime` method formats a `datetime` as a string:

In [41]:
dt.strftime('%m/%d/%Y %H:%M')

'10/29/2011 20:30'

Strings can be converted (parsed) into `datetime` objects with the `strptime` function:

In [42]:
datetime.strptime('20091031', '%Y%m%d')

datetime.datetime(2009, 10, 31, 0, 0)

See Table 3 below for a full list of format specifications:

**Table 3**: Datetime format specification (ISO C89 compatible)
<img src="./python_language_basics/image/Table3_datetime.png" width="650" border="1">


When you are aggregating or otherwise grouping time series data, it will occasionally be useful to replace time fields of a series of `datetimes`—for example, replacing the minute and second fields with zero:

In [43]:
dt.replace(minute=0, second=0)

datetime.datetime(2011, 10, 29, 20, 0, 0, 50)

Since `datetime.datetime` is an immutable type, methods like these always produce new objects.

The difference of two `datetime` objects produces a `datetime.timedelta` type:

In [44]:
print(dt)
dt2 = datetime(2011, 11, 15, 22, 30)
print(dt2)
delta = dt2 - dt
delta

2011-10-29 20:30:21.000050
2011-11-15 22:30:00


datetime.timedelta(days=17, seconds=7178, microseconds=999950)

In [45]:
type(delta)

datetime.timedelta

The output `timedelta(17, 7179)` indicates that the timedelta encodes an offset of 17 days and 7,179 seconds.

Adding a `timedelta` to a `datetime` produces a new shifted datetime:

In [46]:
dt

datetime.datetime(2011, 10, 29, 20, 30, 21, 50)

In [47]:
dt + delta

datetime.datetime(2011, 11, 15, 22, 30)

### Control Flow
Python has several built-in keywords for conditional logic, loops, and other standard *control flow* concepts found in other programming languages.

#### if, elif, and else
`If` checks a condition that, if True, evaluates the code in the block that follows:

```python
if x < 0:
    print('It's negative')
```

An `if` statement can be optionally followed by one or more `elif` blocks and a catch all `else` block if all of the conditions are False:

```python
if x < 0:
    print('It's negative')
elif x == 0:
    print('Equal to zero')
elif 0 < x < 5:
    print('Positive but smaller than 5')
else:
    print('Positive and larger than or equal to 5')
```

If any of the conditions is True, no further `elif` or `else` blocks will be reached. With a compound condition using `and` or `or`, conditions are evaluated left to right and will **short-circuit**:

In [48]:
a = 5; b = 7
c = 8; d = 4
if a < b or c > d:
    print('Made it')

Made it


In this example, the comparison c > d never gets evaluated because the first comparison was True.

It is also possible to chain comparisons:

In [49]:
4 > 3 > 2 > 1 # Try: 4 > 3 > 1 > 2, what is the output? 

True

#### for loops
`for` loops are for iterating over a collection (like a list or tuple) or an iterater. The standard syntax `for` a for loop is:

```python
for value in collection:
    # do something with value
```

You can advance a `for` loop to the next iteration, skipping the remainder of the block, using the `continue` keyword. Consider this code, which sums up integers in a `list` and skips `None` values:

In [50]:
sequence = [1, 2, None, 4, None, 5]
total = 0
for value in sequence:
    if value is None:
        continue
    total += value

print(total)

12


A `for` loop can be exited altogether with the `break` keyword. This code sums elements of the list until a 5 is reached:

In [51]:
sequence = [1, 2, 0, 4, 6, 5, 2, 1]
total_until_5 = 0
for value in sequence:
    if value == 5:
        break
    total_until_5 += value
    
print(total_until_5)

13


The `break` keyword only terminates the innermost `for` loop; any outer `for` loops will continue to run:

In [52]:
for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j))

(0, 0)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(3, 3)


If the elements in the collection or iterator are sequences (tuples or lists, say), they can be conveniently *unpacked* into variables in the `for` loop statement:

```python
for a, b, c in iterator:
    # do something
```

#### while loops
A `while` loop specifies a condition and a block of code that is to be executed until the condition evaluates to `False` or the loop is explicitly ended with break:

In [53]:
x = 256
total = 0
while x > 0:
    if total > 500:
        break
    total += x
    x = x // 2

print(total)

504


#### pass
`pass` is the “no-op” statement in Python. It can be used in blocks where no action is to be taken (or as a placeholder for code not yet implemented); it is only required because Python uses whitespace to delimit blocks:

```python
if x < 0:
    print('negative!')
elif x == 0:
    # TODO: put something smart here
    pass
else:
    print('positive!')
```

#### range
The `range` function returns an iterator that yields a sequence of evenly spaced integers:

In [54]:
range(10)

range(0, 10)

In [55]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Both a start, end, and step (which may be negative) can be given:

In [56]:
list(range(0, 20, 2))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [57]:
list(range(5, 0, -1))

[5, 4, 3, 2, 1]

As you can see, `range` produces integers up to **but not including the endpoint**. A common use of `range` is for iterating through sequences by index:

In [58]:
seq = [3, 4, 5, 6]
for i in range(len(seq)):
    val = seq[i]
    print(val)

3
4
5
6


While you can use functions like `list` to store all the integers generated by `range` in some other data structure, often the default iterator form will be what you want. This snippet sums all numbers from 0 to 99,999 that are multiples of 3 or 5:

In [59]:
sum = 0
for i in range(100000):
    # % is the modulo operator
    if i % 3 == 0 or i % 5 == 0:
        sum += i

print(sum)

2333316668


While the range generated can be arbitrarily large, the memory use at any given time may be very small.

#### Ternary expressions
A *ternary expression* in Python allows you to combine an `if-else` block that produces a value into a single line or expression. The syntax for this in Python is:

   value = *true-expr* if condition else *false-expr*


Here, *true-expr* and *false-expr* can be any Python expressions. It has the identical effect as the more verbose:

```python
if condition:
    value = true-expr
else:
    value = false-expr
```

This is a more concrete example:

In [60]:
x=5
'Non-negative' if x >= 0 else 'Negative'

'Non-negative'