# Introduction to Python for Scientific Computing

## Tue Vu, PhD
AI & ML Research Scientist
Advanced Research Computing & Data Science, SMU OIT

## Outline

1. Introduction to Python
2. Variables, types, operators, and expressions
3. Input / output
4. Control structure
5. Functions
6. Modules, import, and the Python ecosystem
8. Numpy
9. Pandas
10. Using conda environment in M2
11. Plotting with Matplotlib & Seaborn

## 1. Introduction to Python
### History
Python was founded by Guido Von Rossum in Netherlands in 1989, it was named after the BBC TV show **Monty Python's Flying Circus**

![image.png](attachment:47a580f2-c0c6-4719-9546-5a35f3f81bd6.png)

### Data Science Trends based on 4 years of Kaggle surveys
Kaggle is the famous online competition platform for Data Scientist, found in Melbourne, Australia. It was acquired by Google in 2017. Kaggle enables data scientists and other developers to engage in running machine learning contests, write and share code, and to host datasets

#### Programming Languages Used
78% reported using Python at work across 4 years of survey

![image.png](attachment:9df4cda9-b399-4cd3-835d-46b136d2f8f0.png)[link](https://towardsdatascience.com/data-science-trends-based-on-4-years-of-kaggle-surveys-60878d68551f)



### Platform for Python
#### Anaconda 
https://www.anaconda.com
- The distribution of Python & R for scientific computing. Suitable for Windows, Linux, MacOS.
- Found in 2012 by Peter Wang and Travis Oliphant
- It also includes the GUI - Anaconda Navigator, alternative to CLI
   
[image.png](attachment:7acb8928-84f5-40e8-a8d6-dd5414904eca.png)


##### Jupyter Notebook
- The most popular Integrated Development Environments (IDE) for Python
- Browser-based notebooks (integration of python intepreter, markdown & graphics)
- Built in to Anaconda Navigator and also available in SMU HPC system M2 via Open OnDemand (hpc.smu.edu)


#### Python Hello world

In [None]:
print("hello world")

#### Running program in Jupyter Notebook

- The Notebook cell can have 3 types: Markdown, Code and Raw. It can be selected from the dropdown menu
- A cell is **executed** via ```Shift+Enter```
- Any line starting with ```%``` or ```%%``` is a "magic" function in ipython/jupyter, not part of Python language

#### Exercise
Create a helloworld.py and run it in Jupyter Notebook
!python helloworld.py
%run helloworld.py

## 2. Variables, types, operators, and expressions
### Assigning Variables
To assign variable in python, we use ```=``` sign

In [None]:
str1 = "This is a string"
five = 5
one = five - 4
five_square = 5**2

print(str1)
print(five)
print(one + one)
print(five_square)

### Data types
Python provides a number of built-in data types.

- int (integer)
- float (floating point)
- bool (True or False)
- complex (complex numbers), e.g., 3 + 4j
- str (string), e.g., "message", '3.14159'


In [None]:
print(type(five))
print(type(five*1.1))
print(type(five<6))
print(type(complex(five,4)))
print(type("five"))


### Collector of data: lists, dicts, sets, tuples
#### Lists
- Lists are used to store multiple items in a single variable
- In python the index starting with 0
- Lists are written with ```[]```, separated by ```,```


In [None]:
list1 = [five,4,one]
print(list1)
print(list1[0])
print(type(list1))


#### Dictionary
- Used to store data values in key:value pairs.
- A dictionary is a collection which is ordered*, changeable and do not allow duplicates.
- Dictionaries are written with curly brackets, and have keys and values:
    

In [None]:
dict_state = {'California':39613493,'Texas':29730311,'Floria':21944577,'New York':19299981,'Pennsylvania':12804123,
             'Illinois':12569321,'Ohio':11714618,'Georgia':10830007,'North Carolina':10701022}
print(dict_state)
print(dict_state["Texas"])
print(type(dict_state))


In [None]:
for name, population in dict_state.items():
    print(name,population)

#### Set
- Sets are used to store multiple items in a single variable.
- A set is a collection which is unordered, unchangeable*, and unindexed.
- Sets are written with curly bracket ```{}```


In [None]:
set1 = {five,one,one,five}
set2 = {1, 6, 7, 8,9}
print(set1)
print(type(set1))
print(set1.union(set2))
print(set1-set2)
print(set1.intersection(set2))


In [None]:
for s in set2:
    print(s)

#### Tuples
- Tuples are used to store multiple items in a single variable.
- A tuple is a collection which is ordered and unchangeable.
- Tuples are written with parenthesis ```()```


In [None]:
tup1 = (five,one)
print(tup1[1])
print(type(tup1))

### Operators


| Operator    |    Operation    |    Example 1   | Example 2 |
|--------|-----------|-------------------|-----------------|
| + | Addition | 1 + 1 &rarr; 2 |1 + 1.0 &rarr; 2.0 |
| - | Subtraction | 5 - 3 &rarr; 2 |5 - 3.0 &rarr; 2.0  |
| * | Multiplication | 5 * 2 &rarr; 10 |5 * 2.0 &rarr; 10.0   |
| / | Division | 4 / 2 &rarr; 2.0 |4.0 / 2 &rarr; 2.0  |
| ** | Exponentiation (power) | 2 ** 3 &rarr; 8 |2.0 ** 3 &rarr; 8.0 |
| % | Modulo (remainder) | 5 % 2 &rarr; 1 |5.0 % 2 &rarr; 1.0 |
| // | Floor division | 5 // 2 &rarr; 2 | 5.0 // 2 &rarr; 2.0 |



#### Concatenation for operators
Operators have been "overloaded" to allow operations in other contexts deemed meaningful, e.g., 

```'ABC' + '123'``` &rarr; ```'ABC123'```   # string concatenation

```[1,2,3] + [4,5,6]``` &rarr; ```[1,2,3,4,5,6]```   # list concatenation

```'ABC' * 3``` &rarr; ```'ABCABCABC'```  # string repetition

```[1,2,3] * 3``` &rarr; ```[1, 2, 3, 1, 2, 3, 1, 2, 3]```   # list repetition

```set1 - set2```    # return all elements in set1 not in set2

### Relation

| Operator    |    Operation    |    Example |
|--------|-----------|-------------------|
| == | Equality test | 1 == 1 |
| != | Not equal test | 1 != 2 |
| < | Less than | 1 < 2 |
| > | Greater than | 2 > 1 |
| <= | Less than or equal to | 2 <= 2 |
| >= | Greater than or equal to | 3 >= 2 |
| not | Negation | not False |
| and | Logical and | (1 == 1) and (3 > 2) |
| or  | Logical or | (2 < 5) or (3 < 1) |



## 3. Input/Output
### Screen output
- ```print```: the most popular way to print to screen

In [None]:
print("My number is ", five)        # print two objects of different types
print("My number is " + str(five))  # print a concatenated string
print("My numbers are {} and {}".format(five,one))  # print a formatted string
print("The sum of {} and {} is {}".format(five,one,five+one))  # print a formatted string



### Screen input
Read a string from the screen (optionally, with a prompt):

In [None]:
x = input("Enter a number:")
print(x)
print(type(x))

In [None]:
x = float(input("Enter a number:"))
print(x)
print(type(x))

### Read and write from/to file
```open``` function returns a file object

In [None]:
# Open a new file to write in
f = open('textfile.txt','w')
for n in range(five):
    f.write(str(n)+'\n')
f.close()   

In [None]:
# Open a file to read in
f = open('textfile.txt','r')
for line in f:    
    print(line)    
f.close()    

### 4. Control structure
- ```if, then, else```
- ```for loop```
- ```while loop```
- ```break``` and ```continue```

Note: indentation is sensitive in python

#### if:
    if condition:
        statementA
        

In [None]:
a = 5
if (a>3):
    print("a is bigger than 3")


#### if/else

    if condition:
        statementA
    else:
        statementB
    statementC



In [None]:
a = 2
if (a>3):
    print("a is bigger than 3")
else:
    print("a is NOT bigger than 3")

#### if/elif/else

    if conditionA:
        statementA
    elif conditionB:
        statementB
    else:
        statementC
    statementD

In [None]:
a =3
if (a>3):
    print("a is bigger than 3")
elif (a==3):
    print("a equals to 3")
else:
    print("a is less than 3")


#### For loop
    for (iterator in sequence):
        do task
    
range(): is a built-in function that constructs an iterable object that produces a sequence of integers, and is very useful for loops based on simple counting.    

In [None]:
for i in range(1,5):
    print(i)

#### While loop
    while (this condition is true):
          do a thing
          increase in step


In [None]:
a=1
while (a<5):
    print(a)
    a+=1

#### break / continue

* **break** - jump out of a loop completely and go to the next statement
* **continue** - skip the rest of the statements in the loop and return to the beginning of the loop for the next iteration


In [None]:
for val in "hello world":
    if val == "o":
        break
    print(val)

In [None]:
for val in "hello world":
    if val == "o":
        continue
    print(val)

### 5. Functions

A function is a set of scripts organized together to carry out a specific task. Writing efficient functions is an important skill that can significantly improve the productivity of data scientists and data science solutions. In this guide, you will learn the basics of writing a function and the types of functions, which will enable you perform analytical tasks more efficiently.

**Syntax:**

    def name_of_function ( ⟨ arguments ⟩ ):
        # function body, indented consistently   
        return result  # result is returned when this statement is encountered

* the keyword ```def``` indicates a function definition
* the type of neither the inputs nor the output are declared
* the argument variables are internal to the function
* if there is no return statement, the function returns the object ```None```
* any variables defined within the function body are not accessible outside the function
* function is called using the parentheses operator ```()```


In [None]:
def callme():
    print('Hello')

callme()

In [None]:
def f(a,b,c):
    x = a + b//6 + 300%c
    return x

f(10,52,31)

In [None]:
def f(a,b,c=20):
    x = a + b//6 + 300%c
    return x

f(10,52)

In [None]:
def square_sum(a,b):
    sq = a**2
    total = sq+b
    return sq,total
    
square_sum(49,50)

### 6. Modules, import


Everything we have done so far has used built-in data types and functions.  But a huge part of the power of Python is the availability of external packages and libraries that we can **import** into our program to have access to their functionality.

In Python terminology, the object that we import is a **module**, and objects in that module can be accessed with the dot operator (as we do with other objects).



In [None]:
import string
print(string.ascii_letters)


In [None]:
from math import pi
print(pi)

#### Importing your own modules
You can import your own written module, written in your file: ```myfile.py``` that contains your functions. 

    import myfile
    myfile.function1()

Note ```myfile.py``` is in your Python ```sys.path```

### 7. Numpy

- Numpy[https://numpy.org/] stands for Numerical Python
- Numpy is an open source Python library used for scientific computing and provides a host of features that allow a Python programmer to work with high-performance arrays and matrices.
- Nearly every scientist working in Python draws on the power of NumPy.
- NumPy brings the computational power of languages like C and Fortran to Python, a language much easier to learn and use. With this power comes simplicity: a solution in NumPy is often clear and elegant.
- To use numpy, just import it to your library

In [None]:
import numpy as np # imports numpy, but calls it np

mylist = [1, 2, 3, 4]
myarray = np.array(mylist)
print(myarray)
print(type(myarray))

#### Random number



In [None]:
r = np.random.random(20)
print(r)

#### Operation

In [None]:
print(r.sum())
print(r.mean())
print(np.max(r))

#### Numpy Multidimensional array (matrix)

In [None]:
m1 = np.array([[1,2,3,4], [4,5,6,7], [7,8,9,10]])
print(m1)
print(m1[1,2])
print(m1[2:,:2])
print(m1.shape, m1.dtype)

In [None]:
m2 = np.random.rand(3,4)

In [None]:
print(m1+m2)
print(m1*m2)


#### Other popular Numpy function


In [None]:
x = np.linspace(0., 1., 11)
x

In [None]:
y = np.arange(0,40,2)
y

In [None]:
print(np.zeros([4,5]))
print(np.ones([4,5]))

In [None]:
np.eye(5)

#### For Loop with numpy matrix

In [None]:
ny = y.reshape(4,5)
ny

In [None]:
for row in ny:
    print(row)

In [None]:
for row, i in enumerate(ny):
    print('row:', row)
    print('index value',i)

In [None]:
for i,j in zip(m1,m2):
    print(i,'*',j,'=',i*j)

### 8. Pandas

- Pandas[https://pandas.pydata.org/] stands for “Python and data analysis” and “panel data”.

- Pandas is an open source Python library specialize in data structures and operations for the manipulation of numerical tables and time series

- Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, it is built on top of the Python programming language.

- To use pandas, one needs to install and import it into the library

        pip install pandas
        import pandas as pd
    

In [None]:
import pandas as pd # imports pandas, but calls it pd

#### Pandas Series
Pandas Series is a multidimensional numpy array with array index on the first column and array value on the second column

In [None]:
s1 = pd.Series([5,4,3,2,1,0])
s1

To retrieve index and its values:

In [None]:
s1.index
s1.values
print(type(s1.index))
print(type(s1.values))

The index can be changed for more meaning full Series:

In [None]:
s1.index=['Tiger','Cow','Polar Bear','Mustang','Lion','Dragon']
s1

Or a new pandas Series can be created using index:



In [None]:
s2 = pd.Series([1,2,3,4,5,6],index=s1.index)
s2

One can add the 2 Series if they have similar index:

In [None]:
s3 = s1*s2
s3

To access index of a Series:



In [None]:
print(s3["Mustang"]) # using index namge
print(s3[3])         # using index number
print(s3.iloc[3])    # using index location
print(s3[s3>5])      # using filter

Insert an index value

In [None]:
s3['Nothing'] = np.nan
s3

Catch a null value

In [None]:
s3.isnull()

Working with nul value:

In [None]:
# Assign null value with a constant
s3[s3.isnull()]=1
s3.fillna(1)

# Drop null value
s3.dropna()

Apply function over Series

In [None]:
s3.apply(np.log10)

For longer function, we can use ```lambda``` functions. For example, if there are less than 7 available, we will add the value by 10:

In [None]:
s3.apply(lambda x: x if x>7 else x+10)

#### Pandas DataFrames
- Tabular data as you would find in a spreadsheet or csv-formatted file
- Each column is a Series, with a particular type 
- Row and column labels (df.index and df.columns)
- Rows and columns can be indexed (accessed) by labels or position
- Follows similar logic as NumPy: axis=0 (rows) and axis=1 (columns)

In [None]:
dates = ['2019-06-01', '2019-06-02', '2019-06-03', '2019-06-04', '2019-06-05', '2019-06-06', \
         '2019-06-07', '2019-06-08', '2019-06-09', '2019-06-10']
observers = ['Bob', 'Carol', 'Ted', 'Alice', 'Bob', 'Alice', 'Ted', 'Alice', 'Bob', 'Carol']
temperatures = np.round(list(70 + (10.*(np.random.random(10)-0.5))), 1)
rainfall = [0.,0.12,0.11,0.,0.51,0.43,0.02,0.,np.nan,0.32]


In [None]:
df = pd.DataFrame(zip(dates,observers,temperatures,rainfall),
                   columns=['Date', 'Observer', 'Temperature', 'Rainfall'])
print(df.dtypes)
df

In [None]:
df.columns

In [None]:
df['Date']=pd.to_datetime(df['Date'])
df.dtypes

In [None]:
df['Temperature']

In [None]:
df[['Date','Rainfall']]

In [None]:
df.loc[0:5,'Temperature']

In [None]:
df.iloc[0:4,2:4]

In [None]:
df.groupby('Observer').mean()

In [None]:
df.set_index(["Date"])

In [None]:
df['Temp/Rainfall']=df['Temperature']/df['Rainfall']
df

In [None]:
df.to_csv('mydf.csv')