#### DS133: Business intelligence with advanced spreadsheet

# Introduction to Python and Jupyter notebook

**What is Python?** Python is an interpreted, high-level and general-purpose programming language (Wikipedia). It is a pretty versatile language such that you can build anything. It is open source and free: vast amount of application in almost every field. For instance, you could be using Python:
- to do simple calculations, i.e., use as a calculator
- to work on data for analysis
- to build websites
- to build applications

In this case, we will cover some of the basics of Python for the purpose of data analysis. If you would like to explore and find more materials, check *[here](https://wiki.python.org/moin/BeginnersGuide/NonProgrammers)*. 

## Jupyter notebook

Python can be run in many different environments. In this class, we utilize one of the most commonly used environment called Jupyter Notebook.  
*The Jupyter Notebook* is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more (from [Jupyter.org](https://jupyter.org/)).

This case live and is written Jupyter notebook such that you can edit the contents and run the analysis. The notebook consists of cells. There are two types of cells.
> **Markdown cell**: A cell that supports Markdown: a lightweight markup language that you can use to add formatting elements to plaintext text documents ([markdownguide](https://www.markdownguide.org/getting-started/#why-use-markdown)) it has many applications such as creating websites, documents, notes, books, presentations, email messages, and technical documentation. 

In fact, this particular text is composed in Markdown cell. So, in general, we will use Markdown cell for narrative text. There are humongous ways to format your narrative cell (like Word), we will cover some of them along the way. If you want to explore more [click here](https://www.markdownguide.org/basic-syntax/). 

> **Code cell** This cell contains the actual codes to run. 

You easily change cell configuration from top menu in Jupyter notebook: ![image.png](attachment:1e8647da-dc5e-4d36-aad9-65d87f76728d.png)

**So why Jupyter notebook for data analytics/science?** In short, a Jupyter notebook is a document that supports mixing executable code, equations, visualizations, and narrative text ALL IN ONE PLACe. 

*Illustration*: For a moment, assume that you are working on a project that involves lots of data analysis and custom visualizations. Furthermore, suppose you are using JMP. Now, most likely you have been using MS word to write your analysis results supported by many plots and charts that are created in JMP. This means we would be moving/copying/pasting tables and charts from JMP to Word. Also, sometimes we find out some mistakes or errors in our analysis that was already copied to Word (it happens to me a lot). In this case, we have to redo our analysis, create the charts and move them back to Word again. 

Jupyter notebook gives a live notebook. This means, we can code our analysis, create the charts and write them up in one document. When we make a mistake, we can easily fix it by modifying the code. 

## Let's start coding

We need to use code cell to start coding. In code cell, if you put \# in front of a line, it means comment (not executable). Indention is very important in Python: improper indention results in error. Also, commands are case sensitive. 

To run a code cell, click play button on top, or press `Shift - Enter`

In [3]:
# basic operations
# plus
print(5+10)
# minus
print(5-10)
# multiply
print(5*10)
# divide
print(5/10)
# power
print(5**2)

15
-5
50
0.5
25


Strings or text is coded as in `'text'` or `"text"`. Complete the rest of the cell by printing operations before the answer. 

In [7]:
# basic operations
# plus
print('5+10 =', 5+10)
# minus
print('5-10 =', 5-10)
# multiply ....complete the rest

5+10 = 15
5-10 = -5


### Variables

Instead of using absolute number, we can assign variable to a value. This is similar to referencing in Excel cell when using formulas instead of directly using the values in formula bar. Note that variables are case sensitive, for instance $a \neq A$.

In [10]:
# These examples of a variable
a= 5
A= 10
print('a+A = ', a+A)

a+A =  15


***Exercise 1***: Calculate the BMI for a person

Body Mass Index can be calculated as: 
$$
BMI=\frac{weight}{height^2}
$$

Define two variables `weight` and `height`. And define another variable `bmi` based on the formula above. Calculate and print out bmi for values `weight=72`kg and `height=172`cm. 

#### Variable types

You can use `type(variable)` to see type of variable. Some common variables are:
* `float`: real number with decimals
* `int`: integer of whole number
* `str`: string
* `bool`: True or False

In [17]:
weight=80
type(weight)
print('type of weight is ',type(weight))
height=170
print('type of height is ',type(height))
bmi=weight/height**2
print('bmi = ', bmi)
print('type of bmi is ',type(bmi))
love_exp='I love you'
print('type of love_exp is ', type(love_exp))

type of weight is  <class 'int'>
type of height is  <class 'int'>
bmi =  0.002768166089965398
type of bmi is  <class 'float'>
type of love_exp is  <class 'str'>


***Exercise 2***

Try adding `weight+height`, `love_exp+love_exp` and `weight+love_exp`. Comment on what happens.


### Lists

Lists are the collection of values. They contain any value, different types of variables and even other lists. 

In [24]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Adapt list areas
areas = ["hallway",hall, "kitchen",kit, "living room", liv,"bedroom", bed, "bathroom", bath]

# Print areas
print('areas = ', areas)
print('type of areas is ', type(areas))

areas =  ['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'bedroom', 10.75, 'bathroom', 9.5]
type of areas is  <class 'list'>


In [31]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# house information as list of lists
house = [["hallway", hall],
         ["kitchen", kit],
         ["living room", liv],
         ["bedroom", bed],
         ["bathroom",bath]]

# Print out house
print(house)

# Print out the type of house
print(type(house))
    

[['hallway', 11.25], ['kitchen', 18.0], ['living room', 20.0], ['bedroom', 10.75], ['bathroom', 9.5]]
<class 'list'>


#### Subsetting lists

List indexing starts at 0 in Python. We use [ ] to subset and slice the list for data manipulation:

![image.png](attachment:76797e93-8a91-4c49-a1af-5a0c25a89dcf.png)

Some ways, 
- `list[k]` results (k)-th element. Remember indexing starts at 0, this actually (k+1)-th element
- `list[-k]` results (k)-th element starting from last element.  
- `list[:k]` results first k elements of the list 
- `list[k:]` results last k elements of the list 

In [30]:
areas[3]

18.0