# Introduction

## Course: Programming and Data Management (EDI 3400)

### *Vegard H. Larsen (Department of Data Science and Analytics)*

## Syllabus

1. **Python for everybody: Exploring data using Python 3** by *Charles R. Severance* 
    - Available online [here](https://www.py4e.com/book)
2. **Introduction to Python for Econometrics, Statistics and Data Analysis** by *Kevin Sheppard*
    - Available online [here](https://www.kevinsheppard.com/files/teaching/python/notes/python_introduction_2021.pdf)
3. **SQL queries for mere mortals: A hands-on guide to data manipulation in SQL** by *John L. Viescas*


## Notes on the Syllabus

**Severance:** 
- Start with this book if you are new to Python.

**Sheppard:**
- This is a more advanced text. 
- Note that this book is relying on Numpy being imported from the beginning of the text. 
- We will not cover Numpy until a bit later in the course. 
- Also, I will use Numpy in a slightly different way where Numpy is imported using the namespace `import numpy as np`. 
- If this is confusing, just ignore it for now, it will be clear when we get the Numpy part of the lectures.

**Viescas:**
- This book covers much more than we will use in this course. 
- We will only cover topics from the first sections of this book. 
- The rest is optional reading and a good reference for later use.

## Software we will use

- Much of the coding we will do in this course will be done using the **Ed** platform. 
- You will interact with both Python and SQL through this platform using a web-browser and you do not need to install anything on your computer to use it.
- However, we will also use some other software that I advice you to install on your computer.
- Managing to install this software is an important skill. 

The additional software we will use is:

1. [Python](https://www.python.org) through the [Anaconda](https://www.anaconda.com/) distribution
2. [SQLite](https://www.sqlite.org/) - Is available with the Anaconda installation
3. [DB Browser for SQLite](https://sqlitebrowser.org/)

## Course outline

**Today**: Introducing 

**Lecture 2 - 5**: Programming with basic Python 

**Lecture 6 - 8**: 3rd party Python libraries

**Lecture 9 - 10**: Advanced topics

**Lecture 11 - 13**: Databases with SQL and Python

**Lecture 14**: Summing up and Q&As

## Prerequisites and learning goals

No previous experience with programming is required.

**Learning goals:**
- Understand and be able to write computer programs in Python
- Be able to clean, visualize and analyze data using Python libraries
- Interact with data that is stored on disk or in an SQL database

# 1. Introduction to Programming, Python and the Jupyter Notebook

## Computer programming

- A computer program is a set of instructions that tells a computer how it can perform a specific task
- The instructions or recipe for a given task must be written in a programming language that a computer can understand and execute
- Many programming languages exist and they have different strengths and weaknesses (E.g., C, C++, Java, Matlab, R)
- We will use **Python** in this course

## What is Python?

Python is a versatile, high-level programming language renowned for its simplicity, readability, and broad applicability. Designed with the philosophy that code should be easy to write and even easier to read, Python's intuitive syntax closely resembles plain English, making it an ideal language for beginners. Yet, its power and flexibility ensure its prominence in diverse fields, from web development and data analysis to artificial intelligence and scientific computing. Python offers a gentle introduction to coding principles while providing a robust platform for future exploration and innovation in the vast landscape of technology.

- A high-level, general-purpose, interpreted programming language
- Developed by *Guido van Rossum* and first released in 1991
- Free and open source with a huge ecosystem of libraries
- Easy to learn compared to many other languages and also the most used programming language today (https://pypl.github.io/)
- Used for a wide variety of applications
    * E.g., Web development, games, data science, and automation
      
## Some challenges with learning Python?

- Can be run in many different ways on your computer which can be confusing for beginners
- It is possible (and common) to have more than one version of Python installed at the same time
    - This is usually an advantage but can create problems for beginners
- In some cases Python behaves differently on different operating systems (Windows vs. MacOS vs. Linux)

## The basic Python interpreter:

- Can be downloaded from: www.python.org
- Python is often pre-installed on macOS and Linux distributions
- The Python interpreter is used for running Python code 
- Not the best way for us to interact with Python, but useful for some simple tasks allowing for fast and easy access to Python 
- Python code files has the extension `.py`

## Installing Python using Anaconda

- We will use the Anaconda distribution to install Python 
- Anaconda is specifically created for scientific computing
- Includes some important 3rd party libraries useful for data science
- Makes it easy to install and manage additional libraries
- Anaconda also supports other languages such as [R](https://cran.r-project.org/) and [Julia](https://julialang.org/)

# 2. Working with the Jupyter Notebook

## [The Jupyter Notebook](https://jupyter.org/)

- This will be our main tool for Python programming in this course
- The Jupyter Notebook is available in Ed 
- Access to Python through a web-browser such as Chrome
- It allows us to run each piece of the code individually and make modifications and see the results right next to the code
- The Jupyter Notebook can also be installed together with the Anaconda distribution
- Everything in this course can be done within Ed. Hovewer, I recomend everyone to install Anaconda on your own computer.

## Installing Anaconda
Installing Anaconda is a relatively straightforward process. Anaconda is a popular distribution of Python and other tools, which makes it easy for users to work with Python libraries and manage various packages. Below are detailed instructions for installing Anaconda on both Windows and Mac:

**Installing Anaconda on Windows:**

1. **Download the Installer:**
   - Go to the [Anaconda Distribution page](https://www.anaconda.com/products/distribution).
   - Click on the "Download" button for Windows.
   - Select the version for Python 3.11 (recommended).

2. **Run the Installer:**
   - Locate the downloaded `.exe` file, typically in your "Downloads" folder. It will be named something like `Anaconda3-2023.x-Windows-x86_64.exe`.
   - Double-click on the `.exe` file to launch the installer.

3. **Follow the Installer Prompts:**
   - Click "Next" on the welcome screen.
   - Read and accept the license agreement.
   - Choose an install location (default is usually fine, but you can change it if you wish).
   - It's generally recommended to check the box "Add Anaconda to my PATH environment variable" for beginners, though the installer advises against it. This just makes it easier to run Anaconda from the command prompt.
   - Choose "Install".

4. **Wait for Installation to Complete:**
   - The installation may take several minutes. Once it's complete, click "Next".

5. **Finish Installation:**
   - After installation, you'll be presented with some options to learn more about Anaconda or get started with an Anaconda project. You can explore these or simply click “Finish”.

6. **Launch Anaconda Navigator:**
   - You can find Anaconda Navigator on your desktop or in your installed programs list. Launch it to access various tools and applications like Jupyter Notebook, etc.

**Installing Anaconda on macOS:**

1. **Download the Installer:**
   - Visit the [Anaconda Distribution page](https://www.anaconda.com/products/distribution).
   - Click on the "Download" button for macOS.
   - Select the version for Python 3.11.

2. **Run the Installer:**
   - Locate the downloaded `.pkg` file in your "Downloads" folder.
   - Double-click on the `.pkg` file to launch the installer.

3. **Follow the Installer Prompts:**
   - Click "Continue" and follow through the various prompts.
   - Accept the license when prompted.
   - Choose an install location (default is usually fine).
   - Click "Install".

4. **Wait for Installation to Complete and Finish:**
   - After installation, click "Close".

5. **Launch Anaconda Navigator:**
   - Use Spotlight (Cmd + Space) and type "Anaconda Navigator", or navigate to the Applications folder and find it there.

**Tips for All Installations:**

- After installation, it's good practice to check if everything was installed correctly. One way to do this is by launching the Anaconda Prompt (on Windows) or Terminal (on macOS) and typing `conda list`. If you see a list of packages, Anaconda is successfully installed.
- Anaconda Navigator is a graphical interface that allows you to launch applications (like Jupyter Notebook) and manage packages/environments. It's beginner-friendly and a good place to start.

Remember, it's okay to ask for help or look for additional resources online if you're unsure about any step.

## Other related and complementary tools

These are available with the Anaconda instalation but will not get much attention in this course:

- [IPython](https://ipython.org/) - Command shell for interactive computing
- [Spyder](https://www.spyder-ide.org/) -  Integrated Development Environment (IDE) for scientific programming in Python
- [JupyterLab](https://jupyterlab.readthedocs.io/) - Web-based user interface for Project Jupyter
- [VSCode](https://code.visualstudio.com/) - Code editor with support for Jupyter Notebooks
- [PyCharm](https://www.jetbrains.com/pycharm/) - One of the most popular Python IDEs

## Modality in the Notebook

The Jupyter Notebook has a modal user interface meaning that the keyboard has different functions depending on which mode the Notebook is in. 

The Jupyter Notebook has two modes:

- **Command mode (blue cell border)**
    * We can edit the whole notebook, but not type into individual cells
- **Edit mode (green cell border)**
    * We can type into the cell, like a normal text editor 

## Different types of cells 
A Jupyter Notebook can in principle show all types of output that can be displayed in a web browser.

- Code cells (for running your code)
- Markdown cells (useful for documenting your code with Markdown)
- Raw cells (not evaluated by the Notebook)

## Running code

- Run code by pressing Shift-Enter, or by clicking the "Play" on the toolbar, or you can use the drop-down menu Cell and select Run Cells
- Python code is being executed one line at a time, and also one cell at a time
- In a Python script it is very straight forward to see the order of execution by reading the script from top to bottom
- In a Notebook this can be a bit messy if cells are not executed in the correct order since we can run cells in an arbitrary order
- There is a function "Run All Cells" that will run everything from top to bottom 

## Code cells and comments

In [None]:
# This is a code-cell
# Everything after the # is a comment and is ignored by Python

#Lets run some Python code
print('Hello World.....?')

In [None]:
""" 
This is a comment that can extend over several lines.
The comment is ignored by Python, but the Jupyter Notebook prints it out
"""

In [None]:
%run files/hello_world.py

# This is a level 1 heading

### This is a level 3 heading

Add emphasis **bold** and __bold__, or *italic* and _italic_. 

* Sometimes we want to include lists. 
* Which can be bulleted using asterisks. 

[It is possible to include hyperlinks](https://www.example.com)

We can also include equations:
$$ \hat{\beta} = \frac{\sum_i x_iy_i}{\sum_i x_i^2}$$

## Relying on libraries and 3rd party packages 

* Pure Python can help us solve many problems but often we will need some additional and more advances functionality
* The Standard Python library gives us additional functionality, and is included when we install the basic python interpreter e.g.,
    - **[math](https://docs.python.org/3/library/math.html)** - Mathematical functions like `sqrt`, `log` and `sin` 
    - **[re](https://docs.python.org/3/library/re.html)** - Regular expression matching for working with text
    - **[os](https://docs.python.org/3/library/os.html)** - Operating system dependent functionality
    - **[sqlite3](https://docs.python.org/3/library/sqlite3.html)** - A lightweight disk-based SQL database
        
## We will use some powerful third party libraries 
If the functionality we need is not in the Standard library we can rely on 3rd party libraries.
 
- **[Numpy](https://numpy.org/)** - Numerical computing and linear algebra
- **[Pandas](https://pandas.pydata.org/)** - Excel like functionality with data frames
- **[Matplotlib](https://matplotlib.org/)** - Plotting and graphics
- **[Seaborn](https://seaborn.pydata.org/)** - Plotting and graphics

# 3. Examples of what we can do with Python

In [None]:
# Taking the square root of a number
import math
math.sqrt(16)

In [None]:
# Search for a specific term in a string
import re
txt = "How can we find all instances of the term 'in' in this sentence"
x = re.findall("in", txt)
x

In [None]:
# Interact with the operating sytem
import os

os.name # windows will return 'nt' while  mac and linux will return 'posix'

In [None]:
import sqlite3
con = sqlite3.connect('example.db')

## Numpy example

In [None]:
# Creating an array of numbers and calculating the mean of those numbers

# The as np portion of the code tells Python to give NumPy the alias of np. 
# This allows you to use NumPy functions by simply typing np. 
# The alias does not need to be np, but this is commonly used 
import numpy as np

In [None]:
a = np.array([2, 3, 4])
a.mean()

## Pandas example

In [None]:
# Importing an excel file into Pandas

# Pandas is commonly imported as pd
import pandas as pd

In [None]:
data = pd.read_excel('files/example_sales.xlsx')

In [None]:
data

## Matplotlib example

In [None]:
# Let's  plot the sales data imported by Pandas with Matplotlib 

import matplotlib.pyplot as plt # Matplotlib is commonly imported as plt

In [None]:
plt.plot(data['Year'], data['Sales'], linestyle='--', lw=5, color='blue')
plt.show() # We plot the year on the x-axis and the sales data on the y-axis

## Other usefull 3rd party libraries not covered in this course
- **[SciPy](https://scipy.org/)** -  Scientific and technical computing
- **[Scikit-Learn](https://scikit-learn.org/)** - Machine learning with various classification, regression and clustering algorithms
- **[NLTK](https://www.nltk.org/)** - Programs to work with human language data 
- **[BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/)** - Popular library for web crawling and data scraping
- **[TensorFlow](https://www.tensorflow.org/)** - Deep learning
- **[Numba](https://numba.pydata.org/)** and **[Cython](https://cython.org/)** - Translates Python code into fast machine code