# Intro to Python Programming for Data Science
## Dr Austin R Brown
## School of Data Science and Analytics
### Kennesaw State University

## Why Learn Python?  
- You may be asking yourself, out of all of the possible programming languages which exist, why should I spend time learning Python?

- Great question!

- Python is a useful tool and worthwhile to learn for several reasons:
    1. It's free!
    2. Because it's open source, thousands of people have contributed packages and functions at a pace that proprietary softwares can't compete with
    3. It is a very flexible and robust general programming language, meaning there's a lot you can do with it in the data science space and beyond!
    4. It has become basically the standard in industry

## So What is Python?

- Python is command-line, object-oriented general programming language commonly used for data analysis, data science and statistics.

- **Command-line** means that we have to give it commands in order for us to get it to do something. For example:

In [9]:
## What is the sum of 2 & 3? ##
2 + 3

5

**Object-oriented** means that we can save individual pieces of output as some name that we can use later. This is a super handy feature, especially when you have complicated scripts! For example:

In [10]:
## Save 2 + 3 as "a" ##
a = 2 + 3
print(a)
print(a*2)

5
10


## What Can Python do?

- What can Python do? Well, for the purpose of data analytics, I have yet to find a limit!

- In this class, we will be learning how to use Python as a tool in the data science workflow with specific attention placed on designing and evaluating experiments (more on that in the first week's lecture!)

- What is the data science workflow? Let's take a look!

![From R for Data Science 2nd Edition](Data%20Science%20Workflow.png)

## Importing Data/Data Loading

- Since a major reason we use Python is for the analysis of data, we need to know how to import/load data from various sources and file formats into our Python programming environment.

- There are a variety of ways of importing data into our Python programming environment, which largely depend on the type of datafile that you are importing (e.g., Excel file, CSV file, text file, SAS dataset, SPSS dataset, etc.).

- While there are lots of different files which can be imported into our Python programming environment (Google/GenAI is an excellent resource for searching for code for how to start to do something), we're going to focus on two main types: Excel and CSV

- For example, let's try importing a CSV file using Python. This file is part of the famous Framingham Heart Study and is called `HEART.csv` and is located in the `Python for Data Science` subfolder that's part of our class GitHub repo.

- To read in this CSV file using Python, we will use the `read_csv` function, which is part of the famous `pandas` package.

### Defining Packages and Functions

- Okay, but before we get into reading in the `HEART` CSV file, what in the world is a package and function??

- We can think of packages like toolboxes in a mechanic's shop. Each toolbox contains different tools used for specific purposes.

- To access a particular tool, we have to go to the right toolbox.
    - A toolbox is like a package
    - The tools within the toolbox are like functions within a package

- Thus, `read_csv` is a tool (function) within the `pandas` toolbox (package).

- A function can also be thought of like a mathematical function: we provide some input and some specific output is returned. Now, while our Python programming enviroment comes with some functions pre-loaded, almost all others in existence have to be installed from the web, including `pandas`.

- To install a Python package, we have to use a particular command line function called `pip` which is a recursive acroymn for "pip installs packages". 
    - In brief, `pip` is a package management system used to install and manage software packages written in Python.

- Since we need `pandas` to load the `HEART.csv` file, we can install by using:

In [None]:
## Install pandas using pip ##
%pip install pandas;

- Now, we can load the `pandas` library into our current Python environment by using the `import` function. Note, to access the functions within a package, we have to use the following code syntax: `package_name.function_name`

- So typically when we import a package, we shorten its name to something to allow for brevity in our code.
    - `pandas` is almost universally imported as `pd`

In [None]:
## Load pandas library ##
import pandas as pd

## Load HEART CSV ##
heart = pd.read_csv("HEART.csv")

ModuleNotFoundError: No module named 'pandas'