# Chapter 00 - Getting Started with Python and Jupyter Notebooks

In some applications, it's more common to work with Python scripts, we're going to work with a Jupyter Notebook instead. Jupyter notebooks are more interactive, because you can run each line of code manually, and immediately see the result, before moving onto the next line of code. Python scripts are files with self-contained lines of code that are meant to be run automatically without any manual intervention - then you can see all the results at the end. Jupyter Notebooks are just a special type of file that let's you write and run lines of Python code AND write notes about your code or results.  A Jupyter notebook is made up of (mainly) two type of **cells**: **code cells** and **Markdown cells** - which are just blocks of either code or text.  

### Part 1: Running Code

The cell below is our first code cell - we can only run code from code cells. To run the code below, you can click the triangle symbol in the bar at the top of the window. Or you can click the grey part of the cell such that your cursor appears in the grey box and then hit Ctrl+Enter on PC or Cmd+Enter on Mac. Try it with the code cell below!

In [None]:
2 + 2

### Part 2: Installing Modules
Most of the commonly used Python Modules (sometimes called packages) are already installed including `numpy`, `pandas`, `scikit-learn`, etc. One extra module we'll need throughout this course is the `ISLP` package which provides all datasets used in the textbook and more.  Since we're going to use these datasets in this course, we need to install this package. Since we will need a few other packages, we will install them all at once using the requirements text file.  

In [None]:
# Install necessary packages with a requirements.txt file (may take a few minutes)
!pip install -r requirements.txt

### Part 3: Importing Modules
Once a module is installed, we cannot use it until it is imported. To import a module, we use the following structure: `import package_name`.  To import the `ISLP` package, we use the following code.

In [None]:
# Import the ISLP Module
import ISLP

Another module we'll use later in the course to read in and work with data is the `pandas` module.   This is a very widely used module, especially for data scientists.  This package is typically aliased under the name `pd`.  You can think of this as a nickname or shortcut so that you don't have to type out `pandas` every time you use the package. To alias packages, we use the following structure: `import package_name as pd`.  We do this with the `pandas` package below.

In [None]:
# Import the pandas package and alias it as pd
import pandas as pd

### Load Data from the `ISLP` Package
To load data from a package and store it in the variable named `df`, we use the following structure: `df = package_name.load_data("dataset_name")`. Let's try this with the "OJ" dataset from the `ISLP` package. We'll use this data later in the course.

In [None]:
# load OJ data
df = ISLP.load_data("OJ")

To get a preview of the data, we can use `head()`.  This will print the first 5 rows of our data so we can get an idea of what variables we have,

In [None]:
# Preview data
df.head()

In practice, your data will likely not come from Python packages.  Instead, you'll want to read in data from an separate file, like a .csv file.  We will not cover reading in data files in this course. See the following [LinkedIn Learning](https://www.linkedin.com/learning/pandas-essential-training/using-read-csv?resume=false&u=57888345) video for more information.