**SA433 &#x25aa; Data Wrangling and Visualization &#x25aa; Fall 2025**

# Lesson 1. A Survival Course in JupyterLab and Python

❗️ This notebook assumes that you're using JupyterLab 4.0.

## What is JupyterLab?

* [__JupyterLab__](https://jupyter.org/) is an interactive computational environment where you can combine code, text and graphs in __notebooks__


* You're looking at a Jupyter notebook right now!


* We'll be using JupyterLab with the __Python__ programming language in this course

## Structure of a notebook document

* A notebook consists of a sequence of __cells__ of different types

* We'll use two types of cells frequently:
    * code cells
    * Markdown cells

* You can determine the type of a cell in the toolbar

* You can run a cell by:
    * clicking the <kbd><i class="fa fa-play" aria-hidden="true"></i></kbd> button in the toolbar
    * selecting __Run &#8594; Run Selected Cell__ in the menu bar
    * pressing <kbd>Shift</kbd>-<kbd>Enter</kbd>

## Code cells

* In a __code cell__, you can edit and write Python code
    * We'll talk about Python shortly

* For now, we can use a code cell as a fancy calculator


* For example, in the code cell below, let's compute
$$ \frac{2^{5} - 368}{23 + 18} $$

* Note that a code cell has 
    * an __input__ section containing your code
    * an __output__ section after executing the cell

## Markdown cells

* In a __Markdown cell__, you can enter text to write notes about your code and document your workflow


* For example, this cell is a Markdown cell


* The __Markdown__ language is a popular way to provide formatting (e.g. bold, italics, lists) to plain text
    * Use Google to find documentation and tutorials. [Here's a pretty good cheat sheet.](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

* For now, here are a few basic, useful Markdown constructs:

```
You can format text as italic with *asterisks* or _underscores_.

You can format text as bold with **double asterisks** or __double underscores__.

To write an bulleted list, use *, -, or + as bullets, like this:

* One
* Two
* Three
```


* To edit a Markdown cell, double-click it 


* When you're done editing it, run the cell


* Try it in the cell below:

*Double-click to edit this cell. Try out Markdown here.*

## Manipulating cells

* You can insert a new cell by clicking the <kbd>+</kbd> button in the toolbar


* You can copy and paste cells using the __Edit__ menu


* You can also split, merge, move, and delete cells using the __Edit__ menu


* You can also move cells around with your mouse: click on the left part of a cell and drag it to where you want it

## Saving your notebook

* JupyterLab autosaves your notebook every few minutes
    * Check __Settings &#8594; Autosave Documents__ to make sure this setting is turned on

* To manually save, click the <kbd><i class="fa fa-floppy-o"></i></kbd> icon, or select __File &#8594; Save Notebook__


* When you're done, you can shut down JupyterLab by selecting __File &#8594; Shut Down__

## Moving on...

* We'll go over some other features of JupyterLab later


* [The official documentation is here](https://jupyterlab.readthedocs.io/)


* There are many resources out there on using Jupyter notebooks &mdash; Google is your friend!

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## What is Python?

* __Python__ is a *free*, open-source, general-purpose programming language

* Python is popular and used everywhere &mdash; a few examples:
    - [Automation, monitoring, data science, and more at Netflix](https://medium.com/netflix-techblog/python-at-netflix-86b6028b3b3e)
    - [Analysis of historical texts](https://digitalorientalist.com/2019/07/01/making-a-basic-textual-analysis-program-in-python/)

* Python is "beautiful": its syntax was designed with an emphasis on readability


* Python has become the language of choice for data science and machine learning


* _Side note._ Even if you're familiar with other programming languages, having exposure to multiple programming languages will be very useful to you as a {data scientist, operations researcher, quantitative analyst, statistician, economist, etc.}

## Python basics

* In this lesson, we will review some basic Python concepts


* There is a wealth of information on Python on the web!


* [Here is the documentation for the latest version of Python](https://docs.python.org/)

## Fancy calculator

* You can define a variable using the `=` sign


* You can perform arithmetic operations on variables


* You can print the value of a variable using the `print()` function

* For example, in the cell below, let's:
    - define two variables representing the dimensions of a rectangle
    - compute the area of this rectangle
    - print the area of this rectangle

* Don't forget to run the cell when you're done!    

* If you try to access a variable you haven't yet defined, Python will complain


* For example, if we try to print the value of the variable `volume`:

In [None]:
print(volume)

* Note that variable definitions are persistent across cells

* For example, in the cell below, let's:
    - define the height of a box
    - compute the volume of this box, using the length and width we defined in the cell above
    - print the volume of this box

* Note that the __prompt numbers__ next to the code cells (e.g. `[3]`) indicate which cells have been run and *in which order*


* This is very useful, especially if you are running cells out-of-sequence

## Hello, world!

* __Strings__ are sequences of printable characters defined using either double quotes or single quotes


* Just like with variables, to print a string, you can use the `print()` function, like this:

* To insert a variable's value into a string:
    - place the letter `f` immediately before the opening quotation mark
    - put braces `{}` around the names of any variables you want to use inside the string

* These strings are called __f-strings__

* For example, in the cell below, let's:
    - define a variable for your neighbor's name
    - print a sentence containing your neighbor's name

* More generally, f-strings can also be used to insert the value of a Python expression, like this:

In [None]:
print(f'Two plus two is equal to {2 + 2}.')

* Strings, like many **objects** in Python, have associated **methods** that do something to the string


* For example, consider the following example string:

In [None]:
example = 'the quick brown fox jumped over the lazy dog.'

* We can apply the `.upper()` method to return the same string, but in all uppercase:

* Similarly, we can apply the `.title()` method to return the same string, but in title case:

* We can replace certain phrases with another specified phrase using the `.replace()` method:

## Lists

* A __list__ is a collection of items that are organized in a particular order


* You can think of a list as an array or a vector


* A list is written as a sequence of comma-separated items between square brackets, like this:

In [None]:
# Define a list containing the first 5 square numbers
squares = [0, 1, 4, 9, 16]

# Define a list containing the days of the week
days_of_the_week = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']

* To get the first item in `days_of_the_week`, we would write

    ```python
    days_of_the_week[0]
    ```

* __In Python, indexing (that is, counting) starts at 0!__


* So, we can get the third day of the week, like this:

- We can also grab consecutive items of a list through **slicing** &mdash; the code below grabs the first 4 items in `days_of_the_week`:

    ```python
    days_of_the_week[0:4]
    ```

* So, we can get the last 3 days of the week, like this:

* We can add items to the end of a list using the `.append()` method


* We can also print lists just like any other variable


* Let's add the 6th squared number to the list `squares`:

* Let's check our work by printing the list:

* We can determine the length of a list using the `len()` function


* For example, we can get the length of the list `days_of_the_week`, like this:

## Dictionaries

* A __dictionary__ is another way to organize a collection of items

* A dictionary maps __keys__ to __values__.
    - Just like a real-world dictionary maps *words* to *definitions*

* We can create a dictionary by starting with an empty dictionary and adding key-value pairs, like this:

* You can also print dictionaries just like any other variable

* We can also create a dictionary by specifying the key-value pairs directly between braces `{}`:

* Similar to a list, we can use a key to look up the corresponding value in a dictionary, like this:

## Loops and nesting

* We can iterate through lists using a `for` statement


* For example, consider the following list, containing strings representing the months of the year:

In [None]:
months_of_the_year = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Nov", "Dec"]

* We can print each string in this list, like this: 

* Python defines blocks of code using a __colon `:`__ followed by __indentation__


* The code we wrote above is NOT the same as

    ```python
    for month in months_of_the_year:
    print(month)
    ```

* Always use the __Tab__ key to indent &mdash; this will keep your indentation consistent

## If this, then that

* The `==` operator performs __equality testing__: 
    - If the two items on either side of `==` are equal, then it returns `True`
    - Otherwise, it returns `False`

* For example, let's define the variable `today` to be the string `'Tuesday'`:    

In [None]:
today = 'Tuesday'

* We can check if the variable `today` is equal to `'Tuesday'`, like this:

* Let's see what happens when we check if the variable `today` is equal to `'Friday'`:

* Conditional statements are written using the same block/indentation structure as `for` statements, using the keywords `if`, `elif`, and `else`

In [None]:
# Today is...
today = 'Tuesday'

# What should I do?
if today == 'Friday':
    print('Go out.')
elif today == 'Saturday':
    print('Have fun.')
else:
    print('Study.')

* Other types of comparisons:

| Comparison | Meaning |
| :----------- | :-------- |
| `==`         | equal  |
| `!=`         | not equal |
| `<`          | less than  |
| `>`          | greater than |
| `<=`         | less than or equal |
| `>=`         | greater than or equal |

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Useful JupyterLab features

### Line numbers

* Put line numbers in code cells by selecting __View &#8594; Show Line Numbers__

### Indenting multiple lines

* Highlight the lines you want to indent, and then press <kbd>Tab</kbd>


* If you want to de-indent them (i.e., indent them to the left), press <kbd>Shift</kbd>+<kbd>Tab</kbd>

In [None]:
# Play around with indenting and de-indenting code.
# Read the code in this cell. Make sure you understand what it does!
student_names = ['Amy', 'Bob', 'Carol']
for name in student_names:
    print(f'The name of this student is {name}.')

english_words = ['home', 'navy']
spanish = {'home': 'casa', 'navy': 'armada', 'blue': 'azul'}

for word in english_words:
    print(f"The Spanish word for {word} is {spanish[word]}.")

### Running multiple cells

* You can run all the cells in a notebook by selecting __Run &#8594; Run All Cells__


* You can run all the cells above the current cell by selecting __Run &#8594; Run All Above Selected Cell__


* You can run the current cell and all below by selecting __Run &#8594; Run Selected Cell and All Below__

### Clearing the output of code cells

* You can clear the output of a code cell by selecting __Edit &#8594; Clear Cell Output__


* You can clear the output of all code cells by selecting __Edit &#8594; Clear Outputs of All Cells__