# Discussion 01: Python Basics and Causality


Welcome to Discussion 01! This week, we will go over some Python Basics. You can find additional help on these topics in the course [textbook](https://eldridgejm.github.io/dive_into_data_science/front.html).

Additionally, [here](https://ucsd-ets.github.io/dsc10-2020-fa/published/default/reference/babypandas-reference.pdf) is a potentially useful reference sheet that contains several data wrangling tips.

I also highly recommend checking out [this](https://nationalzoo.si.edu/webcams/panda-cam) baby pandas resource as well.

<img src="data/panda.jpeg" width="600">

Afterward, we will be talking about how to __establish causation__.

In [None]:
# please don't change this cell, but do make sure to run it
import babypandas as bpd
import matplotlib.pyplot as plt
import numpy as np 
import math
import otter
grader = otter.Notebook()

from notebook.services.config import ConfigManager
cm = ConfigManager()
cm.update(
    "livereveal", {
        "width": "90%",
        "height": "90%",
        "scroll": True,
})

## Jupyter Notebook Shortcuts

shift+enter: run cell and move focus to cell below <br>
ctl+enter: run cell and keep focus on cell <br>

Command Mode (cell is blue):<br>
x: cut the cell, also quick way to delete<br>
c: copy the cell<br>
v: paste the cell<br>
d+d: delete cell<br>
a: make new cell above<br>
b: make new cell below<br>
y: change cell to code<br>
m: change cell to markdown<br>
enter: start editing cell<br>

Editing Mode (cell is green):<br>
esc: enter command mode<br>
shift+tab: info about a function<br>

# What we'll cover:
---

- What is Python?
- Primitive Types

# What is Python?
---

Python is a **high-level**, **interpreted** programming language invented by Guido Van Rossum in 1991.  It is a powerful language while remaining **dynamically-typed**, easily **readable**, and has plenty of **whitespace**.

- Interpreted:
  - A file or cell can run instantly; does not need to compile to another file

- Dynamically Typed:
  - Python infers what type you want a variable to be; you don't tell it explicitly

- Readable:
  - Simply reading code aloud should largely reveal what's going on

- Whitespace:
  - You can *and should* use multiple lines to fit the `Python a e s t h e t i c`

# Data Types in Python
---

Everything in Python has a type.

Some things are really simple—you could call them *"primitive"*.  
These things have a specific value.

There are four types of primitives:
- Integers (ex. 1, 2, -12)
- Floats (ex. 1.0, 3.5, -0.34)
- Strings ("this is a string", "a", "b")
- Booleans (True, False)

Other things are a bit more complex.  
These things act more like containers for values (or more containers).

Some examples include:
- Lists
- Arrays
- Tables
- Dictionaries
- Sets

We will cover these next week

### Primitive Types: integers, floats, strings, booleans

In [None]:
# Integers
type(65)

In [None]:
# Floats
type(1.0)

In [None]:
# Strings
type("Hello")

In [None]:
# Booleans (True or False)
type(False) 

What are some things we can do with these primitive types?

In [None]:
# Let's do some testing together... here's a couple to start with:

3 + 5 # Can we do this?

In [None]:
3 + 5.9876 # What about this?

In [None]:
# How about this?
# 3 + "string"

In [None]:
# or this?
"string" + "another string"

In [None]:
# Feel free to play around with different types and see what else is possible!

# Causation

## The story of John Snow

In lecture, we started the story of John Snow ([chapter 2](https://www.inferentialthinking.com/chapters/02/causality-and-experiments.html)):

- What would Snow need to do to prove his theory that the water was responsible for causing cholera? 
- Would this be ethical?

## Observational Study vs. Randomized Controlled Trials

In __observational studies, the treatment and control groups are observed rather than assigned__. Therefore we are not changing the behavior of the people in the study, only observing the way they behave. However in __randomized controlled trials, the subjects are randomly assigned to either the control or the treatment group__. For example, we randomly assign half of the subjects to take medicine, and half the subjects to take a sugar pill (placebo). 


## Causality in Observational Studies

In an observational study, if the treatment and control groups differ in ways other than the treatment, it is difficult to make conclusions about causality. This is because these differences between the groups make it unclear if there are **confounding factors** or not. 
- **Confounding Factor**- the underlying difference between your two groups (other than the treatment) 

## Correlation and Causation Examples

- Increase in Ice cream sales and More Sunburns
- Increase in Covid Cases and Fewer Pregnancies
- Restaurant Closure Increases and Weightloss

## Discuss in Breakout Rooms

### Question 1

Suppose you want to see how being quarentined is related to married couples' happiness. In an [article](https://www.bloomberg.com/news/articles/2021-01-05/divorces-and-marriages-tumbled-in-u-s-during-covid-study-shows) from Bloomberg Wealth, they stated that divorce rates in the US have declined since lockdowns started. <br>

Can we state that lockdown has **caused** these married couples to be happier? <br>

If not, what are some counfounding causes to the link?

### Question 2

A new digital health study found that people who used health tracking devices had a relatively good year in 2020 despite lockdowns and limited social schedules. People who used fitness monitors from Withings got more sleep and were more successful with weight loss goals compared to 2019. 

The company found that people got about 10 minutes more sleep each night and were slightly more successful at hitting weight loss goals, despite an overall drop in physical activity.

Withings conducted the study based on anonymous, aggregated data from a pool of 5 million users of Withings devices, including a smart scale, hybrid smartwatches, and smart thermometers. The company said it took steps to avoid re-identification of the data. ([article](https://www.techrepublic.com/article/study-people-got-more-sleep-not-less-in-2020/)) <br>

What is the sample in this study? <br>

What kind of study is this? <br>

Are there clear treatment and control groups here? <br>

Are there any clear causal relationships we can make here? If yes, explain them. If not, explain why not. <br>