# üö¢ Using Google Colab + Google Drive  
## A First Data Science Example (Titanic)

---

## Purpose of This Notebook

This notebook is designed to help you:

- Learn how to organize files using **Google Drive**
- Connect **Google Colab** with **Google Drive**
- Upload and load a dataset
- Run pre-written code successfully
- View and interpret data visualizations

---

‚ö†Ô∏è **Important**

You are **NOT expected to write code** in this notebook.  
Your task is to **run the cells** and **understand the outputs**.

---

## Step 0: Organize Your Google Drive (Very Important)

Before opening Google Colab, organize your Google Drive.

### 1. Open Google Drive
Go to: https://drive.google.com

### 2. Create a folder named: `colab`

### 3. Open the `colab` folder and create another folder named: `HW1` (Project folder)

---

## Step 1: Upload the Notebook and Dataset

Upload the file `HW1_Titanic.ipynb` and `titanic.csv` into the `HW1` folder.

Drag the files to the project folder.

Your folder should now contain:

- HW1_Titanic.ipynb  
- titanic.csv

---

## Step 2: Open the Notebook

Double-click `HW1_Titanic.ipynb` to open it in Google Colab.

---

## Step 3: Connect Google Drive to Colab

Run the following code cell:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Your folder structure should look like this:

```
content
‚îî‚îÄ‚îÄ drive
    ‚îî‚îÄ‚îÄ MyDrive
        ‚îî‚îÄ‚îÄ colab
            ‚îî‚îÄ‚îÄ HW1
                ‚îî‚îÄ‚îÄ HW1_Titanic.ipynb
                ‚îî‚îÄ‚îÄ titanic.csv
```

## Step 4: Naviaget to the project directory.

`cd` command changes the current working directory to our project directory `/content/drive/MyDrive/colab/HW1`

In [None]:
cd /content/drive/MyDrive/colab/HW1

Now, we should at the direcotry of HW1. Use `pwd` coomand to double check.
`pwd` command show the current working direcotry. You should see  `/content/drive/MyDrive/colab/HW1/`

In [None]:
pwd

`ls` command list all the folder and files in the working directory. You should see `HW1_Titanic.ipynb` and `titanic.csv`

In [None]:
ls

## Step 5: Load the Dataset

Run the following code cell:

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

file_path = "titanic.csv"
df = pd.read_csv(file_path)

df.head()

You should see the first few rows of the Titanic dataset.

---

## Step 6: Overall Survival on the Titanic

In the next code cell, we will create a bar chart showing:
- How many passengers survived
- How many passengers did not survive

Run the next code cell to generate the plot.

In [None]:
df["Survived"].value_counts().plot(kind="bar")
plt.xticks([0,1], ["Did not survive", "Survived"], rotation=0)
plt.title("Overall Survival Count")
plt.ylabel("Number of Passengers")
plt.show()

**‚úèÔ∏è Homework Question 1:**

What does this chart tell you about survival on the Titanic?

## Step 6: Survival by Gender

In the next code cell, we will create a bar chart showing survival rate by gender.

Run the next code cell.

In [None]:
df.groupby("Sex")["Survived"].mean().plot(kind="bar")
plt.title("Survival Rate by Gender")
plt.ylabel("Survival Rate")
plt.show()

‚úèÔ∏è **Homework Question 2:**  
Which gender had a higher survival rate?

## Step 7: Survival by Passenger Class

In the next code cell, we will create a bar chart showing survival rate by passenger class.

Run the following cell.

In [None]:
df.groupby("Pclass")["Survived"].mean().plot(kind="bar")
plt.title("Survival Rate by Passenger Class")
plt.xlabel("Passenger Class")
plt.ylabel("Survival Rate")
plt.show()

‚úèÔ∏è **Homework Question 3:**  
How does passenger class relate to survival?

## Step 8: Age Distribution and Survival
In the next code cell, we will compare the age distributions of:
- Passengers who survived  
- Passengers who did not survive

Run the cell below:

In [None]:
df[df["Survived"] == 1]["Age"].hist(alpha=0.5, bins=20, label="Survived")
df[df["Survived"] == 0]["Age"].hist(alpha=0.5, bins=20, label="Did not survive")

plt.legend()
plt.title("Age Distribution by Survival")
plt.xlabel("Age")
plt.ylabel("Count")
plt.show()

‚úèÔ∏è **Homework Question 4:**  
What differences do you observe between these two distributions?

## Step 9: Save Results Back to Google Drive

Now we will save analysis results back to Google Drive.

Run the following cell:

In [None]:
summary = df.groupby(["Sex", "Pclass"])["Survived"].mean()

output_path = "survival_summary.csv"
summary.to_csv(output_path)

summary

After running it:
- A new file named `survival_summary.csv` will be saved in your folder:
  `/content/drive/MyDrive/colab/HW1/`

## Step 10: Download this Notebook and survival_summary.csv and submit to canvas and answer the quesions on HW1 page on Canvas.
