# Worksheet: Introduction to GitHub & Version Control in GitHub Codespaces

**Audience:** First-year Data Science students

## Overview
This worksheet introduces the fundamentals of using **GitHub** for version control and shows you how to get started quickly in **GitHub Codespaces**. You will learn how to create a repository, make changes, commit them, create branches, and merge your changes back into the main project. By the end of this worksheet, you will have a basic understanding of how GitHub helps you collaborate on data science projects in a cloud-based development environment.

## 1. What is GitHub?
GitHub is a popular web-based platform for hosting **Git** repositories. It provides tools for:

- **Version control**: Tracking changes to your code and data files.
- **Collaboration**: Easily working with others, reviewing changes, and discussing ideas.
- **Continuous integration and deployment**: Automating tasks such as testing and building.

For data science, GitHub is widely used to store and share code, analyses, and documentation in a structured and organized manner.

## 2. What is GitHub Codespaces?
**GitHub Codespaces** is a cloud-hosted development environment integrated with GitHub. It allows you to:

- Develop directly in your browser or via supported editors like VS Code.
- Collaborate instantly without installing dependencies locally.
- Use a virtual machine with pre-configured environments for Python, R, or other data science tools.

## 3. Getting Started with Codespaces

### 3.1 Prerequisites
- A **GitHub account** (free or paid).
- Access to **GitHub Codespaces** (your organization or personal account must have it enabled).

### 3.2 Steps to Launch a Codespace
1. **Create or open a repository on GitHub** (can be your personal repo or from an organization).
2. Click on the green **“Code”** button, then select **“Codespaces.”**
3. If prompted, choose to create a **new codespace**. You may configure resources (CPU, RAM, etc.) depending on your quota or organizational settings.
4. Wait for the codespace to launch. You’ll be dropped into a web-based VS Code environment.

## 4. Hands-On Exercises

### Exercise 1: Create a New GitHub Repository
1. Go to [GitHub.com](https://github.com) and click on **“+”** (top-right), then select **“New repository.”**
2. Name your repository (e.g., **intro-to-git**).
3. Choose whether it’s **Public** or **Private**.
4. Click **“Create repository.”**
5. You will now see your **new repository** page.

> **Note:** If you already have a repository you want to use, feel free to skip this step.

### Exercise 2: Open the Repository in a Codespace
1. From your new repository’s page, click on the green **“Code”** button.
2. Select **“Codespaces”** → **“Create codespace on main.”**
3. Wait for the codespace to set up. After a few moments, a web-based VS Code editor will open in your browser.

### Exercise 3: Make Your First Commit
1. In the **Explorer** tab (left sidebar), locate the **README.md** file. If it doesn’t exist, create it:
   - Right-click in the explorer pane → **“New File”** → name it `README.md`.
2. Open **README.md** and add some text:
   ```markdown
   # Intro to GitHub
   This repository is used for learning basic GitHub version control in Codespaces.
   ```
3. Save the file (`Ctrl+S` or `Cmd+S`).
4. In the **Source Control** tab (also on the left sidebar, icon with three branches), you will see your changes listed under **Changes**.
5. Enter a **commit message** like `Add README.md` and press **Commit** (the checkmark button).

### Exercise 4: Push Your Changes
1. After committing, click on **…** in the Source Control tab or open the command palette (`Ctrl+Shift+P` or `Cmd+Shift+P`) and search for **“Push.”**
2. Confirm to **push** your changes to the remote repository on GitHub.
3. Once pushed, go back to your GitHub repo page in a browser, refresh, and verify that the `README.md` file is updated.

### Exercise 5: Create a New Branch and Merge
**Branches** allow you to develop features in isolation without affecting the main version of your project.

1. From your codespace, open the **Source Control** panel or **Command Palette** and create a new branch named `feature-update`.
2. Modify your `README.md` file:
   ```markdown
   # Intro to GitHub
   This repository is used for learning basic GitHub version control in Codespaces.
   
   ## New Feature
   Added a new section while working on a separate branch.
   ```
3. **Commit** your changes on `feature-update` with a message like `Add new feature section`.
4. **Push** your branch to the remote repository.
5. In GitHub (browser), you’ll see a notice about recently pushed branches. Click **“Compare & pull request”** to open a **pull request**.
6. Review your changes, then click **“Create pull request.”**
7. Once merged, your changes will be integrated into the **main** branch.

> **Note:** In real projects, you might request a teammate to review your pull request before merging.

## 5. Wrap-Up and Best Practices
- **Frequent Commits:** Commit small, logical changes. This makes it easier to track what happened and to revert if needed.
- **Meaningful Messages:** Write clear commit messages to explain *why* you made the changes.
- **Branching Strategy:** Use branches for experimental or new features to keep the main branch stable.
- **Pull Requests:** Review code changes, discuss potential issues, and merge when ready.

## 6. Further Exploration
- **GitHub Documentation**: [docs.github.com](https://docs.github.com/)
- **Git Concepts**: Branching, merging, conflict resolution, stash, cherry-pick, etc.
- **Integration with Data Science Tools**: Notebooks (Jupyter, etc.), containerized environments (Docker), CI/CD for data pipelines, etc.

# Conclusion
You have successfully:

1. Created a repository on GitHub.
2. Launched a cloud-based development environment with GitHub Codespaces.
3. Performed basic Git operations: commit, push, branching, and merging.

These skills form the foundation for **collaborative data science projects**. Continue to practice and build confidence in version control workflows, as they are essential for professional development and teamwork in the data science field.

> **Happy Coding & Version Controlling!**