# OUBIC Bioinformatics Workshop

## Session 1 (Morning) — Colab Basics + Python Fundamentals  
### Hello, World!

- **Last updated:** 2025-06-29  
- **Author:** Shotaro Yamasaki (Osaka University, OUBIC, RNA Informatics)

> This material was authored by Shotaro Yamasaki. ChatGPT (OpenAI) was used to help refine structure, wording, and code; the author is responsible for verification and final judgment.

---

### Before You Start

#### 1) Save a Copy to Your Google Drive
1. Open **File → Save a copy in Drive**.  
2. A copy will appear in your **Colab Notebooks** folder. Move it wherever you like.  
3. **Work on the copy**, not on the original.

#### 2) Show the Table of Contents
Click the **Table of contents** icon on the left. A navigation pane appears so you can jump between sections — **recommended**.


---
## Purpose of this Colab Notebook

This notebook is meant to show that **Google Colab is easy and safe** to try.  
Run Python in the browser, see results instantly, and learn without installing anything.


---
## License and Terms of Use

Provided under **Creative Commons Attribution–NonCommercial 4.0 (CC BY‑NC 4.0)**.  
See: https://creativecommons.org/licenses/by-nc/4.0/

- **Allowed:** use, modify, and share for **education and research**.  
- **Not allowed:** **commercial use** (paid courses/services).  
- No author citation is required for results produced with these scripts.  
- Generic Python syntax and library usage may be reused freely.  
- If you reuse the unique structure/teaching flow, add a credit like:  
  `Based on code by Shotaro Yamasaki (Osaka University, OUBIC)`  
- When redistributing this file, **keep these notes** unchanged.


---
## Notes and Disclaimers

- These scripts assume **Google Colab**; other environments aren’t guaranteed.  
- No support is provided for environments outside Colab.  
- Be careful with **personal/sensitive data** (Colab runs in the cloud).


---
---
# 🔰 How to Use Google Colab (Ultra‑Beginner)

## Running a Cell
- Click **▶** or press **Shift+Enter** to run and move to the next cell, or **Ctrl/Cmd+Enter** to run in place.
- To interrupt a long run: click **■ Stop** or use **Runtime → Interrupt execution**.
- To recover from issues: **Runtime → Restart runtime**.

## Editing a Cell
- Two modes: **Edit** (cursor inside) and **Command** (blue border).
- Add **Code** or **Text (Markdown)** cells from the toolbar.
- To show **line numbers**: cell’s ⋮ menu → **Show line numbers**.

## Hardware and Runtime
- Switch to **GPU/TPU** via **Runtime → Change runtime type**.

## Saving Files
- Memory is temporary. Export to files (e.g., CSV) or save to **Google Drive**.
- Mount Drive when needed:
```python
from google.colab import drive
drive.mount('/content/drive')
```


---
---
# First: Hello, World!

1. Insert a **Code** cell.  
2. Type:
```python
print("Hello, World!")
```
3. Run it. You should see:
```
Hello, World!
```

Now change the message and run again!


---
---
# ✨ Let’s Try a Simple Analysis!

The next cell asks you to type random text (letters/numbers). The code will:
- Count characters and build simple frequency tables
- Plot quick visualizations
- Compare with your previous run

**How to use**
1. Click the cell below.  
2. Press **▶ Run** and follow the prompts.


In [None]:
# @title
import collections

import pandas as pd
from scipy.stats import pearsonr, chi2_contingency
import matplotlib.pyplot as plt


# ------------------------------------------------------------------------------
# Save previous input if available(if 'text' is already defined and valid)
try:
    if 'text' in locals() and len(text) >= 50 and all(ord(c) < 128 for c in text):
        prev_text = text
        prev_counter = collections.Counter(prev_text)
        prev_total = sum(prev_counter.values())
        prev_rel_freq = {k: v / prev_total for k, v in prev_counter.items()}
    else:
        prev_text = None
except:
    prev_text = None

# ------------------------------------------------------------------------------
# User input
while True:
    text = input("Type at least 50 random half-width alphanumeric characters (e.g., qawsedrftgyhujikol...):\n")

    # Check conditions
    if not text:
        print("⚠ Input is empty.")
    elif len(text) < 50:
        print(f"⚠ Input is too short ({len(text)} characters). Please enter at least 50 characters.")
    elif not all(ord(c) < 128 for c in text):
        print("⚠ Please enter ASCII (half-width) characters only. Full-width characters were detected.")
    else:
        break  # Finish when all conditions are satisfied

# ------------------------------------------------------------------------------
# Calculate frequency and order of appearance
counter = collections.Counter(text)
total = sum(counter.values())
rel_freq = {k: v / total for k, v in counter.items()}

first_index = {}
for i, char in enumerate(text):
    if char not in first_index:
        first_index[char] = i

# ------------------------------------------------------------------------------
# Create DataFrame
all_chars = set(counter.keys())
if prev_text:
    all_chars.update(prev_rel_freq.keys())

df = pd.DataFrame([
    {
        'Character': char,
        'Current Freq': rel_freq.get(char, 0),
        'First Index': first_index.get(char, None),
        **({'Previous Freq': prev_rel_freq.get(char, 0)} if prev_text else {})
    }
    for char in sorted(all_chars)
])
df = df.sort_values(by='Current Freq', ascending=False).reset_index(drop=True)

# Display
display(df)

# ------------------------------------------------------------------------------
# Bar chart
fig, ax = plt.subplots(figsize=(12, 5))
ax.bar(df['Character'], df['Current Freq'], label='Current', alpha=0.7)
if prev_text:
    ax.bar(df['Character'], df['Previous Freq'], label='Previous', alpha=0.5)
ax.set_xlabel('Character')
ax.set_ylabel('Relative Frequency')
ax.set_title('Character Frequency')
ax.grid(True, axis='y', linestyle='--', alpha=0.5)
if prev_text:
    ax.legend()
plt.show()

# ------------------------------------------------------------------------------
# Scatter plot (order of appearance vs relative frequency)
fig, ax = plt.subplots(figsize=(6, 6))
ax.scatter(df['First Index'], df['Current Freq'], c='blue')
ax.set_xlabel('First Appearance Index')
ax.set_ylabel('Relative Frequency')
ax.set_title('Character Frequency vs. First Appearance')
ax.grid(True, linestyle='--', alpha=0.5)
plt.show()

# Correlation coefficient and test of independence (order vs relative frequency)
valid_corr = df[['First Index', 'Current Freq']].dropna()

if len(valid_corr) >= 2:  # At least 2 points are required to compute correlation
    r_val, p_val = pearsonr(valid_corr['First Index'], valid_corr['Current Freq'])

    print(f"📈 Pearson correlation: r = {r_val:.3f}, p = {p_val:.4f}")
    if p_val < 0.05:
        print("➡ There is a significant correlation between order of appearance and relative frequency (p < 0.05).")
    else:
        print("➡ No significant correlation between order of appearance and relative frequency was observed (p ≥ 0.05).")
else:
    print("⚠ Not enough data to compute the correlation coefficient.")

# ------------------------------------------------------------------------------
# Compare previous and current relative frequencies (scatter plot)
if prev_text:
    fig, ax = plt.subplots(figsize=(6, 6))

    # Plot scatter plot
    ax.scatter(
        df['Previous Freq'],
        df['Current Freq'],
        color='blue'
    )

    # Diagonal line (points fall here if frequencies are equal)
    ax.plot(
        [0, max(df['Previous Freq'].max(), df['Current Freq'].max())],
        [0, max(df['Previous Freq'].max(), df['Current Freq'].max())],
        linestyle='--', color='gray', linewidth=1
    )

    ax.set_xlabel('Previous Relative Frequency')
    ax.set_ylabel('Current Relative Frequency')
    ax.set_title('Current vs Previous Character Frequency')
    ax.grid(True, linestyle='--', alpha=0.5)
    plt.show()

    # Counts per character (previous + current)
    combined_counts = {
        c: prev_counter.get(c, 0) + counter.get(c, 0)
        for c in all_chars
    }

    # Merge characters below the threshold into 'Other' (e.g., total < 5)
    threshold = 5
    filtered_chars = [c for c in all_chars if combined_counts[c] >= threshold]
    other_chars = [c for c in all_chars if combined_counts[c] < threshold]

    # Rebuild counts (including merged categories)
    def count_with_other(counter_dict):
        counts = [counter_dict.get(c, 0) for c in filtered_chars]
        other_count = sum(counter_dict.get(c, 0) for c in other_chars)
        return counts + [other_count]

    # Create vectors of counts for previous and current runs
    prev_counts = count_with_other(prev_counter)
    curr_counts = count_with_other(counter)

    # 2-row by n-column contingency table
    contingency_table = [prev_counts, curr_counts]

    # Chi-squared test
    chi2, p, dof, expected = chi2_contingency(contingency_table)

    # Display results
    print(f"📊 Chi-squared test: χ² = {chi2:.3f}, p = {p:.4f}, dof = {dof}")
    if p < 0.05:
        print("➡ A significant difference exists between the previous and current appearance frequencies (p < 0.05).")
    else:
        print("➡ No significant difference between the previous and current appearance frequencies was observed (p ≥ 0.05).")


---
---
# 🛠️ Let’s Try a Simple Modification!

Try editing the code to change its behavior.

**Ideas**
- Change the input text
- Count only certain character types
- Add a new metric (e.g., uppercase or whitespace counts)
- Adjust plot labels and titles

> If you run into errors, you can always restart the runtime and try again.


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import random

# ↓ If you want Japanese labels, uncomment the next two lines
# !pip install japanize-matplotlib
# import japanize_matplotlib

# ------------------------------------------------------------------------------
# Prepare waveform data
x = np.linspace(0, 10, 100)  # Generate values for x
y1 = np.sin(x)  # Basic sine wave
y2 = np.cos(x)  # Cosine wave
y3 = np.sin(2 * x)  # Short‑period sine wave (← Modification point: larger 2 = denser, smaller 2 = smoother)
y4 = np.exp(-0.3 * x) * np.sin(3 * x)  # Damped oscillation (← Modification point: try adjusting -0.3 and 3)

# ------------------------------------------------------------------------------
# Prepare star data
n_stars = 20  # Number of stars (← Modification point: try changing the number)
star_x = np.random.uniform(0, 10, n_stars)
star_y = np.random.uniform(-2, 2, n_stars)
star_sizes = np.random.randint(100, 300, n_stars)  # Sizes: random integers from 100 to 300 (← Modification point: try changing the values)
star_colors = np.random.choice(['gold', 'orange', 'pink', 'skyblue'], n_stars)  # Colors: randomly chosen from the list

# ------------------------------------------------------------------------------
# Create figure
fig, ax = plt.subplots(figsize=(10, 6))  # ← Modification point: you can adjust figure size (in inches)

# ------------------------------------------------------------------------------
# Plot lines
# Modification point ①: If a line is unnecessary, comment it out by adding '#' at the start
# Modification point ②: Try changing color='blue' to another color (e.g., 'black', 'orange', 'cyan', 'magenta')
# Modification point ③: Try changing linestyle='-' (e.g., '--', '-.', ':')
ax.plot(x, y1, label='sin(x)', color='blue', linestyle='-')
ax.plot(x, y2, label='cos(x)', color='green', linestyle='--')
ax.plot(x, y3, label='sin(2x)', color='red', linestyle='-.')
ax.plot(x, y4, label='damped wave', color='purple', linestyle=':')

# ------------------------------------------------------------------------------
# Scatter ★ markers at random positions and colors
# Modification point ①: If this drawing is not needed, comment out this line with '#' and run
# Modification point ②: Try changing marker='*' to other shapes (e.g., '*', 'o', '^', 's', 'D', 'X'
ax.scatter(star_x, star_y, s=star_sizes, c=star_colors, marker='*', alpha=0.8, edgecolors='black')

# ------------------------------------------------------------------------------
# Adjust axes
# Modification point ①: If no adjustment is needed, comment out with '#' and run
# Modification point ②: Try changing the numbers
ax.set_xlim(0, 10)           # x-axis range
ax.set_ylim(-2, 2)           # y-axis range
ax.set_xticks(np.arange(0, 11, 2))  # x-axis tick interval (min, max+alpha, interval)
ax.set_yticks(np.arange(-2, 2.1, 0.5))  # y-axis tick interval (min, max+alpha, interval)

# ------------------------------------------------------------------------------
# Title and labels
# Modification point ①: If not needed, comment it out with '#' and run
# Modification point ②: Try changing the text
ax.set_title('A Graph with Many Features', fontsize=16)
ax.set_xlabel('Value of x')
ax.set_ylabel('Value of y')

# ------------------------------------------------------------------------------
# Legend labels
# Search for 'ax.legend' and try adding a legend below

# ------------------------------------------------------------------------------
# Other decorations
# Modification point ①: If not needed, comment it out with '#' and run
# Modification point ②: Try changing the settings
ax.axhline(0, color='black', linewidth=0.5, linestyle='--')  # Horizontal reference line
ax.grid(True, linestyle='--', alpha=0.5)

# ------------------------------------------------------------------------------
# Add text at an appropriate position
# Modification point ①: If not needed, comment it out with '#' and run
# Modification point ②: Try changing the settings
ax.text(2, 1.5, 'Python is awesome!', fontsize=12, color='gray', alpha=0.6)
ax.text(7, -1.5, 'Just a sample text', fontsize=12, color='gray', alpha=0.6)

plt.show()


---
---
# Code Copy — 🛠️ Simple Modification

If your edits get messy, copy & paste the code in the cell below to restore a clean version.


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import random

# ↓ If you want Japanese labels, uncomment the next two lines
# !pip install japanize-matplotlib
# import japanize_matplotlib

# ------------------------------------------------------------------------------
# Prepare waveform data
x = np.linspace(0, 10, 100)  # Generate values for x
y1 = np.sin(x)  # Basic sine wave
y2 = np.cos(x)  # Cosine wave
y3 = np.sin(2 * x)  # Short‑period sine wave (← Modification point: larger 2 = denser, smaller 2 = smoother)
y4 = np.exp(-0.3 * x) * np.sin(3 * x)  # Damped oscillation (← Modification point: try adjusting -0.3 and 3)

# ------------------------------------------------------------------------------
# Prepare star data
n_stars = 20  # Number of stars (← Modification point: try changing the number)
star_x = np.random.uniform(0, 10, n_stars)
star_y = np.random.uniform(-2, 2, n_stars)
star_sizes = np.random.randint(100, 300, n_stars)  # Sizes: random integers from 100 to 300 (← Modification point: try changing the values)
star_colors = np.random.choice(['gold', 'orange', 'pink', 'skyblue'], n_stars)  # Colors: randomly chosen from the list

# ------------------------------------------------------------------------------
# Create figure
fig, ax = plt.subplots(figsize=(10, 6))  # ← Modification point: you can adjust figure size (in inches)

# ------------------------------------------------------------------------------
# Plot lines
# Modification point ①: If a line is unnecessary, comment it out by adding '#' at the start
# Modification point ②: Try changing color='blue' to another color (e.g., 'black', 'orange', 'cyan', 'magenta')
# Modification point ③: Try changing linestyle='-' (e.g., '--', '-.', ':')
ax.plot(x, y1, label='sin(x)', color='blue', linestyle='-')
ax.plot(x, y2, label='cos(x)', color='green', linestyle='--')
ax.plot(x, y3, label='sin(2x)', color='red', linestyle='-.')
ax.plot(x, y4, label='damped wave', color='purple', linestyle=':')

# ------------------------------------------------------------------------------
# Scatter ★ markers at random positions and colors
# Modification point ①: If this drawing is not needed, comment out this line with '#' and run
# Modification point ②: Try changing marker='*' to other shapes (e.g., '*', 'o', '^', 's', 'D', 'X'
ax.scatter(star_x, star_y, s=star_sizes, c=star_colors, marker='*', alpha=0.8, edgecolors='black')

# ------------------------------------------------------------------------------
# Adjust axes
# Modification point ①: If no adjustment is needed, comment out with '#' and run
# Modification point ②: Try changing the numbers
ax.set_xlim(0, 10)           # x-axis range
ax.set_ylim(-2, 2)           # y-axis range
ax.set_xticks(np.arange(0, 11, 2))  # x-axis tick interval (min, max+alpha, interval)
ax.set_yticks(np.arange(-2, 2.1, 0.5))  # y-axis tick interval (min, max+alpha, interval)

# ------------------------------------------------------------------------------
# Title and labels
# Modification point ①: If not needed, comment it out with '#' and run
# Modification point ②: Try changing the text
ax.set_title('A Graph with Many Features', fontsize=16)
ax.set_xlabel('Value of x')
ax.set_ylabel('Value of y')

# ------------------------------------------------------------------------------
# Legend labels
# Search for 'ax.legend' and try adding a legend below

# ------------------------------------------------------------------------------
# Other decorations
# Modification point ①: If not needed, comment it out with '#' and run
# Modification point ②: Try changing the settings
ax.axhline(0, color='black', linewidth=0.5, linestyle='--')  # Horizontal reference line
ax.grid(True, linestyle='--', alpha=0.5)

# ------------------------------------------------------------------------------
# Add text at an appropriate position
# Modification point ①: If not needed, comment it out with '#' and run
# Modification point ②: Try changing the settings
ax.text(2, 1.5, 'Python is awesome!', fontsize=12, color='gray', alpha=0.6)
ax.text(7, -1.5, 'Just a sample text', fontsize=12, color='gray', alpha=0.6)

plt.show()
