# üé¨ Lesson 3: AI Predict & Recommend!

**Masa:** 60 minit (30 min setiap part)

**Goals:**
- Part A: Predict outcome dari data (macam predict SPM result!)
- Part B: Buat recommendation system macam Netflix/Spotify/TikTok!

---

## ü§î Kenapa Belajar Ni?

### Part A: Prediction dari Data
- **Banks (Maybank, CIMB)** - Predict siapa akan default loan
- **Insurance** - Calculate premium based on your data
- **HR** - Predict which candidate will perform best

### Part B: Recommendation Systems
- **TikTok FYP** - Kenapa video tu appear?
- **Spotify Discover Weekly** - How dia tau lagu yang kita suka?
- **Netflix "Because you watched..."** - Same concept!
- **Shopee "You might also like"** - Semua guna AI! üî•

---

# Part A: Tabular Data (Structured Data)

## üìä AI boleh belajar dari spreadsheet!

Kalau korang ada data dalam Excel, AI boleh predict outcomes!

In [None]:
# Kalau guna Google Colab:
# !pip install -Uqq fastai

from fastai.tabular.all import *

## Step 1: Load Dataset

Kita guna Adult Census dataset - predict sama ada seseorang earn >$50K atau tidak.

**Bayangkan macam predict:**
- Siapa akan pass SPM dengan cemerlang?
- Siapa akan dapat biasiswa?
- Siapa akan berjaya dalam kerjaya?

In [None]:
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
print(f"Dataset size: {df.shape[0]} rows, {df.shape[1]} columns")
df.head()

In [None]:
# Apa yang kita nak predict?
print("Target variable (salary):")
print(df['salary'].value_counts())
print(f"\n{df['salary'].value_counts(normalize=True).mul(100).round(1)}%")

## Step 2: Define Features

### üß† Concept: Categorical vs Continuous

| Type | Contoh | How AI Process |
|------|--------|----------------|
| **Categorical** | Sekolah (SMK/MRSM/SBP), Gender, Bangsa | Convert to **embeddings** |
| **Continuous** | Umur, CGPA, Household income | **Normalize** values |

**Embeddings** = AI learn hidden patterns dalam categories
- Contoh: AI mungkin learn that "Doctor" dan "Engineer" ada similar pattern!

In [None]:
# Define feature types
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 
             'relationship', 'race', 'sex', 'native-country']
cont_names = ['age', 'fnlwgt', 'education-num', 'capital-gain', 
              'capital-loss', 'hours-per-week']

# Preprocessing steps
procs = [Categorify, FillMissing, Normalize]

print(f"Categorical features: {len(cat_names)}")
print(f"Continuous features: {len(cont_names)}")

## Step 3: Sediakan Data

In [None]:
dls = TabularDataLoaders.from_df(
    df, 
    path, 
    procs=procs,
    cat_names=cat_names, 
    cont_names=cont_names,
    y_names='salary',
    valid_idx=list(range(800, 1000)),
    bs=64
)

In [None]:
# Tengok data kita
dls.show_batch()

## Step 4: Buat & Train Model

In [None]:
# Neural network dengan 2 hidden layers
# [200, 100] = first layer 200 neurons, second layer 100 neurons
learn = tabular_learner(dls, layers=[200, 100], metrics=accuracy)
print("Model ready! üöÄ")

In [None]:
# Find learning rate
learn.lr_find()

In [None]:
# Train!
learn.fit_one_cycle(3, 1e-2)

## Step 5: Make Predictions!

In [None]:
# Predict untuk satu row
row, pred, probs = learn.predict(df.iloc[0])
print(f"Prediction: {pred}")
print(f"\nProbabilities:")
print(f"  <50K: {probs[0]:.1%}")
print(f"  >=50K: {probs[1]:.1%}")

### üí° Real-world Application Ideas

Kalau korang ada data macam ni, boleh predict:

| Data | Prediction |
|------|------------|
| Student records (attendance, grades, activities) | SPM result |
| Game player data | Will they spend money? |
| Shopee seller data | Will product sell well? |
| Social media engagement | Will post go viral? |

---

# Part B: Collaborative Filtering (Recommendations)

## üé¨ Macam Mana Netflix/TikTok Tau Apa Korang Suka?

**Basic idea:** 
> "Orang yang suka benda yang sama dengan korang, probably suka benda lain yang sama jugak!"

Contoh:
- Korang suka K-drama "Squid Game"
- Orang lain yang suka Squid Game juga suka "Money Heist"
- AI recommend "Money Heist" untuk korang! üéØ

In [None]:
from fastai.collab import *

## Step 1: Load MovieLens Data

100,000 movie ratings dari real users!

In [None]:
path = untar_data(URLs.ML_100k)
print(f"Dataset: {path}")

In [None]:
# Load ratings
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
                      usecols=(0,1,2), names=['user', 'movie', 'rating'])
print(f"Total ratings: {len(ratings):,}")
print(f"Unique users: {ratings['user'].nunique()}")
print(f"Unique movies: {ratings['movie'].nunique()}")
ratings.head()

In [None]:
# Load movie titles
movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1',
                     usecols=(0,1), names=['movie', 'title'], header=None)

# Merge untuk dapat titles
ratings = ratings.merge(movies)
ratings.head()

In [None]:
# Tengok rating distribution
import matplotlib.pyplot as plt

ratings['rating'].value_counts().sort_index().plot(kind='bar', color='coral')
plt.title('Rating Distribution üåü')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()

## Step 2: Sediakan Data

In [None]:
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)

In [None]:
dls.show_batch()

## Step 3: Buat & Train Model

### üß† Concept: Embeddings & Latent Factors

AI akan learn **hidden characteristics** (latent factors):

**Untuk Users:**
- Suka action movies? Score tinggi
- Suka romance? Score rendah
- Suka comedy? Medium

**Untuk Movies:**
- Ada banyak action? Score tinggi
- Ada romance? Score rendah
- Comedy level? Medium

**Prediction:** User embedding √ó Movie embedding = Predicted rating!

In [None]:
# n_factors = 50 hidden characteristics untuk learn
# y_range = ratings between 0.5 and 5.5
learn = collab_learner(dls, n_factors=50, y_range=(0.5, 5.5))
print("Recommendation model ready! üé¨")

In [None]:
# Train!
learn.fit_one_cycle(5, 5e-3, wd=0.1)

## Step 4: Analyze Results

In [None]:
# Show predictions
learn.show_results()

In [None]:
# Get movie biases (overall popularity)
movie_bias = learn.model.bias(ratings['title'].unique(), is_item=True)
movie_bias = movie_bias.squeeze()

top_movies = ratings['title'].unique()
movie_ratings = [(b.item(), m) for b, m in zip(movie_bias, top_movies)]

In [None]:
# Movies yang SEMUA ORANG SUKA! üî•
print("üèÜ TOP 10 UNIVERSALLY LOVED MOVIES:")
print("="*50)
for i, (score, title) in enumerate(sorted(movie_ratings, reverse=True)[:10], 1):
    print(f"{i}. {title[:40]}... (score: {score:.2f})")

In [None]:
# Movies yang KURANG POPULAR üòÖ
print("üëé BOTTOM 10 MOVIES:")
print("="*50)
for i, (score, title) in enumerate(sorted(movie_ratings)[:10], 1):
    print(f"{i}. {title[:40]}... (score: {score:.2f})")

---

## üí° Discussion: How TikTok FYP Works!

**Same concept but with more signals:**

| Signal | Weight |
|--------|--------|
| Watch time (berapa lama tengok) | Very High |
| Like, Comment, Share | High |
| Follow creator | High |
| Skip video | Negative |
| Video completion rate | High |

**Mind-blown moment:**
> "TikTok tau korang better than korang tau diri sendiri! 
> Sebab dia analyze every millisecond of your behavior!" ü§Ø

### Privacy Discussion
- AI tau sangat banyak pasal kita
- Filter bubble - kita hanya nampak apa yang kita suka
- Is this good or bad?

---

# üèÜ CHALLENGE: Buat Recommendation Function!

Buat function yang recommend movies untuk specific user!

In [None]:
# CHALLENGE: Complete this function!

def get_recommendations(user_id, n_recommendations=5):
    """
    Get movie recommendations for a specific user
    
    Args:
        user_id: The user to get recommendations for
        n_recommendations: How many movies to recommend
    
    Returns:
        List of recommended movie titles
    """
    # Get movies user hasn't rated yet
    user_rated = ratings[ratings['user'] == user_id]['title'].tolist()
    all_movies = ratings['title'].unique().tolist()
    unrated = [m for m in all_movies if m not in user_rated]
    
    # Predict ratings for unrated movies
    predictions = []
    for movie in unrated[:100]:  # Limit for speed
        # Create test dataframe
        test_df = pd.DataFrame({'user': [user_id], 'title': [movie]})
        dl = learn.dls.test_dl(test_df)
        preds, _ = learn.get_preds(dl=dl)
        predictions.append((movie, preds[0].item()))
    
    # Sort by predicted rating
    predictions.sort(key=lambda x: x[1], reverse=True)
    
    return predictions[:n_recommendations]

# Test the function
print("üé¨ RECOMMENDATIONS FOR USER 42:")
print("="*50)
recs = get_recommendations(42, n_recommendations=5)
for i, (movie, score) in enumerate(recs, 1):
    print(f"{i}. {movie[:40]}... (predicted: {score:.1f}/5)")

---

## üè† Homework Ideas

1. **Build Your Own Recommendation System:**
   - Collect data: What games/songs/movies friends suka
   - Train model
   - Make recommendations!

2. **Predict Student Performance:**
   - Collect: attendance, homework completion, test scores
   - Predict: Final exam result

3. **Shopee Product Recommender:**
   - Scrape product data
   - Build recommendation system

---

## üéØ Real-World Career Applications

| Company | Use Case | Salary Range |
|---------|----------|-------------|
| Grab | Driver/rider matching | RM15-25K |
| Shopee | Product recommendations | RM12-20K |
| TikTok | Content recommendations | RM20-35K |
| Maybank | Credit scoring | RM10-18K |
| AirAsia | Price optimization | RM12-22K |

> "Learn ni sekarang, by the time korang graduate, these skills akan sangat valuable!" üí∞

---

## üéâ Congratulations!

Korang dah belajar:
1. ‚úÖ **Image Classification** - AI kenal gambar
2. ‚úÖ **Text Classification** - AI faham sentiment
3. ‚úÖ **Tabular Prediction** - AI predict dari data
4. ‚úÖ **Recommendation Systems** - AI suggest content

### Next Steps:
1. üìö **course.fast.ai** - Full free course
2. üèÜ **Kaggle.com** - Compete and learn
3. üíª **Build projects** - The best way to learn!

---

*"The future belongs to those who learn AI. And that's YOU!"* üöÄüá≤üáæ