## Install & Import

# ü•∑ Web Scraping X/Twitter dengan Import Cookies

## üìã Cara Kerja:

### ‚úÖ **Metode: Import Cookies dari Browser Manual**

Karena X/Twitter mendeteksi login otomatis, kita gunakan **cookies dari browser yang sudah login**.

### üöÄ **Langkah-langkah Penggunaan:**

1. **Install dependencies** (Cell 3)
2. **Setup driver** (Cell 4)
3. **Import cookies** dari browser yang sudah login (Cell 5) ‚úÖ
4. **Scrape tweets** dengan keyword "mbg" (Cell 6)
5. **Export ke CSV** (Cell 7-8)
6. **Tutup driver** (Cell 9)

---

## üí° **Cara Mendapatkan Cookies:**

### Metode 1: Menggunakan Extension Browser
1. Install extension **EditThisCookie** atau **Cookie-Editor**
2. Login ke x.com di browser biasa
3. Klik icon extension ‚Üí Export ‚Üí Copy JSON
4. Paste di Cell 5 (sudah ada contoh cookies Anda)

### Metode 2: Dari Developer Tools
1. Login ke x.com
2. Tekan F12 ‚Üí Tab "Application" ‚Üí "Cookies" ‚Üí "https://x.com"
3. Copy cookies penting: `auth_token`, `ct0`, `twid`

---

## ‚ö†Ô∏è **Disclaimer:**
Script ini untuk tujuan edukasi. Pastikan mematuhi Terms of Service X/Twitter dan gunakan secara bertanggung jawab.

In [37]:
# Jika perlu install dulu (jalankan sekali saja)
# !pip install selenium pandas webdriver-manager fake-useragent

import time
import os
import random
import pandas as pd

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Optional: untuk random user agent
try:
    from fake_useragent import UserAgent
    ua = UserAgent()
except:
    ua = None
    print("‚ö†Ô∏è fake-useragent tidak terinstall, menggunakan user agent manual")

## Setup Browser

In [38]:
def create_driver():
    """Setup Chrome driver sederhana untuk import cookies"""
    chrome_options = Options()
    
    # User agent
    user_agents = [
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
    ]
    chrome_options.add_argument(f'user-agent={random.choice(user_agents)}')
    
    # Window & options
    chrome_options.add_argument("--window-size=1920,1080")
    chrome_options.add_argument("--start-maximized")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-blink-features=AutomationControlled")
    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_options.add_experimental_option('useAutomationExtension', False)
    
    # Preferences
    chrome_options.add_experimental_option('prefs', {
        'intl.accept_languages': 'id-ID,id,en-US,en',
        'profile.default_content_setting_values.notifications': 2,
    })
    
    # Create driver
    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=chrome_options)
    
    # Stealth script
    driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {
        'source': '''
            Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
            window.chrome = {runtime: {}};
        '''
    })
    
    print("‚úÖ Driver berhasil dibuat")
    return driver

# Buat driver
driver = create_driver()

‚úÖ Driver berhasil dibuat


## üç™ Import Cookies & Login

**Cara menggunakan:**
1. Export cookies dari browser yang sudah login ke X/Twitter
2. Simpan cookies sebagai file JSON dengan nama `cookies.json` di folder yang sama dengan notebook ini
3. Jalankan cell ini untuk import cookies dari file
4. Browser akan otomatis login menggunakan session dari cookies
5. Lanjut ke cell berikutnya untuk scraping

**Format file cookies.json:**
```json
[
  {
    "domain": ".x.com",
    "name": "auth_token",
    "value": "your_auth_token_here",
    ...
  },
  ...
]
```

In [39]:
def import_cookies_from_json(driver, cookies_json):
    """
    Import cookies dari JSON ke Selenium driver
    cookies_json: list of dict atau string JSON
    """
    import json
    
    # Buka halaman X terlebih dahulu
    driver.get("https://x.com")
    time.sleep(2)
    
    # Parse JSON jika berupa string
    if isinstance(cookies_json, str):
        cookies = json.loads(cookies_json)
    else:
        cookies = cookies_json
    
    # Daftar cookies yang penting untuk autentikasi
    essential_cookies = ['auth_token', 'ct0', 'twid', 'att', '__cf_bm', 'kdt', 'guest_id']
    
    added_count = 0
    skipped_count = 0
    
    # Add cookies
    for cookie in cookies:
        cookie_name = cookie.get('name')
        
        # Filter: hanya tambahkan cookies essential
        if cookie_name not in essential_cookies:
            skipped_count += 1
            continue
        
        try:
            # Selenium memerlukan format tertentu
            cookie_dict = {
                'name': cookie.get('name'),
                'value': cookie.get('value'),
                'domain': cookie.get('domain', '.x.com'),
                'path': cookie.get('path', '/'),
            }
            
            # Tambahkan field opsional jika ada
            if 'expiry' in cookie or 'expirationDate' in cookie:
                expiry = cookie.get('expiry') or cookie.get('expirationDate')
                if expiry:
                    cookie_dict['expiry'] = int(expiry)
            if 'secure' in cookie:
                cookie_dict['secure'] = cookie['secure']
            if 'httpOnly' in cookie:
                cookie_dict['httpOnly'] = cookie['httpOnly']
            if 'sameSite' in cookie:
                cookie_dict['sameSite'] = cookie['sameSite']
            
            driver.add_cookie(cookie_dict)
            print(f"  ‚úÖ Added: {cookie_name}")
            added_count += 1
        except Exception as e:
            print(f"  ‚ö†Ô∏è  Skip cookie {cookie.get('name')}: {e}")
    
    print(f"\nüìä Summary: {added_count} cookies added, {skipped_count} cookies skipped")
    print("‚úÖ Essential cookies imported successfully")
    
    # Refresh halaman untuk apply cookies
    driver.refresh()
    time.sleep(3)
    
    # Cek apakah sudah login
    current_url = driver.current_url
    if "/home" in current_url or "/explore" in current_url:
        print("‚úÖ Login berhasil menggunakan cookies!")
        print(f"üìç Current URL: {current_url}")
        print("\nüîë Key cookies used: auth_token, ct0, __cf_bm (Cloudflare Bot Management)")
        return True
    else:
        print("‚ö†Ô∏è  Cookies mungkin sudah expired, coba export ulang")
        print(f"üìç Current URL: {current_url}")
        return False


# ========== BACA FILE COOKIES ==========
import json
import os

# Path ke file cookies.json (di folder yang sama dengan notebook)
COOKIES_FILE = "cookies.json"
cookies_path = os.path.join(os.path.dirname(os.path.abspath(__file__)) if '__file__' in globals() else os.getcwd(), COOKIES_FILE)

print(f"üîç Mencari file cookies: {COOKIES_FILE}")
print(f"üìÅ Path lengkap: {cookies_path}")

# Cek apakah file ada
if not os.path.exists(COOKIES_FILE):
    print(f"\n‚ùå File '{COOKIES_FILE}' tidak ditemukan!")
    print(f"üí° Pastikan file cookies.json ada di folder yang sama dengan notebook ini")
    print(f"üìÇ Current directory: {os.getcwd()}")
    print("\nüìù Cara membuat file cookies.json:")
    print("   1. Login ke x.com di browser")
    print("   2. Install extension 'EditThisCookie' atau 'Cookie-Editor'")
    print("   3. Export cookies ‚Üí Save as 'cookies.json'")
    print("   4. Letakkan file di folder yang sama dengan notebook ini")
    active_driver = None
else:
    # Baca file JSON
    print(f"‚úÖ File cookies.json ditemukan!")
    with open(COOKIES_FILE, 'r', encoding='utf-8') as f:
        cookies_from_file = json.load(f)
    
    print(f"üìä Total cookies dalam file: {len(cookies_from_file)}")
    
    # ========== JALANKAN IMPORT COOKIES ==========
    print("\nüöÄ Mengimport cookies dan login...")
    print("=" * 60)
    success = import_cookies_from_json(driver, cookies_from_file)
    
    if success:
        print("=" * 60)
        print("üéâ BERHASIL! Siap untuk scraping...")
        active_driver = driver
    else:
        print("=" * 60)
        print("‚ö†Ô∏è  Login gagal. Cookies mungkin expired.")
        print("üí° Export cookies baru dari browser dan simpan sebagai cookies.json")
        active_driver = None

üîç Mencari file cookies: cookies.json
üìÅ Path lengkap: d:\Document\Kuliah\Semester7\PPW\PPW\exams\uas\crawling x\cookies.json
‚úÖ File cookies.json ditemukan!
üìä Total cookies dalam file: 99

üöÄ Mengimport cookies dan login...
  ‚ö†Ô∏è  Skip cookie guest_id: 
  ‚ö†Ô∏è  Skip cookie __cf_bm: 
  ‚ö†Ô∏è  Skip cookie guest_id: 
  ‚ö†Ô∏è  Skip cookie kdt: 
  ‚ö†Ô∏è  Skip cookie auth_token: 
  ‚ö†Ô∏è  Skip cookie ct0: 
  ‚ö†Ô∏è  Skip cookie twid: 
  ‚ö†Ô∏è  Skip cookie att: 
  ‚ö†Ô∏è  Skip cookie __cf_bm: 
  ‚ö†Ô∏è  Skip cookie guest_id: 
  ‚ö†Ô∏è  Skip cookie __cf_bm: 
  ‚ö†Ô∏è  Skip cookie guest_id: 
  ‚ö†Ô∏è  Skip cookie kdt: 
  ‚ö†Ô∏è  Skip cookie auth_token: 
  ‚ö†Ô∏è  Skip cookie ct0: 
  ‚ö†Ô∏è  Skip cookie twid: 
  ‚ö†Ô∏è  Skip cookie att: 
  ‚ö†Ô∏è  Skip cookie __cf_bm: 

üìä Summary: 0 cookies added, 81 cookies skipped
‚úÖ Essential cookies imported successfully
  ‚ö†Ô∏è  Skip cookie guest_id: 
  ‚ö†Ô∏è  Skip cookie __cf_bm: 
  ‚ö†Ô∏è  Skip cookie guest_id: 
  ‚ö†Ô∏è  Skip c

## üéØ Scrape Tweets dengan Kata Kunci "MBG"

In [40]:
# Pastikan sudah login terlebih dahulu!
if active_driver is None:
    print("‚ùå Driver belum siap atau belum login!")
    print("üí° Jalankan cell sebelumnya untuk login terlebih dahulu")
else:
    # Scrape tweets dengan kata kunci "MBG"
    SEARCH_KEYWORD = "mbg"
    SEARCH_URL = f"https://x.com/search?q={SEARCH_KEYWORD}&src=typed_query&f=live"
    
    print(f"üîç Mencari tweets dengan keyword: {SEARCH_KEYWORD}")
    active_driver.get(SEARCH_URL)
    print("‚è≥ Memuat hasil pencarian...")
    random_sleep(3, 5)
    
    # Scroll untuk load lebih banyak tweets
    def scroll_page(driver, times=5):
        """Scroll halaman untuk load lebih banyak konten"""
        for i in range(times):
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            print(f"üìú Scrolling... ({i+1}/{times})")
            random_sleep(2, 4)
    
    scroll_page(active_driver, 5)
    
    # Cari semua tweet articles
    print("üîç Mengambil data tweets...")
    tweets = active_driver.find_elements(By.CSS_SELECTOR, "article[data-testid='tweet']")
    print(f"‚úÖ Ditemukan {len(tweets)} tweets")
    
    # Variabel untuk menyimpan data
    usernames = []
    tweet_texts = []
    timestamps = []
    likes = []
    retweets = []
    replies = []
    tweet_links = []
    
    for idx, tweet in enumerate(tweets, 1):
        try:
            # Username
            try:
                username = tweet.find_element(By.CSS_SELECTOR, "[data-testid='User-Name'] a").text
            except:
                username = "N/A"
            
            # Tweet text
            try:
                text = tweet.find_element(By.CSS_SELECTOR, "[data-testid='tweetText']").text
            except:
                text = "N/A"
            
            # Timestamp
            try:
                time_element = tweet.find_element(By.TAG_NAME, "time")
                timestamp = time_element.get_attribute("datetime")
            except:
                timestamp = "N/A"
            
            # Likes
            try:
                like_element = tweet.find_element(By.CSS_SELECTOR, "[data-testid='like']")
                like_count = like_element.get_attribute("aria-label")
            except:
                like_count = "0"
            
            # Retweets
            try:
                retweet_element = tweet.find_element(By.CSS_SELECTOR, "[data-testid='retweet']")
                retweet_count = retweet_element.get_attribute("aria-label")
            except:
                retweet_count = "0"
            
            # Replies
            try:
                reply_element = tweet.find_element(By.CSS_SELECTOR, "[data-testid='reply']")
                reply_count = reply_element.get_attribute("aria-label")
            except:
                reply_count = "0"
            
            # Tweet link
            try:
                link = tweet.find_element(By.CSS_SELECTOR, "a[href*='/status/']").get_attribute("href")
            except:
                link = "N/A"
            
            # Append ke list
            usernames.append(username)
            tweet_texts.append(text)
            timestamps.append(timestamp)
            likes.append(like_count)
            retweets.append(retweet_count)
            replies.append(reply_count)
            tweet_links.append(link)
            
            print(f"‚úì Tweet #{idx} - {username[:30]}")
            
        except Exception as e:
            print(f"‚ö†Ô∏è  Error pada tweet #{idx}: {e}")
            continue
    
    print(f"\nüìä Total data berhasil di-scrape: {len(tweet_texts)} tweets")

‚ùå Driver belum siap atau belum login!
üí° Jalankan cell sebelumnya untuk login terlebih dahulu


## üìä Buat DataFrame & Ekspor ke CSV

In [41]:
# Buat DataFrame dari data yang berhasil di-scrape
df = pd.DataFrame({
    "username": usernames,
    "tweet_text": tweet_texts,
    "timestamp": timestamps,
    "likes": likes,
    "retweets": retweets,
    "replies": replies,
    "link": tweet_links,
})

print(f"üìä DataFrame berhasil dibuat dengan {len(df)} baris")
print("\nüîç Preview 5 data pertama:")
df.head()

NameError: name 'usernames' is not defined

In [None]:
# Ekspor ke CSV dengan timestamp
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
OUTPUT_FILE = f"tweets_mbg_{timestamp}.csv"

df.to_csv(OUTPUT_FILE, index=False, encoding="utf-8-sig")

print(f"‚úÖ Data berhasil disimpan ke: {OUTPUT_FILE}")
print(f"üìä Total tweets tersimpan: {len(df)}")
print(f"üìÅ Lokasi file: {os.path.abspath(OUTPUT_FILE)}")

## Tutup Driver

In [42]:
# Tutup driver setelah selesai scraping
if 'active_driver' in locals() and active_driver:
    active_driver.quit()
    print("‚úÖ Driver ditutup.")
elif 'driver' in locals():
    driver.quit()
    print("‚úÖ Driver ditutup.")
else:
    print("‚ö†Ô∏è  Tidak ada driver aktif untuk ditutup")

‚úÖ Driver ditutup.
