Nama: **Putri Nisrina Az-Zahra**

NIM: **M0501241050**

Dosen Pembimbing: **Prof. Dr. Ir. Anik Djuraidah, MS**



---



# **Project Manajemen Data Statistika**

Web scraper untuk mengambil data dosen pembimbing pertama dalam menangani variasi struktur halaman halaman dari Scopus, Scholar, dan SINTA

*    **Scopus** – database publikasi internasional (akses terbatas).
*    **Google Scholar** – menampilkan publikasi ilmiah dan profil peneliti.
*   **SINTA** – portal resmi kinerja dosen dan peneliti Indonesia.



---



# **SINTA**

https://sinta.kemdikbud.go.id/authors/profile/6030258

## Import Library

In [1]:
import os, time, re, tempfile
import requests
from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

## Konfigurasi Profil Target dan Penyimpanan Data

In [22]:
AUTHOR_ID = "6030258"
BASE_URL = f"https://sinta.kemdikbud.go.id/authors/profile/{AUTHOR_ID}"
OUTPUT_DIR = "sinta_data"
os.makedirs(OUTPUT_DIR, exist_ok=True)

## Inisialisasi WebDriver untuk Web Scraping

In [23]:
def init_driver():
    options = Options()
    options.add_argument("--headless")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    temp_dir = tempfile.mkdtemp()
    options.add_argument(f"--user-data-dir={temp_dir}")
    try:
        return webdriver.Chrome(options=options)
    except Exception as e:
        print("❌ WebDriver gagal:", e)
        return None

## Ekstraksi Informasi Profil Dosen dari Halaman SINTA

In [24]:
def parse_profil_sinta(soup):
    # Nilai default
    profil = {
        "Nama Lengkap": "N/A",
        "Afiliasi": "N/A",
        "Program Studi": "N/A",
        "SINTA ID": "N/A",
        "SINTA Score Overall": "N/A",
        "SINTA Score 3Yr": "N/A",
        "Affil Score": "N/A",
        "Affil Score 3Yr": "N/A",
        "Article (Scopus)": "N/A",
        "Article (GScholar)": "N/A",
        "Citation (Scopus)": "N/A",
        "Citation (GScholar)": "N/A",
        "H-Index (Scopus)": "N/A",
        "H-Index (GScholar)": "N/A"
    }

    try:
        if soup.select_one("div.col-lg.col-md > h3 a"):
            profil["Nama Lengkap"] = soup.select_one("div.col-lg.col-md > h3 a").text.strip()
        if soup.select_one(".meta-profile a:nth-child(1)"):
            profil["Afiliasi"] = soup.select_one(".meta-profile a:nth-child(1)").text.strip()
        if soup.select_one(".meta-profile a:nth-child(3)"):
            profil["Program Studi"] = soup.select_one(".meta-profile a:nth-child(3)").text.strip()
        if soup.select_one(".meta-profile a:nth-child(5)"):
            profil["SINTA ID"] = soup.select_one(".meta-profile a:nth-child(5)").text.split(":")[-1].strip()

        skor = soup.select(".stat-profile .pr-num")
        if len(skor) > 0: profil["SINTA Score Overall"] = skor[0].text.strip()
        if len(skor) > 1: profil["SINTA Score 3Yr"] = skor[1].text.strip()
        if len(skor) > 2: profil["Affil Score"] = skor[2].text.strip()
        if len(skor) > 3: profil["Affil Score 3Yr"] = skor[3].text.strip()

        table = soup.select_one(".stat-table tbody")
        if table:
            for row in table.select("tr"):
                td = row.select("td")
                if len(td) >= 3:
                    key = td[0].text.strip()
                    profil[f"{key} (Scopus)"] = td[1].text.strip()
                    profil[f"{key} (GScholar)"] = td[2].text.strip()
    except Exception as e:
        print("⚠ Error parsing profil:", e)

    return profil

## Fungsi Utama untuk Mengambil Data Profil Dosen dari SINTA

In [25]:
def ambil_profil_sinta():
    driver = init_driver()
    if not driver: return {}
    driver.get(BASE_URL)
    time.sleep(2)
    soup = BeautifulSoup(driver.page_source, "html.parser")
    profil = parse_profil_sinta(soup)
    driver.quit()
    return profil

## Ekstraksi Data Publikasi dari Multi-View Profil SINTA

In [27]:
def scrape_sinta_selected_views(author_id, timeout=20):
    base_url = f"https://sinta.kemdikbud.go.id/authors/profile/{author_id}"
    views = ['googlescholar', 'scopus', 'garuda']
    all_data = []

    headers = {"User-Agent": "Mozilla/5.0"}
    for view in views:
        url = f"{base_url}/?view={view}"
        print(f"🔍 Ambil data {view.upper()}")
        try:
            response = requests.get(url, headers=headers, timeout=timeout)
            response.raise_for_status()
        except Exception as e:
            print(f"❌ Error fetch {view}:", e)
            continue

        soup = BeautifulSoup(response.text, 'html.parser')
        articles = soup.select('div.ar-list-item')
        if not articles:
            print(f"⚠ Tidak ada artikel ditemukan di {view}")
            continue

        for article in articles:
            try:
                title_elem = article.select_one('div.ar-title a')
                title = title_elem.get_text(strip=True) if title_elem else "N/A"
                link = title_elem['href'] if title_elem else "N/A"

                authors = "N/A"
                if view == "garuda":
                    ar_meta = article.select('div.ar-meta a')
                    for i, a in enumerate(ar_meta):
                        if "Author Order" in a.get_text():
                            if i + 1 < len(ar_meta):
                                authors = ar_meta[i + 1].get_text(strip=True)
                            break
                else:
                    author_text = article.find(string=re.compile("Creator|Authors"))
                    authors = re.sub(r"(Creator|Authors)\s*:\s*", "", author_text.strip()) if author_text else "N/A"

                journal_elem = article.select_one('a.ar-pub') or article.select_one('span.ar-pub')
                journal = journal_elem.get_text(strip=True) if journal_elem else "N/A"

                year_elem = article.select_one('a.ar-year')
                year = re.search(r'\d{4}', year_elem.get_text(strip=True)).group(0) if year_elem else "N/A"

                cited_elem = article.select_one('a.ar-cited')
                cited = re.search(r'\d+', cited_elem.get_text(strip=True)).group(0) if cited_elem else "0"

                all_data.append({
                    'Sumber': view.upper(),
                    'Judul': title,
                    'Penulis': authors,
                    'Jurnal': journal,
                    'Tahun': year,
                    'Cited': cited,
                    'Link': link
                })
            except Exception as e:
                print(f"⚠ Error parsing artikel di {view}: {e}")
                continue

    return pd.DataFrame(all_data)

## Penyimpanan Hasil Scraping ke Excel

In [31]:
if __name__ == "__main__":
    profil = ambil_profil_sinta()
    df_pub = scrape_sinta_selected_views(AUTHOR_ID)

    out_path = os.path.join(OUTPUT_DIR, f"sinta_gabungan_{AUTHOR_ID}.xlsx")
    with pd.ExcelWriter(out_path) as writer:
        df_prof = pd.DataFrame(list(profil.items()), columns=["Atribut", "Nilai"])
        df_prof.to_excel(writer, sheet_name="Profil", index=False)
        if not df_pub.empty:
            for sumber in df_pub["Sumber"].unique():
                df_pub[df_pub["Sumber"] == sumber].to_excel(writer, sheet_name=sumber, index=False)

    print(f"✅ Data lengkap disimpan di {out_path}")

🔍 Ambil data GOOGLESCHOLAR
🔍 Ambil data SCOPUS
🔍 Ambil data GARUDA
✅ Data lengkap disimpan di sinta_data/sinta_gabungan_6030258.xlsx


## Contoh Data yang Tersimpan

In [32]:
df_pub["Sumber"].unique()

array(['GOOGLESCHOLAR', 'SCOPUS', 'GARUDA'], dtype=object)

In [33]:
df_pub[df_pub["Sumber"] == "GOOGLESCHOLAR"].head()

Unnamed: 0,Sumber,Judul,Penulis,Jurnal,Tahun,Cited,Link
0,GOOGLESCHOLAR,Spatial Clustering Regression in Identifying Local Factors in Stunting Cases in Indonesia,"UA Syam, A Djuraidah, UD Syafitri","JTAM (Jurnal Teori dan Aplikasi Matematika) 9 (2), 600-614, 2025",2025,0,https://scholar.google.com/scholar?q=+intitle:'Spatial Clustering Regression in Identifying Local Factors in Stunting Cases in Indonesia'
1,GOOGLESCHOLAR,"Spatiotemporal Bayes model for estimating the number of hotspots as an indicator of forest and land fires in Kalimantan Island, Indonesia","F Rohimahastuti, A Djuraidah, H Wijayanto","Journal of Agrometeorology 27 (1), 27-32, 2025",2025,1,"https://scholar.google.com/scholar?q=+intitle:'Spatiotemporal Bayes model for estimating the number of hotspots as an indicator of forest and land fires in Kalimantan Island, Indonesia'"
2,GOOGLESCHOLAR,Identifying Factors Affecting Waste Generation in West Java in 2021 Using Spatial Regression,"A Djuraidah, A Rizki, T Alfan","JTAM (Jurnal Teori dan Aplikasi Matematika) 8 (2), 495-505, 2024",2024,1,https://scholar.google.com/scholar?q=+intitle:'Identifying Factors Affecting Waste Generation in West Java in 2021 Using Spatial Regression'
3,GOOGLESCHOLAR,Rainfall modeling with CMIP6-DCPP outputs and local characteristic information using eigenvector spatial filtering varying coefficient (ESF-VC),"D Al Mahkya, A Djuraidah, AH Wigena, B Sartono","Journal of Agrometeorology 26 (3), 311-317, 2024",2024,0,https://scholar.google.com/scholar?q=+intitle:'Rainfall modeling with CMIP6-DCPP outputs and local characteristic information using eigenvector spatial filtering varying coefficient (ESF-VC)'
4,GOOGLESCHOLAR,Bayesian conditional negative binomial autoregressive model: a case study of stunting on Java Island in 2021,"DJ Fitri, A Djuraidah, H Wijayanto","Commun. Math. Biol. Neurosci. 2024, Article ID 18, 2024",2024,3,https://scholar.google.com/scholar?q=+intitle:'Bayesian conditional negative binomial autoregressive model: a case study of stunting on Java Island in 2021'


In [34]:
df_pub[df_pub["Sumber"] == "SCOPUS"].head()

Unnamed: 0,Sumber,Judul,Penulis,Jurnal,Tahun,Cited,Link
10,SCOPUS,"Spatio-temporal Bayes model for estimating the number of hotspots as an indicator of forest and land fires in Kalimantan Island, Indonesia",Rohimahastuti F.,Journal of Agrometeorology,2025,0,https://www.scopus.com/record/display.uri?eid=2-s2.0-105000853549&origin=resultslist
11,SCOPUS,"Bias correction and ensemble techniques in statistical downscaling model for rainfall prediction using Tweedie-LASSO in West Java, Indonesia",Dewanti D.,Journal of Agrometeorology,2024,0,https://www.scopus.com/record/display.uri?eid=2-s2.0-85203587924&origin=resultslist
12,SCOPUS,Rainfall modeling with CMIP6-DCPP outputs and local characteristic information using eigenvector spatial filtering varying coefficient (ESF-VC),Mahkya D.A.,Journal of Agrometeorology,2024,0,https://www.scopus.com/record/display.uri?eid=2-s2.0-85203602030&origin=resultslist
13,SCOPUS,BAYESIAN CONDITIONAL NEGATIVE BINOMIAL AUTOREGRESSIVE MODEL: A CASE STUDY OF STUNTING ON JAVA ISLAND IN 2021,Fitri D.J.,Communications in Mathematical Biology and Neuroscience,2024,2,https://www.scopus.com/record/display.uri?eid=2-s2.0-85186249577&origin=resultslist
14,SCOPUS,Multiclass Forecasting on Panel Data Using Autoregressive Multinomial Logit and C5.0 Decision Tree,Ardiansyah M.,Pakistan Journal of Statistics and Operation Research,2023,1,https://www.scopus.com/record/display.uri?eid=2-s2.0-85153754262&origin=resultslist


In [35]:
df_pub[df_pub["Sumber"] == "GARUDA"].head()

Unnamed: 0,Sumber,Judul,Penulis,Jurnal,Tahun,Cited,Link
20,GARUDA,Identifying Factors Affecting Waste Generation in West Java in 2021 Using Spatial Regression,"Djuraidah, Anik; Rizki, Akbar; Alfan, Tony","JTAM (Jurnal Teori dan Aplikasi Matematika) Vol 8, No 2 (2024): April495-505",2024,10,https://garuda.kemdikbud.go.id/documents/detail/4469322
21,GARUDA,BCBimax Biclustering Algorithm with Mixed-Type Data,Hanifa Izzati; Indahwati Indahwati; Anik Djuraidah,"JUITA: Jurnal Informatika JUITA Vol. 12 No. 1, May 2024131 - 139",2024,10,https://garuda.kemdikbud.go.id/documents/detail/4148178
22,GARUDA,STACKING ENSEMBLE APPROACH IN STATISTICAL DOWNSCALING USING CMIP6-DCPP FOR RAINFALL ESTIMATION IN RIAU,"Mahkya, Dani Al; Djuraidah, Anik; Wigena, Aji Hamim; Sartono, Bagus","MEDIA STATISTIKA Vol 17, No 1 (2024): Media Statistika1-12",2024,10,https://garuda.kemdikbud.go.id/documents/detail/4571418
23,GARUDA,A-Optimal Pada Mixture Amount Design Dengan Modifikasi Rancangan Petak Terbagi Menggunakan Algoritma Point-Exchange,"Sari, Mutia Dwi Permata; Syafitri, Utami Dyah; Djuraidah, Anik",Al-Khwarizmi : Jurnal Pendidikan Matematika dan Ilmu Pengetahuan Alam Vol. 12 No. 2 (2024): Al-Khwarizmi : Jurnal Pendidikan Matematika dan Ilmu Pengetahuan Alam137-146,2024,10,https://garuda.kemdikbud.go.id/documents/detail/4788257
24,GARUDA,PENDUGAAN FAKTOR â FAKTOR YANG MEMENGARUHI KASUS STUNTING DI JAWA BARAT TAHUN 2021 MENGGUNAKAN REGRESI SPASIAL BINOMIAL NEGATIF,Anik Djuraidah; Mely Amelia; Rahma Anisa,"Jurnal Matematika, Statistika dan Komputasi Vol. 20 No. 1 (2023): SEPTEMBER, 202341-51",2023,10,https://garuda.kemdikbud.go.id/documents/detail/3678067


# **Scopus**

https://www.scopus.com/authid/detail.uri?authorId=56716188100

## Import Library

In [36]:
import os
import time
import requests
import json
import csv

## Konfigurasi Profil Target dan Penyimpanan Data

In [37]:
API_KEY_SCOPUS = "6d622669a44382be609e6c56d4b659e4"
AUTHOR_ID_SCOPUS = "56716188100"
BASE_URL_SCOPUS_API = "https://api.elsevier.com/content/search/scopus"
OUTPUT_DIR = "scopus_api_data"
if not os.path.exists(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)

## Ekstraksi dan Format Data Publikasi

In [38]:
def scrape_scopus_api(api_key, author_id):
    """
    Mengambil data publikasi dari Scopus API menggunakan Author ID.
    """

    headers = {
        "X-ELS-APIKey": api_key,
        "Accept": "application/json"
    }
    publikasi_scopus_api = []
    start_index = 0
    count_per_request = 25

    try:
        while True:
            params = {
                "query": f"AU-ID({author_id})",
                "start": start_index,
                "count": count_per_request
            }

            response = requests.get(BASE_URL_SCOPUS_API, headers=headers, params=params)
            response.raise_for_status()
            data = response.json()

            entries = data.get("search-results", {}).get("entry", [])
            if not entries:
                break

            for item in entries:
                creator = item.get("dc:creator", "-")
                penulis = creator.replace(";", ", ") if ";" in creator else creator
                publikasi_scopus_api.append({
                    "Judul": item.get("dc:title", "-"),
                    "Tahun": item.get("prism:coverDate", "-")[:4],
                    "Sitasi": item.get("citedby-count", "0"),
                    "Penulis": penulis,
                    "Nama Jurnal": item.get("prism:publicationName", "-"),
                    "DOI": item.get("prism:doi", "-"),
                    "EID": item.get("eid", "-"),
                    "Link Dokumen": item.get("prism:url", "-"),
                    "Sumber": "Scopus API"
                })

            total_results_str = data.get("search-results", {}).get("opensearch:totalResults", "0")
            total_results = int(total_results_str)
            start_index += count_per_request

            if start_index >= total_results:
                break

            time.sleep(0.5)

        print(f"✅ Berhasil mengambil total {len(publikasi_scopus_api)} publikasi dari Scopus API.")
        return publikasi_scopus_api

    except requests.exceptions.HTTPError as errh:
        print(f"❌ HTTP Error saat mengambil dari Scopus API: {errh}")
        status_code = errh.response.status_code
        if status_code == 401:
            print("🔑 Periksa API Key Anda. Mungkin tidak valid, salah, atau kedaluwarsa.")
        elif status_code == 404:
            print(f"🔍 Author ID {author_id} tidak ditemukan di Scopus, atau URL API salah.")
        elif status_code == 429:
            print("⚠️ Batas permintaan API terlampaui. Coba lagi nanti setelah beberapa waktu.")
        else:
            print(f"⚠️ Kode status HTTP: {status_code}")
            print(f"📝 Respons: {errh.response.text}")

    except requests.exceptions.ConnectionError as errc:
        print(f"❌ Error Koneksi saat mengambil dari Scopus API: {errc}. Periksa koneksi internet Anda.")
    except requests.exceptions.Timeout as errt:
        print(f"❌ Timeout Error saat mengambil dari Scopus API: {errt}. Permintaan terlalu lama.")
    except requests.exceptions.RequestException as err:
        print(f"❌ Error tidak terduga saat mengambil dari Scopus API: {err}.")
    except Exception as e:
        print(f"❌ Error umum saat memproses data Scopus API: {e}")
        import traceback
        traceback.print_exc()

    return []

## Penyimpanan Hasil Scraping ke CSV

In [39]:
import os
import csv
import json

OUTPUT_DIR = "output"  # Pastikan folder ini ada atau bisa dibuat otomatis

def simpan_data_ke_csv(data, nama_file="scopus_publikasi.csv", header_title="PUBLIKASI DARI SCOPUS API"):
    """
    Menyimpan data publikasi ke file CSV.
    """
    try:
        os.makedirs(OUTPUT_DIR, exist_ok=True)

        path_file = os.path.join(OUTPUT_DIR, nama_file)
        with open(path_file, 'w', newline='', encoding='utf-8') as file:
            writer = csv.writer(file)
            writer.writerow([header_title])

            if data:
                all_headers = set()
                for item in data:
                    all_headers.update(item.keys())

                ordered_headers = [
                    "No", "Judul", "Penulis", "Nama Jurnal", "Tahun",
                    "Sitasi", "DOI", "EID", "Link Dokumen", "Sumber"
                ]

                final_headers = [h for h in ordered_headers if h in all_headers]
                final_headers += sorted(list(all_headers - set(ordered_headers)))

                writer.writerow(final_headers)

                for i, item_data in enumerate(data, start=1):
                    item_data["No"] = i
                    row_data = [item_data.get(key, "N/A") for key in final_headers]
                    writer.writerow(row_data)
            else:
                writer.writerow(["Tidak ada publikasi ditemukan."])

        print(f"✅ Data berhasil disimpan ke '{path_file}'")
    except IOError as e:
        print(f"❌ Gagal menyimpan file CSV: {e}")
        import traceback
        traceback.print_exc()


def simpan_data_ke_json(data, nama_file="scopus_publikasi.json"):
    """
    Menyimpan data publikasi ke file JSON.
    """
    try:
        os.makedirs(OUTPUT_DIR, exist_ok=True)
        path_file = os.path.join(OUTPUT_DIR, nama_file)

        with open(path_file, 'w', encoding='utf-8') as file:
            json.dump(data, file, ensure_ascii=False, indent=4)

        print(f"✅ Data berhasil disimpan ke '{path_file}'")
    except IOError as e:
        print(f"❌ Gagal menyimpan file JSON: {e}")
        import traceback
        traceback.print_exc()

## Proses Scrapping

In [47]:
def main():

    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)

    publikasi_scopus = []
    try:
        publikasi_scopus = scrape_scopus_api(API_KEY_SCOPUS, AUTHOR_ID_SCOPUS)

        print(f"✨ Total Publikasi Ditemukan: {len(publikasi_scopus)}")

        simpan_data_ke_csv(publikasi_scopus, "scopus_publikasi.csv")
        simpan_data_ke_json(publikasi_scopus, "scopus_publikasi.json")

    except Exception as e:
        print(f"❌ Terjadi kesalahan umum selama proses scraping: {e}")
        import traceback
        traceback.print_exc()
    finally:
        print("✨ Proses scraping Scopus API selesai.")

if __name__ == "__main__":
 main()

✅ Berhasil mengambil total 59 publikasi dari Scopus API.
✨ Total Publikasi Ditemukan: 59
✅ Data berhasil disimpan ke 'output/scopus_publikasi.csv'
✅ Data berhasil disimpan ke 'output/scopus_publikasi.json'
✨ Proses scraping Scopus API selesai.


In [52]:
def main():

    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)

    publikasi_scopus = []
    try:
        publikasi_scopus = scrape_scopus_api(API_KEY_SCOPUS, AUTHOR_ID_SCOPUS)

        # ✅ Tampilkan 5 data pertama
        for i, pub in enumerate(publikasi_scopus[:5], start=1):
            print(f"\n📄 Publikasi #{i}")
            for k, v in pub.items():
                print(f"{k}: {v}")

    except Exception as e:
        print(f"❌ Terjadi kesalahan umum selama proses scraping: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    main()

✅ Berhasil mengambil total 59 publikasi dari Scopus API.

📄 Publikasi #1
Judul: Spatio-temporal Bayes model for estimating the number of hotspots as an indicator of forest and land fires in Kalimantan Island, Indonesia
Tahun: 2025
Sitasi: 0
Penulis: Rohimahastuti F.
Nama Jurnal: Journal of Agrometeorology
DOI: 10.54386/jam.v27i1.2761
EID: 2-s2.0-105000853549
Link Dokumen: https://api.elsevier.com/content/abstract/scopus_id/105000853549
Sumber: Scopus API

📄 Publikasi #2
Judul: Bias correction and ensemble techniques in statistical downscaling model for rainfall prediction using Tweedie-LASSO in West Java, Indonesia
Tahun: 2024
Sitasi: 0
Penulis: Dewanti D.
Nama Jurnal: Journal of Agrometeorology
DOI: 10.54386/jam.v26i3.2614
EID: 2-s2.0-85203587924
Link Dokumen: https://api.elsevier.com/content/abstract/scopus_id/85203587924
Sumber: Scopus API

📄 Publikasi #3
Judul: Rainfall modeling with CMIP6-DCPP outputs and local characteristic information using eigenvector spatial filtering varying

# **Google Scholar**

https://scholar.google.com/citations?user=fN-LWBEAAAAJ&hl=en

## Import Library

In [41]:
from serpapi import GoogleSearch
import pandas as pd
import time

## Konfigurasi Profil Target

In [42]:
api_key = "009b5604483aeddb85eed586ac32653b0d6aad8012beb3f73ced8cc5d7d5fca3"
author_id = "fN-LWBEAAAAJ"

## Proses Scrapping

In [43]:
all_articles = []
start = 0
max_retries = 3

while True:
    for attempt in range(max_retries):
        try:
            params = {
                "engine": "google_scholar_author",
                "author_id": author_id,
                "api_key": api_key,
                "hl": "id",
                "num": 100,
                "start": start
            }

            search = GoogleSearch(params)
            results = search.get_dict()

            articles = results.get("articles", [])
            if not articles:
                print("✅ Semua artikel berhasil diambil.")
                break

            for article in articles:
                authors = article.get("authors")
                if isinstance(authors, list):
                    penulis = ", ".join(authors)
                else:
                    penulis = authors if authors else "N/A"

                all_articles.append({
                    "Judul": article.get("title"),
                    "Link": article.get("link"),
                    "Tahun": article.get("year"),
                    "Penulis": penulis,
                    "Publikasi": article.get("publication", "N/A")
                })

            print(f"🔄 Halaman offset {start} - Artikel terkumpul: {len(all_articles)}")
            start += 100
            time.sleep(2)
            break

        except Exception as e:
            if attempt == max_retries - 1:
                print(f"❌ Gagal setelah {max_retries} percobaan: {str(e)}")
                break
            print(f"⚠ Percobaan {attempt + 1} gagal, mencoba lagi...")
            time.sleep(5)
    else:
        break

    if not articles:
        break

🔄 Halaman offset 0 - Artikel terkumpul: 100
🔄 Halaman offset 100 - Artikel terkumpul: 183
✅ Semua artikel berhasil diambil.


## Penyimpanan Hasil Scraping ke CSV

In [48]:
if all_articles:
    df = pd.DataFrame(all_articles)
    df = df.sort_values(by='Tahun', ascending=False)

    filename = "Google Scholar.csv"
    df.to_csv(filename, index=False, encoding='utf-8-sig')
    print(f"📁 Data disimpan ke '{filename}'")

else:
    print("❌ Tidak ada artikel yang berhasil diambil.")

📁 Data disimpan ke 'Google Scholar.csv'


## Contoh Data Tersimpan

In [49]:
df.head()

Unnamed: 0,Judul,Link,Tahun,Penulis,Publikasi
128,Spatial Clustering Regression in Identifying Local Factors in Stunting Cases in Indonesia,https://scholar.google.com/citations?view_op=view_citation&hl=id&user=fN-LWBEAAAAJ&cstart=100&pagesize=100&citation_for_view=fN-LWBEAAAAJ:Ug5p-4gJ2f0C,2025,"UA Syam, A Djuraidah, UD Syafitri","JTAM (Jurnal Teori dan Aplikasi Matematika) 9 (2), 600-614, 2025"
94,"Spatiotemporal Bayes model for estimating the number of hotspots as an indicator of forest and land fires in Kalimantan Island, Indonesia",https://scholar.google.com/citations?view_op=view_citation&hl=id&user=fN-LWBEAAAAJ&pagesize=100&citation_for_view=fN-LWBEAAAAJ:ruyezt5ZtCIC,2025,"F Rohimahastuti, A Djuraidah, H Wijayanto","Journal of Agrometeorology 27 (1), 27-32, 2025"
57,Bayesian conditional negative binomial autoregressive model: a case study of stunting on Java Island in 2021,https://scholar.google.com/citations?view_op=view_citation&hl=id&user=fN-LWBEAAAAJ&pagesize=100&citation_for_view=fN-LWBEAAAAJ:tuHXwOkdijsC,2024,"DJ Fitri, A Djuraidah, H Wijayanto","Commun. Math. Biol. Neurosci. 2024, Article ID 18, 2024"
130,Rainfall modeling with CMIP6-DCPP outputs and local characteristic information using eigenvector spatial filtering varying coefficient (ESF-VC),https://scholar.google.com/citations?view_op=view_citation&hl=id&user=fN-LWBEAAAAJ&cstart=100&pagesize=100&citation_for_view=fN-LWBEAAAAJ:SpbeaW3--B0C,2024,"D Al Mahkya, A Djuraidah, AH Wigena, B Sartono","Journal of Agrometeorology 26 (3), 311-317, 2024"
131,"Bias correction and ensemble techniques in statistical downscaling model for rainfall prediction using Tweedie-LASSO in West Java, Indonesia",https://scholar.google.com/citations?view_op=view_citation&hl=id&user=fN-LWBEAAAAJ&cstart=100&pagesize=100&citation_for_view=fN-LWBEAAAAJ:i2xiXl-TujoC,2024,"D Dewanti, A Djuraidah, B Sartono, A Sopaheluwakan","Journal of Agrometeorology 26 (3), 324-330, 2024"
