# Kelompok 3 Big Data Platform

Anggota:
- Mochamad Zidan Rusdhiana (2305464)
- Muhammad Daffa Ma'arif (2305771)
- Yusrilia Hidayanti (2306828)  
- Ismail Fatih Raihan (2307840)  
- Hafsah Hamidah (2311474)  
- Yahyo Abdullozoda (2313368)  

## Latar Belakang

Dalam analisis pasar keuangan, data saham real-time dan historis sangat penting untuk pengambilan keputusan. yfinance menyediakan data saham dalam format JSON atau DataFrame melalui API, memungkinkan akses ke harga saham, laporan keuangan, dan indikator pasar. Proses ingestion dilakukan dengan memanggil API, kemudian data diformat dan disimpan dalam database seperti PostgreSQL untuk analisis struktural atau MongoDB untuk penyimpanan fleksibel. Dengan penyimpanan yang terstruktur, data ini dapat digunakan untuk analisis tren, prediksi harga, dan integrasi dengan sumber data lainnya.

<hr>

## Ingest Data Saham dan Data Historis Emiten di Indonesia menggunakan library yfinance

Alur Kerja Ingest
1. Koneksi ke Sumber Data
    - Gunakan pustaka yfinance untuk mengakses data saham.
    - Tentukan daftar simbol saham yang akan diambil.
2. Pengambilan Data (Fetching Data)
    - Gunakan yfinance.Ticker() untuk mendapatkan informasi saham (info).
    - Ambil data historis harga saham dengan history(period="1y") untuk mendapatkan data selama 1 tahun terakhir.
    - Pastikan data berhasil diambil sebelum melanjutkan ke tahap berikutnya.
3. Transformasi Data
    - Konversi data historis dari DataFrame menjadi list of dictionaries agar kompatibel dengan penyimpanan di database.
    - Pastikan data memiliki format yang seragam untuk mempermudah analisis lebih lanjut.

In [2]:
import yfinance as yf

stocks = [
    # BANKING SECTOR
    "BBRI.JK", "BMRI.JK", "BBCA.JK", "BBNI.JK", "BJBR.JK", "BJTM.JK", "AGRO.JK", "BBKP.JK", "BDMN.JK", 
    "NISP.JK", "PNBN.JK", "BCIC.JK", "MAYA.JK", "ARTO.JK", "BTPS.JK", "AMAR.JK", "NOBU.JK", "BINA.JK",
    "BSIM.JK", "BNGA.JK", "BGTG.JK", "BNLI.JK", "BABP.JK", "BBYB.JK", "BEKS.JK", "BMAS.JK", "BOSS.JK",

    # ENERGY SECTOR
    "ADRO.JK", "ITMG.JK", "PTBA.JK", "MEDC.JK", "PGAS.JK", "ELSA.JK", "AKRA.JK", "HRUM.JK", "INDY.JK",
    "MBAP.JK", "BIPI.JK", "DOID.JK", "ENRG.JK", "RAJA.JK", "GEMS.JK", "TPIA.JK", "BRPT.JK", "ESSA.JK",
    "PSAB.JK", "SMMT.JK", "BREN.JK", "TOBA.JK", "CNKO.JK", "MARI.JK", "BUMI.JK", "SUGI.JK", "DWGL.JK",

    # CONSUMER GOODS
    "UNVR.JK", "ICBP.JK", "INDF.JK", "MYOR.JK", "SIDO.JK", "KAEF.JK", "PEHA.JK", "KLBF.JK", "GOOD.JK",
    "DMND.JK", "KINO.JK", "ALTO.JK", "AISA.JK", "HOKI.JK", "CLEO.JK", "ULTJ.JK", "ADES.JK", "ROTI.JK",
    "FOOD.JK", "CMRY.JK", "SAPX.JK", "STTP.JK", "MRAT.JK", "FITT.JK", "NAYZ.JK", "TSPC.JK", "CBMF.JK",

    # INFRASTRUCTURE & TELECOMMUNICATION
    "TLKM.JK", "EXCL.JK", "ISAT.JK", "FREN.JK", "TOWR.JK", "SUPR.JK", "JSMR.JK", "CMNP.JK", "WIKA.JK",
    "PTPP.JK", "ADHI.JK", "WSKT.JK", "WTON.JK", "KRAS.JK", "SMGR.JK", "INTP.JK", "LPKR.JK", "PWON.JK",
    "NRCA.JK", "TOTL.JK", "ACST.JK", "MTLA.JK", "PPRO.JK", "KIJA.JK", "JRPT.JK", "CTRA.JK", "DMAS.JK",

    # PROPERTY & REAL ESTATE
    "BSDE.JK", "CTRA.JK", "SMRA.JK", "ASRI.JK", "DMAS.JK", "COWL.JK", "OMRE.JK", "KIJA.JK", "MTLA.JK",
    "JRPT.JK", "PPRO.JK", "NIRO.JK", "BKSL.JK", "URBN.JK", "PLIN.JK", "MKPI.JK", "PWON.JK", "LMAS.JK",
    "LAMI.JK", "DILD.JK", "ELTY.JK", "RDTX.JK", "TARA.JK", "LPCK.JK", "MDLN.JK", "SIPD.JK", "JGLE.JK",

    # TRANSPORTATION & LOGISTICS
    "GIAA.JK", "ASSA.JK", "SMDR.JK", "TMAS.JK", "HITS.JK", "SAFE.JK", "JSMR.JK", "MCOL.JK", "WEHA.JK",
    "BIRD.JK", "SAPX.JK", "BLTA.JK", "MBTO.JK", "IPCC.JK", "DEPO.JK", "NELY.JK", "CMPP.JK", "JAYA.JK",
    "ADES.JK", "TRUK.JK", "WEHA.JK", "LION.JK", "TAXI.JK", "JAYA.JK",

    # METAL & MINING
    "ANTM.JK", "INCO.JK", "MDKA.JK", "PSAB.JK", "TINS.JK", "HRTA.JK", "ZINC.JK", "DKFT.JK", "BOSS.JK",
    "KKGI.JK", "BIPI.JK", "TOBA.JK", "GEMS.JK", "DOID.JK", "MBAP.JK", "CITA.JK", "NICL.JK",
    "NIKL.JK", "SMRU.JK", "PTRO.JK", "SPTO.JK", "MAMI.JK", "LSIP.JK", "MGRO.JK", "TAPG.JK", "STAA.JK",

    # RETAIL & DISTRIBUTION
    "ACES.JK", "MAPI.JK", "RALS.JK", "LPPF.JK", "MPPA.JK", "AMRT.JK", "ERAA.JK", "CSAP.JK", "PRDA.JK",
    "TELE.JK", "KIOS.JK", "DIVA.JK", "DIGI.JK", "NFCX.JK", "GLOB.JK", "MCAS.JK", "TFAS.JK", "MDKA.JK",
    "DNET.JK", "PDES.JK", "MIDI.JK", "MAPA.JK", "OPMS.JK", "BATA.JK", "UNVR.JK", "INAF.JK",

    # FINTECH & TECHNOLOGY
    "GOTO.JK", "BUKA.JK", "DCII.JK", "EDGE.JK", "MTDL.JK", "SKYB.JK", "ALDO.JK", "YELO.JK", "MPXL.JK",
    "MCAS.JK", "BALI.JK", "TFAS.JK", "DIVA.JK", "NFCX.JK", "WIFI.JK", "DMMX.JK", "BIRD.JK", "DEPO.JK",
    "SLIS.JK", "TRIO.JK", "TECH.JK", "CLAY.JK", "LUCY.JK", "AGII.JK", "AXIO.JK",

    # MANUFACTURING
    "SMSM.JK", "AUTO.JK", "ASII.JK", "GJTL.JK", "IMAS.JK", "SPMA.JK", "INDS.JK", "PRAS.JK", "SSTM.JK",
    "INDR.JK", "ICBP.JK", "INDF.JK", "PICO.JK", "LTLS.JK", "INTA.JK", "MASA.JK", "DUTI.JK", "BTON.JK",
    "DKFT.JK", "TRST.JK", "STTP.JK", "ALMI.JK", "CASS.JK", "AGRO.JK", "BTON.JK", "GGRP.JK", "BRNA.JK",
]

# Dictionary untuk menyimpan data saham
stock_data = {}

success_list = []
failed_list = {}

for stock in stocks:
    try:
        ticker = yf.Ticker(stock)
        info = ticker.info
        history = ticker.history(period="1y")

        # Konversi DataFrame ke format JSON-friendly
        history.reset_index(inplace=True)
        history["Date"] = history["Date"].astype(str)  # Ubah Timestamp ke String

        # Buat dokumen terpisah untuk MongoDB
        stock_doc = {
            "_id": stock,  # Gunakan kode saham sebagai ID unik
            "info": info,
            "history": history.to_dict(orient="records")  # List format
        }
        stock_data[stock] = stock_doc
        
        success_list.append(stock)
    except Exception as e:
        failed_list[stock] = str(e)

# Print hasil setelah semua proses selesai
print("\nData berhasil diambil untuk:")
print(", ".join(success_list))

if failed_list:
    print("\nData gagal diambil untuk:")
    for stock, error in failed_list.items():
        print(f"- {stock}: {error}")

print("\nPengambilan data selesai!")

$LAMI.JK: possibly delisted; no price data found  (period=1y)

Data berhasil diambil untuk:
BBRI.JK, BMRI.JK, BBCA.JK, BBNI.JK, BJBR.JK, BJTM.JK, AGRO.JK, BBKP.JK, BDMN.JK, NISP.JK, PNBN.JK, BCIC.JK, MAYA.JK, ARTO.JK, BTPS.JK, AMAR.JK, NOBU.JK, BINA.JK, BSIM.JK, BNGA.JK, BGTG.JK, BNLI.JK, BABP.JK, BBYB.JK, BEKS.JK, BMAS.JK, BOSS.JK, ADRO.JK, ITMG.JK, PTBA.JK, MEDC.JK, PGAS.JK, ELSA.JK, AKRA.JK, HRUM.JK, INDY.JK, MBAP.JK, BIPI.JK, DOID.JK, ENRG.JK, RAJA.JK, GEMS.JK, TPIA.JK, BRPT.JK, ESSA.JK, PSAB.JK, SMMT.JK, BREN.JK, TOBA.JK, CNKO.JK, MARI.JK, BUMI.JK, SUGI.JK, DWGL.JK, UNVR.JK, ICBP.JK, INDF.JK, MYOR.JK, SIDO.JK, KAEF.JK, PEHA.JK, KLBF.JK, GOOD.JK, DMND.JK, KINO.JK, ALTO.JK, AISA.JK, HOKI.JK, CLEO.JK, ULTJ.JK, ADES.JK, ROTI.JK, FOOD.JK, CMRY.JK, SAPX.JK, STTP.JK, MRAT.JK, FITT.JK, NAYZ.JK, TSPC.JK, CBMF.JK, TLKM.JK, EXCL.JK, ISAT.JK, FREN.JK, TOWR.JK, SUPR.JK, JSMR.JK, CMNP.JK, WIKA.JK, PTPP.JK, ADHI.JK, WSKT.JK, WTON.JK, KRAS.JK, SMGR.JK, INTP.JK, LPKR.JK, PWON.JK, NRCA.JK, TOTL.JK,

In [3]:
import pandas as pd
# Preview info saham BBRI (Bank Rakyat Indonesia)

stock_data["BBRI.JK"]["info"]

{'address1': 'Gedung BRI',
 'address2': 'Jalan Jenderal Sudirman Kav.44-46 Tromol Pos 1094/1000',
 'city': 'Jakarta',
 'zip': '10210',
 'country': 'Indonesia',
 'phone': '62 21 251 0244',
 'fax': '62 21 250 0077',
 'website': 'https://bri.co.id',
 'industry': 'Banks - Regional',
 'industryKey': 'banks-regional',
 'industryDisp': 'Banks - Regional',
 'sector': 'Financial Services',
 'sectorKey': 'financial-services',
 'sectorDisp': 'Financial Services',
 'longBusinessSummary': 'PT Bank Rakyat Indonesia (Persero) Tbk provides various banking products and services in Indonesia and internationally. The company offers savings and current accounts; time, foreign currency time, and on call deposits; mortgage, working capital, investment, franchise, and cash collateral loans, as well as supply and value chain financing and bank guarantees; and micro, small and medium, and program loans. It also provides bill payment, deposit, online transaction, remittance, money transfer, business, financial,

In [4]:
# Preview data historis BBRI (Bank Mandiri)

history_df = pd.DataFrame(stock_data["BBRI.JK"]["history"])
history_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2024-03-06 00:00:00+07:00,5687.143626,5780.375488,5687.143626,5780.375488,84108800,0.0,0.0
1,2024-03-07 00:00:00+07:00,5780.375628,5850.299526,5780.375628,5803.683594,117724700,0.0,0.0
2,2024-03-08 00:00:00+07:00,5850.299249,5966.839075,5826.991284,5920.223145,163060000,0.0,0.0
3,2024-03-13 00:00:00+07:00,5920.223423,6013.455288,5920.223423,5966.839355,195173100,0.0,0.0
4,2024-03-14 00:00:00+07:00,5976.517717,6000.714145,5855.535577,5952.321289,271254000,235.0,0.0


In [5]:
# Preview info saham BMRI (Bank Rakyat Indonesia)
stock_data["BMRI.JK"]["info"]

{'address1': 'Jalan Jenderal Gatot Subroto Kav. 36-38',
 'city': 'Jakarta',
 'zip': '12190',
 'country': 'Indonesia',
 'phone': '62 21 5299 7777',
 'fax': '62 21 5299 7735',
 'website': 'https://bankmandiri.co.id',
 'industry': 'Banks - Regional',
 'industryKey': 'banks-regional',
 'industryDisp': 'Banks - Regional',
 'sector': 'Financial Services',
 'sectorKey': 'financial-services',
 'sectorDisp': 'Financial Services',
 'longBusinessSummary': 'PT Bank Mandiri (Persero) Tbk provides various banking products and services to individuals and businesses in Indonesia, Singapore, Hong Kong, Timor Leste, Shanghai, Malaysia, England, and the Cayman Islands. It offers savings and current accounts, multicurrency, payroll, NOW, and foreign currency savings accounts; personal, mortgage, micro enterprise, small and medium enterprises, working capital, and investment loans; corporate, credit, and debit cards; digital banking; and e-banking services. The company also provides life, health, and accid

In [6]:
# Preview data historis BMRI (Bank Mandiri)
history_df = pd.DataFrame(stock_data["BBRI.JK"]["history"])
history_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2024-03-06 00:00:00+07:00,5687.143626,5780.375488,5687.143626,5780.375488,84108800,0.0,0.0
1,2024-03-07 00:00:00+07:00,5780.375628,5850.299526,5780.375628,5803.683594,117724700,0.0,0.0
2,2024-03-08 00:00:00+07:00,5850.299249,5966.839075,5826.991284,5920.223145,163060000,0.0,0.0
3,2024-03-13 00:00:00+07:00,5920.223423,6013.455288,5920.223423,5966.839355,195173100,0.0,0.0
4,2024-03-14 00:00:00+07:00,5976.517717,6000.714145,5855.535577,5952.321289,271254000,235.0,0.0


### Menyimapan Data Frame ke dalam file JSON

In [7]:
import json

# Fixing the issue: We should ensure history is a DataFrame before invoking reset_index()
for stock in stock_data:
    # Check if it is a DataFrame
    if isinstance(stock_data[stock]["history"], pd.DataFrame):
        stock_data[stock]["history"] = stock_data[stock]["history"].reset_index().to_dict(orient="records")
    else:
        # If it is already a dictionary (list of records), keep it as is
        stock_data[stock]["history"] = stock_data[stock]["history"]

# Save the corrected data to a JSON file
with open("stock_yfinance.json", "w", encoding="utf-8") as json_file:
    json.dump(stock_data, json_file, ensure_ascii=False, indent=4)

print("\nData telah disimpan dalam file stock_yfinance.json")


Data telah disimpan dalam file stock_yfinance.json


<hr>

In [8]:
import json
import pandas as pd

# List untuk menyimpan emiten sebagai array of objects
stock_list = []

for stock, data in stock_data.items():
    stock_entry = {
        "info": data["info"],
        "history": data["history"] if isinstance(data["history"], list) 
                  else data["history"].reset_index().to_dict(orient="records")
    }
    stock_list.append(stock_entry)

# Simpan dalam file JSON
with open("stock_yfinance.json", "w", encoding="utf-8") as json_file:
    json.dump(stock_list, json_file, ensure_ascii=False, indent=4)

print("\nData telah disimpan dalam file stock_yfinance.json dalam format array of objects.")


Data telah disimpan dalam file stock_yfinance.json dalam format array of objects.


## Penyimpanan data Ingestion ke MongoDB


Alur Kerja Penyimpanan
1. Koneksi ke MongoDB  
   - Gunakan pustaka `pymongo` untuk menghubungkan Python dengan MongoDB.  
   - Tentukan database (`big_data_platform`) dan koleksi (`stocks`) tempat data akan disimpan.  
2. Pengambilan Data dari yFinance  
   - Gunakan `yfinance.Ticker()` untuk mendapatkan informasi saham (`info`) dan data historis (`history`).  
   - Data historis dalam bentuk DataFrame dikonversi ke list of dictionaries agar kompatibel dengan MongoDB.  
3. Transformasi Data  
   - Format data yang akan disimpan mencakup symbol, info, dan history.  
   - Pastikan setiap saham memiliki struktur data yang sama untuk kemudahan analisis.  
4. Penyimpanan ke MongoDB 
   - Data dimasukkan ke koleksi MongoDB menggunakan `insert_one()`.  
   - Jika ada banyak data, gunakan `insert_many()` untuk efisiensi. 
5. Error Handling  
   - Tangkap kemungkinan error saat mengambil atau menyimpan data.  
   - Catat saham yang gagal agar bisa dicoba ulang atau dianalisis penyebabnya.  
6. Verifikasi Penyimpanan  
   - Gunakan perintah MongoDB seperti `db.stocks.findOne()` untuk memastikan data tersimpan dengan benar.  
   - Cek menggunakan MongoDB Compass atau Mongo Shell untuk melihat struktur data.  

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=07965d0a-800f-4d85-aac5-2d21187ba25a' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>