# Black Hole Evolution Dataset (TNG100)

This notebook builds a dataset for modeling the evolution of black holes using the IllustrisTNG100 simulation (snapshots 18–33). The dataset will be used to train an LSTM to predict future black hole properties based on their past evolution.



## 1. Environment Setup
---
Import core libraries and configure reproducibility settings.


In [4]:
import requests
import numpy as np
import torch
import random

random.seed(42)  # Ensures reproducible random sampling later

print(f"NumPy version: {np.__version__}")
print(f"PyTorch version: {torch.__version__}")


NumPy version: 1.24.3
PyTorch version: 2.0.1+cpu


## 2. Data Access & Preprocessing
---


#### 2.1 Load Subhalo Catalog (Snapshot 33)
---
We begin by selecting black-hole-hosting subhalos at snapshot 33 (z ≈ 0, present day).

In [13]:
import illustris_python as il

basePath = "/home/tnguser/sims.TNG/TNG100-1/output" # Adjust based on Environment

subhalos = il.groupcat.loadSubhalos(
    basePath, 
    33, 
    fields=['SubhaloBHMass', 'SubhaloMassType']
)

bh_mass = subhalos['SubhaloBHMass']
stellar_mass = subhalos['SubhaloMassType'][:, 4]  # Type 4 = stellar component

bh_mask = bh_mass > 0
print(f"Total subhalos with black holes: {bh_mask.sum()}")


Total subhalos with black holes: 29415


#### 2.2 Trace Black Holes with Complete Histories
---
Trace all subhalos with black holes backward through snapshots 32 → 18. Only systems with complete histories across all required snapshots are kept for building consistent temporal sequences.

In [None]:
tree_base = f"{basePath}/postprocessing/trees/sublink"

full_histories = {}

for i, sub_id in enumerate([i for i, has_bh in enumerate(bh_mask) if has_bh]):
    try:
        tree = il.sublink.loadTree(
            basePath,
            33,
            sub_id,
            fields=['SubhaloID', 'SnapNum'],
            onlyMPB=True
        )
        mask = (tree['SnapNum'] <= 32) & (tree['SnapNum'] >= 18)
        
        # Keep only if all snapshots 32 → 18 are present
        snaps = tree['SnapNum'][mask]
        if set(range(18, 33)).issubset(set(snaps)):
            sorted_idx = np.argsort(snaps)
            full_histories[sub_id] = tree['SubhaloID'][mask][sorted_idx]
    except:
        continue

print(f"Black holes with complete histories: {len(full_histories)}")
