# üöÄ GSoC 2025: Metadata for Atomic Data in Carsus

This notebook demonstrates the first objective of the GSoC 2025 project proposal for the Carsus project:  
**Adding metadata to Carsus atomic data outputs.**

We simulate a Carsus-like `levels` table, attach metadata including units, git commit, DOI, and citation info, and export everything into a structured HDF5 file.


In [7]:
# üì¶ Install core dependencies (Colab)
!pip install git+https://github.com/tardis-sn/carsus.git
!pip install gitpython uncertainties

Collecting git+https://github.com/tardis-sn/carsus.git
  Cloning https://github.com/tardis-sn/carsus.git to /tmp/pip-req-build-ymixfldd
  Running command git clone --filter=blob:none --quiet https://github.com/tardis-sn/carsus.git /tmp/pip-req-build-ymixfldd
  Resolved https://github.com/tardis-sn/carsus.git to commit cb83ac00f6491e95a328388376536785461d2a1a
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: carsus
  Building wheel for carsus (pyproject.toml) ... [?25l[?25hdone
  Created wheel for carsus: filename=carsus-2024.12.24.dev6+gcb83ac0-py3-none-any.whl size=110970 sha256=8e757ca0900fc8d89037438734f59e0edec2bf4a3d37132cdbe5b1ce1485e9f3
  Stored in directory: /tmp/pip-ephem-wheel-cache-3r_hadnt/wheels/a8/54/5d/6802a102260901271b6ad0a47abccd52225ba9b44b8bfa8588
Successfully built carsus
Installing collected packages: ca

In [8]:
import pandas as pd
import subprocess
from datetime import datetime
import os
from pathlib import Path

In [2]:
# Simulate a Carsus-like levels DataFrame
levels_df = pd.DataFrame({
    "atomic_number": [1, 1, 2, 2],
    "ion_charge": [0, 0, 1, 1],
    "level_index": [0, 1, 0, 1],
    "energy": [0.0, 10.2, 0.0, 20.6],
    "j": [2, 8, 1, 3],
    "label": ["1s", "2s", "1s", "2s"],
    "method": ["meas"]*4,
    "priority": [10]*4
})
levels_df["reference"] = "Kurucz GFALL"
levels_df.head()


Unnamed: 0,atomic_number,ion_charge,level_index,energy,j,label,method,priority,reference
0,1,0,0,0.0,2,1s,meas,10,Kurucz GFALL
1,1,0,1,10.2,8,2s,meas,10,Kurucz GFALL
2,2,1,0,0.0,1,1s,meas,10,Kurucz GFALL
3,2,1,1,20.6,3,2s,meas,10,Kurucz GFALL


In [3]:
# Function to generate metadata DataFrame
def get_git_commit():
    try:
        return subprocess.check_output(["git", "rev-parse", "HEAD"]).decode().strip()
    except:
        return "unknown"

metadata_df = pd.DataFrame({
    "data_source": ["https://doi.org/10.1086/313149"],
    "units": ["eV"],
    "generated_on": [datetime.now().isoformat()],
    "git_commit": [get_git_commit()],
    "notes": ["Energy levels for H and He from Kurucz GFALL"]
})
metadata_df


Unnamed: 0,data_source,units,generated_on,git_commit,notes
0,https://doi.org/10.1086/313149,eV,2025-03-24T23:48:39.326032,unknown,Energy levels for H and He from Kurucz GFALL


In [4]:
# Citation table for A_ij and Œ•_ij
citation_df = pd.DataFrame({
    "Ref. A_ij": [
        "Bautista et al. (2015)",
        "Quinet (1996)",
        "Storey et al. (2016)",
        "Cassidy et al. (2016)",
        "Fivet et al. (2016)"
    ],
    "Ref. Œ•_ij": [
        "Bautista et al. (2015)",
        "Zhang (1996)",
        "Storey et al. (2016)",
        "Cassidy et al. (2010)",
        "Watts & Burke (1998)"
    ]
})
citation_df


Unnamed: 0,Ref. A_ij,Ref. Œ•_ij
0,Bautista et al. (2015),Bautista et al. (2015)
1,Quinet (1996),Zhang (1996)
2,Storey et al. (2016),Storey et al. (2016)
3,Cassidy et al. (2016),Cassidy et al. (2010)
4,Fivet et al. (2016),Watts & Burke (1998)


In [5]:
# Save levels, metadata, and citations to HDF5
output_path = "carsus_with_metadata.h5"

with pd.HDFStore(output_path) as store:
    store.put("levels", levels_df)
    store.put("levels_metadata", metadata_df)
    store.put("levels_citations", citation_df)

print(f"‚úÖ Data saved to {output_path}")


‚úÖ Data saved to carsus_with_metadata.h5


In [6]:
# Load and verify contents
with pd.HDFStore(output_path) as store:
    print("Available datasets:")
    print(store.keys())
    print("\nMetadata preview:")
    display(store["levels_metadata"])


Available datasets:
['/levels', '/levels_citations', '/levels_metadata']

Metadata preview:


Unnamed: 0,data_source,units,generated_on,git_commit,notes
0,https://doi.org/10.1086/313149,eV,2025-03-24T23:48:39.326032,unknown,Energy levels for H and He from Kurucz GFALL


## ‚úÖ Summary

This notebook demonstrates:
- A Carsus-style atomic `levels` table
- Embedded metadata with source, units, timestamp, git commit
- Citation references (A<sub>ij</sub>, Œ•<sub>ij</sub>)
- Exported HDF5 file with all content included

This fulfills the **first objective** for Carsus metadata integration.
