# BSP Download
The purpose of this notebook is to
1. Cell 1: Make a cURL script ("a.txt" and "c.txt" for asteroids and comets, respectively) to download millions of BSP files from JPL Horizons.
    * This script is made based on SBDB query results.
    * The query script (txt file) will be about 300 MB in total (!)
2. Then the user should run the script on the terminal using `curl`
    * This will try downloading millions of files, one by one, and will take ~1-2 weeks.
3. Cell 1 (again): After doing that, you will realize some downloads were problematic. Run the cell (used in step #1) again.
    * This will remove erroneous BSP files and generate a smaller shell script that can be used for downloading only those erroneous cases.
4. Cell 2: The curl-downloaded files are in TXT format. We must convert them to BSP format.
    * Run the cell, and it will do (1) convert TXT to BSP and save in a different directory && (2) zip the TXT for archive purposes (the user may delete these zip files or even skip zipping if they are not needed)
    * Total ~ 5 hours on my laptop (MBP 14" [2021, macOS 13.6.4, M1Pro(6P+2E/G16c/N16c/32G)])


## Preparation

The original suggestion, for the case of (65803) Didymos:

``curl -s "https://ssd.jpl.nasa.gov/api/horizons.api?format=text&COMMAND='65803%3B'&EPHEM_TYPE=SPK&START_TIME='2025-01-01'&STOP_TIME='2028-01-01'&OBJ_DATA=YES" | awk '/REFGL1NQ/,0' | base64 --decode > a65803.bsp``

In [3]:
# Cell 0
import base64
import zipfile
from pathlib import Path

import pandas as pd


def iterator(it):
    try:
        from tqdm import tqdm
        return tqdm(it)
    except ImportError:
        return it


scripts = Path("_query_scripts")
dir_bsp = dict(c=Path("spkbsp/c"), a=Path("spkbsp/a"))
dir_txt = dict(c=Path("_spktxts/c"), a=Path("_spktxts/a"))
scripts.mkdir(exist_ok=True)
[dd.mkdir(exist_ok=True, parents=True) for dic in [dir_bsp, dir_txt] for dd in dic.values()]

dtypes = {
    "spkid": int, "full_name": str, "soln_date": str, "condition_code": str,
}

dfs = dict(
    c=pd.read_parquet("sbdb_c_2024-08-02.parq", columns=dtypes.keys()),
    a=pd.read_parquet("sbdb_a_2024-08-02.parq", columns=dtypes.keys()),
)

## Shell Script (cURL) for Downloading

https://stackoverflow.com/questions/71244217/how-to-use-curl-z-parallel-effectively

``{a/c}_all.txt`` are the shell scripts for all 1.3+M objects (just for recording+debugging... you may delete it), while ``{a/c}.txt`` are those for further downloads (e.g., when download stopped in the middle).

Run the cell below whenever a new download is needed (or you want to be sure about broken files)

In [4]:
# Cell 1
def write_script(spkids, name, fpath, skip_existing=True):
    if skip_existing and fpath.exists():
        return
    with open(fpath, "w") as ff:
        for spkid in spkids:
            _write_script(spkid, name, ff)


def _write_script(spkid, name, filehandle):
    suffix = "%3BCAP" if name.startswith("c") else ""
    filehandle.write(
        "url=https://ssd.jpl.nasa.gov/api/horizons.api?format=text&COMMAND='"
        + f"DES={spkid}{suffix}%3B'&EPHEM_TYPE=SPK&"
        + "START_TIME='2025-01-01'&STOP_TIME='2028-01-01'&OBJ_DATA=NO\n"
    )
    output = f"./_spktxts/{name}/spk{spkid}.txt"
    filehandle.write(f"output=\"{output}\"\n")


def check_and_get_spkid(fpath):
    if (size := fpath.stat().st_size) > 1000:  # > 1 kB
        return int(fpath.stem[3:])
    else:
        fpath.unlink()
        print(f"Deleted {fpath} because its size is {size/1000:.1f} kB < 50 kB")


for name, folder in dir_bsp.items():
    spkid_already_downloaded = {check_and_get_spkid(fpath) for fpath in folder.glob("spk*.[tzb][xis][tp]")}

    print(f"{name}: {len(spkid_already_downloaded)} already downloaded")

    fpath_script_all = scripts/f"{name}_all.txt"
    fpath_script_part = scripts/f"{name}.txt"
    spkid_all = dfs[name]["spkid"]

    if len(spkid_already_downloaded) == 0:  # Fresh download
        print(f"You need to download all {len(spkid_all)} {name} spk files")
        write_script(spkid_all, name, fpath_script_all, skip_existing=True)
        from shutil import copy
        copy(fpath_script_all, fpath_script_part)

    elif len(spkid_already_downloaded) == len(spkid_all):  # No need to download...
        print(f"All {len(spkid_all)} {name} already downloaded")
        continue

    else:  # len(spkid_already_downloaded) > 0
        spkid2download = set(spkid_all) - spkid_already_downloaded
        print(f"You need to download {len(spkid2download)} {name} spk files")
        write_script(spkid_all, name, fpath_script_all, skip_existing=True)
        write_script(spkid2download, name, fpath_script_part, skip_existing=False)

c: 0 already downloaded
You need to download all 3958 c spk files
a: 0 already downloaded
You need to download all 1386302 a spk files


After the code above, when you have further ``spk`` files to download, run

    curl --parallel --parallel-immediate --parallel-max 2 --config _query_scripts/c.txt
    curl --parallel --parallel-immediate --parallel-max 2 --config _query_scripts/a.txt

## Base64 decoding
Below, base64 decoding, Zipping, and deleting took ~10ms/file (~ 300 min for all 1.3+M files) on the MBP 14" [2021, macOS 13.1, M1Pro(6P+2E/G16c/N16c/32G)].

* **TIP**: You can run the next cell multiple times while ``curl`` is still downloading. (I did not take the effort to make this code "monitor" the directory in real time...)
* **TIP**: After all downloaded `.txt`'s are converted to `.bsp`, run the previous cell & `curl` on shell again to download erroneous files.

In [6]:
# Cell 2
def decompress_and_delete(fpath, delete=True):
    with zipfile.ZipFile(fpath, "r") as z:  # ~ 0.5ms / file
        z.extractall(fpath.parent)
    if delete:
        fpath.unlink()


def compress_and_delete(fpath, delete=True):  # << 1 ms to 10+ ms / file ???
    with zipfile.ZipFile(f"{fpath.parent}/{fpath.stem}.zip", "w", zipfile.ZIP_DEFLATED, compresslevel=9) as z:
        z.write(fpath, arcname=f"{fpath.stem}.txt")
    if delete:
        fpath.unlink()


def save_b64decode(fpath, output, compress=True, delete=True):
    with open(fpath, "r") as ff:
        txt = ff.read()

    try:
        with open(output, "wb") as ff:
            ff.write(base64.b64decode("REFGL1NQ" + txt.split("REFGL1NQ", 1)[1]))
        if compress:
            compress_and_delete(fpath, delete=delete)
    except:  # No name for the error.. it is just ``Error: Incorrect padding``
        print(f"Error with {fpath}")
        output.unlink()


for name, folder in dir_txt.items():
    # for fpath in folder.glob("*.zip"):
    #     try:
    #         decompress_and_delete(fpath, delete=True)
    #     except zipfile.BadZipFile:
    #         continue
    for fpath in folder.glob("*.txt"):
        save_b64decode(fpath, dir_bsp[name]/f"{fpath.stem}.bsp", compress=True, delete=True)

Error with _spktxts/c/spk1003228.txt
Error with _spktxts/c/spk1003776.txt
Error with _spktxts/a/spk20101955.txt
Error with _spktxts/a/spk20134340.txt
