# Ariel Data Challenge 2024: Introductory model: inference

In this notebook, we compute test predictions using the model saved in [ADC24 Intro training](https://www.kaggle.com/code/ambrosm/adc24-intro-training).

<img width="700" src="https://www.ariel-datachallenge.space/static/images/transit_situation.png" />

This image has been taken from [last year's competition](https://www.ariel-datachallenge.space/ML/documentation/about). It shows how a planet transits in front of its star and how this transit maps to the lightcurve (a dip in the brightness of the star). This dip is directly proportional to the ratio of the areas of the planet and star. It's this ratio (the "transit depth") that we are modeling in the present notebook.

The present notebook is simple:
- It reads the pre- and postprocessing code, which is the same as the code used for training.
- It reads the test data.
- It reads the saved model.
- It executes the prediction pipeline and saves the submission file.

The real work was done in the [training notebook](https://www.kaggle.com/code/ambrosm/adc24-intro-training)!

In [1]:
import pandas as pd
import polars as pl
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
from tqdm import tqdm
import pickle

from sklearn.linear_model import Ridge


In [2]:
directory = "/kaggle/input/adc24-intro-training/"

exec(open(directory + 'f_read_and_preprocess.py', 'r').read())
exec(open(directory + 'a_read_and_preprocess.py', 'r').read())
exec(open(directory + 'feature_engineering.py', 'r').read())
exec(open(directory + 'postprocessing.py', 'r').read())


In [3]:
# Load the data
wavelengths = pd.read_csv('/kaggle/input/ariel-data-challenge-2024/wavelengths.csv')
test_adc_info = pd.read_csv('/kaggle/input/ariel-data-challenge-2024/test_adc_info.csv',
                           index_col='planet_id')
sample_submission = pd.read_csv('/kaggle/input/ariel-data-challenge-2024/sample_submission.csv',
                                index_col='planet_id')
f_raw_test = f_read_and_preprocess('test', test_adc_info, sample_submission.index)
a_raw_test = a_read_and_preprocess('test', test_adc_info, sample_submission.index)
test = feature_engineering(f_raw_test, a_raw_test)

# Load the model
with open(directory + 'model.pickle', 'rb') as f:
    model = pickle.load(f)
with open(directory + 'sigma_pred.pickle', 'rb') as f:
    sigma_pred = pickle.load(f)
    
# Predict
test_pred = model.predict(test)

# Package into submission file
sub_df = postprocessing(test_pred, sample_submission.index, sigma_pred=0.00029334213751992073)
display(sub_df)
sub_df.to_csv('submission.csv')
#!head submission.csv

100%|██████████| 1/1 [00:01<00:00,  1.31s/it]
100%|██████████| 1/1 [00:01<00:00,  1.57s/it]


Unnamed: 0_level_0,wl_1,wl_2,wl_3,wl_4,wl_5,wl_6,wl_7,wl_8,wl_9,wl_10,...,sigma_274,sigma_275,sigma_276,sigma_277,sigma_278,sigma_279,sigma_280,sigma_281,sigma_282,sigma_283
planet_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
499191466,0.002819,0.002847,0.002841,0.002836,0.002839,0.002831,0.00283,0.002838,0.002835,0.002831,...,0.000293,0.000293,0.000293,0.000293,0.000293,0.000293,0.000293,0.000293,0.000293,0.000293
