# Ariel Data Challenge 2024: Introductory model: inference

In this notebook, we compute test predictions using the model saved in [ADC24 Intro training](https://www.kaggle.com/code/ambrosm/adc24-intro-training).

<img width="700" src="https://www.ariel-datachallenge.space/static/images/transit_situation.png" />

This image has been taken from [last year's competition](https://www.ariel-datachallenge.space/ML/documentation/about). It shows how a planet transits in front of its star and how this transit maps to the lightcurve (a dip in the brightness of the star). This dip is directly proportional to the ratio of the areas of the planet and star. It's this ratio (the "transit depth") that we are modeling in the present notebook.

The present notebook is simple:
- It reads the pre- and postprocessing code, which is the same as the code used for training.
- It reads the test data.
- It reads the saved model.
- It executes the prediction pipeline and saves the submission file.

The real work was done in the [training notebook](https://www.kaggle.com/code/ambrosm/adc24-intro-training)!

In [None]:
import pandas as pd
import polars as pl
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
from tqdm import tqdm
import pickle

from sklearn.linear_model import Ridge


In [None]:
directory = "/kaggle/input/adc24-intro-training/"

exec(open(directory + 'f_read_and_preprocess.py', 'r').read())
exec(open(directory + 'a_read_and_preprocess.py', 'r').read())
exec(open(directory + 'feature_engineering.py', 'r').read())
exec(open(directory + 'postprocessing.py', 'r').read())


People have been asking how to choose a good value for sigma_pred. As explained in [Understanding the competition metric](https://www.kaggle.com/competitions/ariel-data-challenge-2024/discussion/528114), with sigma_pred we indicate what root mean squared error (rmse) we expect for our test predictions.

The training data cover planets of only two stars (stars 0 and 1), but the test data include planets of other stars.

This leads to the following recipe:
- For known stars (stars 0 and 1), we expect the test rmse to be equal to our cross-validation rmse, i.e. we predict the out-of-fold rmse of our model (0.000293 as shown in the training notebook).
- For unknown stars, the prediction error can only be higher. We thus predict a higher value (0.001 in this notebook).

In [None]:
# Load the data
wavelengths = pd.read_csv('/kaggle/input/ariel-data-challenge-2024/wavelengths.csv')
test_adc_info = pd.read_csv('/kaggle/input/ariel-data-challenge-2024/test_adc_info.csv',
                           index_col='planet_id')
f_raw_test = f_read_and_preprocess('test', test_adc_info, test_adc_info.index)
a_raw_test = a_read_and_preprocess('test', test_adc_info, test_adc_info.index)
test = feature_engineering(f_raw_test, a_raw_test)

# Load the model
with open(directory + 'model.pickle', 'rb') as f:
    model = pickle.load(f)
with open(directory + 'sigma_pred.pickle', 'rb') as f:
    sigma_pred = pickle.load(f)
    
# Predict
test_pred = model.predict(test)

# Package into submission file
sub_df = postprocessing(test_pred,
                        test_adc_info.index,
                        sigma_pred=np.tile(np.where(test_adc_info[['star']] <= 1, sigma_pred, 0.001), (1, 283)))
display(sub_df)
sub_df.to_csv('submission.csv')
#!head submission.csv