PLAsTiCC Astronomical Classification 3rd-place solution

Overview of solution

https://www.kaggle.com/c/PLAsTiCC-2018/discussion/75131

environment

I used pyenv virtualenv to set up the environment. I think catboost==0.10.4.1 is important, but the versions of the other libraries won't affect the score so much.

$ pyenv install 3.5.1
$ pyenv virtualenv 3.5.1 plasticc
$ pyenv activate plasticc
$ pip install --upgrade pip
$ pip install cython==0.27.3
$ pip install numpy==1.13.0
$ pip install PyYAML==3.12
$ pip install -r requirements.txt

I used n1-standard-64 in Google Cloud Engine, which has 240GB RAM and 64 CPUs.
OS/Platform : Ubuntu 16.04

datasets & result files

I will upload prepare.zip for the host, which contains these directories.

buckets:
It contains nyanp's train & test features.
data:
It contains kaggle datasets. you can also download them via

kaggle competitions download -c PLAsTiCC-2018

features:
It contains all of my train & test features.
fi:
It contains feature names and the number of rounds used for training.
- exp_*.npy
  numpy array that contains feature names.
- exp_*rounds.pkl
  pickle object that contains the number of rounds.
- whole_fn_s.npy
  numpy array that contains all feature names.
- mamas_feature_names_*.npy
  the names of features that yuval used.
models:
It contains trained models.
- exp*.cbm
  trained catboost model.
others:
It contains class weights.
- W.npy
  numpy array that contains class weights.
sub:
It contains submission files
- experiment57_59(th985)_61_62.csv
  nyanp's averaged submission file.
- pred*.csv
  yuval's submission file.

scripts

utils.py :
It contains utility functions.
preprocess_*.py :
I did easy preprocessing here, like converting .csv files into .feather files.
save_features_train_*.py :
I saved test features here.
save_features_test_*.py :
I saved train features here.
save_features_nyanp.py :
I saved nyanp's train & test features here.
train.py :
I trained models here.
predict.py :
I made predictions here.
postprocess.py :
I did postprocessing like ensembling and class99 handling here.

usage

full version :
It will take a few months to run with a single machine (64 core, 240GB RAM).
I never recommend you to run it.

cd mamas/
unzip prepare.zip
cp -r prepare/* .
rm features/*
rm models/*
cd ../scripts
python preprocess_01.py
python preprocess_02.py
python save_features_train_01.py
python save_features_train_02.py
python save_features_train_03.py
python save_features_train_04.py
python save_features_train_05.py
python save_features_train_06.py
python save_features_test_01.py
python save_features_test_02.py
python save_features_test_03.py
python save_features_test_04.py
python save_features_test_05.py
python save_features_test_06.py
python save_features_nyanp.py
python save_features_for_yuval.py
python train.py
python predict.py
python postprocess.py

Then, mamas/sub/host_sub.csv.gz will be generated.

short version :
It's a short version, which will take about 4 hours. I use extracted features and trained model here.

cd mamas/
unzip prepare.zip
cp -r prepare/* .
cd scripts
python preprocess_01.py
python preprocess_02.py
python predict.py
python postprocess.py

Then, mamas/sub/host_sub.csv.gz will be generated.
It should score 0.680 on public LB, 0.700 on private LB.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
buckets		buckets
curve		curve
data		data
fe_extract		fe_extract
features		features
fi		fi
models		models
notebook		notebook
others		others
preds		preds
prepare		prepare
scripts		scripts
sub		sub
.gitignore		.gitignore
README.md		README.md
SETTINGS.json		SETTINGS.json
entry_points.md		entry_points.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLAsTiCC Astronomical Classification 3rd-place solution

Overview of solution

environment

datasets & result files

scripts

usage

directory

About

Releases

Packages

Contributors 2

Languages

takashioya/plasticc

Folders and files

Latest commit

History

Repository files navigation

PLAsTiCC Astronomical Classification 3rd-place solution

Overview of solution

environment

datasets & result files

scripts

usage

directory

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages