Skip to content

tswsxk/spark_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beginner Tutorial

Preparation

  1. Install dependencies and initial the package
make install

If you have multi-version python, use the command like

make install ENVPIP=pip3.9

to specify your pip.

Also, you can use the following command:

pip install -e . --config-settings editable_mode=compat

(Optional)

If you want to include the spider deps, use the following command:

pip install -e .[spider] --config-settings editable_mode=compat
  1. Run the demo script to see whether everything has been prepared
cd scripts/beginner
python beginner.py

If you see *** Spark *** in the terminal, then everything goes well.

Then, run the notebook eda.ipynb in scripts/EDA

NOTICE: Download the data before you run the scripts:

data/
├── test_X.xlsx
├── test_y.xlsx
└── train.xlsx
  1. The following parts are optional
  • Run tests
pip install -e .[test] --config-settings editable_mode=compat
pytest
  • Use command line tools to see the feature importance of model
# After you have trained the lgb model
tsl lgb imp scripts/lgb_model/lgb.dill

Troubleshooter

  1. For M1/M2/M3 Mac Users: If you encounter issues with installing LightGBM, create a conda virtual environment, and install it using conda:

    conda install -c conda-forge lightgbm
  2. File Not Found Error: If you see an error like "No such file or directory: '../../data/train.xlsx'" after placing the files in the data directory, ensure that you are running the script from its directory (e.g., .../spark_learning/scripts/lgb_model) rather than the project root (e.g., .../spark_learning). Note that VSCode's default behavior is to use the project directory, so run the script from the command line instead.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published