Product Recommendation Engine

Environment Details:

This code is developed and tested with Spark v3.1.2 using Python 3.9. But I have tested it with Python versions 3.8 and 3.6 as well.

Usage:

First run setup.sh script. It will create a python virtual environment and will install the package dependencies.

chmod +x setup.sh
bash setup.sh

Once virtual environment has been created, you can run it in IDE/development environment or you can use it with spark-submit.

Execute In Development Environment:

You can follow the following steps to execute this code in an IDE/development environment.

Extract the code archive and change directory to the extracted code
Activate virtual environment using the following command:
```
source venv/bin/activate
```
Execute the code using following command:
```
python main.py 'sku-1'
```
On successful execution, following output should appear on your console:

Execute with spark-submit:

NOTE: In order to utilize interpreter from the virtual environment, Spark node must have the specific python interpreter version installed.

Archive virtual environment and export appropriate environment variables by running following command:
```
source archive-venv.sh 
```
venv-pack is used here to archive the virtual environment as suggested in the official documentation
Now, this job can be submitted to spark master using spark-submit:
```
spark-submit --archives venv.tar.gz#venv main.py 'sku-6276'
```

Job Configuration with Environment Variables:

Environment Variable	Default Value	Usage
MASTER	local[*]	It is being used to set Master URL in spark session
LOG_LEVEL	WARN	Sets logging level of the application in log4j
RECOMMENDATIONS_COUNT	10	Sets output count of recommendations. For example, if we set it to 5, it will show top 5 recommendations
INPUT_DATA_PATH	data/test-data-for-spark.json	Set it to the input file path if you have it on some other filesystems

Environment Variables Usage Example:

We can set environment variables like this:

RECOMMENDATIONS_COUNT=5 MASTER=spark://zahid:7077 INPUT_DATA_PATH=/tmp/data/test-data-for-spark.json spark-submit --archives venv.tar.gz#venv main.py 'sku-6276'

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
images		images
utils		utils
.gitignore		.gitignore
README.md		README.md
archive-venv.sh		archive-venv.sh
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Product Recommendation Engine

Environment Details:

Usage:

Execute In Development Environment:

Execute with spark-submit:

Job Configuration with Environment Variables:

Environment Variables Usage Example:

About

Releases

Packages

Languages

zahidadeel/product-recommendation-engine

Folders and files

Latest commit

History

Repository files navigation

Product Recommendation Engine

Environment Details:

Usage:

Execute In Development Environment:

Execute with spark-submit:

Job Configuration with Environment Variables:

Environment Variables Usage Example:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages