Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh #253

Merged
merged 7 commits into from
May 26, 2022

Conversation

navinrathore
Copy link
Contributor

Made EMAIL optional in ClientOptions

@navinrathore
Copy link
Contributor Author

#15

Copy link
Member

@sonalgoyal sonalgoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us keep both email and license. we need to support them in the future.

@navinrathore
Copy link
Contributor Author

Reverted License and email stuff.
Introduced a file based environment configuration setup

Copy link
Member

@sonalgoyal sonalgoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • zingg-env.sh - have option to specify any property that starts like spark..(can be spark/executor.memory/spark.driver.memory,spark.hadopp.fs.impl..etc with sensible defaults where applicable and commented out for rest -eg bigquery hadoop fs). any property that starts with spark. put into conf. --conf spark.executor.memory=22g
    zingg.sh should not have any knowledge of which property. it just knows that satuff has to be passed into jars and conf.

EMAIL=xxx@yyy.com
LICENSE="test"
##for local
export SPARK_MEM=10g
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cant remove spark_mem, we should document to set it outside. docker should also have a way to set this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in zingg-env.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK has multiple conf files.(SIX)

  1. env. variables executed by the unix script (spark-env.sh) needed before starting a spark program
  2. Property File (spark-defaults.conf). It is taken care of Java/scala class
  3. log4j.properties
  4. Workers.template

We should keep things simple unless there they are must. We are replicating files 1) and 2) and only in scripts through env var and "--conf" params, respectively.

"--jar" has corresponding conf param as well.e.g
"spark.driver.extraClassPath=/path/myjarfile1.jar:/path/myjarfile2.jar"

Therefore, I think It's not required to processing based on the "pattern" matching.
If you thing otherwise, please let it know.

  • zingg-env.sh - have option to specify any property that starts like spark..(can be spark/executor.memory/spark.driver.memory,spark.hadopp.fs.impl..etc with sensible defaults where applicable and commented out for rest -eg bigquery hadoop fs). any property that starts with spark. put into conf. --conf spark.executor.memory=22g
    zingg.sh should not have any knowledge of which property. it just knows that satuff has to be passed into jars and conf.

scripts/zingg.sh Outdated
@@ -18,4 +14,4 @@ else
OPTION_SPARK_CONF="${ZINGG_EXTRA_SPARK_CONF}"
fi

$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER $OPTION_JARS $OPTION_SPARK_CONF --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.es.nodes="127.0.0.1" --conf spark.es.port="9200" --conf spark.es.resource="cluster/cluster1" --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.executor.memory=10g --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ --email $EMAIL --license $LICENSE
$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER $OPTION_JARS $OPTION_SPARK_CONF --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.executor.memory=10g --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove spark.executor.memory and document this to be set as part of the installation. also has to be set in docker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to property file


#### SPARK_EXECUTOR_MEMORY ########################################
# The SPARK_EXECUTOR_MEMORY variable updates spark.executor.memory. It may be modified based on memory available in the system. Default is 8GB
SPARK_EXECUTOR_MEMORY=8g
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep spark properties as spark.executor.memory etc - exactly as we have them in spark? That makes remembering things easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Now they are in property file.

@@ -3,8 +3,12 @@
ZINGG_JARS=$ZINGG_HOME/zingg-0.3.3-SNAPSHOT.jar
EMAIL=xxx@yyy.com
LICENSE="test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move these too to the defaults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to environment file.

scripts/zingg.sh Outdated
OPTION_EXECUTOR_MEMORY="--conf spark.executor.memory=${SPARK_EXECUTOR_MEMORY}"

if [[ -z "${SPARK_DRIVER_MEMORY}" ]]; then
SPARK_DRIVER_MEMORY=8g
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the defaults need to go in the conf, not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from here.

scripts/zingg.sh Outdated

# All the additional options must be added here
ALL_OPTIONS=" ${OPTION_DRIVER_MEMORY} ${OPTION_EXECUTOR_MEMORY} ${OPTION_JARS} ${OPTION_SPARK_CONF} "
$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER ${ALL_OPTIONS} --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ --email $EMAIL --license $LICENSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.default.parallelism too needs to go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go to the zingg-env.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

ZINGG_ENV_SH="${ZINGG_CONF_DIR}/${ZINGG_ENV_SH}"
if [[ -f "${ZINGG_ENV_SH}" ]]; then
# Promote all variable declarations to environment (exported) variables
set -a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to set them as env variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the alternative? As they are going to be supplied in another command (spark-submit), they must be set. Whether they have to be exported or not could be thought over!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we source them, we can read them here, no? Why export?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check if needed or not.
either all exported or none.

scripts/zingg.sh Outdated
# Set the ZINGG environment variables
ZINGG_ENV="$(dirname "$0")"/load-zingg-env.sh
if [[ -f "${ZINGG_ENV}" ]]; then
source ${ZINGG_ENV}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use source, I dont think you would need to also set env variables in load-zingg-env.sh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, load-zingg-env.sh is being executed. Which, in turn, will export variables defined in "zingg-env.sh.
Am I missing something?

Copy link
Member

@sonalgoyal sonalgoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We are missing spark.master - default has to be local[*]
  • add a new folder extraLibs under zingg and add that by default to the classpath. So people can add their jarsthier. Or they can manipulate ZINGG_EXTRA_JARS env variable.
  • there doesnt seem tobe any benefit exporting the variables.
  • zingg-defaults and zingg-env can be collapsed to one file.

Copy link
Member

@sonalgoyal sonalgoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to EMAIL AND LICENSE?

@sonalgoyal
Copy link
Member

sonalgoyal commented May 16, 2022 via email

@navinrathore
Copy link
Contributor Author

What happens to EMAIL AND LICENSE?

They are in zingg-env file.

@sonalgoyal
Copy link
Member

What happens to EMAIL AND LICENSE?

They are in zingg-env file.

ok

Copy link
Member

@sonalgoyal sonalgoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. rename zingg-defaults.conf to zingg.conf
  2. move license and email to it

Copy link
Member

@sonalgoyal sonalgoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only set zingg_home and spark_master through the env. rest everything is in zingg.conf. license and email remain in zingg.sh

@navinrathore navinrathore mentioned this pull request May 16, 2022
@sonalgoyal sonalgoyal merged commit 7129eff into zinggAI:main May 26, 2022
@navinrathore navinrathore deleted the zZinggScriptCleanup branch June 1, 2022 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants