Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh #253

navinrathore · 2022-05-11T14:29:00Z

Made EMAIL optional in ClientOptions

navinrathore · 2022-05-11T14:29:39Z

#15

sonalgoyal

let us keep both email and license. we need to support them in the future.

navinrathore · 2022-05-13T13:40:45Z

Reverted License and email stuff.
Introduced a file based environment configuration setup

sonalgoyal

zingg-env.sh - have option to specify any property that starts like spark..(can be spark/executor.memory/spark.driver.memory,spark.hadopp.fs.impl..etc with sensible defaults where applicable and commented out for rest -eg bigquery hadoop fs). any property that starts with spark. put into conf. --conf spark.executor.memory=22g
zingg.sh should not have any knowledge of which property. it just knows that satuff has to be passed into jars and conf.

sonalgoyal · 2022-05-11T14:36:09Z

scripts/zingg.sh

-EMAIL=xxx@yyy.com
-LICENSE="test"
-##for local
-export SPARK_MEM=10g


we cant remove spark_mem, we should document to set it outside. docker should also have a way to set this.

Added in zingg-env.sh

SPARK has multiple conf files.(SIX)

env. variables executed by the unix script (spark-env.sh) needed before starting a spark program

Property File (spark-defaults.conf). It is taken care of Java/scala class

log4j.properties

Workers.template

We should keep things simple unless there they are must. We are replicating files 1) and 2) and only in scripts through env var and "--conf" params, respectively.

"--jar" has corresponding conf param as well.e.g
"spark.driver.extraClassPath=/path/myjarfile1.jar:/path/myjarfile2.jar"

Therefore, I think It's not required to processing based on the "pattern" matching.
If you thing otherwise, please let it know.

zingg-env.sh - have option to specify any property that starts like spark..(can be spark/executor.memory/spark.driver.memory,spark.hadopp.fs.impl..etc with sensible defaults where applicable and commented out for rest -eg bigquery hadoop fs). any property that starts with spark. put into conf. --conf spark.executor.memory=22g
zingg.sh should not have any knowledge of which property. it just knows that satuff has to be passed into jars and conf.

sonalgoyal · 2022-05-11T14:38:08Z

scripts/zingg.sh

@@ -18,4 +14,4 @@ else
  OPTION_SPARK_CONF="${ZINGG_EXTRA_SPARK_CONF}"
 fi

-$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER $OPTION_JARS $OPTION_SPARK_CONF --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.es.nodes="127.0.0.1" --conf spark.es.port="9200" --conf spark.es.resource="cluster/cluster1" --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.executor.memory=10g --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ --email $EMAIL --license $LICENSE 
+$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER $OPTION_JARS $OPTION_SPARK_CONF --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.executor.memory=10g --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ 


please remove spark.executor.memory and document this to be set as part of the installation. also has to be set in docker.

moved to property file

sonalgoyal · 2022-05-13T13:49:12Z

config/zingg-env.sh

+
+#### SPARK_EXECUTOR_MEMORY ########################################
+# The SPARK_EXECUTOR_MEMORY variable updates spark.executor.memory. It may be modified based on memory available in the system. Default is 8GB
+SPARK_EXECUTOR_MEMORY=8g


can we keep spark properties as spark.executor.memory etc - exactly as we have them in spark? That makes remembering things easier

Yes. Now they are in property file.

sonalgoyal · 2022-05-13T13:50:37Z

scripts/zingg.sh

@@ -3,8 +3,12 @@
 ZINGG_JARS=$ZINGG_HOME/zingg-0.3.3-SNAPSHOT.jar
 EMAIL=xxx@yyy.com
 LICENSE="test"


move these too to the defaults?

moved to environment file.

sonalgoyal · 2022-05-13T13:53:06Z

scripts/zingg.sh

+OPTION_EXECUTOR_MEMORY="--conf spark.executor.memory=${SPARK_EXECUTOR_MEMORY}"
+
+if [[ -z "${SPARK_DRIVER_MEMORY}" ]]; then
+  SPARK_DRIVER_MEMORY=8g


the defaults need to go in the conf, not here.

Moved from here.

sonalgoyal · 2022-05-13T13:53:30Z

scripts/zingg.sh

+
+# All the additional options must be added here
+ALL_OPTIONS=" ${OPTION_DRIVER_MEMORY} ${OPTION_EXECUTOR_MEMORY} ${OPTION_JARS} ${OPTION_SPARK_CONF} "
+$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER ${ALL_OPTIONS} --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ --email $EMAIL --license $LICENSE 


spark.default.parallelism too needs to go

go to the zingg-env.sh

sonalgoyal · 2022-05-13T14:12:26Z

scripts/load-zingg-env.sh

+ZINGG_ENV_SH="${ZINGG_CONF_DIR}/${ZINGG_ENV_SH}"
+if [[ -f "${ZINGG_ENV_SH}" ]]; then
+  # Promote all variable declarations to environment (exported) variables
+  set -a


do we need to set them as env variables?

What is the alternative? As they are going to be supplied in another command (spark-submit), they must be set. Whether they have to be exported or not could be thought over!

If we source them, we can read them here, no? Why export?

I'll check if needed or not.
either all exported or none.

sonalgoyal · 2022-05-13T14:14:09Z

scripts/zingg.sh

+# Set the ZINGG environment variables
+ZINGG_ENV="$(dirname "$0")"/load-zingg-env.sh
+if [[ -f "${ZINGG_ENV}" ]]; then
+  source ${ZINGG_ENV}


if you use source, I dont think you would need to also set env variables in load-zingg-env.sh.

Here, load-zingg-env.sh is being executed. Which, in turn, will export variables defined in "zingg-env.sh.
Am I missing something?

sonalgoyal

We are missing spark.master - default has to be local[*]
add a new folder extraLibs under zingg and add that by default to the classpath. So people can add their jarsthier. Or they can manipulate ZINGG_EXTRA_JARS env variable.
there doesnt seem tobe any benefit exporting the variables.
zingg-defaults and zingg-env can be collapsed to one file.

sonalgoyal

What happens to EMAIL AND LICENSE?

sonalgoyal · 2022-05-16T05:51:38Z

Most of our files will be used by people who do not understand or use spark. Those who know spark already know how to configure things inside spark or use a hosted service like databricks or azure where config is different. That’s why we are not asking people to change spark config directly, which would be a lot easier for us but not for them. We can go with spark default property for jars instead of specifying our own. Secondly, we have common properties like email and license that we read. And we will need to specify some properties for snowflake tomorrow. The config will be common, variables will be different.

On Mon, 16 May 2022 at 11:13 AM, Navin Singh ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In scripts/zingg.sh <#253 (comment)>: > @@ -1,10 +1,6 @@ #!/bin/bash #ZINGG_HOME=./assembly/target ZINGG_JARS=$ZINGG_HOME/zingg-0.3.3-SNAPSHOT.jar ***@***.*** -LICENSE="test" -##for local -export SPARK_MEM=10g SPARK has multiple conf files.(SIX) 1. env. variables executed by the unix script (spark-env.sh) needed before starting a spark program 2. Property File (spark-defaults.conf). It is taken care of Java/scala class 3. log4j.properties 4. Workers.template We should keep things simple unless there they are must. We are replicating files 1) and 2) and only in scripts through env var and "--conf" params, respectively. "--jar" has corresponding conf param as well.e.g "spark.driver.extraClassPath=/path/myjarfile1.jar:/path/myjarfile2.jar" Therefore, I think It's not required to processing based on the "pattern" matching. If you thing otherwise, please let it know. - zingg-env.sh - have option to specify any property that starts like spark..(can be spark/executor.memory/spark.driver.memory,spark.hadopp.fs.impl..etc with sensible defaults where applicable and commented out for rest -eg bigquery hadoop fs). any property that starts with spark. put into conf. --conf spark.executor.memory=22g zingg.sh should not have any knowledge of which property. it just knows that satuff has to be passed into jars and conf. — Reply to this email directly, view it on GitHub <#253 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACLRC7Q2LTX75O4SEJROXDVKHN6XANCNFSM5VVESV4A> . You are receiving this because you commented.Message ID: ***@***.***>

-- Cheers, Sonal https://github.com/zinggAI/zingg

navinrathore · 2022-05-16T06:03:17Z

What happens to EMAIL AND LICENSE?

They are in zingg-env file.

sonalgoyal · 2022-05-16T06:10:57Z

What happens to EMAIL AND LICENSE?

They are in zingg-env file.

ok

sonalgoyal

rename zingg-defaults.conf to zingg.conf
move license and email to it

sonalgoyal

only set zingg_home and spark_master through the env. rest everything is in zingg.conf. license and email remain in zingg.sh

Made EMAIL optional in ClientOptions

Added a file to configure env variables that are needed for spark

Moved EMAIL, LICENSE env var to singg-env.sh

navinrathore mentioned this pull request May 11, 2022

Clean up zingg.sh script #15

Closed

sonalgoyal requested changes May 11, 2022

View reviewed changes

sonalgoyal requested changes May 13, 2022

View reviewed changes

sonalgoyal reviewed May 16, 2022

View reviewed changes

sonalgoyal requested changes May 16, 2022

View reviewed changes

navinrathore mentioned this pull request May 16, 2022

Revisit zingg.sh #260

Closed

navinrathore added 7 commits May 17, 2022 10:49

Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh

1fa9fe1

Made EMAIL optional in ClientOptions

restored EMAIL, LICENCE in zingg.sh for future

bb5832c

Reverted changes to pass EMAIL, LICENSE options to zingg

c11c8cb

Added a file to configure env variables that are needed for spark

Property definition through config/zingg.defaults.conf too

b7177fc

Moved EMAIL, LICENSE env var to singg-env.sh

Moved EMAIL, LICENCE to zingg-env.sh

8095861

added spark.master default property

3a65ee7

Having single zingg.conf file for all zingg configurations

b291688

navinrathore force-pushed the zZinggScriptCleanup branch from 2f1695c to b291688 Compare May 17, 2022 05:29

sonalgoyal merged commit 7129eff into zinggAI:main May 26, 2022

navinrathore deleted the zZinggScriptCleanup branch June 1, 2022 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh #253

Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh #253

navinrathore commented May 11, 2022

navinrathore commented May 11, 2022

sonalgoyal left a comment

navinrathore commented May 13, 2022

sonalgoyal left a comment

sonalgoyal May 11, 2022

navinrathore May 16, 2022

navinrathore May 16, 2022

sonalgoyal May 11, 2022

navinrathore May 16, 2022

sonalgoyal May 13, 2022

navinrathore May 16, 2022

sonalgoyal May 13, 2022

navinrathore May 16, 2022

sonalgoyal May 13, 2022

navinrathore May 16, 2022

sonalgoyal May 13, 2022

sonalgoyal May 13, 2022

navinrathore May 16, 2022

sonalgoyal May 13, 2022

navinrathore May 16, 2022

sonalgoyal May 16, 2022

navinrathore May 16, 2022

sonalgoyal May 13, 2022

navinrathore May 16, 2022

sonalgoyal left a comment

sonalgoyal left a comment

sonalgoyal commented May 16, 2022 via email

navinrathore commented May 16, 2022

sonalgoyal commented May 16, 2022

sonalgoyal left a comment

sonalgoyal left a comment

Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh #253

Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh #253

Conversation

navinrathore commented May 11, 2022

navinrathore commented May 11, 2022

sonalgoyal left a comment

Choose a reason for hiding this comment

navinrathore commented May 13, 2022

sonalgoyal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonalgoyal left a comment

Choose a reason for hiding this comment

sonalgoyal left a comment

Choose a reason for hiding this comment

sonalgoyal commented May 16, 2022 via email

navinrathore commented May 16, 2022

sonalgoyal commented May 16, 2022

sonalgoyal left a comment

Choose a reason for hiding this comment

sonalgoyal left a comment

Choose a reason for hiding this comment