-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh #253
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us keep both email and license. we need to support them in the future.
Reverted License and email stuff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- zingg-env.sh - have option to specify any property that starts like spark..(can be spark/executor.memory/spark.driver.memory,spark.hadopp.fs.impl..etc with sensible defaults where applicable and commented out for rest -eg bigquery hadoop fs). any property that starts with spark. put into conf. --conf spark.executor.memory=22g
zingg.sh should not have any knowledge of which property. it just knows that satuff has to be passed into jars and conf.
EMAIL=xxx@yyy.com | ||
LICENSE="test" | ||
##for local | ||
export SPARK_MEM=10g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cant remove spark_mem, we should document to set it outside. docker should also have a way to set this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in zingg-env.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPARK has multiple conf files.(SIX)
- env. variables executed by the unix script (spark-env.sh) needed before starting a spark program
- Property File (spark-defaults.conf). It is taken care of Java/scala class
- log4j.properties
- Workers.template
We should keep things simple unless there they are must. We are replicating files 1) and 2) and only in scripts through env var and "--conf" params, respectively.
"--jar" has corresponding conf param as well.e.g
"spark.driver.extraClassPath=/path/myjarfile1.jar:/path/myjarfile2.jar"
Therefore, I think It's not required to processing based on the "pattern" matching.
If you thing otherwise, please let it know.
- zingg-env.sh - have option to specify any property that starts like spark..(can be spark/executor.memory/spark.driver.memory,spark.hadopp.fs.impl..etc with sensible defaults where applicable and commented out for rest -eg bigquery hadoop fs). any property that starts with spark. put into conf. --conf spark.executor.memory=22g
zingg.sh should not have any knowledge of which property. it just knows that satuff has to be passed into jars and conf.
scripts/zingg.sh
Outdated
@@ -18,4 +14,4 @@ else | |||
OPTION_SPARK_CONF="${ZINGG_EXTRA_SPARK_CONF}" | |||
fi | |||
|
|||
$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER $OPTION_JARS $OPTION_SPARK_CONF --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.es.nodes="127.0.0.1" --conf spark.es.port="9200" --conf spark.es.resource="cluster/cluster1" --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.executor.memory=10g --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ --email $EMAIL --license $LICENSE | |||
$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER $OPTION_JARS $OPTION_SPARK_CONF --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.executor.memory=10g --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove spark.executor.memory and document this to be set as part of the installation. also has to be set in docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to property file
config/zingg-env.sh
Outdated
|
||
#### SPARK_EXECUTOR_MEMORY ######################################## | ||
# The SPARK_EXECUTOR_MEMORY variable updates spark.executor.memory. It may be modified based on memory available in the system. Default is 8GB | ||
SPARK_EXECUTOR_MEMORY=8g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep spark properties as spark.executor.memory etc - exactly as we have them in spark? That makes remembering things easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Now they are in property file.
@@ -3,8 +3,12 @@ | |||
ZINGG_JARS=$ZINGG_HOME/zingg-0.3.3-SNAPSHOT.jar | |||
EMAIL=xxx@yyy.com | |||
LICENSE="test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move these too to the defaults?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to environment file.
scripts/zingg.sh
Outdated
OPTION_EXECUTOR_MEMORY="--conf spark.executor.memory=${SPARK_EXECUTOR_MEMORY}" | ||
|
||
if [[ -z "${SPARK_DRIVER_MEMORY}" ]]; then | ||
SPARK_DRIVER_MEMORY=8g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the defaults need to go in the conf, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from here.
scripts/zingg.sh
Outdated
|
||
# All the additional options must be added here | ||
ALL_OPTIONS=" ${OPTION_DRIVER_MEMORY} ${OPTION_EXECUTOR_MEMORY} ${OPTION_JARS} ${OPTION_SPARK_CONF} " | ||
$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER ${ALL_OPTIONS} --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ --email $EMAIL --license $LICENSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.default.parallelism too needs to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go to the zingg-env.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
ZINGG_ENV_SH="${ZINGG_CONF_DIR}/${ZINGG_ENV_SH}" | ||
if [[ -f "${ZINGG_ENV_SH}" ]]; then | ||
# Promote all variable declarations to environment (exported) variables | ||
set -a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to set them as env variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the alternative? As they are going to be supplied in another command (spark-submit), they must be set. Whether they have to be exported or not could be thought over!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we source them, we can read them here, no? Why export?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check if needed or not.
either all exported or none.
scripts/zingg.sh
Outdated
# Set the ZINGG environment variables | ||
ZINGG_ENV="$(dirname "$0")"/load-zingg-env.sh | ||
if [[ -f "${ZINGG_ENV}" ]]; then | ||
source ${ZINGG_ENV} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use source, I dont think you would need to also set env variables in load-zingg-env.sh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, load-zingg-env.sh is being executed. Which, in turn, will export variables defined in "zingg-env.sh.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We are missing spark.master - default has to be local[*]
- add a new folder extraLibs under zingg and add that by default to the classpath. So people can add their jarsthier. Or they can manipulate ZINGG_EXTRA_JARS env variable.
- there doesnt seem tobe any benefit exporting the variables.
- zingg-defaults and zingg-env can be collapsed to one file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to EMAIL AND LICENSE?
Most of our files will be used by people who do not understand or use
spark. Those who know spark already know how to configure things inside
spark or use a hosted service like databricks or azure where config is
different. That’s why we are not asking people to change spark config
directly, which would be a lot easier for us but not for them.
We can go with spark default property for jars instead of specifying our
own.
Secondly, we have common properties like email and license that we read.
And we will need to specify some properties for snowflake tomorrow. The
config will be common, variables will be different.
On Mon, 16 May 2022 at 11:13 AM, Navin Singh ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In scripts/zingg.sh
<#253 (comment)>:
> @@ -1,10 +1,6 @@
#!/bin/bash
#ZINGG_HOME=./assembly/target
ZINGG_JARS=$ZINGG_HOME/zingg-0.3.3-SNAPSHOT.jar
***@***.***
-LICENSE="test"
-##for local
-export SPARK_MEM=10g
SPARK has multiple conf files.(SIX)
1. env. variables executed by the unix script (spark-env.sh) needed
before starting a spark program
2. Property File (spark-defaults.conf). It is taken care of Java/scala
class
3. log4j.properties
4. Workers.template
We should keep things simple unless there they are must. We are
replicating files 1) and 2) and only in scripts through env var and
"--conf" params, respectively.
"--jar" has corresponding conf param as well.e.g
"spark.driver.extraClassPath=/path/myjarfile1.jar:/path/myjarfile2.jar"
Therefore, I think It's not required to processing based on the "pattern"
matching.
If you thing otherwise, please let it know.
- zingg-env.sh - have option to specify any property that starts like
spark..(can be
spark/executor.memory/spark.driver.memory,spark.hadopp.fs.impl..etc with
sensible defaults where applicable and commented out for rest -eg bigquery
hadoop fs). any property that starts with spark. put into conf. --conf
spark.executor.memory=22g
zingg.sh should not have any knowledge of which property. it just
knows that satuff has to be passed into jars and conf.
—
Reply to this email directly, view it on GitHub
<#253 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACLRC7Q2LTX75O4SEJROXDVKHN6XANCNFSM5VVESV4A>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Cheers,
Sonal
https://github.com/zinggAI/zingg
|
They are in zingg-env file. |
ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- rename zingg-defaults.conf to zingg.conf
- move license and email to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only set zingg_home and spark_master through the env. rest everything is in zingg.conf. license and email remain in zingg.sh
Made EMAIL optional in ClientOptions
Added a file to configure env variables that are needed for spark
Moved EMAIL, LICENSE env var to singg-env.sh
2f1695c
to
b291688
Compare
Made EMAIL optional in ClientOptions