-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed EMAIL, LICENSE, SPARK_MEM and elastic references from zingg.sh #253
Changes from all commits
1fa9fe1
bb5832c
c11c8cb
b7177fc
8095861
3a65ee7
b291688
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# file config/zingg-defaults.conf | ||
# This file defines default spark properties. These properties are passed to 'spark-submit' as spark configurations (--conf) | ||
# This is useful for setting default environmental settings. | ||
# Entries in this file could be - | ||
# A. Blank Lines | ||
# B. Comment Lines(Starts with #) | ||
# C. Property in key=value format | ||
# | ||
# Leading or trailing spaces could be fine. | ||
# Please note that any key or value already comprising spaces or double quotes must be enclosed with single quotes ('') | ||
|
||
|
||
### General properties | ||
spark.serializer=org.apache.spark.serializer.KryoSerializer | ||
spark.default.parallelism=8 | ||
spark.debug.maxToStringFields=200 | ||
spark.driver.memory=8g | ||
spark.executor.memory=8g | ||
|
||
# Additional Jars could be passed to spark through below configuration. Jars list should be comma(,) separated. | ||
#spark.jars= | ||
#spark.executor.extraClassPath= | ||
#spark.driver.extraClassPath= | ||
|
||
### Below property must be set for BigQuery | ||
#spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
#!/usr/bin/env bash | ||
|
||
ZINGG_ENV_SH="zingg-env.sh" | ||
export ZINGG_CONF_DIR="$(dirname "$0")"/../config | ||
|
||
ZINGG_ENV_SH="${ZINGG_CONF_DIR}/${ZINGG_ENV_SH}" | ||
if [[ -f "${ZINGG_ENV_SH}" ]]; then | ||
# Promote all variable declarations to environment (exported) variables | ||
set -a | ||
. ${ZINGG_ENV_SH} | ||
set +a | ||
fi |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,19 +3,34 @@ | |
ZINGG_JARS=$ZINGG_HOME/zingg-0.3.3-SNAPSHOT.jar | ||
EMAIL=xxx@yyy.com | ||
LICENSE="test" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. move these too to the defaults? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. moved to environment file. |
||
##for local | ||
export SPARK_MEM=10g | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we cant remove spark_mem, we should document to set it outside. docker should also have a way to set this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added in zingg-env.sh There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK has multiple conf files.(SIX)
We should keep things simple unless there they are must. We are replicating files 1) and 2) and only in scripts through env var and "--conf" params, respectively. "--jar" has corresponding conf param as well.e.g Therefore, I think It's not required to processing based on the "pattern" matching.
|
||
|
||
if [[ -z "${ZINGG_EXTRA_JARS}" ]]; then | ||
OPTION_JARS="" | ||
else | ||
OPTION_JARS="--jars ${ZINGG_EXTRA_JARS}" | ||
fi | ||
|
||
if [[ -z "${ZINGG_EXTRA_SPARK_CONF}" ]]; then | ||
OPTION_SPARK_CONF="" | ||
else | ||
OPTION_SPARK_CONF="${ZINGG_EXTRA_SPARK_CONF}" | ||
fi | ||
function read_zingg_conf() { | ||
local CONF_PROPS="" | ||
|
||
ZINGG_CONF_DIR="$(cd "`dirname "$0"`"/../config; pwd)" | ||
|
||
file="${ZINGG_CONF_DIR}/zingg.conf" | ||
# Leading blanks removed; comment Lines, blank lines removed | ||
PROPERTIES=$(sed 's/^[[:blank:]]*//;s/#.*//;/^[[:space:]]*$/d' $file) | ||
|
||
while IFS='=' read -r key value; do | ||
# Trim leading and trailing spaces | ||
key=$(echo $key | sed 's/^[[:blank:]]*//;s/[[:blank:]]*$//;') | ||
value=$(echo $value | sed 's/^[[:blank:]]*//;s/[[:blank:]]*$//;') | ||
# Append to conf variable | ||
CONF_PROPS+=" --conf ${key}=${value}" | ||
done <<< "$(echo -e "$PROPERTIES")" | ||
|
||
echo $CONF_PROPS | ||
} | ||
|
||
$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER $OPTION_JARS $OPTION_SPARK_CONF --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.es.nodes="127.0.0.1" --conf spark.es.port="9200" --conf spark.es.resource="cluster/cluster1" --conf spark.default.parallelism="8" --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.executor.memory=10g --conf spark.debug.maxToStringFields=200 --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ --email $EMAIL --license $LICENSE | ||
OPTION_SPARK_CONF+=$(read_zingg_conf) | ||
# All the additional options must be added here | ||
ALL_OPTIONS=" ${OPTION_JARS} ${OPTION_SPARK_CONF} " | ||
$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER ${ALL_OPTIONS} --conf spark.executor.extraJavaOptions="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --driver-class-path $ZINGG_JARS --class zingg.client.Client $ZINGG_JARS $@ --email $EMAIL --license $LICENSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to set them as env variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the alternative? As they are going to be supplied in another command (spark-submit), they must be set. Whether they have to be exported or not could be thought over!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we source them, we can read them here, no? Why export?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check if needed or not.
either all exported or none.