According to the official gudience, you can enable Spark to log the detailed execution infomation in spark/confspark-defaults.conf
file.
spark.eventLog.enabled true
spark.eventLog.dir XXX
# default is file:///tmp/spark-events
{"Event":"SparkListenerTaskStart","Stage ID":0,"Stage Attempt ID":0,"Task Info":{"Task ID":2,"Index":2,"Attempt":0,"Launch Time":1499522864395,"Executor ID":"11","Host":"172.31.38.103","Locality":"PROCESS_LOCAL","Speculative":false,"Getting Result Time":0,"Finish Time":0,"Failed":false,"Accumulables":[]}}
You can convert it by https://jsonformatter.org
{
"Event": "SparkListenerTaskStart",
"Stage ID": 0,
"Stage Attempt ID": 0,
"Task Info": {
"Task ID": 2,
"Index": 2,
"Attempt": 0,
"Launch Time": 1499522864395,
"Executor ID": "11",
"Host": "172.31.38.103",
"Locality": "PROCESS_LOCAL",
"Speculative": false,
"Getting Result Time": 0,
"Finish Time": 0,
"Failed": false,
"Accumulables": []
}
}
python /Spark-Log-Parser/main.py /Spark-Log-Parser/log-example/app-20170708141026-0013