Take advantage of spark DataFrame to filter a large .csv file.
This notebook does following -
- Read Events from a Big File using pySpark
- Remove the unwanted events using Spark DataFrame APIs
- Add new column to represent EPOCH time
- ==> Following doesn't work now, proabably, as I have used spark 2.3 with java 11, So use Spark itself to write the out file