Skip to content

Ingest Github Organization Events into Delta lake with Azure ADLS Gen2

Notifications You must be signed in to change notification settings

yutongluo/github-org-events-deltalake-ingestor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Github Org Events DeltaLake Ingestor

Requirements

Running

  1. Create a file in src/main/resources/settings.conf with the following content
    token=<github-api-token>
    app_id=<github-application-id>
    account_name=<storage-account-name>
    directory_id=<directory_id> 
    password=U-l8Q~AcG~Fmy5uTklapyBYzRJkH-aszR68TzbJS
    
  2. Run mvn package
  3. To store locally (store table in local /tmp/ folder), execute with $SPARK_HOME/bin/spark-submit --packages io.delta:delta-core_2.12:2.2.0 --class SparkIngestMain --master 'local[4]' ~/github-events-ingest/target/github-events-ingest-1.0-SNAPSHOT.jar -o microsoft
  4. To store in remote location specified in settings.conf, run with added -e prod option

If spark-submit complaints about missing jars, add them manually to jars folder of your spark submit

About

Ingest Github Organization Events into Delta lake with Azure ADLS Gen2

Resources

Stars

Watchers

Forks