Skip to content

Latest commit

 

History

History
206 lines (139 loc) · 4.88 KB

README.md

File metadata and controls

206 lines (139 loc) · 4.88 KB

BqTail - command line loader

Stand alone Google Storage based BigQuery loader.

Introduction

BqTail command loader manages ingestion process as stand along process using Data ingestion rules. For each source datafile an event is triggered to local BqTail process. Since BigQuery Load API accepts URI that is valid Google Cloud Storage location, all data events also needs to be valid GCS locations.

Data event can be trigger directly to the bqtail process if source URL is valid Google Cloud Storage URL and source path matches bucket and rule filter. Otherwise all files get copied from sourceURL to gs://${bucket}/$filterPath, and then an event gets fired. $filterPath can be derived from source path when it matches rule filter, or constructed from rule prefix and pattern.

In direct eventing mode all data source files are govern by BqTail ingestion rule. For example if rule uses batching window, datafile last modification is used to allocate corresponding batch. Take another example when a rule uses delete action on Success, all source matched data files would be deleted.

For non direct mode, original data files are never deleted, to avoid the same file processing between a separate bqtail commands run, you can use -h or -X parameter to store all successfully processed file in a history file.

By default only streaming mode stores history file in file:///${env.HOME}/.bqtail location, otherwise memory filesystem is used.

Installation

OSX(amd64)
wget https://github.com/viant/bqtail/releases/download/v2.10.3/bqtail_osx_amd64_2.10.3.tar.gz
tar -xvzf bqtail_osx_amd64_2.10.3.tar.gz
cp bqtail /usr/local/bin/
OSX(arm64)
wget https://github.com/viant/bqtail/releases/download/v2.10.3/bqtail_osx_arm64_2.10.3.tar.gz
tar -xvzf bqtail_osx_arm64_2.10.3.tar.gz
cp bqtail /usr/local/bin/
Linux(amd64)
wget https://github.com/viant/bqtail/releases/download/v2.10.3/bqtail_amd64_linux_2.10.3.tar.gz
tar -xvzf bqtail_linux_amd64_2.10.3.tar.gz
cp bqtail /usr/local/bin/
Linux(arm64)
wget https://github.com/viant/bqtail/releases/download/v2.10.3/bqtail_arm64_linux_2.10.3.tar.gz
tar -xvzf bqtail_linux_arm64_2.10.3.tar.gz
cp bqtail /usr/local/bin/

Building from the source

git clone https://github.com/viant/bqtail.git
cd bqtail/cmd/bqtail
go build

Usage

Make sure that you have temp dataset in the project.

Data ingestion rule validation

To validate rule use -V option.

bqtail -r='myRuleURL -V' -p=myProject
bqtail -s=mydatafile -d='myProject:mydataset.mytable' -V
bqtail -r=gs://MY_CONFIG_BUCKET/BqTail/Rules/sys/bqjob.yaml -V

Local data file ingestion

bqtail -s=mydatafile -d='myProject:mydataset.mytable' -b=myGCSBucket

Google storage file ingestion

The following line creates default ingestion rule to ingest data directly from Google Storage

bqtail -s=gs://myBuckey/folder/mydatafile.csv -d='myProject:mydataset.mytable' 

The command ingests data to the dest table and produces the following rule:

Async: true
Dest:
  Table: myProject:mydataset.mytable
  Transient:
    Alias: t
    Dataset: temp
    ProjectID: myProject
Info:
  LeadEngineer: awitas
  URL: mem://localhost/BqTail/config/rule/performance.yaml
  Workflow: rule
OnSuccess:
- Action: delete
  Request:
    URLs: $LoadURIs
When:
  Prefix: /folder/

You can save it as rule.yaml to extend/customize the rule, then you can ingest data with updated rule:

bqtail -s=gs://myBuckey/folder/mydatafile.csv -r=performance.yaml

Local data ingestion with data ingestion rule

bqtail -s=mydatafile -r='myRuleURL'  -b=myGCSBucket

Local data files ingestion

bqtail -s=mylocaldatafolder -d='myProject:mydataset.mytable' -b=myGCSBucket

Local data files ingestion in batch with 120 sec window

bqtail -s=mylocaldatafolder -d='myProject:mydataset.mytable' -w=120  -b=myGCSBucket

Local data files streaming ingestion with rule

bqtail -s=mylocaldatafolder -r='myRuleURL' -X 

Local data files ingestion in batch with 120 sec window with processed file tracking

bqtail -s=mylocaldatafolder -d='myProject:mydataset.mytable' -w=120 -h=~/.bqtail

Authentication

BqTail client can use one the following auth method

  1. With BqTail BigQuery OAuth client (by default)
  • no env setting needed

2.With Google Service Account Secrets

export GOOGLE_APPLICATION_CREDENTIALS=myGoogle.secret
  1. With gsutil authentication
    gcloud config set project my-project
    gcloud auth login`
    export GCLOUD_AUTH=true
  1. With custom BigQuery Oath clent

-c switch

bqtail -c=pathTo/custom.json

where:

  • @pathTo/custom.json
{
   "Id": "xxxx.apps.googleusercontent.com",
  "Secret": "xxxxxx"

}

Help:

bqtail -h