-
Notifications
You must be signed in to change notification settings - Fork 3
Changing and overriding default behaviour
When fired, the Cloud Function creates a BigQuery load job for the triggering file.
This job will be auto-configured with sensible default options, but you can alter these options using:
- Environment variables
- Custom metadata on the file
- Mapping files
These environment variables can be set during the deployment of the cloud function to override the default bahaviour without editing mappings:
-
PROJECT_ID
: GCP Project IDString
, mandatory, no default value -
DATASET_ID
: Default dataset for the destination tableString
, defaults toStaging
-
CREATE_DISPOSITION
: Should new tables be automatically createdCREATE_IF_NEEDED|CREATE_NEVER
, defaults toCREATE_IF_NEEDED
-
WRITE_DISPOSITION
: How new data for an existing table should be processedWRITE_TRUNCATE|WRITE_APPEND|WRITE_EMPTY
, defaults toWRITE_APPEND
-
ENCODING
: Encoding of the fileUTF-8|ISO-8859-1
, defaults toUTF-8
-
DRY_RUN
: Dry runTrue|False
, defaults toFalse
Mapping files let you define options for a specific file, or all files matching a specific pattern
A mapping file is a handlebars
template file that result in a json
document defining a JobConfiguration
object
See https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoad for the complete list of configuration options
The handlebars
templating engine is mainly used for:
- Variable substitution
- Conditional generation of sections
Additionnal variables are made available to the tempateing engine:
- Environment variables are available under the
env
prefix. - The file that triggered the cloud function is available as
file
variable
A few handlebars
helpers are pre-loaded and available for use in the mapping files templates:
-
Complete collection of helpers from
handlebars-helpers
module (link) -
regex-match
Enclosed section will be rendered only if
variable
value matches thepattern
. Named groups are made available as the new context inside the block.{{#regex-match variable pattern}} // ... content {{/regex-match}}
-
assign
Enclosed section value will be evaluated and assigned to
variable
for later use in the template.{{#assign variable}}value{{/assign}}`
All files of the autoload bucket matching the pattern /mappings/**/*.hbs
will be aggregated to obtain the full mapping configuration
Therefore, you can add any number of arbitrary .hbs
files to the /mappings/
directory, defining the mappings you want to use.
These mappings can be specific to a single file, or a file pattern matching multiple similar files you want to process the same way.
Example:
The following file will instruct biquery-autoloader to load data from
export_{table}_{yyyyMMdd}.csv
into the{table}
table rather thanexport_{table}
.eg:
export_cities_20190506.csv
will be loaded into thecities
table// File: mappings/export_TABLE_yyyyMMdd.hbs {{#regex-match file.name "\/export_(?<TABLE_ID>.*)_\d{8}\.csv$" }} { "configuration.load.destinationTable.tableId": "{{TABLE_ID}}", "configuration.load.writeDisposition":"WRITE_TRUNCATE" } {{/regex-match}}
HJSON
For convenience, the resulting file is parsed using hjson
, so that the syntax is a bit more permissive.
You can use comments, or forget commas and quotes, hjson
will try (and most likely succeed) to parse your file.
DOT-OBJECT
dot-object is used to expand properties named with dot-notation.
This JSON object:
{ "configuration.load.destinationTable.datasetId": "myDataset", "configuration.load.destinationTable.tableId": "myTable" }will be parsed as:
{ "configuration": { "load": { "destinationTable": { "datasetId": "myDataset", "tableId": "myTable" } } } }
Any custom metadata of the file prefixed with bigquery.
will be added to the job configuration
Options specified in custom metadata take precedence and override existing configuration options if present
Dotted notation is used to define nested properties
Example: Changing the table name and dataset by setting custom metadata at upload time
gsutil -h "x-goog-meta-bigquery.configuration.load.destinationTable.datasetId: Test" \ -h "x-goog-meta-bigquery.configuration.load.destinationTable.tableId: City" \ cp "samples/cities_20190506.csv" "gs://bq-autoload/"Note: Custom metadata keys must be prefixed with
x-goog-meta-
when using gsutil to upload the file