bq-schema-validator is a CLI tool to detect the log entry that causes a BigQuery table schema error via log sink.
BigQuery creates a schema (field, type, etc) in a table when a log router sends container logs to the table for the first time. If the type of field sent to the table is different from original one, a log sink error takes place. For example, a log including log_id
field which is string type is sent and then type is changed to int in the application. When the log is sent, the error occurs due to a type mismatch, so we need to detect the change. But, we only can see the error via an email and an activity stream when it happens for the first time so that it's hard to find.
To solve this issue, this CLI detects the log of which field type is different from the existing type in the schema. As fsr as I know, the schema mismatch usually happens at a top level field, so nested fields aren't checked.
- Permission to view the container log
- Permission to view the table schema (
roles/bigquery.dataViewer
for the GCP project) gcloud auth application-default login
$ kubectl logs POD_NAME [CONTAINER_NAME] | bq-schema-validator -project GCP_PROJECT_ID -dataset BIGQUERY_DATASET_ID -table TABLE_ID
$ bq-schema-validator -project GCP_PROJECT_ID -dataset BIGQUERY_DATASET_ID -table TABLE_ID -file FILE_PATH_FOR_CONTAINER_LOGS
-dataset string
[Required] A name of BigQuery dataset
-file string
[Optional] A file path
-ignore string
[Optional] Ignore field when comparing log and schema
-project string
[Required] A name of GCP project
-table string
[Required] A name of BigQuery table
-target string
[Optional] Target field when comparing log and schema
Download the binary from GitHub Releases and drop it in your $PATH
Data types in BigQuery schema
Name | Data Type | Description |
---|---|---|
Integer | INT64 | Numeric values without fractional components |
Floating point | FLOAT64 | Approximate numeric values with fractional components |
Numeric | NUMERIC | Exact numeric values with fractional components |
BigNumeric (Preview) | BIGNUMERIC | Exact numeric values with fractional components |
Boolean | BOOL | TRUE or FALSE (case insensitive) |
String | STRING | Variable-length character (Unicode) data |
Bytes | BYTES | Variable-length binary data |
Date | DATE | A logical calendar date |
Date/Time | DATETIME | A year, month, day, hour, minute, second, and subsecond |
Time | TIME | A time, independent of a specific date |
Timestamp | TIMESTAMP | An absolute point in time, with microsecond precision |
Struct (Record) | STRUCT | Container of ordered fields each with a type (required) and field name (optional) |
Geography | GEOGRAPHY | A pointset on the Earth's surface (a set of points, lines and polygons on the WGS84 reference spheroid, with geodesic edges) |