-
Notifications
You must be signed in to change notification settings - Fork 3
Usage samples
Upload a CSV file to create a new table
The samples/cities_20190506.csv file :
- is a
CSV
file, separated with','
- has a header line
- has 127 records
Upload the file to the
biquery-autoload
GCS bucket$> gsutil cp samples/cities_20190506.csv gs://bq-autoload
-> A new Staging.cities
BigQuery table will automatically be created and populated with data from the CSV
- The column names are extracted from the header line
- The column types are infered from data
$> bq query-q "SELECT * FROM Staging.cities LIMIT 5" +------+------+------+-------+------+------+------+----+-----------------+-------+ | LatD | LatM | LatS | NS | LonD | LonM | LonS | EW | City | State | +------+------+------+-------+------+------+------+----+-----------------+-------+ | 39 | 45 | 0 | false | 75 | 33 | 0 | W | Wilmington | DE | | 41 | 15 | 0 | false | 77 | 0 | 0 | W | Williamsport | PA | | 26 | 43 | 11 | false | 80 | 3 | 0 | W | West Palm Beach | FL | | 36 | 40 | 11 | false | 121 | 39 | 0 | W | Salinas | CA | | 50 | 25 | 11 | false | 104 | 39 | 0 | W | Regina | SA | +------+------+------+-------+------+------+------+----+-----------------+-------+ $> bq query -q --format json "SELECT COUNT(*) as Count FROM Staging.cities" [{"Count":"127"}]
Upload a CSV file, adding data and a new column to the existing table
The samples/cities_20190507.csv file:
- has the same structure as samples/cities_20190506.csv
- has one additional
Comment
column - has 1 record
Upload the file to the biquery-autoload
GCS bucket
$> gsutil cp samples/cities_20190507.csv gs://bq-autoload
-> The Staging.cities
BigQuery table is updated:
- The
Comment
column has been created with aString
infered type, and anull
default value - The new line from the samples/cities_20190507.csv file has been added
$> bq query-q "SELECT * FROM Staging.cities LIMIT 5" +------+------+------+-------+------+------+------+----+-----------------+-------+------------------------------------------------------+ | LatD | LatM | LatS | NS | LonD | LonM | LonS | EW | City | State | Comment | +------+------+------+-------+------+------+------+----+-----------------+-------+------------------------------------------------------+ | 41 | 5 | 59 | false | 80 | 39 | 0 | W | Youngstown | OH | This is a new line appended from cities_20190507.csv | | 39 | 45 | 0 | false | 75 | 33 | 0 | W | Wilmington | DE | NULL | | 41 | 15 | 0 | false | 77 | 0 | 0 | W | Williamsport | PA | NULL | | 26 | 43 | 11 | false | 80 | 3 | 0 | W | West Palm Beach | FL | NULL | | 36 | 40 | 11 | false | 121 | 39 | 0 | W | Salinas | CA | NULL | +------+------+------+-------+------+------+------+----+-----------------+-------+------------------------------------------------------+ $> bq query -q --format json "SELECT COUNT(*) as Count FROM Staging.cities" [{"Count":"128"}]
Upload a CSV file, using custom metadata to specify an arbitrary table, and specify TRUNCATE
mode to replace all existing data
The samples/export_cities.20190508.csv file:
- has the the same structure as samples/cities_20190506.csv (eg: no
Comment
column) - has 10 records
Uploading this file as is would normally cause a new export_cities
table to be created.
We will upload this file with additional custom metadata to override this behaviour
$> gsutil -h "x-goog-meta-bigquery.configuration.load.destinationTable.datasetId: Cities" \ -h "x-goog-meta-bigquery.configuration.load.writeDisposition: WRITE_TRUNCATE" \ cp samples/export_cities.20190508.csv gs://bq-autoload
-> The Staging.cities
BigQuery table is updated:
- All existing rows have been deleted (due to the
WRITE_TRUNCATE
mode being used), and the 10 records from the CSV file have been inserted - The
Comment
column has been deleted
$> bq query-q "SELECT * FROM Staging.cities LIMIT 5" +------+------+------+-------+------+------+------+----+-----------------+-------+ | LatD | LatM | LatS | NS | LonD | LonM | LonS | EW | City | State | +------+------+------+-------+------+------+------+----+-----------------+-------+ | 41 | 5 | 59 | false | 80 | 39 | 0 | W | Youngstown | OH | | 39 | 45 | 0 | false | 75 | 33 | 0 | W | Wilmington | DE | | 41 | 15 | 0 | false | 77 | 0 | 0 | W | Williamsport | PA | | 26 | 43 | 11 | false | 80 | 3 | 0 | W | West Palm Beach | FL | | 36 | 40 | 11 | false | 121 | 39 | 0 | W | Salinas | CA | +------+------+------+-------+------+------+------+----+-----------------+-------+ $> bq query -q --format json "SELECT COUNT(*) as Count FROM Staging.cities" [{"Count":"10"}]
Upload a CSV file without header into a new table, configure columns names and types using a mappings file
The samples/cities_noheader_20190506.csv file:
- has the the same structure as samples/cities_20190506.csv
- has no header line
- has 10 records
Uploading this file as is would normally result into a new cities_noheader
table being created with generic columns names like {inferedType}_field_{n}
$> gsutil cp samples/cities_noheader_20190506.csv gs://bq-autoload $> bq query -q "SELECT * FROM Staging.cities_noheader LIMIT 1" +---------------+---------------+---------------+--------------+-----+ | int64_field_0 | int64_field_1 | int64_field_2 | bool_field_3 | ... | +---------------+---------------+---------------+--------------+-----+ | 39 | 45 | 0 | false | ... | +---------------+---------------+---------------+--------------+-----+
The samples/mappings/cities_noheader_yyyyMMdd.json file defines a mapping matching the header-less CSV file name (/\/cities_noheader_\d{8}\.csv$/
) and specifies the column names and types to be used.
// File: mappings/cities_noheader.hbs { {{#regex-match file.name 'cities_noheader_\d{8}\.csv$'}} configuration: { load: { schema: { fields: [ { name: "LatD", type: "INTEGER" }, { name: "LatM", type: "INTEGER" }, { name: "LatS", type: "INTEGER" }, { name: "NS", type: "BOOLEAN" }, { name: "LonD", type: "INTEGER" }, { name: "LonM", type: "INTEGER" }, { name: "LonS", type: "INTEGER" }, { name: "EW", type: "STRING" }, { name: "City", type: "STRING" }, { name: "State", type: "STRING" } ] } } } {{/regex-match}} }
Uploading this mapping file to the mappings
directory of the autoload GCS bucket will allow the Cloud Function to be parsed, and the configuration to be modified for all files matching the cities_noheader_yyyyMMdd.csv
pattern.
Subsequent upload of the samples/cities_noheader_20190506.csv file will now yield a different result
$> gsutil cp samples/mappings/cities_noheader_yyyyMMdd.json gs://bq-autoload/mappings/cities_noheader_yyyyMMdd.json $> gsutil cp samples/cities_noheader_20190506.csv gs://bq-autoload
-> The Staging.cities_noheaders
BigQuery table is created:
- The column names and types are extracted from the load job configuration
$> bq query -q "SELECT * FROM Staging.cities_noheader LIMIT 1" +------+------+------+-------+------+------+------+----+----------+-------+ | LatD | LatM | LatS | NS | LonD | LonM | LonS | EW | City | State | +------+------+------+-------+------+------+------+----+----------+-------+ | 49 | 52 | 48 | false | 97 | 9 | 0 | W | Winnipeg | MB | +------+------+------+-------+------+------+------+----+----------+-------+