In [None]:
library(bigrquery)

# Intro to bigrquery

The following is a quick intro to the `bigrquery` R library. More specifically, its a guide to getting your data out of the BigQuery database and into your R notebooks. The workflow we'll be using requires your queries to be scripted in SQL: 

## SQL Basics

Mostly you'll be sending instructions to the BigQuery databse in the form of `SELECT * FROM`. If that means absolutely nothing to you, or you want to brush up your SQL skills, I recommend the following courses on Kaggle. They're particularly good in this case as they are taught using the BigQuery SQL syntax, so it's more-or-less exactly the tool you'll be using here. The lessons are scripted using the python BigQuery API, but don't let that put you off - the actual nuts-and-bolts bits of the lessons, the bits that you'll be interacting with are all SQL Hey, it might be useful as an intro to the Python API if you ever want to use that (the python API is leagues better than the R one, so I'd recommend using that if you ever want to properly pipeline any data using BigQuery).

* Intro to SQL - https://www.kaggle.com/learn/intro-to-sql
* Advanced SQL - https://www.kaggle.com/learn/advanced-sql

On a related note, if you're not familiar with Kaggle I highly recommend giving it a look. It's an interactive data science platform where you can build and run code, look at other people's work, and enter machine learning competitions. It's an amazing resource for anybody interested in data science (and particularly machine learning)

## Downloading a Table from BigQuery

Once you have SQL down, pick one of the tables in your BigQuery environment to take a gander at. You'll need the full table id, so thats:

1. the project id - that is always "yhcr-prd-phm-bia-core" (catchy eh?!)
2. your dataset id - usually in the form "CY_XXXX_surname", the Xs being numbers and "surname" being your surname. You might have access to other datasets with a totally different naming structure, but you get the idea
3. your table id - this will differ depending on the table

You should end up with a table ID that looks something like "yhcr-prd-phm-bia-core.CY_1715_example_surname.tbl_example_table". You can then stick it in this basic SQL query to pull the whole table:

```
SELECT * 
FROM `yhcr-prd-phm-bia-core.CY_1715_example_surname.tbl_example_table`
```

**Quick warning**: Always stick your table names inside backticks \`like this\`. BigQuery can get confused with hyphens (-) in queries and, helpfully, YHCR have decided to give us a project name with hyphens in it, so the backticks avoid the database engine thinking the backticks are subtraction operators. 

Once you have your sql query, store it as a string like so:

In [None]:
sql_query <- "SELECT * FROM `yhcr-prd-phm-bia-core.CY_1715_example_surname.tbl_example_table`"

You can then send your query to the BigQuery database engine like so:

In [None]:
project_id <- "yhcr-prd-phm-bia-core"

my_query <- bq_project_query(project_id, sql_query)

That tells BigQuery to run your SQL instructions, and store the output in a temporary location in the cloud that `my_query` points to. Then, you can download the information in `my_query` like so:

In [None]:
table <- bq_table_download(my_query)

And that's all there is to it really! The table you've downloaded your table into an R "tibble" object - basically the dataframe that you're all used to. You can now go ahead and use it like you would any other dataframe you've loaded. Enjoy!