# Introduction 

This worksheet, provides a range of fill in the blank, learning prompts and other activities to engage users in their initial steps within using Python within GCP. 

This worksheet is broken down into several core sections, corresponding with those sections covered within the demonstration. Please note, as section one explore the BigQuery Studio environment, coding questions will not be provided for this section, rather only prompts to encourage your user experience with the upcoming deep dive workshop. 

Please note, there is an accompanying Solutions Sheet, to provide the answers as needed!

## Section 1: Navigating Data in BigQuery Studio 

Using the BigQuery Studio (and potentially going beyond the dataset we explored within the session), explore the metadata (variable names, previews etc) of various datasets available within the GCP environment. 

## Section 2a: Package Installation and Setup 
Run the following code, to prepare the environment and load in the relevant packages: 

In [None]:
# Install a pip package in the current Jupyter kernel
# here the package is bigframes https://pypi.org/project/bigframes/
import sys
!{sys.executable} -m pip install bigframes | grep -v 'already satisfied'

import bigframes.pandas as bpd
import matplotlib.pyplot as plt
import pandas as pd

# import warnings filter & ignore all future warnings
# this is for teaching purposes only, to avoid FutureWarnings to do with bigframe compiler implementation
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)

## Section 3a: Load in Data
Load in the Fire Brigade Data from GCP. 

In [None]:
# Fill in the Blanks

# Load data from BigQuery
query_or_table = "bigquery-public-data.london_fire_brigade.---" # [Insert the table name here]
bq_df = bpd.read_gbq(---, use_cache=False) # [Insert Variable Name Here]

# Double Check the column names 
list(---) # [Insert Read in GBQ Data]

## Section 3b: Using Magic Commands to Manipulate Data 

Complete the following magic command to explore the number of Outdoor Fires by Hour of Call made. This will produce a breakdown per hour of call, for the quantity of fires call outs which happened under the Outdoor Category. 

In [None]:
%%bigquery hour_of_call_outdoorcategory
SELECT
    ----, # [Insert Variable here]
    COUNT(DISTINCT incident_number) as incident_number_count
FROM 
    ---- # [Insert Dataset]
WHERE 
    property_category = "Outdoor"
GROUP BY 
    ---- # [Insert Dataset]
ORDER BY 
    hour_of_call ---; # [Insert Direction (ASC or DESC)]

In [None]:
# Lets check what results you got!

hour_of_call_outdoorcategory

Next up, have a go at producing a breakdown per incident group, for those reported within a dwelling (property category), using your knowledge of SQL and magic commands to manipulate the data. 

In [None]:
%%bigquery hour_of_call_dwellingcategory

## Section 3c: Basic Descriptive Statistics

As we have already explored all of the available numerical variables for the given dataset, lets jump across to a slightly different dataset to explore this skill further. 

Using the staged prompts available, explore the dataset "
bigquery-public-data.noaa_hurricanes.hurricanes", particularly exploring the numerical data and producing basic descriptive statistics for: 

- wmo_wind 
- wmo_pressure 
- usa_wind
- usa_pressure
- cma_presure 

Please check the solution sheet for our answer, however, like most programming, there will always be multiple ways to conduct this analysis! 


In [None]:
%%bigquery hurricane_description
SELECT 
    ----, # [Insert Variables here]
FROM 
    bigquery-public-data.noaa_hurricanes.hurricanes

In [None]:
hurricane_description.------ # [Insert describe function]

## Section 3d: Building Cross Tabs 
Lets build some more complex crosstabs, for this we will explore the 'stop_code_description', and 'property_category'. 

In [None]:
%%bigquery crosstab_explore_data_stopcategory
SELECT 
    -----, # [Insert Variables]
FROM 
    bigquery-public-data.london_fire_brigade.fire_brigade_service_calls

In [None]:
# Crosstab in Pandas 
## https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html
pd.crosstab(
    ------,
    ------) # [Insert the two elements needed]

Taking this further, we can now explore the breakdown between "stop_code_description" and "property_type".

In [None]:
%%bigquery crosstab_explore_data_stoptype
SELECT 
    -----, # [Insert Variables]
FROM 
    bigquery-public-data.london_fire_brigade.fire_brigade_service_calls

In [None]:
# Crosstab in Pandas 
## https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html
pd.crosstab(
    ------,
    ------) # [Insert the two elements needed]

## Section 4: Data Visualisation 

Once again using the results from Section 3b, we can create visualisations using matplotlib! 

In [None]:
# Bar Chat 1: 
    # For this we will want to call hour of call vs incident number count 
plt.bar(hour_of_call_outdoorcategory.----, # [Call our X Variable]
        hour_of_call_outdoorcategory.----) # [Call our Y variable]
# Label our X and Y axis and Title 
plt.xlabel('----') # [Give a relevant X Label 
plt.ylabel('----') # [Give a relevant Y Label]
plt.title('Number of Incidents recorded for Outdoor Callouts per Hour')

In [None]:
# Bar Chat 2: 
    # For this we will want to call Incident Group vs incident number count 

## Extension

If you would like to explore these skills and explore further, our suggestion is to rerun these exercises with the Hurricanes Dataset used within Section 3c. Focusing on variables such as 'mlc_class', 'mlc_wind' and 'mlc_pressure' and how this varies from other metrics such as nadi, usa and cma. These however will not have solutions here, but we will be happy to discuss them during the deep dive to help fuel your imagination! 
