----------------------------------------

Copyright 2018 Google LLC 

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

[http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.

----------------------------------------

# Energy Forecasting with AutoML Tables

To use this Colab notebook, copy it to your own Google Drive and open it with [Colaboratory](https://colab.research.google.com/) (or Colab). To run a cell hold the Shift key and press the Enter key (or Return key). Colab automatically displays the return value of the last line in each cell. Refer to [this page](https://colab.research.google.com/notebooks/welcome.ipynb) for more information on Colab.

You can run a Colab notebook on a hosted runtime in the Cloud. The hosted VM times out after 90 minutes of inactivity and you will lose all the data stored in the memory including your authentication data. If your session gets disconnected (for example, because you closed your laptop) for less than the 90 minute inactivity timeout limit, press 'RECONNECT' on the top right corner of your notebook and resume the session. After Colab timeout, you'll need to

1.   Re-run the initialization and authentication.
2.   Continue from where you left off. You may need to copy-paste the value of some variables such as the `dataset_name` from the printed output of the previous cells.

Alternatively you can connect your Colab notebook to a [local runtime](https://research.google.com/colaboratory/local-runtimes.html).

## 1. Project set up





Follow the [AutoML Tables documentation](https://cloud.google.com/automl-tables/docs/) to
* Create a Google Cloud Platform (GCP) project.
* Enable billing.
* Apply to whitelist your project.
* Enable AutoML API.
* Enable AutoML Talbes API.
* Create a service account, grant required permissions, and download the service account private key.

You also need to upload your data into Google Cloud Storage (GCS) or BigQuery. For example, to use GCS as your data source
* Create a GCS bucket.
* Upload the training and batch prediction files.


**Warning:** Private keys must be kept secret. If you expose your private key it is recommended to revoke it immediately from the Google Cloud Console.



---



## 2. Initialize and authenticate
This section runs intialization and authentication. It creates an authenticated session which is required for running any of the following sections.

### Install the client library
Run the following cell to install the client library using `pip`.

In [0]:
#@title Install AutoML Tables client library { vertical-output: true }

!pip install google-cloud-automl

### Authenticate using service account key
Run the following cell. Click on the 'Choose Files' button and select the service account private key file. If your Service Account key file or folder is hidden, you can reveal it in a Mac by pressing the <b>Command + Shift + .</b> combo.

In [0]:
#@title Authenticate using service account key and create a client. { vertical-output: true }

from google.cloud import automl_v1beta1

# Upload service account key
keyfile_upload = files.upload()
keyfile_name = list(keyfile_upload.keys())[0]
# Authenticate and create an AutoML client.
client = automl_v1beta1.AutoMlClient.from_service_account_file(keyfile_name)
# Authenticate and create a prediction service client.
prediction_client = automl_v1beta1.PredictionServiceClient.from_service_account_file(keyfile_name)

### Set Project and Location

Enter your GCP project ID.

In [0]:
#@title GCP project ID and location

project_id = 'energy-forecasting' #@param {type:'string'}
location = 'us-central1' #@param {type:'string'}
location_path = client.location_path(project_id, location)
location_path



---



## 3. Import training data

### Create dataset

Select a dataset display name and pass your table source information to create a new dataset.

In [0]:
#@title Create dataset { vertical-output: true, output-height: 200 }

dataset_display_name = 'energy_forecasting_solution' #@param {type: 'string'}

create_dataset_response = client.create_dataset(
    location_path,
    {'display_name': dataset_display_name, 'tables_dataset_metadata': {}})
dataset_name = create_dataset_response.name
create_dataset_response

### Import data

You can import your data to AutoML Tables from GCS or BigQuery. For this tutorial, you can use the [iris dataset](https://storage.cloud.google.com/rostam-193618-tutorial/automl-tables-v1beta1/iris.csv) as your training data. You can create a GCS bucket and upload the  data into your bucket. The URI for your file is `gs://BUCKET_NAME/FOLDER_NAME1/FOLDER_NAME2/.../FILE_NAME`. Alternatively you can create a BigQuery table and upload the data into the table. The URI for your table is `bq://PROJECT_ID.DATASET_ID.TABLE_ID`.

Importing data may take a few minutes or hours depending on the size of your data. If your Colab times out, run the following command to retrieve your dataset. Replace `dataset_name` with its actual value obtained in the preceding cells.

    dataset = client.get_dataset(dataset_name)

In [0]:
#@title Datasource in BigQuery { vertical-output: true }

dataset_bq_input_uri = 'bq://energy-forecasting.Energy.automldata' #@param {type: 'string'}
# Define input configuration.
input_config = {
    'bigquery_source': {
        'input_uri': dataset_bq_input_uri
    }
}

In [0]:
 #@title Import data { vertical-output: true }

import_data_response = client.import_data(dataset_name, input_config)
print('Dataset import operation: {}'.format(import_data_response.operation))
# Wait until import is done.
import_data_result = import_data_response.result()
import_data_result

### Review the specs

Run the following command to see table specs such as row count.

In [0]:
#@title Table schema { vertical-output: true }

import google.cloud.automl_v1beta1.proto.data_types_pb2 as data_types

# List table specs
list_table_specs_response = client.list_table_specs(dataset_name)
table_specs = [s for s in list_table_specs_response]
# List column specs
table_spec_name = table_specs[0].name
list_column_specs_response = client.list_column_specs(table_spec_name)
column_specs = {s.display_name: s for s in list_column_specs_response}
[(x, data_types.TypeCode.Name(
  column_specs[x].data_type.type_code)) for x in column_specs.keys()]

Run the following command to see column specs such inferred schema.

___

## 4. Update dataset: assign a label column and enable nullable columns

AutoML Tables automatically detects your data column type. For example, for the [Iris dataset](https://storage.cloud.google.com/rostam-193618-tutorial/automl-tables-v1beta1/iris.csv) it detects `species` to be categorical and `petal_length`, `petal_width`, `sepal_length`, and `sepal_width` to be numerical. Depending on the type of your label column, AutoML Tables chooses to run a classification or regression model. If your label column contains only numerical values, but they represent categories, change your label column type to categorical by updating your schema.

### Update a column: set as categorical

In [0]:
#@title Update dataset { vertical-output: true }

column_to_category = 'hour' #@param {type: 'string'}

update_column_spec_dict = {
    "name": column_specs[column_to_category].name,
    "data_type": {
        "type_code": "CATEGORY"
    }
}
update_column_response = client.update_column_spec(update_column_spec_dict)
update_column_response.display_name , update_column_response.data_type 


### Update dataset: assign a label and split column

In [0]:
#@title Update dataset { vertical-output: true }

label_column_name = 'price' #@param {type: 'string'}
label_column_spec = column_specs[label_column_name]
label_column_id = label_column_spec.name.rsplit('/', 1)[-1]
print('Label column ID: {}'.format(label_column_id))

split_column_name = 'split' #@param {type: 'string'}
split_column_spec = column_specs[split_column_name]
split_column_id = split_column_spec.name.rsplit('/', 1)[-1]
print('Split column ID: {}'.format(split_column_id))
# Define the values of the fields to be updated.
update_dataset_dict = {
    'name': dataset_name,
    'tables_dataset_metadata': {
        'target_column_spec_id': label_column_id,
        'ml_use_column_spec_id': split_column_id,
    }
}
update_dataset_response = client.update_dataset(update_dataset_dict)
update_dataset_response

___

## 5. Creating a model

### Train a model
Specify the duration of the training. For example, `'train_budget_milli_node_hours': 1000` runs the training for one hour. If your Colab times out, use `client.list_models(location_path)` to check whether your model has been created. Then use model name to continue to the next steps. Run the following command to retrieve your model. Replace `model_name` with its actual value.

    model = client.get_model(model_name)

In [0]:
#@title Create model { vertical-output: true }



model_display_name = 'energy_model' #@param {type:'string'}
model_train_hours = 12 #@param {type:'integer'}
model_optimization_objective = 'MINIMIZE_MAE' #@param {type:'string'}
column_to_ignore = 'date_utc' #@param {type:'string'}

# Create list of features to use
feat_list = list(column_specs.keys())
feat_list.remove(label_column_name)
feat_list.remove(split_column_name)
feat_list.remove(column_to_ignore)

model_dict = {
    'display_name': model_display_name,
    'dataset_id': dataset_name.rsplit('/', 1)[-1],
    'tables_model_metadata': {
      'train_budget_milli_node_hours':model_train_hours * 1000,
      'optimization_objective': model_optimization_objective,
      'target_column_spec': column_specs[label_column_name],
      'input_feature_column_specs': [
            column_specs[x] for x in feat_list]}
    }
    
create_model_response = client.create_model(location_path, model_dict)
print('Dataset import operation: {}'.format(create_model_response.operation))
# Wait until model training is done.
create_model_result = create_model_response.result()
model_name = create_model_result.name
create_model_result

In [0]:
#@title Model Metrics {vertical-output: true }

metrics= [x for x in client.list_model_evaluations(model_name)][-1]
metrics.regression_evaluation_metrics

![alt text](https://storage.googleapis.com/images_public/automl_test.png)

In [0]:
#@title Feature Importance {vertical-output: true }

model = client.get_model(model_name)
feat_list = [(x.feature_importance, x.column_display_name) for x in model.tables_model_metadata.tables_model_column_info]
feat_list.sort(reverse=True)
feat_list[:15]

![alt text](https://storage.googleapis.com/images_public/feature_importance.png)
![alt text](https://storage.googleapis.com/images_public/loc_portugal.png)
![alt text](https://storage.googleapis.com/images_public/weather_schema.png)
![alt text](https://storage.googleapis.com/images_public/training_schema.png)

___