### Install the tecton SDK and setup access

Credentials configured using `tecton.set_credentials()` are scoped to the notebook session. They will need to be reconfigured whenever a notebook is restarted or state is cleared. To read SDK credentials from the environment, it is recommended to store the API key in a Secrets Manager.

Set `_TOKEN_` and  `_DEPLOYMENT_NAME_` with your Tecton token and deployment name.

PySpark kernel is required for retrieving features.

_NOTE: You may need to restart the kernel after installing `tecton`, or the import will fail._

In [None]:
%pip install tecton~=0.8.0

In [None]:
import tecton

_TOKEN_ = ""
_DEPLOYMENT_NAME_ = ""

tecton.set_credentials(
    tecton_api_key=_TOKEN_,
    tecton_url=f"https://{_DEPLOYMENT_NAME_}.tecton.ai/api",
)


## Notebook Driven Development
Any Tecton object can be defined and validated in a notebook. 

https://docs.tecton.ai/docs/the-feature-development-workflow


### Define and validate a Big Query data source
Define a data source for retrieving top terms by international regions.

https://docs.tecton.ai/tecton-on-gcp-datasources#connecting-to-bigquery

In [None]:
from tecton import *

INTERNATIONAL_TOP_TERMS_TABLE = "bigquery-public-data.google_trends.international_top_terms"

@spark_batch_config()
def data_source_function(spark):
    df = (
        spark.read.format("com.google.cloud.spark.bigquery")
        .option("table", INTERNATIONAL_TOP_TERMS_TABLE)
        .load()
    )
    return df

top_terms = BatchSource(name='top_terms', batch_config=data_source_function)
top_terms.validate()

In [None]:
top_terms.get_dataframe().to_spark().limit(10).show()

### Define an Entity and an Aggregate Feature View

A Tecton Entity is used to organize and join features.

Feature Views take in data sources as inputs and define a transformation to compute one or more features.

https://docs.tecton.ai/docs/defining-features

In this example, we are defining an aggregate feature to compute a sum of term scores over a 30 day period by region.

In [None]:
from datetime import timedelta

entity = Entity(name='region_name')

@batch_feature_view(
    mode='spark_sql',
    entities=[entity],
    sources=[top_terms],
    aggregations=[
        Aggregation(
            column='score',
            function='sum',
            time_window=timedelta(days=30)
        )
    ],
    aggregation_interval=timedelta(days=1),
)
def scores_by_region(t):
    return f"""
        SELECT 
            region_name,
            score,
            to_timestamp(refresh_date) AS timestamp 
        FROM {t}
    """

scores_by_region.validate()

#### Retrieve historical features.

https://docs.tecton.ai/docs/0.7/reading-feature-data/reading-feature-data-for-training/constructing-training-data

In [None]:
import pandas

# Retrieve historical features by entities.
entities = pandas.DataFrame({"region_name": ["Hanoi", "Manitoba"]})
historical_features = scores_by_region.get_historical_features(entities=entities)
# Display the query plan.
historical_features.explain()

In [None]:
# show features
historical_features.to_spark().orderBy("timestamp").limit(10).show()

In [None]:
# Point-in-time spine for feature retrieval
spine_df = spark.sql('SELECT TIMESTAMP("2022-12-31T00:00:01Z") AS timestamp, "Hanoi" AS region_name')
spine_df.show()

In [None]:
# Retrieve point-in-time features per entity.
spine_historical_features = scores_by_region.get_historical_features(spine=spine_df)
spine_historical_features.explain()

In [None]:
spine_historical_features.to_spark().show()