# 📚 Tecton Self-Guided Quickstart

In this quickstart, you will:

1. Explore the Feature Library, where Data Scientists share Features
2. Add a New Feature to the Feature Library
3. Generate Training Data using a Feature Service
4. Use a Feature Service to call Production Data at Prediction Time

## What is Tecton?

Tecton is an Enterprise Feature Store. It empowers data scientists and engineers to:

1. Build great features from batch, streaming, and real-time data
2. Share and re-use features to build better models faster
3. Deploy and serve features in production with confidence

### ✅ 0.1) Attach to a cluster and run the cell below to import the Tecton SDK and disable noisy log messages.

First make sure to attach to the `notebook-cluster` Databricks cluster by selecting it in the top-left dropdown in this notebook.

To run cells in this notebook, click on the cell and press `shift + enter`.

In [4]:
import logging
# Databricks default py4j logging pollutes cell outputs
logging.getLogger('py4j').setLevel(logging.ERROR)

# Install Tecton
import tecton
import pandas as pd

# Initialise Tecton
tecton.version.summary()

### ✅ 1) Exploring the Feature Library

🔑 **Concept: Feature Library**

The Feature Library is where you can go to review your organization's registered features. You can access it through the Web UI by clicking on the Features button. There, you'll get an overview of all the features in the Library.

In your Tecton Web UI, click on the tab "Features."

<img src='https://github.com/tecton-ai-ext/ad-serving-tutorial/raw/master/example_notebooks/img/feature_library.PNG' width='80%'/>

Reviewing the Web UI, we can see a few important questions that can already be answered. For example:
- Who created the feature, and when?
- Has that data been persisted for training? For serving? This is the materialization status.
- What will the feature help predict? This is the entity of the feature.

Let's go into one feature, `partner_ctr_performance`. This is a feature we're using to understand the relative clickthrough rates of different partner sites. 

<img src='https://github.com/tecton-ai-ext/ad-serving-tutorial/raw/master/example_notebooks/img/feature_overview.PNG' width='80%'/>

Once in this feature, we'll be able to answer more detailed questions, such as:

- Is this feature produced from batch, streaming, or request-time data?
- What is the logic of the transformation?
- What is the health of this feature? Have all processing jobs been successful?
- What are the distributions of the values associated with this feature?

Let's look at the different tabs to answer reach of the questions above.

**Transformations Tab**

*Here, we can answer:*
- *Is this feature produced from batch, streaming, or request-time data?*
- *What is the logic of the transformation?*

<img src='https://github.com/tecton-ai-ext/ad-serving-tutorial/raw/master/example_notebooks/img/feature_transformations.PNG' width='60%'/>

Looking at the above tab, this feature is using a batch data source. We can also see the SQL query that generates the feature - this makes it easy to see the quality of the code, and, if being used for a new model, whether the logic may be applicable. 


**Materialization Tab**

*Here, we can answer: What is the health of this feature? Have all processing jobs been successful?*

<img src='https://github.com/tecton-ai-ext/ad-serving-tutorial/raw/master/example_notebooks/img/feature_materialization.PNG' width='60%'/>

From this tab, we can see all jobs have completed successfully, as the progress bar is green. We can also see how the processing jobs are being managed, from the underlying data source, to being stored in the Offline and Online Feature Stores.

**Statistics Tab**

*Here, we can answer: What are the distributions of the values associated with this feature?*

<img src='https://github.com/tecton-ai-ext/ad-serving-tutorial/raw/master/example_notebooks/img/feature_statistics.PNG' width='60%'/>

This tab helps us get a sense of any outliers, nulls, and the general shape of the feature's values.

If you are reviewing features created by your colleagues, this information helps you as a data scientist determine whether you trust and want to use this feature for building your own models.

🛑 **Exercises for Exploring the Feature Library**

*If you'd like a bit more practice navigating the Feature Library, we've included a few prompts below:*

- Review the Feature Package `user_ads_frequency_counts` - what is the underlying data source?
- How many features are generated in the `user_ads_frequency_counts` Feature Package?
- What does each feature in `user_ads_frequency_counts` represent? (Hint: The names involve operations, time windows, and slide intervals. It may be helpful to review the python file in the Feature Repository associated with this feature.)
- `user_ads_frequency_counts` is one of three Feature Packages in the Feature Service `ctr_prediction_service`. What are the other two?
- Why doesn't the online feature `ad_is_displayed_as_banner` have materialization enabled? (Hint: It has to do with when the feature values are computed.)

### ✅ 2) Adding a New Feature Variant

🔑 **Concept: Variants**

Variants are a helpful way of organizing Feature Packages that have similar and related logic. Variants are often created when a new data scientist identifies a Feature Package they want to use for their model, but slight tweaks are required.

You can find variants in the Web UI next to a feature's name. For example, reviewing `partner_ctr_performance` highlights there are four variants of this feature - the calculation over different timeframes of 7d, 14d, 30d, and 60d. 

<img src='https://github.com/tecton-ai-ext/ad-serving-tutorial/raw/master/example_notebooks/img/feature_variant.png' width='60%'/>

We are going to create our own variant for the feature `user_ads_impression_counts`. We'll do this by going back to the Feature Repository we cloned when setting up. There, navigate to the `user_ads_impression_counts` file. You'll see that a variant has already been created, but is commented out. Let's uncomment it and see what it means. 

<img src='https://github.com/tecton-ai-ext/ad-serving-tutorial/raw/master/example_notebooks/img/feature_variant_update.PNG' width='60%'/>

Here, we want to explore a different sliding window and ranges. We've changed the sliding window to 6h and have changed the window intervals to daily metrics for the first 3 days. This change still exists locally, but has yet to be made to our production environment. 

**To have this feature be available in the Web UI, we'll need to go back to the CLI and run `tecton apply`.**

Once this is done, we accept the change and check the feature in the Web UI. 

<img src='https://github.com/tecton-ai-ext/ad-serving-tutorial/raw/master/example_notebooks/img/feature_variant_updated.png' width='60%'/>

We can see that the new feature variant has been added. This allows us to track closely related changes, and give other data scientists the ability to access both variants when developing their own models.

### ✅ 3) Using a Service for Training

🔑 **Concept: FeatureServices**

A FeatureService is used to expose an API for accessing FeaturePackages for training and serving. Typically, each deployed model will have one FeatureService to serve features. A Tecton FeatureService can be used both for generating training data sets and for serving feature values for batch and real-time predictions.

Looking through the UI, we already have a sample Service that we can use: `ctr_prediction_service`. 

<img src='https://github.com/tecton-ai-ext/ad-serving-tutorial/raw/master/example_notebooks/img/feature_service.PNG' width='60%'/>

We can see that some features from Feature Packages have been added to this service, we can see what a `curl` call would look like if we wanted to fetch the real-time values associated with these features. There is also a monitoring tab to understand how the endpoint is being used and whether you are meeting the SLOs you have for your model and customers.

Let's go back to the notebook and see how this can be used to develop a training set. At a basic level, a training set contains four things:

- **Index:** What you are predicting on
- **Prediction Cutoff:** The point in time you want to make a prediction. For inference, this is "now"; in training sets, it is some historical date for backtesting.
- **Target/Label:** What you are trying to predict, usually within a given time window
- **Feature Values:** What are the values of your features at the prediction cutoff time, which presumably have some correlation with the target

You've already loaded sample events as a data source, named `sample_events_for_model`. Let's see what it looks like - call this in your notebook:

In [17]:
ds = tecton.get_virtual_data_source('sample_events_for_model')
ds.preview()

When reviewing this, you can see, for a given event - a user and an ad at a given time - whether there was a clickthrough or not. Let's use this with the group of features associated with the Feature Service to create a training set.

In [19]:
fs = tecton.get_feature_service('ctr_prediction_service')
training_data = fs.get_feature_dataframe(ds.dataframe())
display(training_data.to_spark())

🛑 **Three things to notice:** 
* If you give Tecton a list of features you want to test, we'll build the features for you with time management in mind. That is, we'll make sure that these feature values are accurate with respect to the cutoff point and there is no target leakage. 
* This training set is a combination of streaming and batch data sources into one training table, something that is traditionally a difficult undertaking. 
* This all was done with two lines of code! It is very quick to begin building your models, defining your model metrics, and iterate on different algorithms and hyperparameters because the training set generation is a quick process.

### ✅ 4) Using a Feature Service for Prediction

To provide the model with the data it needs to make a prediction, your application will make a request to Tecton with the index values it needs - Tecton will return the feature values data in a JSON object. Let's do that together.

Assuming we've trained this model and put it into production, we will be making requests based on `ad_id` and `user_uuid`. You can take two values for this by referring to the training set you just created.

In [23]:
from pprint import pprint

keys = {
  'ad_id': '',
  'user_uuid': ''
}

response = fs.get_feature_vector(keys).to_dict()
pprint(response)

What you now see is a feature vector - all of the feature values for these features as of this exact moment in time. Note that these values are different from your training set - which were calculated for a historical point in time. Once you define a Feature Service in Tecton - it automatically can be accessible for scoring in production!

## Conclusion

In this walkthrough we:
1. Learned Tecton key concepts
2. Launched a Tecton development environment
3. Explored the Feature Library, where Data Scientists share Features
4. Added a New Feature to the Feature Library
5. Generated Training Data using a Feature Service
6. Used a Feature Service to call Production Data at Prediction Time

Now you are ready to define, serve, and monitor feature pipelines and services using Tecton!

## What's next?

Try Taking the Next Step! That's the extended part of our tutorial - where you create features and add new services.

### Tecton Documentation
- <a href="https://docs.tecton.ai" target="_blank">Tecton Docs</a>
- <a href="https://s3-us-west-2.amazonaws.com/tecton.ai.public/documentation/tecton-py/index.html" target="_blank">Tecton Python SDK Docs</a>