ADS Sample Notebook.

Copyright (c) 2021 Oracle, Inc. All rights reserved. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.

***
# <font color=red>Feature Type Manager</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color=teal>Oracle Cloud Infrastructure Data Science Service Team</font></p>

***


# Overview:

The feature type system allows data scientists to separate the concept of how data is represented physically from what the data actually measures. The data can have feature types that classify the data based on what it represents and not how the data is stored in memory. Each set of data can have multiple feature types through a system of multiple inheritances. Feature type warnings are used for rapid validation of the data. The feature type validators are a set of methods that return a boolean Pandas series that indicates what values meet the validation criteria. The feature type manager provides the tools to manage the handlers that are used to drive this system. The system works by creating functions that are then registered as feature type validators or warnings. The role of the `feature_type_manager` is to provide the interface to manage these handlers.

---

## Prerequisites:
- Experience with a specific topic: Intermediate
- Professional experience: Basic

---

## Objectives:

- <a href="#overview">Feature Type System</a>
    - <a href="#feature_type">Feature Type</a>
    - <a href="#feature_type_list">List Feature Types</a>
    - <a href="#feature_type_object">Feature Type Object</a>
    - <a href="#feature_type_register">Register a Custom Feature Type</a>
    - <a href="#feature_type_unregister">Unregister a Custom Feature Type</a>
    - <a href="#feature_type_rest">Unregister All Custom Feature Type</a>
    - <a href="#warnings">Feature Type Warnings</a>
    - <a href="#warnings_list">List Feature Type Warning</a>
- <a href="#reference">References</a>

---

**Important:**

Placeholder text for required values are surrounded by angle brackets that must be removed when adding the indicated content. For example, when adding a database name to `database_name = "<database_name>"` would become `database_name = "production"`.

---

<font color=gray>Datasets are provided as a convenience. Datasets are considered third-party content and are not considered materials under your agreement with Oracle applicable to the services. The [`orcl_attrition` dataset](oracle_data/UPL.txt) is distributed under the UPL license.
</font>

In [None]:
import ads
import pandas as pd

from ads.feature_engineering import feature_type_manager, FeatureType

<a id="overview"></a>
# Feature Type System

The feature type system allows the data scientist to separate the concept of how data is represented physically from what the data actually measures. The data can have feature types that classify the data based on what it represents and not how the data is stored in memory. Each feature can have multiple feature types through a system of multiple inheritances. For example, an organization that sells cars might have a set of data that represents their purchase price of a car (the wholesale price). This could have a feature set of `wholesale_price`, `car_price`, `USD`, and `continuous`. This multiple inheritance allows a data scientist to create <a href="#warnings">feature type warnings</a> for each feature type.

Feature type warnings are used for rapid validation of the data. For example, the `wholesale_price` might have a method that ensures that the value is a positive number because you can't purchase a car with negative money. The `car_price` feature type might have a check to ensure that it is within a reasonable price range. `USD` can check the value to make sure that it represents a valid US dollar amount and it isn't below one cent. The `continuous` feature type is the default type, and it represents the way the data is stored internally.

The feature type validators are a set of `is_*` methods, where `*` is generally the name of the feature type. For example, the method `.is_wholesale_price()`can create a boolean Pandas series that indicates what values meet the validation criteria. It allows you to quickly identify which values need to be filtered or require future examination into problems in the data pipeline. The feature type validators can be as complex as they need to be. For example, they might take a client ID and call an API to validate each client ID is active.

The feature type manager provides the tools to manage the handlers that are used to drive this system. The system works by creating functions that are then registered as feature type validators or warnings. The role of `feature_type_manager` is to provide the interface to manage these handlers.


<a id="feature_type"></a>
# Feature Type

Pandas dtypes are physical data types that indicate how data are stored. You can call `.dtype` on your Pandas dataframe or series to inspect the physical types. Feature types are the logical types that define how the data should be interpreted by the end user. Feature types categorize the features from the machine learning perspective. Different feature types could be the same physical type. For example, both categorical and ordinal can be an integer dtype. However, the difference between `categorical` and `ordinal` feature types is that `ordinal` features have an ordering while `categorical` features don't.

ADS allows a set of data to have multiple feature types through a system of inheritance. For example, a hospital may have a medical record number for each patient. That data might have the feature types `patient_id`, `id`, and `integer`. The `patient_id` is the child feature type with `id` being its parent. The `integer` is the parent of the `id` feature type. It is also the last feature type in the inheritance chain and is called the default feature type.

In addition to the regular feature types, there are two special versions. The default type is based on the Pandas dtype and cannot be changed without changing the Pandas dtype. There is no need to set it because it is always the last feature type in the inheritance chain. The tag feature type does not support feature type warning nor feature type validators. It is designed to allow you to tag data with extra information.


<a id="feature_type_list"></a>
## List Feature Types

Calling `feature_type_manager.feature_type_registered()`, gives an overview of all the registered feature types. ADS comes with various common feature types, but the idea is that you create feature types that explicitly define your data.

`feature_type_manager.feature_type_registered()` returns a dataframe with these columns:

- `Class`: Registered feature type class.
- `Name`: Feature type class name.
- `Description`: Description of each feature type class.

In [None]:
feature_type_manager.feature_type_registered()

<a id="feature_type_object"></a>
## Feature Type Object

Feature type objects are derived from the `FeatureType` class. Obtaining a feature type object allows access to manipulate the feature type validators and feature type warnings that are associated with a given feature type. A feature type object is loaded using `feature_type_manager.feature_type_object()` method and providing the its feature type name. For example, the `PhoneNumber` feature type is loaded by following this approach `PhoneNumber = feature_type_manager('phone_number')`.

In [None]:
PhoneNumber = feature_type_manager.feature_type_object('phone_number')

<a id="feature_type_register"></a>
## Register a Custom Feature Type

The feature type framework comes with some common feature types. However, the power of using feature types is that you can easily create your own, and apply them to your specific data. You don't need to try to represent your data in a synthetic way that does not match the nature of your data. This framework allows you to create methods that validate whether the data fits the specifications of your organization.

To create a custom feature type, you need to create a class that is inherited from the `FeatureType` class. The class must be registered with ADS before it can be used. You do this using the `feature_type_manager.feature_type_register()` method and pass in the name of the class.

In the next cell, the custom feature type, `MyFeatureType`, is created and is inherited from the `FeatureType` base class. You can add an optional description. You can add various attributes and methods to the class, but none of them are required. The next cell also registers the class. If the class is already registered, an exception occurs.

In [None]:
class MyFeatureType(FeatureType):
    description = "This is an exmaple of custom feature type."
    

try:
    feature_type_manager.feature_type_register(MyFeatureType)
except:
    pass
feature_type_manager.feature_type_registered()

<a id="feature_type_unregister"></a>
## Unregister a Custom Feature Type

Custom feature types can be unregistered from ADS using the feature type name and the `feature_type_manager.feature_type_unregister()` method. Builtin feature types can't be unregistered.

The next cell unregisters the `MyFeatureType` class using the `my_feature_type` feature type name . It also displays the list of registered classes and the fact that `MyFeatureType` was removed. 

In [None]:
try:
    feature_type_manager.feature_type_unregister('my_feature_type')
except:
    pass
feature_type_manager.feature_type_registered()

<a id="feature_type_rest"></a>
## Unregister All Custom Feature Types

The `feature_type_manager.reset()` is used to unregister all custom feature types. The next cell registers the `MyFeatureType`, and checks to ensure that it is there. Then it resets the feature types, and checks to ensure that `MyFeatureType` is not registered.

In [None]:
try:
    feature_type_manager.feature_type_register(MyFeatureType)
except:
    pass

print("MyFeatureType is registered:" + str('my_feature_type' in feature_type_manager.feature_type_registered()['Name'].unique()))
print("Removing all the custom feature types")
feature_type_manager.feature_type_unregister('my_feature_type')
print("MyFeatureType is registered:" + str('my_feature_type' in feature_type_manager.feature_type_registered()['Name'].unique()))

<a id="warnings"></a>
# Feature Type Warnings

Part of the exploratory data analysis (EDA) is to check the state or condition of your data. This is checking to ensure that there are no missing values. With categorical data, you often want to ensure that the cardinality is low enough for the type of modeling that you are doing. Since the feature type system is meant to understand the nature of your data, and is an ideal mechanism to help automate the evaluation of the data. This evaluation is done by registering feature type warnings handlers with ADS.

Feature type warning handlers are functions that are builtin or user-defined. They perform an analysis of a feature to determine whether there are any data condition problems. For example, it might report that a feature is skewed when the expectation is that the data is normally distributed. Another common example is that the data might have more than some threshold of missing values. ADS comes with various common warnings builtin for the feature types that it supports. However, you are able to create and register any warnings that you want.

<a id="warnings_list"></a>
## List Feature Type Warning

Feature warnings are defined at the feature type level. Warnings can be registered dynamically at runtime. The `feature_type_manager.warning_registered()` shows a dataframe of registered warnings for each registered feature type. The columns of returned dataframe are:

- `Feature Type`: Feature type class name.
- `Warning`: Warning name.
- `Handler`: Registered warning handler for that feature type.

In [None]:
feature_type_manager.warning_registered()

<a id="reference"></a>
# References
- [Oracle ADS Library documentation](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html)
- [ADS Library Documentation](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html)
- [OCI Data Science Documentation](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm)
- [Oracle Data & AI Blog](https://blogs.oracle.com/datascience/)
- [Data Science YouTube Videos](https://www.youtube.com/playlist?list=PLKCk3OyNwIzv6CWMhvqSB_8MLJIZdO80L)
