<a href="https://colab.research.google.com/github/rcrowe-google/tfx-addons-outreachy/blob/Fatima%2Ffeature%2Fexample/tfx_addons/feature_selection/example/Examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Various examples using TFX Feature Selection Component** 

This example demonstrate the use of feature selection component. This project allows the user to select different algorithms for performing feature selection on datasets artifacts in TFX pipelines

base code taken from: https://github.com/tensorflow/tfx/blob/master/docs/tutorials/tfx/components_keras.ipynb

**Install TFX**

Note: In Google Colab, because of package updates, the first time you run this cell you must restart the runtime (Runtime > Restart runtime ...).

In [None]:
!pip install -U tfx

In [None]:
x = !pwd

if 'feature_selection' not in str(x):
  !git clone  -b Fatima/feature/example https://github.com/rcrowe-google/tfx-addons-outreachy.git
  %cd tfx-addons-outreachy/tfx_addons/feature_selection

**Import packages**

We import necessary packages, including standard TFX component classes.

In [None]:
import os
import pprint
import tempfile
import urllib

import absl
import tensorflow as tf
import tensorflow_model_analysis as tfma
tf.get_logger().propagate = False
import importlib
pp = pprint.PrettyPrinter()

from tfx import v1 as tfx
import importlib
from tfx.components import CsvExampleGen
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

%load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip


from feature_selection.component import FeatureSelection


# This is the root directory for your TFX pip package installation.
_tfx_root = tfx.__path__[0]

# Palmer Penguins example using TFX Feature Selection Component

### Download example data
We download the example dataset for use in our TFX pipeline.

The dataset we're using is the [Palmer Penguins dataset](https://allisonhorst.github.io/palmerpenguins/articles/intro.html) which is also used in other
[TFX examples](https://github.com/tensorflow/tfx/tree/master/tfx/examples/penguin).

There are four numeric features in this dataset:

- culmen_length_mm
- culmen_depth_mm
- flipper_length_mm
- body_mass_g

All features were already normalized to have range [0,1]. We will build a
that selects 2 features to be eliminated from the dataset in other to improve the performance of the mode in predicting the `species` of penguins.

In [None]:
_data_root = tempfile.mkdtemp(prefix='tfx-data')
DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/data/labelled/penguins_processed.csv'
             
_data_filepath = os.path.join(_data_root, "data.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)

**Run TFX components**

In the cells that follow, we create TFX components one-by-one and generates `example` using `exampleGen` component.

In [None]:
context = InteractiveContext()

#create and run exampleGen component
example_gen = CsvExampleGen(input_base=_data_root )
context.run(example_gen)

#create and run statisticsGen component
statistics_gen = tfx.components.StatisticsGen(
    examples=example_gen.outputs['examples'])
context.run(statistics_gen)

In [None]:
#feature selection component

feature_selector = FeatureSelection(orig_examples = example_gen.outputs['examples'],
                                   module_file='example.modules.penguins_module')
context.run(feature_selector)



In [None]:
# Display Selected Features
context.show(feature_selector.outputs['feature_selection']._artifacts[0].selected_features)

# Pima Indians Diabetes example using TFX Feature Selection Component

### Download example data
We download the example dataset for use in our TFX pipeline.

The dataset we're using is the [Pima Indians Diabetes dataset](https://www.kaggle.com/uciml/pima-indians-diabetes-database) 

There are eight features in this dataset:

- Pregnancies
- Glucose
- BloodPressure
- SkinThickness
- Insulin
- BMI
- DiabetesPedigreeFunction
- Age


The dataset corresponds to classification tasks on which you need to predict if a person has `diabetes` based on the 8 features above

In [None]:
_data_root = tempfile.mkdtemp(prefix='tfx-data')
DATA_PATH = 'https://raw.githubusercontent.com/npradaschnor/Pima-Indians-Diabetes-Dataset/master/diabetes.csv'
_data_filepath = os.path.join(_data_root, "data.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)

**Run TFX components**

In the cells that follow, we create TFX components one-by-one and generates `example` using `exampleGen` component.

In [None]:
context = InteractiveContext()

#create and run exampleGen component
example_gen = CsvExampleGen(input_base=_data_root )
context.run(example_gen)

#create and run statisticsGen component
statistics_gen = tfx.components.StatisticsGen(
    examples=example_gen.outputs['examples'])
context.run(statistics_gen)

In [None]:
#feature selection component
feature_selector = FeatureSelection(orig_examples = example_gen.outputs['examples'],
                                   module_file='example.modules.pima_indians_module_file')
context.run(feature_selector)


In [None]:
# Display Selected Features
context.show(feature_selector.outputs['feature_selection']._artifacts[0].selected_features)

# Iris example using TFX Feature Selection Component

### Download example data
We download the example dataset for use in our TFX pipeline.

The dataset we're using is the [Iris dataset](https://www.kaggle.com/uciml/iris).

There are four numeric features in this dataset:

- sepal.length
- sepal.width
- petal.length
- petal.width

We will build a module that selects 2 features to be eliminated from the dataset in other to improve the performance of the mode in predicting the `species` of iris plants.

In [None]:
_data_root = tempfile.mkdtemp(prefix='tfx-data')
DATA_PATH = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
             
_data_filepath = os.path.join(_data_root, "data.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)

**Run TFX components **


In the cells that follow, we create TFX components one-by-one and generates `example` using `exampleGen` component.





In [None]:
context = InteractiveContext()

#create and run exampleGen component
example_gen = CsvExampleGen(input_base=_data_root )
context.run(example_gen)

#create and run statisticsGen component
statistics_gen = tfx.components.StatisticsGen(
    examples=example_gen.outputs['examples'])
context.run(statistics_gen)

In [None]:
#feature selection component
feature_selector = FeatureSelection(orig_examples = example_gen.outputs['examples'],
                                   module_file="example.modules.iris_module_file")
context.run(feature_selector)


In [None]:
# Display Selected Features
context.show(feature_selector.outputs['feature_selection']._artifacts[0].selected_features)