# Scikit-learn Estimator

Any object that can estimate some parameters based on a dataset is called an **estimator** (e.g., an `imputer` is an estimator). The estimation itself is performed by the `fit()` method, and it takes only a dataset as a parameter (or two for supervised learning algorithms; the second dataset contains the labels). Any other parameter needed to guide the estimation process is considered a **hyperparameter** (such as an `imputer`â€™s strategy), and it must be set as an instance variable (generally via a constructor parameter, e.g. `SimpleImputer()` in the case of the `imputer`).

## Transformer

Some estimators (such as an `imputer`) can transform a dataset; these are called **transformers**. The API is simple: the transformation is performed by the method `transform` with the dataset to transform as a parameter. It returns the transformed dataset. This transformation generally relies on the learned
parameters, as is the case for an `imputer`. The central piece of a transformer is `sklearn.base.BaseEstimator`. All estimators in scikit-learn are derived from this class. In more details, this base class enables users to set and get parameters of the estimator. It can be imported as:

In [1]:
from sklearn.base import BaseEstimator

Once imported, we can create a class which inherits from this base class:

In [2]:
class MyOwnEstimator(BaseEstimator):
    pass

Note: The `class` definitions cannot be empty. If we need to have a class definition with no content, then we must place the `pass` statement to avoid getting an error.

Transformers are scikit-learn estimators which implement a `transform` method. The use case is the following:

* at `fit`, some parameters can be learned from `X` and `y`;
* at `transform`, `X` will be transformed, using the parameters learned during fit.

For some transformers, the input `y` is not used. It is present for API consistency by convention. In addition, scikit-learn provides a [mixin](https://en.wikipedia.org/wiki/Mixin), i.e. the `sklearn.base.TransformerMixin` class, which implements the combination of `fit` and `transform` called `fit_transform`. We can import the class as follows:

In [None]:
from sklearn.base import TransformerMixin

When creating a `transformer`, we need to create a class which inherits from both `sklearn.base.BaseEstimator` and `sklearn.base.TransformerMixin`. The scikit-learn API imposed `fit` to return `self`. This pattern is useful when we need to be able to implement quick one liners, sequentially applying a list of transforms and a final estimator. Essentially, it allows for pipelining `fit` and `transform` methods imposed by the `sklearn.base.TransformerMixin` class. The `fit` method is expected to have `X` and `y` as inputs. Note that `transform` takes only `X` as input and is expected to return the transformed `X`:

In [None]:
class MyOwnTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, other_argument):
        self.other_argument = other_argument

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X

Many classes like to create objects with instances customized to a specific *initial state*. Therefore, a class may define a special method named `__init__()`. When a class defines an `__init__()` method, class instantiation automatically invokes `__init__()` for the newly-created class instance. The `self` parameter is a reference to the current instance of the class, and is used to access variables that belongs to the class. It does not have to be named `self`  as it can be called other names. However,  it has to be the first parameter of any method defined for a class. 