In [1]:
from zipline.pipeline import Pipeline
from zipline.component.research import run_pipeline
from zipline.pipeline.factors import SimpleMovingAverage, AverageDollarVolume

## Classifiers
A classifier is a function from an asset and a moment in time to a [categorical output](https://en.wikipedia.org/wiki/Categorical_variable) such as a `string` or `integer` label:
```
F(asset, timestamp) -> category
```
An example of a classifier producing a string output is the exchange ID of a security. To create this classifier, we'll have to import `Fundamentals.exchange_id` and use the [latest](https://www.zipline.com/tutorials/pipeline#lesson3) attribute to instantiate our classifier:

In [2]:
from zipline.pipeline.data import Fundamentals

# 使用地区来代替
# is of type string, .latest returns a Classifier
region = Fundamentals.region.latest

Previously, we saw that the `latest` attribute produced an instance of a `Factor`. In this case, since the underlying data is of type `string`, `latest` produces a `Classifier`.

Similarly, a computation producing the latest Morningstar sector code of a security is a `Classifier`. In this case, the underlying type is an `int`, but the integer doesn't represent a numerical value (it's a category) so it produces a classifier. To get the latest sector code, we can use the built-in `Sector` classifier.

In [3]:
#from zipline.pipeline.classifiers.fundamentals import Sector  
# sector放在fundamentals中，单独作为一个绑定列
cninfo_sector = Fundamentals.cninfo.sector.latest

使用类似原版的 `Sector`自定义因子时，请直接用`Fundamentals.cninfo.sector.latest`.

### Building Filters from Classifiers
Classifiers can also be used to produce filters with methods like `isnull`, `eq`, and `startswith`. The full list of `Classifier` methods producing `Filters` can be found [here](https://www.zipline.com/help#zipline_pipeline_classifiers_Classifier).

As an example, if we wanted a filter to select for securities trading on the New York Stock Exchange, we can use the `eq` method of our `exchange` classifier.

In [4]:
region_filter = region.eq('湖南')

This filter will return `True` for securities having `'湖南'` as their most recent `region`.

使用字符串过滤时，建议使用`has_substring`方法，搜索范围更大。

### Quantiles
Classifiers can also be produced from various `Factor` methods. The most general of these is the `quantiles` method which accepts a bin count as an argument. The `quantiles` method assigns a label from 0 to (bins - 1) to every non-NaN data point in the factor output and returns a `Classifier` with these labels. `NaN`s are labeled with -1. Aliases are available for [quartiles](https://www.zipline.com/help/#zipline_pipeline_factors_Factor_quartiles) (`quantiles(4)`), [quintiles](https://www.zipline.com/help/#zipline_pipeline_factors_Factor_quintiles) (`quantiles(5)`), and [deciles](https://www.zipline.com/help/#zipline_pipeline_factors_Factor_deciles) (`quantiles(10)`). As an example, this is what a filter for the top decile of a factor might look like:

In [5]:
dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
top_decile = (dollar_volume_decile.eq(9))

Let's put each of our classifiers into a pipeline and run it to see what they look like.

In [6]:
def make_pipeline():
    region = Fundamentals.region.latest
    region_filter = region.eq('湖南')

    cninfo_sector = Fundamentals.cninfo.sector.latest

    dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
    top_decile = (dollar_volume_decile.eq(9))

    return Pipeline(
        columns={
            'region': region,
            'cninfo_sector': cninfo_sector,
            'dollar_volume_decile': dollar_volume_decile
        },
        screen=(region_filter & top_decile)
    )

In [7]:
result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')
print('Number of securities that passed the filter: %d' % len(result))
result.head(5)

Number of securities that passed the filter: 3


Unnamed: 0,Unnamed: 1,cninfo_sector,dollar_volume_decile,region
2015-05-05 00:00:00+00:00,Equity(000157 [中联重科]),工业,9,湖南
2015-05-05 00:00:00+00:00,Equity(000917 [电广传媒]),可选消费,9,湖南
2015-05-05 00:00:00+00:00,Equity(601901 [方正证券]),金融地产,9,湖南


Classifiers are also useful for describing grouping keys for complex transformations on Factor outputs. Grouping operations such as [demean](https://www.zipline.com/help#zipline_pipeline_factors_Factor_demean) and [groupby](https://www.zipline.com/help#zipline_pipeline_factors_Factor_groupby) are outside the scope of this tutorial. A future tutorial will cover more advanced uses for classifiers.

In the next lesson, we'll look at the different datasets that we can use in pipeline.