# CountVectorizer

In [None]:
CountVectorizer(name: str,
                cursor = None,
                lowercase: bool = True,
                max_df: float = 1.0,
                min_df: float = 0.0,
                max_features: int = -1,
                ignore_special: bool = True,
                max_text_size: int = 2000)

Creates a Text Index which will count the occurences of each word in the data.

### Parameters

<table id="parameters">
    <tr> <th>Name</th> <th>Type</th> <th>Optional</th> <th>Description</th> </tr>
    <tr> <td><div class="param_name">name</div></td> <td><div class="type">str</div></td> <td><div class = "no">&#10060;</div></td> <td>Name of the the model. The model will be stored in the DB.</td> </tr>
    <tr> <td><div class="param_name">cursor</div></td> <td><div class="type">DBcursor</div></td> <td><div class = "yes">&#10003;</div></td> <td>Vertica DB cursor.</td> </tr>
    <tr> <td><div class="param_name">lowercase</div></td> <td><div class="type">bool</div></td> <td><div class = "yes">&#10003;</div></td> <td>Converts all the elements to lowercase before processing.</td> </tr>
    <tr> <td><div class="param_name">max_df</div></td> <td><div class="type">float</div></td> <td><div class = "yes">&#10003;</div></td> <td>Keeps the words which represent less than this float in the total dictionary distribution.</td> </tr>
    <tr> <td><div class="param_name">min_df</div></td> <td><div class="type">float</div></td> <td><div class = "yes">&#10003;</div></td> <td>Keeps the words which represent more than this float in the total dictionary distribution.</td> </tr>
    <tr> <td><div class="param_name">max_features</div></td> <td><div class="type">int</div></td> <td><div class = "yes">&#10003;</div></td> <td>Keeps only the top words of the dictionary.</td> </tr>
    <tr> <td><div class="param_name">ignore_special</div></td> <td><div class="type">bool</div></td> <td><div class = "yes">&#10003;</div></td> <td>Ignores all the special characters to build the dictionary.</td> </tr>
    <tr> <td><div class="param_name">max_text_size</div></td> <td><div class="type">int</div></td> <td><div class = "yes">&#10003;</div></td> <td>The maximum size of the column which is the concatenation of all the text columns during the fitting.</td> </tr>
</table>

### Attributes

After the object creation, all the parameters become attributes. 
The model will also create extra attributes when fitting the model:

<table id="parameters">
    <tr> <th>Name</th> <th>Type</th>  <th>Description</th> </tr>
    <tr> <td><div class="param_name">stop_words</div></td> <td><div class="type">list</div></td> <td>The words not added to the vocabulary.</td> </tr>
    <tr> <td><div class="param_name">vocabulary</div></td> <td><div class="type">list</div></td> <td>The final vocabulary.</td> </tr>
    <tr> <td><div class="param_name">input_relation</div></td> <td><div class="type">str</div></td> <td>Train relation.</td> </tr>
    <tr> <td><div class="param_name">X</div></td> <td><div class="type">list</div></td> <td>List of the predictors.</td> </tr>
</table>

### Methods

<table id="parameters">
    <tr> <th>Name</th> <th>Description</th> </tr>
    <tr> <td><a href="../Unsupervised/deploySQL">deploySQL</a></td> <td>Returns the SQL code needed to deploy the model.</td> </tr>
    <tr> <td><a href="../Unsupervised/drop">drop</a></td> <td>Drops the model from the Vertica DB.</td> </tr>
    <tr> <td><a href="../Unsupervised/fit">fit</a></td> <td>Trains the model.</td> </tr>
    <tr> <td><a href="../Unsupervised/to_vdf">to_vdf</a></td> <td>Creates a vDataFrame of the model.</td> </tr>
</table>

### Example

In [38]:
from vertica_ml_python.learn.preprocessing import CountVectorizer
model = CountVectorizer(name = "public.vocabulary")
print(model)

<CountVectorizer>
