## Using the fine-tuned classifier

The fine-tuned model can be used to classify child-directed websites using the code below. You can either fine-tune your own model following the instructions in the [fine-tuning.ipynb](https://github.com/targeted-and-troublesome/targeted-and-troublesome-crawler/blob/main/classifier/fine-tuning.ipynb) notebook, or download the fine-tuned model [here](TODO).

Here's what each step involves:

1. **Loading the Model and Tokenizer**:
   - **Model**: The fine-tuned model is loaded from its saved directory.
   - **Tokenizer**: The tokenizer that was used during the training phase is also loaded. The tokenizer converts text data into a format that the model can understand (i.e., tokenized tensors).

2. **Creating a Classification Pipeline**:
   - A pipeline is created using the `pipeline` function from the `transformers` library. This pipeline encapsulates the model and tokenizer into a single object which simplifies the process of making predictions.
   - The pipeline is set up specifically for text classification and is configured to run on a specific device (e.g., GPU or CPU). This setup includes options such as enabling truncation to handle text that exceeds the model's maximum input length.

3. **Running Predictions on Sample Text Data**:
   - With the pipeline ready, you can input sample text data to get predictions. The pipeline outputs the predicted class labels along with confidence scores. In the paper, the confidence scores were used to prioritize most-likely child-directed websites for manual labeling.


### Download the model and the tokenizer (if you haven't fine-tuned the model yourself)


In [None]:
# ! wget TODO (replace with the model URL)
# ! wget TODO (replace with the tokenizer URL)

In [None]:
from transformers import pipeline
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# TODO: set the path to the model and tokenizer
MODEL_PATH = '/path/to/model'
# Load the model and tokenizer
MODEL = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH, local_files_only=True)
TOKENIZER = AutoTokenizer.from_pretrained(MODEL_PATH)

def create_pipeline(model, tokenizer):
    """Create a text classification pipeline."""
    try:
        return pipeline("text-classification", model=model, tokenizer=tokenizer, device=0, truncation=True)
    except Exception as e:
        print(f"Error creating pipeline: {e}")
        return None

def run_inference(text_data, classifier):
    """Run inference on provided text data using the classifier pipeline."""
    if classifier is None:
        print("Classifier pipeline is not available.")
        return

    try:
        for text in text_data:
            result = classifier(text)
            print(f"Text: {text}, Prediction: {result}")
    except Exception as e:
        print(f"Error during inference: {e}")

## Example usage

In [None]:
# create the pipeline
webpage_classifier = create_pipeline(MODEL, TOKENIZER)

test_data_text = [
    "Online Toddler Games and Online Games for Kids. Online games for toddlers, preschool kids and babies.",
    "Play the best games for children of all ages! Free games made for 2 - 3 - 4 - 5 years old."
    ]
run_inference(test_data_text, webpage_classifier)
