# Lesson 1: Text Classification for Spam Detection Using spaCy

Introduction to Text Classification
Welcome! In today's lesson, we will delve into the world of Text Classification. Text Classification refers to the process of categorizing or classifying text into organized groups. It explains the significance of text in Natural Language Processing (NLP), as it helps in organizing data, simplifying search, enabling proper mapping and consistency.

In this practical exercise of spam detection, you will see how Text Classification plays a considerable role. Spam detection is a real-world problem where we differentiate unwanted emails (spam) from real ones (ham).

At a high level, text classification involves converting text data into some kind of numerical feature vectors, which can then be used by machine learning models to categorize. We will dig deeper into this as we progress through the lesson. Let's get started.

Setting up a spaCy Text Classification Pipeline
Before beginning the coding exercise, we need to understand the concept of pipelines in spaCy. In simple terms, a pipeline is a sequence of data processing components in SpaCy. It is designed to take raw text data and perform several operations to convert the text data into valuable insights and information.

```python
# Load a blank spaCy model
nlp = spacy.blank("en")
```

In previous courses, spacy.load("en_core_web_sm") was utilized to load a pre-trained model with components for common NLP tasks. In contrast, spacy.blank("en") initializes a blank model specific for English, allowing for customization by adding only the necessary components for specific tasks like text classification for spam detection.

Data Preparation and Labeling
Effective data preparation is a critical step in any machine learning project. In text classification, labeled data plays a crucial role. The labels act as a form of instruction set for the model to learn and understand the patterns in the data.

```python
# Sample dataset with labels
training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0}}),
    # More examples...
]
```

This dataset contains a mix of both types of messages that will help train the model effectively. Each text is paired with annotations indicating whether it is spam or ham.

Adding and Configuring Text Classifier in the Pipeline
With our dataset and initial pipeline setup in place, the next step is to incorporate a text classifier into the pipeline. We'll utilize the TextCatBOW (Bag-of-Words) configuration for this purpose. The Bag-of-Words model represents a text as a 'bag' of its words, disregarding grammar and word order but focusing on word frequency, which is effective for capturing patterns.

```python
# Add the text classifier to the pipeline
config = {
    "threshold": 0.5,  # Decision threshold for labels
    "model": {
        "@architectures": "spacy.TextCatBOW.v1",  # Classifier architecture
        "exclusive_classes": True,  # Mutually exclusive labels
        "ngram_size": 1,  # Use unigrams
        "no_output_layer": False  # Include output layer
    }
}
textcat = nlp.add_pipe("textcat", config=config)
textcat.add_label("SPAM")
textcat.add_label("HAM")
```

This configuration allows the pipeline to leverage the Bag-of-Words architecture in distinguishing between different types of text data, tailored specifically for spam detection.

Training the Text Classifier
Now that our pipeline has the text classifier, we can proceed to train it. Model training essentially involves running the model through a loop where it can learn from the training data as well as its mistakes (also referred to as 'loss'). Each run is an iteration, and it is common to run these iterative training loops multiple times to fine-tune the model.

```python
# Training the classifier
def train_spam_detector(training_data, nlp, textcat, n_iter=30):
    optimizer = nlp.initialize(lambda: (
        Example.from_dict(nlp.make_doc(text), annotations)
        for text, annotations in training_data
    ))
    
    for i in range(n_iter):
        losses = {}
        for text, annotations in training_data:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], sgd=optimizer, losses=losses)
        print(f"Iteration {i} - Loss: {losses}")

train_spam_detector(training_data, nlp, textcat)
```

In the train_spam_detector() function, several critical methods are employed:
- **nlp.initialize**: This method prepares the pipeline and optimizer for training. It sets up the weights of the model according to the architecture and configuration.
- **nlp.make_doc**: This converts a text string into a spaCy Doc object, which is a container that holds the processed text and is integral to how spaCy handles text data.
- **nlp.update**: This function performs an optimization step during training. It takes a batch of examples and updates the model's weights, improving its accuracy based on the loss computed.

This output shows the loss at each iteration during the training process, which helps in understanding how the model is learning and improving. The reduction in loss value signifies the model is better classifying texts over iterations.

Model Evaluation
Once the model is trained, it's time to evaluate its performance. To achieve this, we test the model on some new data that it hasn't seen during the training process. This provides insights into how well the model generalizes its learning to unseen data.

```python
# Test the trained model
test_texts = [
    "Exclusive deal just for you!",
    "Can we reschedule the meeting?",
    ...
]

for text in test_texts:
    doc = nlp(text)
    print(f"Text: {text}")
    for cat, score in doc.cats.items():
        print(f"  {cat}: {score:.4f}")
    print("\n")

# Output:
# Text: Exclusive deal just for you!
#   SPAM: 0.5940
#   HAM: 0.4060

# Text: Can we reschedule the meeting?
#   SPAM: 0.3539
#   HAM: 0.6461
```

Great work! In this lesson, you have delved deep into Text Classification using spaCy, specifically focusing on the scenario of spam detection. You learned how to set up a text classification pipeline, prepare data for training, add and configure a text classifier to the pipeline using the Bag-of-Words model, train the model, and evaluate its performance. The practical exercise allowed you to apply these concepts and gain real hands-on experience in this versatile field of Natural Language Processing. In the next lessons, we will continue to build on these foundations and explore more advanced topics. Keep practicing and happy coding!

## Spam Detection: Modify Test Texts

Stellar Navigator, can you modify the test_texts in the code? Replace the existing texts in test_texts with sentences that our model would classify as SPAM. Once you've replaced the sentences, run the code to confirm that the model correctly tags your sentences.

```python
import spacy
from spacy.training import Example


# Load a blank spaCy model
nlp = spacy.blank("en")

# Enhanced sample dataset
training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Call me asap", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Lunch meeting at 1 pm?", {"cats": {"SPAM": 0, "HAM": 1}})
]

# Add the text classifier to the pipeline
config = {
    "threshold": 0.5,
    "model": {
        "@architectures": "spacy.TextCatBOW.v1",
        "exclusive_classes": True,
        "ngram_size": 1,
        "no_output_layer": False
    }
}
textcat = nlp.add_pipe("textcat", config=config)
textcat.add_label("SPAM")
textcat.add_label("HAM")

# Training the classifier
def train_spam_detector(training_data, nlp, textcat, n_iter=30):
    optimizer = nlp.initialize(lambda: (
        Example.from_dict(nlp.make_doc(text), annotations)
        for text, annotations in training_data
    ))
  
    for i in range(n_iter):
        losses = {}
        for text, annotations in training_data:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], sgd=optimizer, losses=losses)


train_spam_detector(training_data, nlp, textcat)


# TODO: Replace the texts with sentences that our model would classify as SPAM
test_texts = [
    "Let's discuss our project over a cup of coffee",
    "Do not forget our meeting tomorrow morning"
]

for text in test_texts:
    doc = nlp(text)
    print(f"Text: {text}")
    for cat, score in doc.cats.items():
        print(f"  {cat}: {score:.4f}")
    print("\n")

```

Here’s the modified code with the `test_texts` replaced by sentences that the model would likely classify as SPAM. The new sentences include common phrases often associated with spam messages:

```python
import spacy
from spacy.training import Example

# Load a blank spaCy model
nlp = spacy.blank("en")

# Enhanced sample dataset
training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Call me asap", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Lunch meeting at 1 pm?", {"cats": {"SPAM": 0, "HAM": 1}})
]

# Add the text classifier to the pipeline
config = {
    "threshold": 0.5,
    "model": {
        "@architectures": "spacy.TextCatBOW.v1",
        "exclusive_classes": True,
        "ngram_size": 1,
        "no_output_layer": False
    }
}
textcat = nlp.add_pipe("textcat", config=config)
textcat.add_label("SPAM")
textcat.add_label("HAM")

# Training the classifier
def train_spam_detector(training_data, nlp, textcat, n_iter=30):
    optimizer = nlp.initialize(lambda: (
        Example.from_dict(nlp.make_doc(text), annotations)
        for text, annotations in training_data
    ))
    
    for i in range(n_iter):
        losses = {}
        for text, annotations in training_data:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], sgd=optimizer, losses=losses)

train_spam_detector(training_data, nlp, textcat)

# Replace the texts with sentences that our model would classify as SPAM
test_texts = [
    "Congratulations! You've won a $1000 gift card.",
    "Click here to claim your free trial now!",
    "Act now! Limited time offer just for you.",
    "You have been selected for a special promotion!",
    "Earn money from home with this simple trick."
]

for text in test_texts:
    doc = nlp(text)
    print(f"Text: {text}")
    for cat, score in doc.cats.items():
        print(f"  {cat}: {score:.4f}")
    print("\n")
```

### Explanation of Changes:

1. **Updated `test_texts`**: The sentences in `test_texts` have been replaced with typical spam messages that are likely to be classified as SPAM by the model.

2. **Model Evaluation**: The code will now evaluate these new test sentences, and you should see the model's confidence scores for each category (SPAM and HAM) printed out.

This setup will help you confirm that the model correctly identifies spam messages based on the training data provided.

Execution Result:

```
Text: Congratulations! You've won a $1000 gift card.
  SPAM: 0.5514
  HAM: 0.4486


Text: Click here to claim your free trial now!
  SPAM: 0.6495
  HAM: 0.3505


Text: Act now! Limited time offer just for you.
  SPAM: 0.6017
  HAM: 0.3983


Text: You have been selected for a special promotion!
  SPAM: 0.5514
  HAM: 0.4486


Text: Earn money from home with this simple trick.
  SPAM: 0.5000
  HAM: 0.5000
```

## Enhance Training Data for Model

Stellar Navigator, we have a blank in our training_data. Add the extra provided spam and ham examples to it to enhance the model. The examples are:

```Python
("Get rich quick! Visit now", {"cats": {"SPAM": 1, "HAM": 0}}),
("Congratulations! You've won a prize", {"cats": {"SPAM": 1, "HAM": 0}}),
("Important information about your account", {"cats": {"SPAM": 0, "HAM": 1}}),
("Don't miss our meeting tomorrow", {"cats": {"SPAM": 0, "HAM": 1}})
```

Check the changes these cause in the model's performance.

Here’s the updated code with the additional spam and ham examples added to the `training_data`. This will enhance the model's training set, allowing it to learn from a broader range of examples. After updating the training data, the model will be retrained, and we can evaluate its performance on the test texts.

```python
import spacy
from spacy.training import Example

# Load a blank spaCy model
nlp = spacy.blank("en")

# Enhanced sample dataset with additional examples
training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Call me asap", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Lunch meeting at 1 pm?", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Get rich quick! Visit now", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Congratulations! You've won a prize", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Important information about your account", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Don't miss our meeting tomorrow", {"cats": {"SPAM": 0, "HAM": 1}})
]

# Add the text classifier to the pipeline
config = {
    "threshold": 0.5,
    "model": {
        "@architectures": "spacy.TextCatBOW.v1",
        "exclusive_classes": True,
        "ngram_size": 1,
        "no_output_layer": False
    }
}
textcat = nlp.add_pipe("textcat", config=config)
textcat.add_label("SPAM")
textcat.add_label("HAM")

# Training the classifier
def train_spam_detector(training_data, nlp, textcat, n_iter=30):
    optimizer = nlp.initialize(lambda: (
        Example.from_dict(nlp.make_doc(text), annotations)
        for text, annotations in training_data
    ))
    
    for i in range(n_iter):
        losses = {}
        for text, annotations in training_data:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], sgd=optimizer, losses=losses)

train_spam_detector(training_data, nlp, textcat)

# Replace the texts with sentences that our model would classify as SPAM
test_texts = [
    "Congratulations! You've won a $1000 gift card.",
    "Click here to claim your free trial now!",
    "Act now! Limited time offer just for you.",
    "You have been selected for a special promotion!",
    "Earn money from home with this simple trick."
]

for text in test_texts:
    doc = nlp(text)
    print(f"Text: {text}")
    for cat, score in doc.cats.items():
        print(f"  {cat}: {score:.4f}")
    print("\n")
```

### Explanation of Changes:

1. **Updated `training_data`**: The additional spam and ham examples have been added to the `training_data` list. This includes phrases that are commonly associated with spam and legitimate messages.

2. **Model Training**: The model is retrained with the enhanced dataset, which should improve its ability to classify texts accurately.

3. **Model Evaluation**: The same test texts are evaluated to see how the changes in training data affect the model's performance.

### Expected Outcome:

After running the updated code, you should see the model's confidence scores for each test text. The additional training examples should help the model better distinguish between spam and ham messages, potentially leading to improved accuracy in its classifications. 

You can expect the output to show the confidence scores for SPAM and HAM categories for each test text, reflecting the model's performance after the enhancements.

Execution Result:

```sh
Text: Congratulations! You've won a $1000 gift card.
  SPAM: 0.8188
  HAM: 0.1812


Text: Click here to claim your free trial now!
  SPAM: 0.6716
  HAM: 0.3284


Text: Act now! Limited time offer just for you.
  SPAM: 0.6710
  HAM: 0.3290


Text: You have been selected for a special promotion!
  SPAM: 0.6967
  HAM: 0.3033


Text: Earn money from home with this simple trick.
  SPAM: 0.4943
  HAM: 0.5057
```

## Add Extra Category to Classifier

Great work, stellar navigator! Now, let's modify our text classifier to recognize a new category "PROMOTION". Your task is to add an appropriate TODO in the code to allow our model to identify this new category.

```python
import spacy
from spacy.training import Example

# Load a blank spaCy model
nlp = spacy.blank("en")

# Enhanced sample dataset with an additional category PROMOTION
training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0, "PROMOTION": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0, "PROMOTION": 0}}),
    ("Call me asap", {"cats": {"SPAM": 0, "HAM": 1, "PROMOTION": 0}}),
    ("Lunch meeting at 1 pm?", {"cats": {"SPAM": 0, "HAM": 1, "PROMOTION": 0}}),
    ("Limited time offer, shop now!", {"cats": {"SPAM": 0, "HAM": 0, "PROMOTION": 1}}),
    ("Free shipping on all orders!", {"cats": {"SPAM": 0, "HAM": 0, "PROMOTION": 1}})
]

# Add the text classifier to the pipeline with extra category
config = {
    "threshold": 0.5,
    "model": {
        "@architectures": "spacy.TextCatBOW.v1",
        "exclusive_classes": True,
        "ngram_size": 1,
        "no_output_layer": False
    }
}
textcat = nlp.add_pipe("textcat", config=config)
textcat.add_label("SPAM")
textcat.add_label("HAM")
# TODO: Add the label for the extra category

# Training the classifier
def train_spam_detector(training_data, nlp, textcat, n_iter=30):
    optimizer = nlp.initialize(lambda: (
        Example.from_dict(nlp.make_doc(text), annotations)
        for text, annotations in training_data
    ))
    
    for i in range(n_iter):
        losses = {}
        for text, annotations in training_data:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], sgd=optimizer, losses=losses)
        print(f"Iteration {i} - Loss: {losses}")

train_spam_detector(training_data, nlp, textcat)

# Test the trained model with new texts including the PROMOTION category
test_texts = [
    "Exclusive deal just for you!",
    "Can we reschedule the meeting?",
    "Limited time offer, shop now!",
    "Free shipping on all orders!"
]

for text in test_texts:
    doc = nlp(text)
    print(f"Text: {text}")
    for cat, score in doc.cats.items():
        print(f"  {cat}: {score:.4f}")
    print("\n")


```

To modify your text classifier to recognize the new category "PROMOTION", you need to add a label for this category in the code. Here’s the updated code with the appropriate TODO implemented:

import spacy
from spacy.training import Example

# Load a blank spaCy model
nlp = spacy.blank("en")

# Enhanced sample dataset with an additional category PROMOTION
training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0, "PROMOTION": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0, "PROMOTION": 0}}),
    ("Call me asap", {"cats": {"SPAM": 0, "HAM": 1, "PROMOTION": 0}}),
    ("Lunch meeting at 1 pm?", {"cats": {"SPAM": 0, "HAM": 1, "PROMOTION": 0}}),
    ("Limited time offer, shop now!", {"cats": {"SPAM": 0, "HAM": 0, "PROMOTION": 1}}),
    ("Free shipping on all orders!", {"cats": {"SPAM": 0, "HAM": 0, "PROMOTION": 1}})
]

# Add the text classifier to the pipeline with extra category
config = {
    "threshold": 0.5,
    "model": {
        "@architectures": "spacy.TextCatBOW.v1",
        "exclusive_classes": True,
        "ngram_size": 1,
        "no_output_layer": False
    }
}
textcat = nlp.add_pipe("textcat", config=config)
textcat.add_label("SPAM")
textcat.add_label("HAM")
textcat.add_label("PROMOTION")  # TODO: Add the label for the extra category

# Training the classifier
def train_spam_detector(training_data, nlp, textcat, n_iter=30):
    optimizer = nlp.initialize(lambda: (
        Example.from_dict(nlp.make_doc(text), annotations)
        for text, annotations in training_data
    ))

    for i in range(n_iter):
        losses = {}
        for text, annotations in training_data:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], sgd=optimizer, losses=losses)
        print(f"Iteration {i} - Loss: {losses}")

train_spam_detector(training_data, nlp, textcat)

# Test the trained model with new texts including the PROMOTION category
test_texts = [
    "Exclusive deal just for you!",
    "Can we reschedule the meeting?",
    "Limited time offer, shop now!",
    "Free shipping on all orders!"
]

for text in test_texts:
    doc = nlp(text)
    print(f"Text: {text}")
    for cat, score in doc.cats.items():
        print(f"  {cat}: {score:.4f}")
    print("\n")

### Changes Made:
- Added the line `textcat.add_label("PROMOTION")` to register the new category in the text classifier.

Now, your model will be able to recognize the "PROMOTION" category along with "SPAM" and "HAM". If you have any further questions or need additional modifications, feel free to ask!

## Build a Spam Classifier Pipeline

Stellar Navigator, get ready to steer the spaceship solo! It's time to write the full text classification pipeline for spam detection from scratch.

Implement the process of creating the pipeline, preparing training data, adding a text classifier, training the classifier, and evaluating the model.

```python
import spacy
from spacy.training import Example

# TODO: Create a blank spaCy model

training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Call me asap", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Lunch meeting at 1 pm?", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Get rich quick! Visit now", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Congratulations! You've won a prize", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Important information about your account", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Don't miss our meeting tomorrow", {"cats": {"SPAM": 0, "HAM": 1}})
]

# TODO: Create configuration for text classifier pipeline

# TODO: Add text classifier to the pipeline with labels

# TODO: Define train_spam_detector function to train the model

# TODO: Feed the training data to your function to train the model

# TODO: Prepare test texts 

# TODO: Classify the test texts using the trained model and print the results



```

Here’s a complete implementation of a text classification pipeline for spam detection using spaCy. This code includes creating a blank model, preparing training data, adding a text classifier, training the classifier, and evaluating the model.

import spacy
from spacy.training import Example

# Create a blank spaCy model
nlp = spacy.blank("en")

# Prepare training data
training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Call me asap", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Lunch meeting at 1 pm?", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Get rich quick! Visit now", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Congratulations! You've won a prize", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Important information about your account", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Don't miss our meeting tomorrow", {"cats": {"SPAM": 0, "HAM": 1}})
]

# Create configuration for text classifier pipeline
config = {
    "threshold": 0.5,
    "model": {
        "@architectures": "spacy.TextCatBOW.v1",
        "exclusive_classes": True,
        "ngram_size": 1,
        "no_output_layer": False
    }
}

# Add text classifier to the pipeline with labels
textcat = nlp.add_pipe("textcat", config=config)
textcat.add_label("SPAM")
textcat.add_label("HAM")

# Define train_spam_detector function to train the model
def train_spam_detector(training_data, nlp, textcat, n_iter=30):
    optimizer = nlp.initialize(lambda: (
        Example.from_dict(nlp.make_doc(text), annotations)
        for text, annotations in training_data
    ))

    for i in range(n_iter):
        losses = {}
        for text, annotations in training_data:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], sgd=optimizer, losses=losses)
        print(f"Iteration {i} - Loss: {losses}")

# Feed the training data to your function to train the model
train_spam_detector(training_data, nlp, textcat)

# Prepare test texts
test_texts = [
    "Exclusive deal just for you!",
    "Can we reschedule the meeting?",
    "Get rich quick! Visit now",
    "Important information about your account"
]

# Classify the test texts using the trained model and print the results
for text in test_texts:
    doc = nlp(text)
    print(f"Text: {text}")
    for cat, score in doc.cats.items():
        print(f"  {cat}: {score:.4f}")
    print("\n")

### Explanation of the Code:

1. **Creating a Blank Model**: A blank spaCy model is created using `spacy.blank("en")`.

2. **Preparing Training Data**: The training data consists of pairs of text and their corresponding categories (SPAM or HAM).

3. **Configuration for Text Classifier**: A configuration dictionary is created to set up the text classifier.

4. **Adding the Text Classifier**: The text classifier is added to the pipeline, and the labels "SPAM" and "HAM" are registered.

5. **Training Function**: The `train_spam_detector` function is defined to train the model using the provided training data.

6. **Training the Model**: The training data is fed into the training function to train the model.

7. **Testing the Model**: A set of test texts is prepared, and the trained model is used to classify these texts. The results are printed, showing the probability scores for each category.

This pipeline is now ready to classify text as spam or ham based on the training data provided. If you have any further questions or need additional modifications, feel free to ask!

## Build a Spam Classifier Pipeline

Stellar Navigator, get ready to steer the spaceship solo! It's time to write the full text classification pipeline for spam detection from scratch.

Implement the process of creating the pipeline, preparing training data, adding a text classifier, training the classifier, and evaluating the model.

```python
import spacy
from spacy.training import Example

# TODO: Create a blank spaCy model

training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Call me asap", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Lunch meeting at 1 pm?", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Get rich quick! Visit now", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Congratulations! You've won a prize", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Important information about your account", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Don't miss our meeting tomorrow", {"cats": {"SPAM": 0, "HAM": 1}})
]

# TODO: Create configuration for text classifier pipeline

# TODO: Add text classifier to the pipeline with labels

# TODO: Define train_spam_detector function to train the model

# TODO: Feed the training data to your function to train the model

# TODO: Prepare test texts 

# TODO: Classify the test texts using the trained model and print the results



```


Here’s a complete implementation of a text classification pipeline for spam detection using spaCy. This code includes creating a blank model, preparing training data, adding a text classifier, training the classifier, and evaluating the model.

import spacy
from spacy.training import Example

# Create a blank spaCy model
nlp = spacy.blank("en")

# Prepare training data
training_data = [
    ("Buy cheap watches now!", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Get your discount codes today", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Call me asap", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Lunch meeting at 1 pm?", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Get rich quick! Visit now", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Congratulations! You've won a prize", {"cats": {"SPAM": 1, "HAM": 0}}),
    ("Important information about your account", {"cats": {"SPAM": 0, "HAM": 1}}),
    ("Don't miss our meeting tomorrow", {"cats": {"SPAM": 0, "HAM": 1}})
]

# Create configuration for text classifier pipeline
config = {
    "threshold": 0.5,
    "model": {
        "@architectures": "spacy.TextCatBOW.v1",
        "exclusive_classes": True,
        "ngram_size": 1,
        "no_output_layer": False
    }
}

# Add text classifier to the pipeline with labels
textcat = nlp.add_pipe("textcat", config=config)
textcat.add_label("SPAM")
textcat.add_label("HAM")

# Define train_spam_detector function to train the model
def train_spam_detector(training_data, nlp, textcat, n_iter=30):
    optimizer = nlp.initialize(lambda: (
        Example.from_dict(nlp.make_doc(text), annotations)
        for text, annotations in training_data
    ))

    for i in range(n_iter):
        losses = {}
        for text, annotations in training_data:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], sgd=optimizer, losses=losses)
        print(f"Iteration {i} - Loss: {losses}")

# Feed the training data to your function to train the model
train_spam_detector(training_data, nlp, textcat)

# Prepare test texts
test_texts = [
    "Exclusive deal just for you!",
    "Can we reschedule the meeting?",
    "Get rich quick! Visit now",
    "Important information about your account"
]

# Classify the test texts using the trained model and print the results
for text in test_texts:
    doc = nlp(text)
    print(f"Text: {text}")
    for cat, score in doc.cats.items():
        print(f"  {cat}: {score:.4f}")
    print("\n")

### Explanation of the Code:

1. **Creating a Blank Model**: A blank spaCy model is created using `spacy.blank("en")`.

2. **Preparing Training Data**: The training data consists of pairs of text and their corresponding categories (SPAM or HAM).

3. **Configuration for Text Classifier**: A configuration dictionary is created to set up the text classifier.

4. **Adding the Text Classifier**: The text classifier is added to the pipeline, and the labels "SPAM" and "HAM" are registered.

5. **Training Function**: The `train_spam_detector` function is defined to train the model using the provided training data.

6. **Training the Model**: The training data is fed into the training function to train the model.

7. **Testing the Model**: A set of test texts is prepared, and the trained model is used to classify these texts. The results are printed, showing the probability scores for each category.

This pipeline is now ready to classify text as spam or ham based on the training data provided. If you have any further questions or need additional modifications, feel free to ask!