# Single Class Classifier Tutorial

## Introduction

In this tutorial, we'll learn how to use the SingleClassClassifier from the zenbase library. This classifier is designed to categorize inputs into predefined classes using language models.

### Import the Zenbase Library

In [2]:
import sys
import subprocess

def install_package(package):
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
    except subprocess.CalledProcessError as e:
        print(f"Failed to install {package}: {e}")
        raise

def install_packages(packages):
    for package in packages:
        install_package(package)

try:
    # Check if running in Google Colab
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    # Install the zenbase package if running in Google Colab
    # install_package('zenbase')
    # Install the zenbse package from a GitHub branch if running in Google Colab
    install_package('git+https://github.com/zenbase-ai/lib.git@main#egg=zenbase&subdirectory=py')

    # List of other packages to install in Google Colab
    additional_packages = [
        'python-dotenv',
        'openai',
        'langchain',
        'langchain_openai',
        'instructor',
        'datasets'
    ]
    
    # Install additional packages
    install_packages(additional_packages)

# Now import the zenbase library
try:
    import zenbase
except ImportError as e:
    print("Failed to import zenbase: ", e)
    raise

### Configure the Environment

In [3]:
from pathlib import Path
from dotenv import load_dotenv

# import os
#
# os.environ["OPENAI_API_KEY"] = "..."

load_dotenv(Path("../../.env.test"), override=True)

True

## Setup

First, let's import the necessary libraries and set up our environment.

In [4]:
import instructor
from openai import OpenAI
from zenbase.core.managers import ZenbaseTracer
from zenbase.predefined.single_class_classifier import SingleClassClassifier

# Set up OpenAI client (you'll need to provide your API key)
openai_client = OpenAI()
instructor_client = instructor.from_openai(openai_client)
zenbase_tracer = ZenbaseTracer()

## Defining the Classifier

To use the SingleClassClassifier, we need to define a few key components:

1. Prompt definition
2. Class dictionary
3. Dataset (train, validation, and test sets)

Let's set these up:

In [16]:
# 1. Prompt definition
prompt_definition = """Your task is to accurately categorize each incoming arXiv paper into one of the given categories based on its title and abstract."""

# 2. Class dictionary
class_dict = {
    "Machine Learning": "Papers focused on algorithms and statistical models that enable computer systems to improve their performance on a specific task over time.",
    "Artificial Intelligence": "Research on creating intelligent machines that work and react like humans.",
    "Computational Linguistics": "Studies involving computer processing of human languages.",
    "Information Retrieval": "The science of searching for information in documents, databases, and on the World Wide Web.",
    "Computer Vision": "Field of study focused on how computers can be made to gain high-level understanding from digital images or videos.",
    "Human-Computer Interaction": "Research on the design and use of computer technology, focused on the interfaces between people and computers.",
    "Cryptography and Security": "Studies on secure communication techniques and cybersecurity measures.",
    "Robotics": "Research on the design, construction, operation, and use of robots.",
    "Computers and Society": "Exploration of the social impact of computers and computation on society.",
    "Software Engineering": "Application of engineering to the development of software in a systematic method.",
}
# 3. Dataset preparation
from datasets import load_dataset

def create_dataset_with_examples(item_set):
    return [{"inputs": item['input'], "outputs": item['output']} for item in item_set]

# Load the arxiv dataset
arxiv_dataset = load_dataset("dansbecker/arxiv_article_classification")

# Define the sizes for each set
TRAINSET_SIZE = 100
VALIDATIONSET_SIZE = 20
TESTSET_SIZE = 20

# Create train set
train_data = list(arxiv_dataset["train"].select(range(TRAINSET_SIZE)))
train_set = create_dataset_with_examples(train_data)

# Create validation set
validation_data = list(arxiv_dataset["train"].select(range(TRAINSET_SIZE, TRAINSET_SIZE + VALIDATIONSET_SIZE)))
validation_set = create_dataset_with_examples(validation_data)

# Create test set
test_data = list(arxiv_dataset["test"].select(range(TESTSET_SIZE)))
test_set = create_dataset_with_examples(test_data)

## Creating the SingleClassClassifier

Now that we have all the components, let's create our SingleClassClassifier:

In [17]:
classifier = SingleClassClassifier(
    instructor_client=instructor_client,
    prompt=prompt_definition,
    class_dict=class_dict,
    model="gpt-4o-mini",  # You can change this to the appropriate model
    zenbase_tracer=zenbase_tracer,
    training_set=train_set,
    validation_set=validation_set,
    test_set=test_set,
    samples=20,
)

## Performing Classification

To use the classifier, we call the `perform()` method:

In [18]:
result = classifier.perform()

## Analyzing Results

After performing the classification, we can analyze the results:

In [25]:
# Base Evaluation based on the test set
print(classifier.base_evaluation.evals['score'])

# Best function evaluation based on the test set
print(classifier.best_evaluation.evals['score'])

0.5
0.8


In [27]:
print("Best function:", result.best_function)
print("Number of candidate results:", len(result.candidate_results))
print("Best candidate result:", result.best_candidate_result.evals)

# Check the traces
print("Number of traces:", len(classifier.zenbase_tracer.all_traces))

Best function: <zenbase.types.LMFunction object at 0x127c605e0>
Number of candidate results: 20
Best candidate result: {'score': 0.85}
Number of traces: 470


## Using the Classifier

Now that we have trained and optimized our classifier, we can use it to classify new inputs:

In [28]:
new_paper = """
title: Advances in Quantum Computing Algorithms
abstract: This paper explores recent developments in quantum computing algorithms, 
focusing on their potential applications in cryptography and optimization problems. 
We present a novel approach to quantum error correction that significantly improves 
the stability of qubit states in noisy environments.
"""

classification = result.best_function(new_paper)
print(f"The paper is classified as: {classification.class_label.name}")

The paper is classified as: Cryptography and Security


## Conclusion

In this tutorial, we've learned how to:
1. Set up the necessary components for the SingleClassClassifier
2. Create and initialize the classifier
3. Perform classification and optimization
4. Analyze the results
5. Use the optimized classifier for new inputs

