# Day 14: Building my own GPT

### Assignment: Code-Focused Inference

Your task is to load a pre-trained GPT-2 model and configure it to answer *only* questions related to Python coding.

1. **Load Model and Tokenizer:** Load a suitable pre-trained GPT-2 model and its corresponding tokenizer. You can use `transformers.AutoModelForCausalLM` and `transformers.AutoTokenizer`. A smaller model like `gpt2` or `gpt2-medium` might be sufficient.
2. **Implement a Filtering Mechanism:** Before generating a response, check if the input prompt is related to Python coding. You can use simple keyword matching (e.g., "Python", "code", "function", "class", "import") or a more sophisticated approach using a text classification model (optional).
3. **Generate Response:** If the prompt is deemed a Python coding question, generate a response using the loaded GPT-2 model.
4. **Handle Non-Coding Questions:** If the prompt is not related to Python coding, return a predefined message indicating that the model can only answer coding questions.
5. **Test:** Test your implementation with various prompts, including both Python coding questions and non-coding questions, to ensure the filtering mechanism works correctly.

In [5]:
from transformers import AutoModelForCausalLM, AutoTokenizer

## Load the model and tokenizer

In [7]:
model_name = "gpt2-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

In [8]:
# Set pad_token_id to eos_token_id for open-end generation
tokenizer.pad_token_id = tokenizer.eos_token_id
model.config.pad_token_id = model.config.eos_token_id


## Logic check and answering

In [9]:
def is_coding_question(prompt):
    """
    Checks if a prompt is related to Python coding using keyword matching.
    """
    coding_keywords = ["python", "code", "function", "class", "import", "def ", "lambda", "list", "dict", "tuple", "set", "string", "int", "float", "bool", "if ", "else", "elif", "for ", "while ", "try", "except", "finally", "with ", "as ", "return", "yield", "print", "input", "open", "read", "write"]
    prompt_lower = prompt.lower()
    for keyword in coding_keywords:
        if keyword in prompt_lower:
            return True
    return False


def answer_question(prompt, max_len=100):
    """
    Answers a question if it's related to Python coding, otherwise returns a predefined message.
    """
    if is_coding_question(prompt):
        inputs = tokenizer(
            prompt,
            return_tensors='pt',
            padding=True,
            return_attention_mask=True
        )

        # Generate response
        output = model.generate(
            input_ids=inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_length=max_len,
            num_return_sequences=1,
            no_repeat_ngram_size=2,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )

        generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
        return generated_text
    else:
        return "I can only answer questions related to Python coding."

## Driver code: Testing the model :)

In [10]:
print(f"Prompt: 'How to define a function in Python?'")
print(f"Response: {answer_question('How to define a function in Python?')}\n")

Prompt: 'How to define a function in Python?'
Response: How to define a function in Python?

The Python language is a collection of functions. Each function is defined by a class, and each class has a set of methods that it can call.
, the Python interpreter, is the main program that runs the interpreter. The interpreter is responsible for executing the code that is in the source code. It is also responsible to execute the compiled code, which is what the compiler does. In Python, each function has its own set, called the class



In [11]:
print(f"Prompt: 'Tell me a story.'")
print(f"Response: {answer_question('Tell me a story.')}\n")

Prompt: 'Tell me a story.'
Response: I can only answer questions related to Python coding.

