# Building an AI-detector: fine-tuning DistilBERT with keras (GPT/Claude/Gemini)

The model trained in the previous notebook on GPT data was able to identify gpt-4o and gpt-4o with high accuracy, but was less accurate on other models. In this notebook I'll build a new model that incorporates training data from Claude and Gemini, and also has a more complex, hierarchical structure: it consists of a base DistilBERT three-class classifier, to which a custom keras layer is attached which fuses the multiclass probabilities into a binary output layer.

## Install and import dependencies

First, we have to import the necessary libraries, making sure the latest version of the Huggingface "transformers" library is installed and is compatible with keras.

In [None]:
pip install --upgrade transformers

Collecting transformers
  Downloading transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Downloading transformers-4.51.3-py3-none-any.whl (10.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.4/10.4 MB[0m [31m66.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.51.1
    Uninstalling transformers-4.51.1:
      Successfully uninstalled transformers-4.51.1
Successfully installed transformers-4.51.3
Note: you may need to restart the kernel to use updated packages.


In [None]:
!pip install tf-keras
import os
os.environ['TF_USE_LEGACY_KERAS'] = '1'



In [None]:
from transformers import TFDistilBertForSequenceClassification, DistilBertTokenizerFast
import numpy as np
import tensorflow as tf

2025-04-19 03:25:08.985280: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745033109.230712      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745033109.300178      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Loading the training data

Next, we load and explore the training data.

In [None]:
import pandas as pd

human_train = pd.read_csv('human_train.csv')
AI_train = pd.read_csv('AI_train_gpt_claude_gemini.csv')

In [None]:
human_train

Unnamed: 0,text,source
0,Alan Mathison Turing (; 23 June 1912 – 7 June ...,English Wikipedia
1,"James Dewey Watson (born April 6, 1928) is an ...",English Wikipedia
2,"Harry George Drickamer (November 19, 1918 – Ma...",English Wikipedia
3,Anthony Stephen Fauci ( FOW-chee; born Decemb...,English Wikipedia
4,"Charles Hard Townes (July 28, 1915 – January 2...",English Wikipedia
...,...,...
25175,I’ve been reading through AITA and found a pos...,Reddit (r/OffMyChest)
25176,"So, my mom bakes cakes and she got an order t...",Reddit (r/OffMyChest)
25177,My brother is 16 and has Down Syndrome. For a ...,Reddit (r/OffMyChest)
25178,With the news of Bill and Melinda Gates divorc...,Reddit (r/OffMyChest)


In [None]:
AI_train

Unnamed: 0,text,prompt,system,model,temperature,cleaning
0,Alan Turing (23 June 1912 – 7 June 1954) was a...,Write the introductory section to a Wikipedia ...,You are a wikipedia contributor.,gpt-4o-mini,0.22,Removed headers and markdown formatting
1,"James Dewey Watson (born April 6, 1920) is an ...",Write the introductory section to a Wikipedia ...,You are a wikipedia contributor.,gpt-4o-mini,0.31,Removed headers and markdown formatting
2,Harry George Drickamer (born [insert date of b...,Write the introductory section to a Wikipedia ...,You are a wikipedia contributor.,gpt-4o-mini,0.54,Removed headers and markdown formatting
3,"Anthony Stephen Fauci (born December 24, 1940)...",Write the introductory section to a Wikipedia ...,You are a wikipedia contributor.,gpt-4o-mini,0.38,Removed headers and markdown formatting
4,"Charles H. Townes (July 28, 1915 – January 27,...",Write the introductory section to a Wikipedia ...,You are a wikipedia contributor.,gpt-4o-mini,0.03,Removed headers and markdown formatting
...,...,...,...,...,...,...
35249,Throwaway because my friends know my main. I (...,Write a post in r/relationship_advice with the...,You are a redditor.,gemini-2.0-flash,1.06,Removed headers
35250,AITA for embarrassing my FIL after I repeatedl...,Write a post in r/AmItheAsshole with the title...,You are a redditor.,gemini-1.5-flash,1.09,"Removed headers; removed ""So, "" at beginning o..."
35251,"Look, I know it sucks to feel like you're talk...",Write a post in r/dating_advice with the title...,You are a redditor.,gemini-2.0-flash,1.17,Removed headers
35252,AITA: For giving my deceased son's college fun...,Write a post in r/AmItheAsshole with the title...,You are a redditor.,gemini-1.5-flash,1.03,Removed headers


In [None]:
AI_train['model'].unique()

array(['gpt-4o-mini', 'gpt-4o', 'claude-3-5-haiku-20241022',
       'claude-3-7-sonnet-20250219', 'gemini-2.0-flash',
       'gemini-1.5-flash'], dtype=object)

In [None]:
AI_train.groupby('model').describe()

Unnamed: 0_level_0,temperature,temperature,temperature,temperature,temperature,temperature,temperature,temperature
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
model,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
claude-3-5-haiku-20241022,4532.0,0.550402,0.30804,0.0,0.2675,0.6,0.82,0.99
claude-3-7-sonnet-20250219,503.0,0.549861,0.316231,0.0,0.26,0.6,0.84,0.99
gemini-1.5-flash,1285.0,0.795922,0.357694,0.0,0.51,0.9,1.1,1.2
gemini-2.0-flash,3750.0,0.792899,0.357044,0.0,0.52,0.9,1.1,1.2
gpt-4o,2560.0,0.789266,0.361525,0.0,0.51,0.9,1.1,1.2
gpt-4o-mini,22624.0,0.792187,0.358038,0.0,0.52,0.9,1.1,1.2


Our new dataset has 11,284 new AI samples generated by Claude-3.5-Haiku. Let's combine the AI and human data into a single training dataset.

In [None]:
AI_train['label'] = 1
human_train['label'] = 0
full_train = pd.concat([human_train[['text','label']],AI_train[['text','label']]],
                       ignore_index=True)

## Tokenization

Next, we have to tokenize our training corpus. As in the previous notebook, I will create a custom tokenizer that either truncates the input on paragraphs, lines, sentences or tokens.

In [None]:
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-cased')

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

In [None]:
import regex
PARAGRAPH_SEP_PATTERN = regex.compile(r'(?<=\n\n)')
LINE_SEP_PATTERN = regex.compile('[\n]+')
PUNCT_PATTERN = regex.compile(r'(?<=[\p{P}])(?=\s+)')

def truncator(group_encodings):
    input_ids = []
    attention_mask = []

    input_ids = [tokenizer.cls_token_id]
    attention_mask = [1]
    n = 0
    while n < len(group_encodings):
        if len(input_ids) + len(group_encodings[n]['input_ids']) + 1 >= tokenizer.model_max_length:
            break
        input_ids = [*input_ids, *group_encodings[n]['input_ids']]
        attention_mask = [*attention_mask, *group_encodings[n]['attention_mask']]
        n += 1

    input_ids.append(tokenizer.sep_token_id)
    attention_mask.append(1)

    pad_length = tokenizer.model_max_length - len(input_ids)
    input_ids = [*input_ids, *[tokenizer.pad_token_id]*pad_length]
    attention_mask = [*attention_mask, *[0]*pad_length]

    return {'input_ids': input_ids,
            'attention_mask': attention_mask}, n

def tokenizer_custom_truncation(text):
  # split text into paragraphs and tokenize
    paragraphs = PARAGRAPH_SEP_PATTERN.split(text)
    paragraph_encodings = [tokenizer(para, add_special_tokens=False) for para in paragraphs]

  # if first paragraph is too long, further split text into lines and tokenize
    if len(paragraph_encodings[0]['input_ids']) +2 >= tokenizer.model_max_length:
        lines = LINE_SEP_PATTERN.split(paragraphs[0])
        line_encodings = [tokenizer(line, add_special_tokens=False) for line in lines]

      # if first line is still too long, split first line on punctuation and tokenize
        if len(line_encodings[0]['input_ids']) +2 >= tokenizer.model_max_length:
            sentences = PUNCT_PATTERN.split(lines[0])
            sentence_encodings = [tokenizer(sentence, add_special_tokens=False) for sentence in sentences]

          # if first sentence is still too long, just return truncated first sentence
            if len(sentence_encodings[0]['input_ids']) +2 >= tokenizer.model_max_length:
              return tokenizer(sentences[0], truncation=True, padding='max_length')
        # otherwise truncate first line split on sentences
            else:
              encodings, _ = truncator(sentence_encodings)
              return encodings
      # otherwise truncate first paragraph split on lines
        else:
            encodings, _ = truncator(line_encodings)
            return encodings
  # otherwise truncate whole text split on paragraphs
    encodings, _ = truncator(paragraph_encodings)
    return encodings

def tokenize_list(texts):
    encodings = [tokenizer_custom_truncation(text) for text in texts]
    return {'input_ids': np.array([e['input_ids'] for e in encodings]),
            'attention_mask': np.array([e['attention_mask'] for e in encodings])}

In [None]:
%%time
full_train_encodings = tokenize_list(full_train['text'].tolist())

Token indices sequence length is longer than the specified maximum sequence length for this model (570 > 512). Running this sequence through the model will result in indexing errors


CPU times: user 1min 37s, sys: 1.33 s, total: 1min 38s
Wall time: 1min 38s


## Hyperparameter tuning using a validation set

We now perform hyperparameter tuning as in the previous notebook. This will require a train/val split:

In [None]:
from sklearn.model_selection import train_test_split

train_indices, val_indices = train_test_split(np.arange(len(full_train)), test_size=0.2, random_state=623, stratify=full_train['label'])

train_encodings = {'input_ids': full_train_encodings['input_ids'][train_indices,:],
                   'attention_mask': full_train_encodings['attention_mask'][train_indices,:]}
val_encodings = {'input_ids': full_train_encodings['input_ids'][val_indices,:],
                 'attention_mask': full_train_encodings['attention_mask'][val_indices,:]}

We can now convert our training and validation data into TensorFlow datasets.

In [None]:
def create_dataset(encodings, labels, batch_size):
    input_ids = tf.convert_to_tensor(encodings['input_ids'], dtype=tf.int32)
    attention_mask = tf.convert_to_tensor(encodings['attention_mask'], dtype=tf.int32)
    labels = tf.keras.utils.to_categorical(labels)

    return tf.data.Dataset.from_tensor_slices(
        ({'input_ids': input_ids, 'attention_mask': attention_mask}, labels)
        ).shuffle(buffer_size=len(encodings['input_ids'])).batch(batch_size).prefetch(tf.data.AUTOTUNE)

In [None]:
batch_size = 8

train_dataset = create_dataset(train_encodings, full_train.iloc[train_indices]['label'].values, batch_size)
val_dataset = create_dataset(val_encodings, full_train.iloc[val_indices]['label'].values, batch_size)

I0000 00:00:1745033230.054287      19 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13942 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5
I0000 00:00:1745033230.055133      19 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13942 MB memory:  -> device: 1, name: Tesla T4, pci bus id: 0000:00:05.0, compute capability: 7.5


We are now ready to train the model. We'll load the model, then set the parameters for the training loop:

In [None]:
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import LearningRateScheduler,TensorBoard
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Lambda, Softmax

def lr_scheduler(epoch, lr):
    return learning_rate

def create_model():
    base_model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-cased', num_labels=3)
    input_ids = Input(shape=(None,), dtype=tf.int32, name='input_ids')
    attention_mask = Input(shape=(None,), dtype=tf.int32, name='attention_mask')
    logits_3_class = base_model([input_ids, attention_mask]).logits
    probs_3_class = Softmax(name='softmax_3class')(logits_3_class)

    def fuse_probs(probs):
        class_0 = probs[:,0]
        class_1 = probs[:,1] + probs[:,2]
        return tf.stack([class_0, class_1], axis=1)

    fused_probs = Lambda(fuse_probs, name='fused_probs')(probs_3_class)
    model = Model(inputs=[input_ids, attention_mask], outputs=fused_probs)

    model.compile(optimizer=RMSprop(learning_rate=learning_rate),
                  loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True),
                  metrics = ['accuracy'])
    return model

def fit_model(model, train, val=None):
    history = model.fit(train,
                        epochs=epochs,
                        batch_size=batch_size,
                        callbacks=[LearningRateScheduler(lr_scheduler),
                                   TensorBoard(log_dir="logs/fit", histogram_freq=1, update_freq='batch')],
                        validation_data=val,
                        verbose=1)

In [None]:
learning_rate = 1e-5
model_for_val = create_model()

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/263M [00:00<?, ?B/s]

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFDistilBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']
You should 

As in the previous model, I'll begin training at a learning rate of $10^{-5}$, then repeatedly halve the learning rate as I train for further epochs.

In [None]:
epochs = 1
fit_model(model_for_val, train_dataset, val_dataset)

I0000 00:00:1745033257.239707      75 service.cc:148] XLA service 0x7b3091ab0e30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1745033257.240599      75 service.cc:156]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
I0000 00:00:1745033257.240624      75 service.cc:156]   StreamExecutor device (1): Tesla T4, Compute Capability 7.5
I0000 00:00:1745033257.325182      75 cuda_dnn.cc:529] Loaded cuDNN version 90300
I0000 00:00:1745033257.434190      75 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.




We immediately see that both the train and validation loss at one epoch are nearly an order of magnitude higher than in the previous model. The model is struggling to for a decision boundary that incorporates the new text samples.

In [None]:
learning_rate /= 2
fit_model(model_for_val, train_dataset, val_dataset)



After the second epoch, the train and validation loss have both improved slightly

In [None]:
learning_rate /= 2
fit_model(model_for_val, train_dataset, val_dataset)



After the third epoch, the train loss has continued to improve, while the validation loss has very slightly increased. I'll stop training at this point to avoid overfitting, then train using the same learning rate schedule.

## Training on the full dataset

In [None]:
batch_size = 8
full_train_dataset = create_dataset(full_train_encodings, full_train['label'].values, batch_size)

In [None]:
learning_rate = 1e-5
model_full = create_model()

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFDistilBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']
You should 

In [None]:
epochs = 1
fit_model(model_full, full_train_dataset)



In [None]:
model_full.save('model_epoch_1')

In [None]:
learning_rate /= 2
fit_model(model_full, full_train_dataset)



In [None]:
model_full.save('model_epoch_2')

In [None]:
learning_rate /= 2
fit_model(model_full, full_train_dataset)



In [None]:
model_full.save('model_epoch_3')