## Overview

KerasNLP makes it very easy to create simple model pipelines at a very fast rate. In this guide we create a simple text classification pipeline from scratch including augmentation, model building etc.

## Imports & setup

This tutorial requires you to have KerasNLP installed:

```shell
pip install keras-nlp
```

We begin by importing all required packages:

In [1]:
!pip install keras-nlp wandb




In [2]:
import numpy as np
import skimage.io as io
import random
import os
import cv2
import pandas as pd
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from glob import glob
from scipy.io import loadmat
import matplotlib.pyplot as plt
import keras_nlp
import tensorflow as tf
import keras
from keras.layers import *
from keras import Sequential
import matplotlib.pyplot as plt



Using TensorFlow backend


## Data loading

This guide uses the
[Quora Insincere Questions Classification Dataset](https://www.kaggle.com/competitions/quora-insincere-questions-classification/data)
for demonstration purposes.

To get started, we first load the dataset:


In [3]:
df = pd.read_csv('/kaggle/input/quora-insincere-questions-classification/train.csv')
df

Unnamed: 0,qid,question_text,target
0,00002165364db923c7e6,How did Quebec nationalists see their province...,0
1,000032939017120e6e44,"Do you have an adopted dog, how would you enco...",0
2,0000412ca6e4628ce2cf,Why does velocity affect time? Does velocity a...,0
3,000042bf85aa498cd78e,How did Otto von Guericke used the Magdeburg h...,0
4,0000455dfa3e01eae3af,Can I convert montra helicon D to a mountain b...,0
...,...,...,...
1306117,ffffcc4e2331aaf1e41e,What other technical skills do you need as a c...,0
1306118,ffffd431801e5a2f4861,Does MS in ECE have good job prospects in USA ...,0
1306119,ffffd48fb36b63db010c,Is foam insulation toxic?,0
1306120,ffffec519fa37cf60c78,How can one start a research project based on ...,0


In [4]:
text = df['question_text'].tolist()
target = df['target'].tolist()

In [5]:
from kaggle_secrets import UserSecretsClient
import wandb
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("api_key")
wandb.login(key = secret_value_0)

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [6]:
run = wandb.init(project="quora")
table = wandb.Table(data=df)
run.log({'data':table})
run.finish()

[34m[1mwandb[0m: Currently logged in as: [33mtensorgirl[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: wandb version 0.16.4 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade
[34m[1mwandb[0m: Tracking run with wandb version 0.16.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/kaggle/working/wandb/run-20240317_201746-hqidhmcb[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mfanciful-capybara-6[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/tensorgirl/quora[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/tensorgirl/quora/runs/hqidhmcb[0m
[34m[1mwandb[0m:                                                                                
[34m[1mwandb[0m: 🚀 View run [33mfanciful-capybara-6[0m at: [34m[4mhttps://wandb.ai/tensorgirl/quora/runs/hqidhmcb[0m
[34m[1mwandb[0m: Synced 5 W&B file(s), 1

## Model Building

We use the pretrained `Roberta Classifier` from the KerasNLP to build a simple text classifier.

In [7]:
from wandb.keras import WandbMetricsLogger
run = wandb.init(project="quora",name = 'model_training')

classifier = keras_nlp.models.RobertaClassifier.from_preset(
    "roberta_base_en",
    num_classes=2,
)
classifier.backbone.trainable = False

history = classifier.fit(x=text[:5000], y=target[:5000], verbose =1, epochs=1,batch_size=16,callbacks=[WandbMetricsLogger()])

[34m[1mwandb[0m: wandb version 0.16.4 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade
[34m[1mwandb[0m: Tracking run with wandb version 0.16.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/kaggle/working/wandb/run-20240317_202026-lvg1zh2s[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mmodel_training[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/tensorgirl/quora[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/tensorgirl/quora/runs/lvg1zh2s[0m
Attaching 'config.json' from model 'keras/roberta/keras/roberta_base_en/1' to your Kaggle notebook...
Attaching 'config.json' from model 'keras/roberta/keras/roberta_base_en/1' to your Kaggle notebook...
Attaching 'model.weights.h5' from model 'keras/roberta/keras/roberta_base_en/1' to your Kaggle notebook...
Attaching 'tokenizer.json' from model 'keras/roberta/keras/roberta_base_en/1' to



In [8]:
run.finish()

[34m[1mwandb[0m:                                                                                
[34m[1mwandb[0m: 
[34m[1mwandb[0m: Run history:
[34m[1mwandb[0m:                       epoch/epoch ▁
[34m[1mwandb[0m:               epoch/learning_rate ▁
[34m[1mwandb[0m:                        epoch/loss ▁
[34m[1mwandb[0m: epoch/sparse_categorical_accuracy ▁
[34m[1mwandb[0m: 
[34m[1mwandb[0m: Run summary:
[34m[1mwandb[0m:                       epoch/epoch 0
[34m[1mwandb[0m:               epoch/learning_rate 2e-05
[34m[1mwandb[0m:                        epoch/loss 0.17204
[34m[1mwandb[0m: epoch/sparse_categorical_accuracy 0.9382
[34m[1mwandb[0m: 
[34m[1mwandb[0m: 🚀 View run [33mmodel_training[0m at: [34m[4mhttps://wandb.ai/tensorgirl/quora/runs/lvg1zh2s[0m
[34m[1mwandb[0m: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
[34m[1mwandb[0m: Find logs at: [35m[1m./wandb/run-20240317_202026-lvg1zh2s/logs[0m


In [9]:
classifier.predict([text[0]])



array([[-0.69298095,  0.8235075 ]], dtype=float32)