Model, Architecture, and Checkpoints

Remember, architecture refers to the skeleton of the model and checkpoints are the weights for a given architecture. For example, BERT is an architecture, while bert-base-uncased is a checkpoint. Model is a general term that can mean either architecture or checkpoint.

Nearly every NLP task begins with a tokenizer. A tokenizer converts your input into a format that can be processed by the model.

In [None]:
# Load a tokenizer with AutoTokenizer

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

sequence = "In a hole in the ground there lived a hobbit. That's the chinese gift."

print(tokenizer(sequence))



In [None]:
# Use Tokenizer to encode and decode

from transformers import AutoTokenizer

tokenzier = AutoTokenizer.from_pretrained("bert-base-cased")

encoded = tokenzier("Do not meddle in the affairs of wizzards, for they are subtle and quick to anger!")

print(encoded)

tokenizer.decode(encoded["input_ids"])


In [None]:
# Batch inputs and padding

# Set the padding parameter to True to pad the shorter sequences in the batch to match the longest sequence

batch_sentences = [
    "But what about second breakfast?",
    "Don't think he knows about second breakfast, Pip.",
    "What about elevensies?",
]

encoded_input = tokenizer(batch_sentences, padding=True)

print(encoded_input)

In [None]:
# Return tensors for model input

batch_sentences = [
    "But what about second breakfast?",
    "Don't think he knows about second breakfast, Pip.",
    "What about elevensies?",
]
encoded_input = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt")

print(encoded_input)

For audio tasks, you’ll need a feature extractor to prepare your dataset for the model. The feature extractor is designed to extract features from raw audio data, and convert them into tensors.

In [None]:
# Install datasets first

! pip install -U datasets

In [3]:
# Install missing package of librosa

! pip install -U librosa

Collecting librosa
  Downloading librosa-0.10.1-py3-none-any.whl.metadata (8.3 kB)
Collecting audioread>=2.1.9 (from librosa)
  Downloading audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Collecting pooch>=1.0 (from librosa)
  Downloading pooch-1.8.0-py3-none-any.whl.metadata (9.9 kB)
Collecting soxr>=0.3.2 (from librosa)
  Downloading soxr-0.3.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.5 kB)
Collecting lazy-loader>=0.1 (from librosa)
  Downloading lazy_loader-0.3-py3-none-any.whl.metadata (4.3 kB)
Collecting msgpack>=1.0 (from librosa)
  Downloading msgpack-1.0.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.1 kB)
Downloading librosa-0.10.1-py3-none-any.whl (253 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m253.7/253.7 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading audioread-3.0.1-py3-none-any.whl (23 kB)
Downloading lazy_loader-0.3-py3-none-any.whl (9.1 kB)
Downloading msgpack-1.0.7-cp310-c

In [5]:
# Use dataset to pre-processing Audio data file

from datasets import load_dataset, Audio

dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")

from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

model = AutoModelForAudioClassification.from_pretrained("facebook/wav2vec2-base")
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")

dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))

dataset[0]["audio"]

def preprocess_function(examples):
    audio_arrays = [x["array"] for x in examples["audio"]]
    inputs = feature_extractor(
        audio_arrays,
        sampling_rate=16000,
        padding=True,
        max_length=100000,
        truncation=True,
    )
    return inputs

dataset = dataset.map(preprocess_function, batched=True)

dataset = dataset.rename_column("intent_class", "labels")

from torch.utils.data import DataLoader

dataset.set_format(type="torch", columns=["input_values", "labels"])
dataloader = DataLoader(dataset, batch_size=4)


Some weights of the model checkpoint at facebook/wav2vec2-base were not used when initializing Wav2Vec2ForSequenceClassification: ['project_hid.weight', 'project_q.weight', 'quantizer.weight_proj.bias', 'quantizer.weight_proj.weight', 'project_q.bias', 'quantizer.codevectors', 'project_hid.bias']
- This IS expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForSequenceClassification were not initialized from the model checkpoint at facebook/wav2vec2-base and are newly initialized: ['projector.bias', 'projector.weight', 'classifier.

Map:   0%|          | 0/563 [00:00<?, ? examples/s]