# ⭐ Arabic Named Entity Recognition (NER) Project

### Project Overview
In this project, we'll be developing a Named Entity Recognition (NER) system for Arabic text to identify entities such as:
- **Persons (PER)**
- **Organizations (ORG)**
- **Locations (LOC)**
- **Dates (TIMEX)**
---

### Tools and Libraries:
- HuggingFace Transformers
- PyTorch
- Kaggle Datasets
- Tokenizers
- Scikit-Learn

In [19]:
import warnings
warnings.filterwarnings('ignore')

In [20]:
!pip install datasets



In [21]:
from datasets import load_dataset

# Login using e.g. `huggingface-cli login` to access this dataset
dataset = load_dataset("iahlt/arabic_ner_mafat")
train_dataset = dataset["train"]

In [22]:
dataset

DatasetDict({
    train: Dataset({
        features: ['tokens', 'raw_tags', 'ner_tags', 'spaces', 'spans', 'record', 'text'],
        num_rows: 40000
    })
})

In [23]:
# Extract all the tags
all_tags = set(tag for example in dataset["train"]['raw_tags'] for tag in example)

# Create a mapping from tag name to number
tag2id = {tag: idx for idx, tag in enumerate(sorted(all_tags))}
id2tag = {v: k for k, v in tag2id.items()}

# Add new columns with numbers instead of raw_tags
def convert_tags(example):
    example['labels'] = [tag2id[tag] for tag in example['raw_tags']]
    return example

# Apply the conversion to the entire dataset
dataset = dataset.map(convert_tags)

# If you want to see the mapping
print("Tag2ID:", tag2id)
print("ID2Tag:", id2tag)

Tag2ID: {'B-ANG': 0, 'B-DUC': 1, 'B-EVE': 2, 'B-FAC': 3, 'B-GPE': 4, 'B-INFORMAL': 5, 'B-LOC': 6, 'B-MISC': 7, 'B-ORG': 8, 'B-PER': 9, 'B-TIMEX': 10, 'B-TTL': 11, 'B-WOA': 12, 'I-ANG': 13, 'I-DUC': 14, 'I-EVE': 15, 'I-FAC': 16, 'I-GPE': 17, 'I-INFORMAL': 18, 'I-LOC': 19, 'I-MISC': 20, 'I-ORG': 21, 'I-PER': 22, 'I-TIMEX': 23, 'I-TTL': 24, 'I-WOA': 25, 'L-ANG': 26, 'L-DUC': 27, 'L-EVE': 28, 'L-FAC': 29, 'L-GPE': 30, 'L-INFORMAL': 31, 'L-LOC': 32, 'L-MISC': 33, 'L-ORG': 34, 'L-PER': 35, 'L-TIMEX': 36, 'L-TTL': 37, 'L-WOA': 38, 'O': 39}
ID2Tag: {0: 'B-ANG', 1: 'B-DUC', 2: 'B-EVE', 3: 'B-FAC', 4: 'B-GPE', 5: 'B-INFORMAL', 6: 'B-LOC', 7: 'B-MISC', 8: 'B-ORG', 9: 'B-PER', 10: 'B-TIMEX', 11: 'B-TTL', 12: 'B-WOA', 13: 'I-ANG', 14: 'I-DUC', 15: 'I-EVE', 16: 'I-FAC', 17: 'I-GPE', 18: 'I-INFORMAL', 19: 'I-LOC', 20: 'I-MISC', 21: 'I-ORG', 22: 'I-PER', 23: 'I-TIMEX', 24: 'I-TTL', 25: 'I-WOA', 26: 'L-ANG', 27: 'L-DUC', 28: 'L-EVE', 29: 'L-FAC', 30: 'L-GPE', 31: 'L-INFORMAL', 32: 'L-LOC', 33: 'L-MISC'

# 🏷️ Tag2ID Mappings for Arabic NER Dataset

The table below explains each NER tag, its meaning, along with multiple Arabic examples to cover various cases:

<table style="border-collapse: collapse; width: 100%; text-align: center;">
  <thead style="background-color: #f2f2f2; color: #333;">
    <tr>
      <th style="padding: 10px; border: 1px solid #ccc;">Tag</th>
      <th style="padding: 10px; border: 1px solid #ccc;">Meaning</th>
      <th style="padding: 10px; border: 1px solid #ccc;">Arabic Examples</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="padding: 10px; border: 1px solid #ccc;">PER</td>
      <td style="padding: 10px; border: 1px solid #ccc;">Person 👤</td>
      <td style="padding: 10px; border: 1px solid #ccc;">"أحمد"، "فاطمة"، "محمد صلاح"، "نجيب محفوظ"</td>
    </tr>
    <tr>
      <td style="padding: 10px; border: 1px solid #ccc;">ORG</td>
      <td style="padding: 10px; border: 1px solid #ccc;">Organization 🏢</td>
      <td style="padding: 10px; border: 1px solid #ccc;">"شركة الاتصالات"، "اليونيسكو"، "جامعة القاهرة"، "نادي الأهلي"</td>
    </tr>
    <tr>
      <td style="padding: 10px; border: 1px solid #ccc;">LOC</td>
      <td style="padding: 10px; border: 1px solid #ccc;">Location 📍</td>
      <td style="padding: 10px; border: 1px solid #ccc;">"الحديقة"، "النهر"، "الصحراء"، "الجبل"</td>
    </tr>
    <tr>
      <td style="padding: 10px; border: 1px solid #ccc;">GPE</td>
      <td style="padding: 10px; border: 1px solid #ccc;">Geopolitical Entity 🌍</td>
      <td style="padding: 10px; border: 1px solid #ccc;">"مصر"، "السعودية"، "القاهرة"، "الرياض"</td>
    </tr>
    <tr>
      <td style="padding: 10px; border: 1px solid #ccc;">TIMEX</td>
      <td style="padding: 10px; border: 1px solid #ccc;">Time Expression ⏰</td>
      <td style="padding: 10px; border: 1px solid #ccc;">"الساعة"، "اليوم"، "عام ٢٠٢٠"، "منتصف الليل"</td>
    </tr>
    <tr>
      <td style="padding: 10px; border: 1px solid #ccc;">TTL</td>
      <td style="padding: 10px; border: 1px solid #ccc;">Title 📚</td>
      <td style="padding: 10px; border: 1px solid #ccc;">"عنوان الكتاب"، "رواية الأسود يليق بك"، "فيلم الكنز"، "مقالة علمية"</td>
    </tr>
    <tr>
      <td style="padding: 10px; border: 1px solid #ccc;">WOA</td>
      <td style="padding: 10px; border: 1px solid #ccc;">Work of Art 🎨</td>
      <td style="padding: 10px; border: 1px solid #ccc;">"لوحة الموناليزا"، "تمثال الحرية"، "قصيدة الأطلال"، "مقطوعة موسيقية"</td>
    </tr>
    <tr>
      <td style="padding: 10px; border: 1px solid #ccc;">MISC</td>
      <td style="padding: 10px; border: 1px solid #ccc;">Miscellaneous 📦</td>
      <td style="padding: 10px; border: 1px solid #ccc;">"موقع إلكتروني"، "اسم حدث"، "مصطلح علمي"، "براءة اختراع"</td>
    </tr>
  </tbody>
</table>

> This table gives a richer and clearer view 🌟 of each tag with several examples to better understand how tags are used in Arabic contexts.


In [24]:
dataset["train"][0]

{'tokens': ['البروفسور', 'محمود', 'خليل'],
 'raw_tags': ['B-TTL', 'B-PER', 'L-PER'],
 'ner_tags': [47, 38, 37],
 'spaces': [1, 1, 0],
 'spans': [{'end': 9, 'label': 'TTL', 'start': 0, 'text': 'البروفسور'},
  {'end': 20, 'label': 'PER', 'start': 10, 'text': 'محمود خليل'}],
 'record': '{"metadata": {"doc_id": "003875358633647be47857dcee7869cfc03720431774756bd997a7cf87dd93bb", "url": "https://www.alarab.com//Article/824736", "source": "AlArab", "title": "كلية سخنين لتأهيل المعلمين تحصل على اعتراف مجلس التعليم العالي بإعطاء لقب M.Teach", "authors": "موقع العرب وصحيفة كل العرب- الناصرة", "date": "2017-09-14 15:59:34", "domains": "Students", "parnumber": "8", "sentnumber": "none"}, "text": "البروفسور محمود خليل", "label": [[0, 9, "TTL"], [10, 20, "PER"]], "user": "nlhowell", "timestamp": 1685356355.6331542, "flatten": {"tokens": ["البروفسور", "محمود", "خليل"], "ner_tags": ["B-TTL", "B-PER", "L-PER"], "spaces": [1, 1, 0]}, "label_hierarchy": {"0": [{"end": 9, "label": "TTL", "start": 0, "te

In [25]:
from datasets import DatasetDict

def simplify(example):
    return {
        "tokens": example["tokens"],
        "labels": example["labels"]
    }

keep_cols = ["tokens", "labels"]

simplified_dataset = DatasetDict({
    split: dataset[split]
        .map(simplify, remove_columns=[col for col in dataset[split].column_names if col not in keep_cols])
    for split in dataset
})


In [26]:
simplified_dataset["train"][0]

{'tokens': ['البروفسور', 'محمود', 'خليل'], 'labels': [11, 9, 35]}

# 🥰 Dataset Preprocessing Summary

In this project, we focused on preprocessing the **Arabic NER** dataset by following these steps:

### 1. **Selecting Important Columns**  
We selected only the necessary columns: **tokens** and **labels**. This helps to focus on the key information needed for Named Entity Recognition (NER).

### 2. **Tag Mapping**  
We created a mapping between the tag names and numerical values:
- **Tag to ID Mapping**: We assigned a unique number to each tag.
- **ID to Tag Mapping**: We reversed the mapping to get back from IDs to tags.

### 3. **Unified Tagging**  
We replaced the original **raw_tags** with numerical labels corresponding to each entity type, making it easier to process the data for training.

---

With these steps, the dataset is now ready for model training and optimized for NER tasks! 🚀


In [27]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-base-arabic")
from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer

num_labels = 40

model = AutoModelForTokenClassification.from_pretrained(
    "asafaya/bert-base-arabic",
    num_labels=num_labels,
    id2label=id2tag,
    label2id=tag2id
)


# ✨ Tokenization and Label Alignment for Arabic NER

To prepare the Arabic NER dataset for model training, we apply a tokenization and label alignment step.  
The goal is to **tokenize** each word properly and **align** the NER labels with the corresponding tokens, taking into account that some words may be split into multiple sub-tokens.

---

### 🔹 Step Explanation:

- 1️⃣ **Tokenization:** Tokenize each example separately using the pretrained tokenizer.
- 2️⃣ **Tracking:** Keep track of which token belongs to which original word (`word_ids`).
- 3️⃣ **Label Assignment:** Assign the correct label only to the first token of each word. For sub-tokens or special tokens, assign `-100` (to ignore them during loss calculation).
- 4️⃣ **Store Labels:** Attach the aligned labels back into the tokenized output.
- 5️⃣ **Batch Processing:** Apply the mapping to the entire dataset using `batched=True` for efficiency.

---

### 🎯 Why This Matters?

- **Subword Tokenization:** In languages like Arabic, tokenizers (such as BERT-based ones) often split words into multiple subwords.
- **Correct Label Alignment:** Ensures only the first sub-token carries the label, avoiding confusion during model training.
- **Loss Masking:** Using `-100` masks irrelevant positions, helping the model focus on learning the right targets.


In [28]:
def tokenize_and_align_labels(examples):
    tokenized_inputs = tokenizer(
        examples["tokens"],
        truncation=True,
        is_split_into_words=True,
        padding=True,
        max_length=512
    )

    all_labels = []
    for i, word_ids in enumerate(tokenized_inputs.word_ids(batch_index=i) for i in range(len(examples["tokens"]))):
        labels = []
        previous_word_idx = None
        for word_idx in word_ids:
            if word_idx is None:
                labels.append(-100)
            elif word_idx != previous_word_idx:
                labels.append(examples["labels"][i][word_idx])
            else:
                labels.append(-100)
            previous_word_idx = word_idx
        all_labels.append(labels)

    tokenized_inputs["labels"] = all_labels
    return tokenized_inputs


tokenized_datasets = simplified_dataset.map(tokenize_and_align_labels, batched=True)

In [29]:
!pip install evaluate



In [30]:
!pip install seqeval




# 📊 Evaluation Setup for Arabic NER

To properly evaluate our NER model performance, we set up the data collator, evaluation metric, and metric computation function.

---

### 🔹 Key Steps:

- **Data Collation:**  
  Use `DataCollatorForTokenClassification` to dynamically pad inputs and labels during training.

- **Metric Loading:**  
  Load the `seqeval` metric, specialized for sequence labeling tasks like NER.

- **Metric Computation:**  
  - Predict the best label for each token using `argmax`.
  - Align predictions and labels, ignoring padding tokens (`-100`).
  - Calculate overall **precision**, **recall**, **f1-score**, and **accuracy**.

 


In [31]:
from transformers import DataCollatorForTokenClassification
import numpy as np
import evaluate

# Data collator
data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)

# Metrics
metric = evaluate.load("seqeval")

def compute_metrics(p):
    predictions, labels = p
    # Ensure that predictions is an array of probabilities, and if so, use np.argmax to choose the label with the highest probability
    predictions = np.argmax(predictions, axis=2)

    # Get the true predictions (selected based on the highest probability)
    true_predictions = [
        [id2tag[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    # Get the true labels (from labels, removing the -100 values)
    true_labels = [
        [id2tag[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    # Calculate the results using the seqeval library
    results = metric.compute(predictions=true_predictions, references=true_labels)

    # Return the evaluation metrics
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

In [32]:
import torch


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

model.to(device)

cuda


BertForTokenClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(32000, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12

In [33]:
import os
import warnings
import logging
os.environ["WANDB_DISABLED"] = "true"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
logging.getLogger("transformers").setLevel(logging.ERROR)
warnings.filterwarnings("ignore")


# 🚀 Model Training Setup for Arabic NER

In this section, we use the **Trainer API** from Hugging Face Transformers to train the model **CAMeL-Lab's bert-base-arabic-camelbert-ca** (`camelbert`) on the **Arabic NER dataset**.



### 🔹 Training Arguments Explanation:

- **output_dir="/kaggle/working/"**: Specifies the directory where model checkpoints and training results will be saved.
- **eval_strategy="epoch"**: Evaluates the model at the end of each *epoch*.
- **save_strategy="epoch"**: Saves the model after each *epoch*.
- **learning_rate=2e-5**: Sets the learning rate for the optimizer.
- **per_device_train_batch_size=16**: Batch size per device (CPU/GPU) during training.
- **per_device_eval_batch_size=16**: Batch size per device during evaluation.
- **num_train_epochs=3**: Number of *epochs* for training, set to 3 here.
- **weight_decay=0.01**: Applies L2 regularization to prevent overfitting.
- **disable_tqdm=False**: Enables the progress bar during training.
- **gradient_checkpointing=True**: Saves memory by recomputing activations during the backward pass (useful for large models).
- **dataloader_num_workers=4**: Number of workers for data loading to speed up the process.



### 🔹 Trainer Setup Explanation:

- **model=model.to(device)**: Moves the model to the appropriate device (CPU or GPU).
- **args=training_args**: Uses the specified training arguments defined earlier.
- **train_dataset=tokenized_datasets["train"]**: The dataset used for training after preprocessing.
- **eval_dataset=tokenized_datasets["train"].select(range(1000))**: A subset of 1000 examples for evaluation.
- **tokenizer=tokenizer**: The tokenizer used to convert text into tokens.
- **data_collator=data_collator**: Responsible for padding the inputs to the correct size during batching.
- **compute_metrics=compute_metrics**: A function that calculates Precision, Recall, F1, and overall Accuracy.



### 🛠️ Final Step: Training

Finally, we train the model using `trainer.train()`, which starts the training process on the **Arabic NER dataset**.




In [34]:
tokenized_split = tokenized_datasets["train"].train_test_split(test_size=0.2, seed=42)
val_test_split = tokenized_split["test"].train_test_split(test_size=0.5, seed=42)

train_dataset = tokenized_split["train"]
valid_dataset = val_test_split["train"]
test_dataset = val_test_split["test"]

In [35]:
from transformers import TrainingArguments
from transformers import Trainer

training_args = TrainingArguments(
    output_dir="/kaggle/working/",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    disable_tqdm=False,
    gradient_checkpointing=True
)

trainer = Trainer(
    model=model.to(device),
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)
trainer.train()

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,0.1903,0.179529,0.746202,0.800179,0.772249,0.948414
2,0.1398,0.166288,0.761528,0.813257,0.786543,0.951965
3,0.1068,0.173279,0.767163,0.820879,0.793112,0.953025


TrainOutput(global_step=6000, training_loss=0.17119370396931965, metrics={'train_runtime': 7807.3911, 'train_samples_per_second': 12.296, 'train_steps_per_second': 0.769, 'total_flos': 2.495136155267712e+16, 'train_loss': 0.17119370396931965, 'epoch': 3.0})

In [36]:
results = trainer.evaluate(eval_dataset=test_dataset)
print(results)

{'eval_loss': 0.17174355685710907, 'eval_precision': 0.7637311620598529, 'eval_recall': 0.8175076452599388, 'eval_f1': 0.7897049591964846, 'eval_accuracy': 0.9523553162853298, 'eval_runtime': 67.229, 'eval_samples_per_second': 59.498, 'eval_steps_per_second': 3.719, 'epoch': 3.0}


In [41]:
from torchinfo import summary


total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Total parameters: {total_params}")
print(f"Trainable parameters: {trainable_params}")

Total parameters: 110057512
Trainable parameters: 110057512




# 📝 Test Sentences for Arabic NER

This section includes a set of test sentences used to evaluate the performance of the Arabic NER model.


In [37]:
test_sentences_ner = [
    "سافر أحمد إلى القاهرة لحضور مؤتمر التكنولوجيا.",
    "مُنح كتاب 'الخيميائي' جائزة أفضل رواية.",
    "ستقام المباراة النهائية في ملعب الملك فهد الدولي.",
    "تعمل ليلى في شركة مايكروسوفت منذ خمس سنوات.",
    "ولد العالم إسحاق نيوتن في إنجلترا.",
    "أقيمت فعاليات معرض الكتاب الدولي في الرياض.",
    "يبدأ العام الدراسي الجديد في سبتمبر.",
    "يتحدث سامي اللغة الفرنسية بطلاقة.",
    "زار السائحون مدينة البتراء الأثرية في الأردن.",
    "يقدم المستشفى الوطني خدمات طبية عالية الجودة.",
    "تم عرض الفيلم الوثائقي الجديد على قناة الجزيرة.",
    "افتتحت شركة سامسونج فرعاً جديداً في دبي.",
    "سيتم تسليم الجوائز يوم الخميس القادم.",
    "أحب قراءة كتاب 'مئة عام من العزلة'.",
    "تعيّن الدكتور يوسف عميداً لكلية الهندسة.",
    "اللغة الإسبانية منتشرة في قارة أمريكا الجنوبية.",
    "زار الفريق الرئاسي قصر قرطاج الرئاسي في تونس.",
    "يبدأ مهرجان كان السينمائي في مايو من كل عام.",
    "تعد جبال الهيمالايا من أعلى سلاسل الجبال في العالم.",
    "تم إعلان الفائز بجائزة نوبل للسلام هذا الأسبوع."
]

In [38]:
for sentence in test_sentences_ner:
    inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding=True)
    inputs = {key: value.to(device) for key, value in inputs.items()}
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    logits_cpu = logits.cpu().numpy()
    predictions = np.argmax(logits_cpu, axis=2)
    predicted_tags = [id2tag[pred] for pred in predictions[0]]
    tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])

    print(f"Sentence: {sentence}")
    for token, tag in zip(tokens, predicted_tags):
        print(f"{token}: {tag}")
    print("-" * 50)


Sentence: سافر أحمد إلى القاهرة لحضور مؤتمر التكنولوجيا.
[CLS]: O
سافر: O
احمد: B-PER
الى: O
القاهرة: B-GPE
لحضور: O
موتمر: B-EVE
التكنولوجيا: L-EVE
.: O
[SEP]: O
--------------------------------------------------
Sentence: مُنح كتاب 'الخيميائي' جائزة أفضل رواية.
[CLS]: O
منح: O
كتاب: O
': O
الخي: B-WOA
##مي: L-WOA
##ايي: L-WOA
': O
جايزة: O
افضل: O
رواية: O
.: O
[SEP]: O
--------------------------------------------------
Sentence: ستقام المباراة النهائية في ملعب الملك فهد الدولي.
[CLS]: O
ستقام: O
المباراة: O
النهايية: O
في: O
ملعب: B-FAC
الملك: I-FAC
فهد: I-FAC
الدولي: L-FAC
.: O
[SEP]: O
--------------------------------------------------
Sentence: تعمل ليلى في شركة مايكروسوفت منذ خمس سنوات.
[CLS]: O
تعمل: O
ليلى: B-PER
في: O
شركة: O
مايكروسوفت: B-ORG
منذ: O
خمس: O
سنوات: O
.: O
[SEP]: O
--------------------------------------------------
Sentence: ولد العالم إسحاق نيوتن في إنجلترا.
[CLS]: O
ولد: O
العالم: B-TTL
اسحاق: B-PER
نيوت: L-PER
##ن: L-PER
في: O
انجلترا: B-GPE
.: O
[SEP]: O
--

In [39]:
import os

# Step 1: Save the model and tokenizer
save_directory = "/kaggle/working/FinalModel"
os.makedirs(save_directory, exist_ok=True)
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)
print(f"✅ Model and tokenizer have been saved to {save_directory}")

# Step 2: Zip the directory
!zip -r /kaggle/working/FinalModel.zip /kaggle/working/FinalModel
print("✅ Model has been zipped successfully!")

✅ Model and tokenizer have been saved to /kaggle/working/FinalModel
  adding: kaggle/working/FinalModel/ (stored 0%)
  adding: kaggle/working/FinalModel/model.safetensors (deflated 7%)
  adding: kaggle/working/FinalModel/special_tokens_map.json (deflated 42%)
  adding: kaggle/working/FinalModel/config.json (deflated 63%)
  adding: kaggle/working/FinalModel/vocab.txt (deflated 63%)
  adding: kaggle/working/FinalModel/tokenizer.json (deflated 73%)
  adding: kaggle/working/FinalModel/tokenizer_config.json (deflated 74%)
✅ Model has been zipped successfully!


In [40]:
from IPython.display import FileLink

# رابط لتحميل الملف مباشرة
FileLink('/kaggle/working/FinalModel.zip')