# Tapas

TAPAS (Table-based Pretraining and Answering System) is a specialized transformer model designed for table-based question answering tasks. Here's a description similar to the one provided for DistilBERT:

TAPAS is a compact, efficient, and specialized transformer model tailored for table-based question answering tasks. It offers a lightweight alternative to larger transformer models, boasting significantly fewer parameters while still achieving high performance on table-based question answering benchmarks.

Compared to larger models like BERT, TAPAS demonstrates superior efficiency, with a reduction in parameters by up to and a corresponding improvement in inference speed.

This model is fine-tuned from a base version of TAPAS using knowledge distillation techniques on datasets like SQuAD v1.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

In [None]:
import tensorflow as tf
from tensorflow import keras

In [None]:
!pip install transformers
from transformers import pipeline



In [10]:
answerer = pipeline("table-question-answering", model='google/tapas-base-finetuned-wtq')

config.json:   0%|          | 0.00/1.66k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/443M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/490 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/262k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/154 [00:00<?, ?B/s]

In [14]:
data = pd.read_csv('/content/plane_crash_info.csv')

In [15]:
data.head()

Unnamed: 0.1,Unnamed: 0,Fatal,Date,Location,Carrier,Flight,Type
0,1,2907*,09/11/2001,"New York City, New York",American /United Airlines,11/93,B767 / B767
1,2,583,03/27/1977,"Tenerife, Canary Islands",Pan Am / KLM,1736/4805,B747 / B747
2,3,520,08/12/1985,"Mt. Osutaka, Japan",Japan Air Lines,123,B747
3,4,349,11/12/1996,"New Delhi, India",Saudi / Kazastan,763/1907,B747 / Il76
4,5,346,03/03/1974,"Bois d' Ermenonville, France",Turkish Airlines,981,DC10


In [19]:
data = data.astype(str)

In [21]:
query = input("Enter your question from the data...")

Enter your question from the data...What is highest fatalities?


In [22]:
print(answerer(table=data, query=query)["answer"])

AVERAGE > 2907*


In the dataset, '2907*' was not a proper integer but our answered was able to answer correctly even for it.
Lets test if we remove the first row

In [40]:
n_data = data.drop(data.index[0]).reset_index(drop=True)
n_data.head()

Unnamed: 0.1,Unnamed: 0,Fatal,Date,Location,Carrier,Flight,Type
0,2,583,03/27/1977,"Tenerife, Canary Islands",Pan Am / KLM,1736/4805,B747 / B747
1,3,520,08/12/1985,"Mt. Osutaka, Japan",Japan Air Lines,123,B747
2,4,349,11/12/1996,"New Delhi, India",Saudi / Kazastan,763/1907,B747 / Il76
3,5,346,03/03/1974,"Bois d' Ermenonville, France",Turkish Airlines,981,DC10
4,6,329,06/23/1985,Atlantic Ocean West of Ireland,Air India,182,B747


In [39]:
print(answerer(table=n_data, query=query)["answer"])

AVERAGE > 583


This time, it is only considering int values. Let us verify this.

In [41]:
n_data['Fatal'].max()

'583'