<a href="https://colab.research.google.com/github/zap-frs/myrepo/blob/main/ask_law_cdqa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook [1]: First steps with cdQA

This notebook shows how to use the `cdQA` pipeline to perform question answering on a custom dataset.

***Note:*** *If you are using colab, you will need to install `cdQA` by executing `!pip install cdqa` in a cell.*

In [2]:
!pip install cdqa

Collecting cdqa
[?25l  Downloading https://files.pythonhosted.org/packages/39/f5/af831b7ee653aa6bace99e39ec6b2754b1adb10bb60a1296f5e16f1f24ee/cdqa-1.3.9.tar.gz (45kB)
[K     |████████████████████████████████| 51kB 5.0MB/s 
[?25hCollecting Flask==1.1.1
[?25l  Downloading https://files.pythonhosted.org/packages/9b/93/628509b8d5dc749656a9641f4caf13540e2cdec85276964ff8f43bbb1d3b/Flask-1.1.1-py2.py3-none-any.whl (94kB)
[K     |████████████████████████████████| 102kB 7.7MB/s 
[?25hCollecting flask_cors==3.0.8
  Downloading https://files.pythonhosted.org/packages/78/38/e68b11daa5d613e3a91e4bf3da76c94ac9ee0d9cd515af9c1ab80d36f709/Flask_Cors-3.0.8-py2.py3-none-any.whl
Collecting joblib==0.13.2
[?25l  Downloading https://files.pythonhosted.org/packages/cd/c1/50a758e8247561e58cb87305b1e90b171b8c767b15b12a1734001f41d356/joblib-0.13.2-py2.py3-none-any.whl (278kB)
[K     |████████████████████████████████| 286kB 37.7MB/s 
[?25hCollecting pandas==0.25.0
[?25l  Downloading https://files.pytho

In [1]:
import os
import pandas as pd
from ast import literal_eval

from cdqa.utils.filters import filter_paragraphs
from cdqa.pipeline import QAPipeline



### Download pre-trained reader model and example dataset


In [2]:
from cdqa.utils.download import download_model, download_bnpp_data

download_bnpp_data(dir='http://constructiontrack1ng.com/data/')
download_model(model='bert-squad_1.1', dir='./models')


Downloading BNP data...

Downloading trained model...


### Visualize the dataset

In [53]:
df = pd.read_csv('http://constructiontrack1ng.com/data/ffask43-Copy2.csv', converters={'paragraphs': literal_eval})
df = filter_paragraphs(df)
df.head()

Unnamed: 0,date,title,category,link,abstract,paragraphs
0,11.01.2016,G.R. No. 167333 PEDRO LADINES Petitioner v. PE...,Press release,https://www.chanrobles.com/cralaw/2016januaryd...,,[We point out that the concept of newly-discov...
1,11.01.2016,G.R. No. 203882 LORELEI O. ILADAN Petitioner v...,Press release,https://www.chanrobles.com/cralaw/2016januaryd...,,"[By this Petition for Review on Certiorari,1 p..."
2,11.01.2016,G.R. No. 198450 PEOPLE OF THE PHILIPPINES Plai...,Press release,https://www.chanrobles.com/cralaw/2016januaryd...,,[Their observance is the key to the successful...
3,11.01.2016,G.R. No. 197825 CAMILO SIBAL Petitioner v. PED...,Press release,https://www.chanrobles.com/cralaw/2016januaryd...,,[A petition for annulment of judgment is a rem...
4,27.01.2016,G.R. No. 180993 REPUBLIC OF THE PHILIPPINES R...,Press release,https://www.chanrobles.com/cralaw/2016januaryd...,,[The power of the OSG to deputize legal office...


In [52]:
from cdqa.utils.converters import df2squad

json_data = df2squad(df=df, squad_version='v1.1', output_dir='./sample_data', filename='dataset_anong')

7it [00:00, 1599.75it/s]


### Instantiate the cdQA pipeline from a pre-trained reader model

In [49]:
cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib')
cdqa_pipeline.fit_retriever(df=df)

100%|██████████| 231508/231508 [00:00<00:00, 22933207.30B/s]


QAPipeline(reader=BertQA(adam_epsilon=1e-08, bert_model='bert-base-uncased',
                         do_lower_case=True, fp16=False,
                         gradient_accumulation_steps=1, learning_rate=5e-05,
                         local_rank=-1, loss_scale=0, max_answer_length=30,
                         n_best_size=20, no_cuda=False,
                         null_score_diff_threshold=0.0, num_train_epochs=3.0,
                         output_dir=None, predict_batch_size=8, seed=42,
                         server_ip='', server_po...size=8,
                         verbose_logging=False, version_2_with_negative=False,
                         warmup_proportion=0.1, warmup_steps=0),
           retrieve_by_doc=False,
           retriever=BM25Retriever(b=0.75, floor=None, k1=2.0, lowercase=True,
                                   max_df=0.85, min_df=2, ngram_range=(1, 2),
                                   preprocessor=None, stop_words='english',
                                   t

### Execute a query

In [58]:
from cdqa.utils.evaluation import evaluate_pipeline

evaluate_pipeline(cdqa_pipeline, '/content/sample_data/cdqa-v1.1.json')

HBox(children=(IntProgress(value=0, max=8), HTML(value='')))



Evaluation results: {'exact_match': 25.0, 'f1': 47.69512159218041}


{'exact_match': 25.0, 'f1': 47.69512159218041}

In [59]:
from cdqa.utils.evaluation import evaluate_reader

evaluate_reader(cdqa_pipeline, '/content/sample_data/cdqa-v1.1.json')

Evaluation expects v-1.1, but got dataset with v-v1.1


{'exact_match': 37.5, 'f1': 67.34796408985049}

In [56]:
query = ' when is certiorari available?'
prediction = cdqa_pipeline.predict(query)

### Explore predictions

In [57]:
print('query: {}'.format(query))
print('answer: {}'.format(prediction[0]))
print('title: {}'.format(prediction[1]))
print('paragraph: {}'.format(prediction[2]))

query:  when is certiorari available?
answer: when a court or other tribunal exercising quasi-judicial powers acts without or in excess of its jurisdiction
title: G.R. No. 218536 ROLANDO P. TOLENTINO Petitioner v. COMMISSION ON ELECTIONS (FIRST DIVISION)
paragraph: Certiorari is available when a court or other tribunal exercising quasi-judicial powers acts without or in excess of its jurisdiction or with grave abuse of discretion amounting to lack of jurisdiction. It is an extraordinary remedy of last resort designed to correct errors of jurisdiction.There is grave abuse of discretion justifying the issuance of the writ of certiorari when there is such capricious and whimsical exercise of judgment as is equivalent to lack of jurisdiction
