## Efficient Q&A and search system for Parliamentary Questions

### Problem Statement: 

 *During parliament session, each Department receives number of Parliament Questions on varied topics raised by MPs and handled in a very time bound manner on top priority. Each reply is generally prepared by seeking inputs from all the other relevant departments which requires lot of efforts and is also time consuming. It is desired a platform can be designed which can provide responses to similar PQ asked earlier, suggest probable reply and indicate different departments having similar programs and information. This will be helpful in preparing proper reply to PQ. As of now there are some search tools, separate for Lok Sabha and Rajya Sabha. But a unified, fast and effective mechanism is missing.*

In [61]:
#Importing libraries
import sys
import spacy
import pandas as pd
import math
import re
import numpy as np
from collections import Counter
import gzipimport gensim 
import logging

SyntaxError: invalid syntax (<ipython-input-61-e2ef8c332997>, line 9)

In [9]:
#Fetching the data
data=pd.read_csv("dataset/rajyasabha_questions_and_answers_2009.csv")
data.head()

Unnamed: 0,question,answer_date,ministry,question_type,question_no,question_by,question_title,question_description,answer,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12
0,150000,16.12.2009,COMMERCE AND INDUSTRY,STARRED,396,SHRI ISHWAR SINGH,SPURT IN PRICES OF GOLD .,(a) whether Government is aware that there is ...,MINISTER OF COMMERCE AND INDUSTRY (SHRI ANAND ...,,,,
1,150001,16.12.2009,COMMERCE AND INDUSTRY,UNSTARRED,2927,Dr. T. Subbarami Reddy,OPENING OF TRADE CENTRES IN LATIN AMERICAN COU...,(a) whether in order to capitalize on the lull...,MINISTER OF COMMERCE AND INDUSTRY (SHRI ANAND ...,,,,
2,150002,16.12.2009,COMMERCE AND INDUSTRY,UNSTARRED,2929,SHRI SHREEGOPAL VYAS,EARLY EXIT OF CHINESE BUSINESSMEN FROM TRADE F...,(a) whether the Chinese businessmen packed of ...,MINISTER OF COMMERCE AND INDUSTRY ( SHRI ANAND...,,,,
3,150003,16.12.2009,COMMERCE AND INDUSTRY,UNSTARRED,2931,Smt. Kusum Rai,DONATION BY STC AND MMTC TO STUDENT WINGS OF P...,to answer to Starred Question 97 given in the ...,MINISTER OF COMMERCE AND INDUSTRY ( SHRI ANAND...,,,,
4,150004,16.12.2009,COMMERCE AND INDUSTRY,UNSTARRED,2932,SHRI GIREESH KUMAR SANGHI,ENVISAGED EXPORT EARNING TARGETS .,(a) the envisaged export earning targets for t...,MINISTER OF COMMERCE AND INDUSTRY ( SHRI ANAND...,,,,


In [27]:
questions=data['question_description']

print('Sample questions from the dataset\n')
for i in range(10):
    print('Question No '+str(i+1)+'\n')
    print(questions[i]+'\n')
    
print(questions)
    

Sample questions from the dataset

Question No 1

(a) whether Government is aware that there is spurt in prices of gold in the country in the last few months; (b) if so, the reasons therefor; (c) whether the import policy of gold needs further changes in view of spurt in prices of gold; and (d) if so, the steps Government proposes to take to bring down the prices of gold in the country?

Question No 2

(a) whether in order to capitalize on the lull in trade between the US and Latin American countries, Government proposed to open up four to five trade centres in a few Latin American countries on a permanent basis, to create awareness among investors and business people alike on Indian products; (b) whether India and Latin American countries have potential to treble their bilateral trade from 2007-08 level of $ 12 billion to $36 billion by 2012-2013; and (c) whether Government is also planning to set up help desks across India like Bangalore and Hyderabad to provide valuable information 

In [21]:
answer=data['answer']
for i in range(10):
    print('Answer No '+str(i+1)+'\n')
    print(answer[i]+'\n')

Answer No 1

MINISTER OF COMMERCE AND INDUSTRY (SHRI ANAND SHARMA) a) to d): A Statement is laid on the Table of the House. STATEMENT REFERRED TO IN REPLY TO PARTS (a) TO (d) OF RAJYA SABHA STARRED QUESTION NO. 396 FOR ANSWER ON 16TH DECEMBER, 2009 REGARDING �<U+0080><U+009C>SPURT IN PRICES OF GOLD �<U+0080>� (a) Yes, Sir. (b) Increase in prices of gold in the international markets, seasonal demand by major consumers and investment buying are the major factors known to affect the prices of gold. (c) & (d): The gold prices are broadly driven by the international gold prices. Government has minimal control over them.

Answer No 2

MINISTER OF COMMERCE AND INDUSTRY (SHRI ANAND SHARMA) (a) to (c) At present, thirteen Indian Missions are functioning in the Latin America region. Ten posts of Marketing Assistants have been provided in nine Indian missions of the said region to exclusively look after the trade related matters and to respond queries of exporters and importers interested to unde

## Creating a cosine similarity calculator

This helps in calculating similarity between two sentences. Since we are planning to take a) Title b) Question as input from the user, using the 'Title' we will first try to get the relevant rows to search through.

![title](img/step1.png)

In [24]:
#Function to find similarity between two strings that are converted into vectors
WORD = re.compile(r"\w+")

def get_cosine(vec1, vec2):
    intersection = set(vec1.keys()) & set(vec2.keys())
    numerator = sum([vec1[x] * vec2[x] for x in intersection])

    sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])
    sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])
    denominator = math.sqrt(sum1) * math.sqrt(sum2)

    if not denominator:
        return 0.0
    else:
        return float(numerator) / denominator

#Word Vectorization step (This has to be improved)
def text_to_vector(text):
    words = WORD.findall(text)
    return Counter(words)

In [25]:
#Test case
text1 = "This is a foo bar sentence ."
text2 = "This sentence is similar to a foo bar sentence ."

vector1 = text_to_vector(text1)
vector2 = text_to_vector(text2)

cosine = get_cosine(vector1, vector2)

print("Cosine:", cosine)

Cosine: 0.8616404368553293


In [36]:
#Function to get top 100 relevant rows to search
def get_relevant_indices(query,title_list):
    
    cosine_list=np.empty(len(title_list))
    query_vector=text_to_vector(query)
    for i in range(len(title_list)):
        title_vector_i=text_to_vector(title_list[i])
        cosine_val_i=get_cosine(query_vector,title_vector_i)
        cosine_list[i]=cosine_val_i
    
    relevant_indices=cosine_list.argsort()[-10:][::-1]
    return relevant_indices

In [49]:
#Function to get relevant titles
def get_relevant_titles(query):
    titles=data['question_title'].values.tolist()
    relevant_indices=get_relevant_indices(query,titles)
    relevant_titles=[]
    for i in range(len(relevant_indices)):
        relevant_titles.append(titles[relevant_indices[i]])
    
    relevant_titles_df=pd.DataFrame({'Indices':relevant_indices,
                                    'Titles':relevant_titles})
    return relevant_titles_df

In [57]:
#Test Case
relevant_titles=get_relevant_titles("Selling stakes of PSEs")
print(relevant_titles)

print('\n\n For better visualization')
print('\n\nRelevant Indices: \n')
print(relevant_titles['Indices'].values)

print('\nRelevant Titles:\n')
print(relevant_titles['Titles'].values)


   Indices                                             Titles
0      177                             Disinvestment of NALCO
1      270                    INCOME TAX SLABS FOR INDUSTRIES
2       86                      FAKE CURRENCY IN ATM OF BANKS
3       87                            EMI CALCULATION METHODS
4       88  SEPARATION OF LENDING BUSINESS FROM INVESTMENT...
5       89                      SCRAPPING OF INSURANCE AGENTS
6       90                   FAKE CURRENCY RACKETS AT KOLKATA
7       91                           SELLING STAKES OF PSES .
8       92                           DISCUSSION AT G SUMMIT .
9      134                            PATTERN OF RURAL CREDIT


 For better visualization


Relevant Indices: 

[177 270  86  87  88  89  90  91  92 134]

Relevant Titles:

['Disinvestment of NALCO' 'INCOME TAX SLABS FOR INDUSTRIES'
 'FAKE CURRENCY IN ATM OF BANKS' 'EMI CALCULATION METHODS'
 'SEPARATION OF LENDING BUSINESS FROM INVESTMENT BUSINESS BY BANKS .'
 'SCRAPPING OF INS