## Content:
    00 Crawling top 10 documents from Google
    0. Pre-processing
        0.1 Splitting documents into sentences, removing unnecessary characters
        0.2 Extracting potential answer type from question
        0.3 Extracting query terms from question
        0.4 Extracting entities from all documents
        0.5 Building AIDA mention dictionary for all entities
    1. QUEST
        1.1 Creating SPO Tuples
        1.2 Building Graph
        1.3 Running GST Algorithm
          

## 00 Crawling top 10 documents from Google
    raw data from Google

In [56]:
import networkx as nx
import pickle

#### [Input Data] 
question from terminal. Example: "When was Steve Jobs born?"

#### [Output Data]
10 documents from top 10 website in Google. 
#### One of Output File:

In [34]:
fp1=open('./docs/doc1.txt','r')
lines=fp1.readlines()
lines[:2]

['https://en.wikipedia.org/wiki/Steve_Jobs\n',
 '  CentralNotice  From Wikipedia, the free encyclopedia Jump to navigation Jump to search This article is about the person. For other uses, see  Steve Jobs (disambiguation) American entrepreneur and co-founder of Apple Inc. Jobs in 2010 Steven Paul Jobs February 24, 1955 October 5, 2011 Palo Alto, California Alta Mesa Memorial Park Pioneer of the  personal computer revolution  with  Co-creator of the  , and first  US$7 billion (September 2011) Co-founder, Chairman, and CEO of  Primary investor and Chairman of  Founder, Chairman, and CEO of  The Walt Disney Company 4, including  Steven Paul Jobs ; February 24, 1955 \xe2\x80\x93 October 5, 2011) was an American  , and  . He was the chairman, chief executive officer (CEO), and co-founder of  , the chairman and majority shareholder of  , a member of  The Walt Disney Company \'s board of directors following its acquisition of Pixar, and the founder, chairman, and CEO of  . Jobs is widely recog

In [35]:
fp1.close()

## 0. Pre-processing

### 0.1 Splitting documents into sentences, removing unnecessary characters

In [None]:
#Format the documents for future processing, split into sentences, remove unnecessary characters
# s11='question_top10.txt'
# get_formatted_docs(question,docs,s11)

#### [Input Data] 
1. question: "When was Steve Jobs born?"  
2. docs: the address of top 10 links' documents. eg: ['./docs/doc1.txt', './docs/doc1.txt', ...]

#### [Output Data] 
question_top10.txt
#### Output File:

In [76]:
fp1=open('./question_top10.txt','r')
lines=fp1.readlines()
lines[:10]

['ques-q1\tWhen was Steve Jobs born\n',
 'doc-0\t./docs/doc1.txt\thttps://en.wikipedia.org/wiki/Steve_Jobs\n',
 '\t\tsent-0\t\tCentralNotice  From Wikipedia, the free encyclopedia Jump to navigation Jump to search This article is about the person.\n',
 '\t\tsent-1\t\tFor other uses, see  Steve Jobs  disambiguation  American entrepreneur and co-founder of Apple Inc.\n',
 '\t\tsent-2\t\tJobs in 2010 Steven Paul Jobs February 24, 1955 October 5, 2011 Palo Alto, California Alta Mesa Memorial Park Pioneer of the  personal computer revolution  with  Co-creator of the  , and first  US$7 billion  September 2011  Co-founder, Chairman, and CEO of  Primary investor and Chairman of  Founder, Chairman, and CEO of  The Walt Disney Company 4, including  Steven Paul Jobs ; February 24, 1955 \xe2\x80\x93 October 5, 2011  was an American  , and  .\n',
 "\t\tsent-3\t\tHe was the chairman, chief executive officer  CEO , and co-founder of  , the chairman and majority shareholder of  , a member of  The Walt

In [77]:
fp1.close()

### 0.2 Extracting potential answer type from question

In [None]:
# s1='question_type.txt'
# get_answer_type_main(question,s1,nlp)

#### [Input Data] 
question: "When was Steve Jobs born?"

#### [Output Data]
question_type.txt
#### Output File:

In [15]:
fp1=open('./question_type.txt','r')
lines=fp1.readlines()
lines

['date\n']

In [16]:
fp1.close()

### 0.3 Extracting query terms from question

In [None]:
# s1='question_subject_predicate.txt'
# get_sub_pred_ques_main(question,s1,nlp)

#### [Input Data]
question: "When was Steve Jobs born?"

#### [Output Data]
question_subject_predicate.txt
#### Output File:

In [17]:
fp1=open('./question_subject_predicate.txt','r')
lines=fp1.readlines()
lines

['was P\n', 'Steve Jobs NE\n', 'born P\n']

In [18]:
fp1.close()

### 0.4 Extracting entities from all documents

In [None]:
# s2='subjects_from_all_docs.txt'
# get_all_subject_main(s11,s2,nlp)

#### [Input Data] 
"question_top10.txt" (top 10 links with its sentences)

#### [Output Data]
subjects_from_all_docs.txt
#### Output File:

In [25]:
fp1=open('./subjects_from_all_docs.txt','r')
lines=fp1.readlines()
lines[:50]

['Yale School of Management\n',
 'sister Patty\n',
 '34 people\n',
 'Five years later\n',
 'A-Changin\n',
 'Webarchive template wayback links Webarchive template webcite links Wikipedia\n',
 'Amelio-Jobs cooperation\n',
 'demand\n',
 'Apple office\n',
 'June 2008\n',
 'molded plastic case\n',
 'original Apple computers\n',
 'Syrian migration\n',
 'mansion\n',
 'other treatments\n',
 'Schuster\n',
 'votes\n',
 'clasps\n',
 'latter idea\n',
 'hormone imbalance\n',
 'Baig\n',
 '$ 1.64 billion\n',
 'Mediterranean\n',
 'Three Faces\n',
 'Unstrip post\xe2\x80\x90expand size\n',
 'biggest threat\n',
 'point A\n',
 'disappointing sales\n',
 'tenacity\n',
 'comparable technologies\n',
 'pride\n',
 'terrible mistake\n',
 'short movie\n',
 'nearly a decade\n',
 'California People\n',
 'May 1985\n',
 'late 1980s\n',
 'risk\n',
 'Monta Loma elementary school\n',
 '100,000 units\n',
 'blanket\n',
 'little girl\n',
 'Getaway\n',
 'a day\n',
 'Reed College\n',
 'floppy disk drive\n',
 'Adoption Jobs\n

In [26]:
fp1.close()

### 0.5 Building AIDA mention dictionary for all entities

In [None]:
# s1='mention_dict_all_docs'
# get_mentions_main(s2,s1)

#### [Input Data]
1.'question_subject_predicate.txt' (query terms from question: ['was P\n', 'Steve Jobs NE\n', 'born P\n'])
2.'subjects_from_all_docs.txt' (entities from all documents)

#### [Output Data]
mention_dict_all_docs
#### Output File:

In [36]:
fp1=open('./mention_dict_all_docs','r')
lines=fp1.readlines()
lines[:10]

['(dp0\n',
 "S'roy disney'\n",
 'p1\n',
 'c__builtin__\n',
 'set\n',
 'p2\n',
 '((lp3\n',
 "S'<roy_e._disney>'\n",
 'p4\n',
 "aS'<roy_o._disney>'\n"]

In [37]:
fp1.close()

## 1. QUEST

### 1.1 Creating SPO Tuples

In [None]:
# context_match_flag,NE_types=call_main_SPO(con, spo_file, option, context_len, verbose, nlp ,gt)
# call_main_SPO(f1,f2,f3,f4,f5,nlp,gt)
# f1=argv[1] #input file
# f2=argv[2] #output file
# f3=argv[3] #Option
# f4=argv[4] #context Length
# f5=argv[5] #verbose

#### [Input Data]
    1.'./files/context_ques-q1'(generated by './question_top10.txt')
    2. config: 	
        c1=config['corpus'] "question_top10.txt"#input file with question, question type, context, answer, supporting fact
        c2=config['mention_dict'] "mention_dict_all_docs" #input file with mention dictionary
        c3=config['ques_ent_pred'] "question_subject_predicate.txt" #input file with question terms
        c4=config['ans_type'] "question_type.txt" #input file with question answer types
        c5=config['outfile'] "answer.txt" #output file

#### [Output Data]
    './files/SPO_paragraph_ques-q1.txt' 
    [document| address| sentence number| subject 1 | d1| predicate | d2 | subject 2]
    d1: distance between subject 1 to predicate 
    d2: distance between predicate to subject 1 

#### Input File:

In [40]:
fp1=open('./files/context_ques-q1','r')
lines=fp1.readlines()
lines[:10]

['doc-0\t./docs/doc1.txt\thttps://en.wikipedia.org/wiki/Steve_Jobs\n',
 '\t\tsent-0\t\tCentralNotice  From Wikipedia, the free encyclopedia Jump to navigation Jump to search This article is about the person.\n',
 '\t\tsent-1\t\tFor other uses, see  Steve Jobs  disambiguation  American entrepreneur and co-founder of Apple Inc.\n',
 '\t\tsent-2\t\tJobs in 2010 Steven Paul Jobs February 24, 1955 October 5, 2011 Palo Alto, California Alta Mesa Memorial Park Pioneer of the  personal computer revolution  with  Co-creator of the  , and first  US$7 billion  September 2011  Co-founder, Chairman, and CEO of  Primary investor and Chairman of  Founder, Chairman, and CEO of  The Walt Disney Company 4, including  Steven Paul Jobs ; February 24, 1955 \xe2\x80\x93 October 5, 2011  was an American  , and  .\n',
 "\t\tsent-3\t\tHe was the chairman, chief executive officer  CEO , and co-founder of  , the chairman and majority shareholder of  , a member of  The Walt Disney Company 's board of directors fo

In [41]:
fp1.close()

#### Output File:

In [38]:
fp1=open('./files/SPO_paragraph_ques-q1.txt','r')
lines=fp1.readlines()
lines[:10]

['doc-0 | ./docs/doc1.txt | sent-0 | navigation Jump | 2 | search | 2 | article\n',
 'doc-0 | ./docs/doc1.txt | sent-0 | free encyclopedia Jump | 4 | search | 2 | article\n',
 'doc-0 | ./docs/doc1.txt | sent-0 | Wikipedia | 7 | search | 2 | article\n',
 'doc-0 | ./docs/doc1.txt | sent-0 | article | 1 | is about | 2 | person\n',
 'doc-0 | ./docs/doc1.txt | sent-1 | other uses | 2 | see | 1 | Steve Jobs disambiguation American entrepreneur\n',
 'doc-0 | ./docs/doc1.txt | sent-1 | other uses | 2 | see | 3 | co-founder\n',
 'doc-0 | ./docs/doc1.txt | sent-1 | other uses | 2 | see | 5 | Apple Inc\n',
 'doc-0 | ./docs/doc1.txt | sent-2 | California Alta Mesa Memorial Park Pioneer | 3 | personal computer revolution with | 5 | first US$ 7 billion September 2011 Co-founder\n',
 'doc-0 | ./docs/doc1.txt | sent-2 | California Alta Mesa Memorial Park Pioneer | 3 | personal computer revolution with | 7 | Chairman\n',
 'doc-0 | ./docs/doc1.txt | sent-2 | California Alta Mesa Memorial Park Pioneer | 

In [39]:
fp1.close()

### 1.2 Building Graph 

In [None]:
# q_ent, cornerstones, QKG_match_flag = call_main_GRAPH(spo_file, con, terms, f2,QKG_file, cornerstone_file,gdict,prune,verbose,gt,NE_types,config)
# call_main_GRAPH(f1, f2, f4, f5, f6, f7, gdict, prune, verbose, gt1, NE_types, config):
# f1=argv[1] #input triple file
# f2=argv[2] #input context file
# f4=argv[4] #file with question terms
# f5=argv[5] #mention dictionary pickle files
# f6=argv[6] #output graph path
# f7=argv[7] #OUTput cornerstone path

#### [Input Data]
    1.[f1-triple file]~/quest/Code/files/SPO_paragraph_ques-q1.txt
    2.[f2-context file]~/quest/Code/files/context_ques-q1 
    3.[f4-file with question terms]~/quest/Code/question_subject_predicate.txt
    4.[f5] config['mention_dict'] ~/quest/Code/mention_dict_all_docs, containing AIDA mentions for entities in corpora and questions

#### [Output Data]
    1.[f6-output graph path]~/quest/Code/files/QKG_ques-q1
    2.[f7-output cornerstone path]~/quest/Code/files/QKG_cornerstones_ques-q1

#### Input File:

In [43]:
fp1=open('./files/SPO_paragraph_ques-q1.txt','r')
lines=fp1.readlines()
lines[:5]

['doc-0 | ./docs/doc1.txt | sent-0 | navigation Jump | 2 | search | 2 | article\n',
 'doc-0 | ./docs/doc1.txt | sent-0 | free encyclopedia Jump | 4 | search | 2 | article\n',
 'doc-0 | ./docs/doc1.txt | sent-0 | Wikipedia | 7 | search | 2 | article\n',
 'doc-0 | ./docs/doc1.txt | sent-0 | article | 1 | is about | 2 | person\n',
 'doc-0 | ./docs/doc1.txt | sent-1 | other uses | 2 | see | 1 | Steve Jobs disambiguation American entrepreneur\n']

In [45]:
fp2=open('./files/context_ques-q1','r')
lines=fp2.readlines()
lines[:5]

['doc-0\t./docs/doc1.txt\thttps://en.wikipedia.org/wiki/Steve_Jobs\n',
 '\t\tsent-0\t\tCentralNotice  From Wikipedia, the free encyclopedia Jump to navigation Jump to search This article is about the person.\n',
 '\t\tsent-1\t\tFor other uses, see  Steve Jobs  disambiguation  American entrepreneur and co-founder of Apple Inc.\n',
 '\t\tsent-2\t\tJobs in 2010 Steven Paul Jobs February 24, 1955 October 5, 2011 Palo Alto, California Alta Mesa Memorial Park Pioneer of the  personal computer revolution  with  Co-creator of the  , and first  US$7 billion  September 2011  Co-founder, Chairman, and CEO of  Primary investor and Chairman of  Founder, Chairman, and CEO of  The Walt Disney Company 4, including  Steven Paul Jobs ; February 24, 1955 \xe2\x80\x93 October 5, 2011  was an American  , and  .\n',
 "\t\tsent-3\t\tHe was the chairman, chief executive officer  CEO , and co-founder of  , the chairman and majority shareholder of  , a member of  The Walt Disney Company 's board of directors fo

In [48]:
fp3=open('./question_subject_predicate.txt','r')
lines=fp3.readlines()
lines[:5]

['was P\n', 'Steve Jobs NE\n', 'born P\n']

In [52]:
fp4=open('./mention_dict_all_docs','r')
lines=fp4.readlines()
lines[:10]

['(dp0\n',
 "S'roy disney'\n",
 'p1\n',
 'c__builtin__\n',
 'set\n',
 'p2\n',
 '((lp3\n',
 "S'<roy_e._disney>'\n",
 'p4\n',
 "aS'<roy_o._disney>'\n"]

In [53]:
fp1.close()
fp2.close()
fp3.close()
fp4.close()

#### Output File:

In [57]:
G1 = nx.read_gpickle("./files/QKG_ques-q1")

In [60]:
G1.nodes(data=True)[0]

(u'11 presentation lessons:Entity',
 {'did': ['doc-0'],
  'dtitle': ['./docs/doc1.txt'],
  'matched': '',
  'sid': [('sent-858', 'doc-0')],
  'weight': 0.0})

In [61]:
G1.edges(data=True)[0]

(u'11 presentation lessons:Entity',
 u'learn from:Predicate:1',
 {'did': ['doc-0'],
  'dtitle': ['./docs/doc1.txt'],
  'etype': 'Triple',
  'sid': [('sent-858', 'doc-0')],
  'weight': 0.25,
  'wlist': [0.25]})

In [67]:
G2 = nx.read_gpickle("./files/QKG_cornerstones_ques-q1")

In [75]:
G2

{u'11 steve jobs:Entity': 'steve jobs',
 u'1985 jobs:Entity': 'steve jobs',
 u'1986 jobs:Entity': 'steve jobs',
 u'2003 jobs:Entity': 'steve jobs',
 u'2009 jobs:Entity': 'steve jobs',
 u'actor steve carell:Entity': 'steve jobs',
 u'adoption jobs:Entity': 'steve jobs',
 u'after jobs:Entity': 'steve jobs',
 u'apple co-founder 06 oct 2011 steve jobs:Entity': 'steve jobs',
 u'apple co-founder steve jobs:Entity': 'steve jobs',
 u'apple jobs:Entity': 'steve jobs',
 u'biological father steve jobs:Entity': 'steve jobs',
 u'birth after:Predicate:1': 'born',
 u'birth family during:Predicate:1': 'born',
 u'birth father:Predicate:1': 'born',
 u'birth mother out:Predicate:1': 'born',
 u'book steve jobs 2011:Entity': 'steve jobs',
 u'book steve jobs:Predicate:1': 'steve jobs',
 u'born in:Predicate:1': 'born',
 u'born in:Predicate:2': 'born',
 u'born in:Predicate:3': 'born',
 u'born in:Predicate:4': 'born',
 u'born in:Predicate:5': 'born',
 u'born in:Predicate:6': 'born',
 u'born in:Predicate:7': 'bo

### graph G1

<img src="graph.png">,

### 1.3 Running GST Algorithm