# Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base

    Wen-tau Yih,Ming-Wei Chang,Xiaodong He,Jianfeng Gao
    2015

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ACL15-STAGG.pdf

## 总结
1. 定义了一个query graph用于映射到形式逻辑λ-calculus
1. 训练两个CNN
    1. 找到pattern（如 “who first voiced meg on ” ）
    1. 找到inferential chain（如cast-actor）
1. 是一个ranking问题，而不是一个分类问题

## Introduction
1. We first define a `query graph` that can be straightforwardly mapped to a logical form in λ-calculus and is semantically closely related to λ-DCS (Liang, 2013).
1. Semantic parsing is then reduced to query graph generation, formulated as a search problem with staged states and actions. Each state is a candidate parse in the query graph representation and each action defines a way to **grow** the graph.
1. In particular, we stage the actions into three main steps:   
    1. locating the topic `entity` in the question, 
    1. finding the main `relationship` between the answer and the topic entity, and 
    1. expanding the query graph with additional `constraints` that describe properties the answer needs to have, or relationships between the answer and other entities in the question.

## Background
### Knowledge base
The knowledge base K considered in this work is a collection of subject-predicate-object triples (e1, p, e2), where e1, e2 ∈ E are the `entities` (e.g., FamilyGuy or MegGriffin) and p ∈ P is a binary `predicate` like character. A knowledge base in this form is often called a `knowledge graph`.

![1](http://ou8qjsj0m.bkt.clouddn.com//17-8-6/77503092.jpg)

### Query graph
Our `query graph` consists of four types of nodes:

1. grounded entity (rounded rectangle),
1. existential variable (circle), 
1. lambda variable (shaded circle),
1. aggregation function (diamond).

![2](http://ou8qjsj0m.bkt.clouddn.com//17-8-6/92401601.jpg)

1. `Grounded entities` are existing entities in the knowledge base K.
1. `Existential variables` and `lambda variables` are ungrounded entities. In particular, we would like to retrieve all the entities that can map to the lambda variables in the end as the answers.
1. Aggregation function is designed to operate on a specific entity,which typically captures some numerical properties.

The two entities, MegGriffin and FamilyGuy are represented by two rounded rectangle nodes. The circle node y means that there should exist an entity describing some casting relations like the character, actor and the time she started the role. The shaded circle node x is also called the answer node, and is used to map entities retrieved by the query. The diamond node arg min constrains that the answer needs to be the earliest actor for this role.

Equivalently, the logical form query in λ-calculus without the aggregation function is: λx.∃y.cast(FamilyGuy, y) ∧ actor(y, x) ∧ character(y, MegGriffin)

Running this query graph against K as in Fig. 1 will match both LaceyChabert and MilaKunis before applying the aggregation function, but only LaceyChabert is the correct answer as she started this role earlier.

## Staged Query Graph Generation
We focus on generating query graphs with the following properties.

1. the tree graph consists of one entity node as the root, referred as the `topic entity`.
1. there exists only one lambda variable x as the answer node, with a directed path from the root to it, and has zero or more existential variables in-between. We call this path the `core inferential chain` of the graph, as it describes the main relationship between the answer and topic entity.
1. zero or more entity or aggregation nodes can be attached to each variable node, including the answer node.

in Fig. 2

1. FamilyGuy is the root
1. FamilyGuy → y → x is the core inferential chain
1. The branch y → MegGriffin specifies the character and y → arg min constrains that the answer needs to be the earliest actor for this role.

![3](http://ou8qjsj0m.bkt.clouddn.com//17-8-6/90536406.jpg)

Given a question, we formalize the query graph generation process as a search problem, with staged `states` and `actions`.

1. Let S = S{φ, Se, Sp, Sc} be the set of states, where each state could be 
    1. an empty graph (φ), 
    1. a single node graph with the topic entity (Se), 
    1. a core inferential chain (Sp), 
    1. a more complex query graph with additional constraints (Sc).
1. Let A = S U {Ae, Ap, Ac, Aa} be the set of actions. An action grows a given graph by adding some edges and nodes. In particular, 
    1. Ae picks an entity node;
    1. Ap determines the core inferential chain; 
    1. Ac and Aa add constraints and aggregation nodes, respectively.

### Linking Topic Entity
Starting from the initial state s0, the valid actions are to create a single-node graph that corresponds to the topic entity found in the given question.

![4](http://ou8qjsj0m.bkt.clouddn.com//17-8-6/41081525.jpg)

### Identifying Core Inferential Chain
Given a state s that corresponds to a single-node graph with the topic entity e, valid actions to extend this graph is to identify the core inferential chain; namely, the relationship between the topic entity and the answer.

![5](http://ou8qjsj0m.bkt.clouddn.com//17-8-6/36131909.jpg)

#### Deep Convolutional Neural Networks
![6](http://ou8qjsj0m.bkt.clouddn.com//17-8-6/66390053.jpg)

We propose using Siamese neural networks (Bromley et al., 1993) for identifying the core inferential chain. For instance, one of our constructions maps the question to a pattern by replacing the entity
mention with a generic symbol <e> and then compares it with a candidate chain, such as “who first voiced meg on <e>” vs. cast-actor. The model consists of two neural networks, one for the pattern and the other for the inferential chain.

Training the model needs positive pairs, such as a pattern like “who first voiced meg on <e>” and an inferential chain like cast-actor. These pairs can be extracted from the full semantic parses when provided in the training data.

### Augmenting Constraints & Aggregations
To further restrict the set of answer entities, the graph with only the core inferential chain can be expanded by two types of actions: Ac and Aa. 

![7](http://ou8qjsj0m.bkt.clouddn.com//17-8-6/85577135.jpg)

### Learning the reward function
#### Features
![8](http://ou8qjsj0m.bkt.clouddn.com//17-8-6/30019334.jpg)

1. Topic Entity
1. Core Inferential Chain
    1. PatChain compares the pattern (replacing the topic entity with an entity symbol) and the predicate sequence.
    1. QuesEP concatenates the canonical name of the topic entity and the predicate sequence, and compares it with the question.
1. Constraints & Aggregations
1. Overall

#### Learning
We view it as a ranking problem. Suppose we have several candidate query graphs for each question. Let ga and gb be the query graphs described in states sa and sb for the same question q, and the entity sets Aa and Ab be those retrieved by executing ga and gb, respectively. Suppose that A is the labeled answers to q. We first compute the precision, recall and F1 score of Aa and Ab, compared with the gold answer set A. We then rank sa and sb by their F1 scores.