# Semantic Parsing for Single-Relation Question Answering

    Wen-tau Yih 
    Xiaodong He 
    Christopher Meek
    2014

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/SingleRelationQA-YihHeMeek-ACL14.pdf

## 总结
1. 回答single-relation questions: r(e1, e2)
1. entity-mention pair: e1和e2
1. pattern–relation pair: r和e1e2
1. train两个CNN semantic model识别entity-mention pair和pattern–relation pair
1. 得到了这两个pair再套lambda expression

## Problem Definition & Approach
In this paper, we focus on using a knowledge base to answer `single-relation` questions.

An example of a single-relation question is “When were DVD players invented?” The entity is `dvd-player` and the relation is `be-invent-in`. The answer can thus be described as the following lambda expression:

    λx. be-invent-in(dvd-player, x)

```
    Q → RP ∧ M (1) 
    RP → when were X invented (2) 
    M → dvd players (3)
when were X invented
    → be-invent-in (4) 
dvd players
    → dvd-player (5)
```

Figure 1: A potential semantic parse of the question “When were DVD players invented?”

A knowledge base in this work can be simply viewed as a collection of binary relation instances in the form of r(e1, e2), where r is the relation and e1 and e2 are the first and second entity arguments.

Given a question, we first separate it into two disjoint parts: the `entity mention` and the `relation pattern`. The entity mention is a subsequence of consecutive words in the question, where the relation pattern is the question where the mention is substituted by a special symbol.

## Convolutional Neural Network based Semantic Model
![1](http://ou8qjsj0m.bkt.clouddn.com//17-8-6/45019825.jpg)

Figure 2: The CNNSM maps a variable-length word sequence to a low-dimensional vector in a latent semantic space. A word contextual window size (i.e., the receptive field) of three is used in the illustration. Convolution over word sequence via learned matrix Wc is performed implicitly via the earlier word hashing layer’s mapping with a local receptive field. The max operation across the sequence is applied for each of 500 feature dimensions separately.

Given a pattern and a relation, we compute their relevance score by measuring the cosine similarity between their semantic vectors. The semantic relevance score between a pattern Q and a relation R is defined as the cosine score of their semantic vectors $y_Q$ and $y_R$.

We train two CNN semantic models from sets of pattern–relation and mention–entity pairs, respectively.

The posterior probability of the positive relation given the pattern is computed based on the cosine scores using softmax:

$P(R^{+}|Q)=\frac{exp(\gamma \cdot cos(y_{R^{+}},y_Q))}{\sum_{R'} exp(\gamma \cdot cos(y_{R'},y_Q))}$

where γ is a scaling factor set to 5. Model training is done by maximizing the log-posteriori using stochastic gradient descent.