How to spread the local evidence score using Markov chain #34

sareaghaei · 2021-03-23T14:03:14Z

Hi Antonin
Thanks for your talented work.I appreciate it if u can explain some parts of the paper for me.
1- As far as I understand, a vector of features F is computed for each entity e as its local compatibility.
The third feature of the vector is log p(e). What is it exactly? it has been mentioned that it is a log-linear combination of the number of statements, site likes and PageRrank but based on the code, it seems to be based on only the PageRank.
2- The output of semantic similarity step is a column-stochastic matrix Md. I can not get how to use the local compatibility vector and the similarity matrix Md in order to define the final feature vectors for the classification. Could u further clarify equation (1) of the paper?

wetneb · 2021-03-24T16:28:46Z

Hi @sareaghaei,

Sure!

Concerning the features for each mention, you can find them here:
https://github.com/wetneb/opentapioca/blob/ad96f385b5eca424dbd431d6661984ca98928c94/opentapioca/classifier.py#L41-L47
Here is a quick explanation of equation 1.

If you have a probability distribution over nodes of a Markov chain "p", you can get the probability distribution after following one edge in the graph by computing "T p" (where "T" is the transition matrix and p is represented as a vector). Similarly, if you want to get the distribution after k steps, you can compute "T^k p".
In our case we take the transition matrix to be equal to "alpha . I + (1 - alpha) M_d", which means that with probability alpha we stay on the same node and with probability 1 - alpha we follow an edge of the graph (with probability determined by the similarity measure). That should explain the "(alpha . I + (1 - alpha) M_d)^k" term.

Let me know if anything is still unclear!

sareaghaei · 2021-03-24T17:02:23Z

Thanks for yr reply.
1- LC(d) in the page6-equation1 should be denoted the local compatibility, right? which part of the code computes the distribution after k steps?
2- Do u think adding another feature to the feature-vector as the similarity score of tag-description context and the text-input context can improve the accuracy to some extend?

wetneb · 2021-03-24T17:19:01Z

1- LC(d) in the page6-equation1 should be denoted the local compatibility, right? which part of the code computes the distribution after k steps?

It is done here:

https://github.com/wetneb/opentapioca/blob/ad96f385b5eca424dbd431d6661984ca98928c94/opentapioca/classifier.py#L300-L303

2- Do u think adding another feature to the feature-vector as the similarity score of tag-description context and the text-input context can improve the accuracy to some extend?

Maybe! It's hard to say without trying :)

sareaghaei closed this as completed Apr 7, 2021

wetneb mentioned this issue May 5, 2021

improvment #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to spread the local evidence score using Markov chain #34

How to spread the local evidence score using Markov chain #34

sareaghaei commented Mar 23, 2021

wetneb commented Mar 24, 2021

sareaghaei commented Mar 24, 2021 •

edited

Loading

wetneb commented Mar 24, 2021

How to spread the local evidence score using Markov chain #34

How to spread the local evidence score using Markov chain #34

Comments

sareaghaei commented Mar 23, 2021

wetneb commented Mar 24, 2021

sareaghaei commented Mar 24, 2021 • edited Loading

wetneb commented Mar 24, 2021

sareaghaei commented Mar 24, 2021 •

edited

Loading