Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to spread the local evidence score using Markov chain #34

Closed
sareaghaei opened this issue Mar 23, 2021 · 3 comments
Closed

How to spread the local evidence score using Markov chain #34

sareaghaei opened this issue Mar 23, 2021 · 3 comments

Comments

@sareaghaei
Copy link

Hi Antonin
Thanks for your talented work.I appreciate it if u can explain some parts of the paper for me.
1- As far as I understand, a vector of features F is computed for each entity e as its local compatibility.
The third feature of the vector is log p(e). What is it exactly? it has been mentioned that it is a log-linear combination of the number of statements, site likes and PageRrank but based on the code, it seems to be based on only the PageRank.
2- The output of semantic similarity step is a column-stochastic matrix Md. I can not get how to use the local compatibility vector and the similarity matrix Md in order to define the final feature vectors for the classification. Could u further clarify equation (1) of the paper?

@wetneb
Copy link
Member

wetneb commented Mar 24, 2021

Hi @sareaghaei,

Sure!

  1. Concerning the features for each mention, you can find them here:
    https://github.com/wetneb/opentapioca/blob/ad96f385b5eca424dbd431d6661984ca98928c94/opentapioca/classifier.py#L41-L47

  2. Here is a quick explanation of equation 1.

If you have a probability distribution over nodes of a Markov chain "p", you can get the probability distribution after following one edge in the graph by computing "T p" (where "T" is the transition matrix and p is represented as a vector). Similarly, if you want to get the distribution after k steps, you can compute "T^k p".
In our case we take the transition matrix to be equal to "alpha . I + (1 - alpha) M_d", which means that with probability alpha we stay on the same node and with probability 1 - alpha we follow an edge of the graph (with probability determined by the similarity measure). That should explain the "(alpha . I + (1 - alpha) M_d)^k" term.

Let me know if anything is still unclear!

@sareaghaei
Copy link
Author

sareaghaei commented Mar 24, 2021

Thanks for yr reply.
1- LC(d) in the page6-equation1 should be denoted the local compatibility, right? which part of the code computes the distribution after k steps?
2- Do u think adding another feature to the feature-vector as the similarity score of tag-description context and the text-input context can improve the accuracy to some extend?

@wetneb
Copy link
Member

wetneb commented Mar 24, 2021

1- LC(d) in the page6-equation1 should be denoted the local compatibility, right? which part of the code computes the distribution after k steps?

It is done here:

https://github.com/wetneb/opentapioca/blob/ad96f385b5eca424dbd431d6661984ca98928c94/opentapioca/classifier.py#L300-L303

2- Do u think adding another feature to the feature-vector as the similarity score of tag-description context and the text-input context can improve the accuracy to some extend?

Maybe! It's hard to say without trying :)

@wetneb wetneb mentioned this issue May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants