-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training set and accuracy calculation #26
Comments
Yes, this is indeed the case. The training setup is based on a real-world
use case where one might want to predict missing links in a corrupted or
incomplete graph. At training time, nothing is known about later test
edges, i.e. we treat all missing edges at training time as negative
examples (which will unavoidably contain a small number of false negatives,
but as long as the false/positive ratio is small, this is not a big issue)
-- this is related to the open world assumption in statistical relational
learning.
…On Wed, Mar 13, 2019 at 7:59 PM Hongwei Jin ***@***.***> wrote:
Hi @tkipf <https://github.com/tkipf>
When I review the code, the training set you generated is a subgraph of
the original graph, while in model when calculating the loss, the function
weighted_cross_entropy_with_logits compares the pred_scores and the
subgraph adjacency matrix.
The ones in the subgraph adjacency matrix represent the train_edge, while
the zeros have both *val_edges_false* and *test_edge_false* as the
trainings. Is that true?
If yes, what I think the loss should be calculated by sampling the
adj_matrix with train_edges and train_edges_false, which represent both
1s and 0s.
Thus, what I think is to split the edges into
E_G = E_train + E_val + E_test
and split the non-existed edges into
\bar{E}_G = \bar{E}_train + \bar{E}_val + \bar{E}_test
Also, since the reconstructed matrix is not a matrix with values from [0,
1].
I think there is a problem when applying the sigmoid function to
calculate the accuracy. Most of entries in the reconstruct matrix are
positive, and it could leads to a bias calculation. Is that correct?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHAcYKOVtZ-7H0eF3VN9BZqxJp7Zx2Slks5vWUqsgaJpZM4bv6u->
.
|
The sigmoid function is not an issue, but accuracy is not a good metric in
this case as the classes are heavily unbalanced. It is better to report F1
score, average precision or area under the ROC curve.
…On Fri, Mar 15, 2019 at 11:36 AM Thomas Kipf ***@***.***> wrote:
Yes, this is indeed the case. The training setup is based on a real-world
use case where one might want to predict missing links in a corrupted or
incomplete graph. At training time, nothing is known about later test
edges, i.e. we treat all missing edges at training time as negative
examples (which will unavoidably contain a small number of false negatives,
but as long as the false/positive ratio is small, this is not a big issue)
-- this is related to the open world assumption in statistical relational
learning.
On Wed, Mar 13, 2019 at 7:59 PM Hongwei Jin ***@***.***>
wrote:
> Hi @tkipf <https://github.com/tkipf>
> When I review the code, the training set you generated is a subgraph of
> the original graph, while in model when calculating the loss, the function
> weighted_cross_entropy_with_logits compares the pred_scores and the
> subgraph adjacency matrix.
>
> The ones in the subgraph adjacency matrix represent the train_edge, while
> the zeros have both *val_edges_false* and *test_edge_false* as the
> trainings. Is that true?
>
> If yes, what I think the loss should be calculated by sampling the
> adj_matrix with train_edges and train_edges_false, which represent both
> 1s and 0s.
>
> Thus, what I think is to split the edges into
> E_G = E_train + E_val + E_test
> and split the non-existed edges into
> \bar{E}_G = \bar{E}_train + \bar{E}_val + \bar{E}_test
>
> Also, since the reconstructed matrix is not a matrix with values from [0,
> 1].
> I think there is a problem when applying the sigmoid function to
> calculate the accuracy. Most of entries in the reconstruct matrix are
> positive, and it could leads to a bias calculation. Is that correct?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#26>, or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AHAcYKOVtZ-7H0eF3VN9BZqxJp7Zx2Slks5vWUqsgaJpZM4bv6u->
> .
>
|
Will it be possible to predict the link based on the node embedding |
Yes, this the main idea behind these two papers:
https://arxiv.org/abs/1703.06103, https://arxiv.org/abs/1706.02263
…On Fri, Mar 15, 2019 at 5:20 PM Hongwei Jin ***@***.***> wrote:
Will it be possible to predict the link based on the node embedding Z_i,
Z_j using the metric learning?
Where in the case that we have learned the node embedding from the
subgraph, and learn a metric to predict the existence of a link, say f(Z_i,
Z_j) = Z_i' W Z_j
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHAcYEPewol6EWMgD-LYRdIabCSkR1JFks5vW8g9gaJpZM4bv6u->
.
|
Got it, thanks for your information. |
Why can't we just try to generate the graph from scratch via simulation. I would like to use these methods to create an ensemble of different graphs that are close to the original graph rather than complete missing edges. Mainly to compare the statistical properties and solve degeneracy problems with ERGMs. |
Hi @tkipf
When I review the code, the training set you generated is a subgraph of the original graph, while in model when calculating the loss, the function
weighted_cross_entropy_with_logits
compares the pred_scores and the subgraph adjacency matrix.The ones in the subgraph adjacency matrix represent the train_edge, while the zeros have both val_edges_false and test_edge_false as the trainings. Is that true?
If yes, what I think the loss should be calculated by sampling the adj_matrix with
train_edges
andtrain_edges_false
, which represent both 1s and 0s.Thus, what I think is to split the edges into
E_G = E_train + E_val + E_test
and split the non-existed edges into
\bar{E}_G = \bar{E}_train + \bar{E}_val + \bar{E}_test
Also, since the reconstructed matrix is not a matrix with values from
[0, 1]
.I think there is a problem when applying the
sigmoid
function to calculate the accuracy. Most of entries in the reconstruct matrix are positive, and it could leads to a bias calculation. Is that correct?The text was updated successfully, but these errors were encountered: