Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about negative samples generation in preprocessing.py #70

Open
huxiaoti opened this issue Mar 24, 2021 · 4 comments
Open

A question about negative samples generation in preprocessing.py #70

huxiaoti opened this issue Mar 24, 2021 · 4 comments

Comments

@huxiaoti
Copy link

Hi Thomas,

I'm confused when you generate the negative edge labels of validation set as:

val_edges_false = []
    while len(val_edges_false) < len(val_edges):
        idx_i = np.random.randint(0, adj.shape[0])
        idx_j = np.random.randint(0, adj.shape[0])
        if idx_i == idx_j:
            continue
        if ismember([idx_i, idx_j], train_edges):
            continue
        if ismember([idx_j, idx_i], train_edges):
            continue
        if ismember([idx_i, idx_j], val_edges):
            continue
        if ismember([idx_j, idx_i], val_edges):
            continue
        if val_edges_false:
            if ismember([idx_j, idx_i], np.array(val_edges_false)):
                continue
            if ismember([idx_i, idx_j], np.array(val_edges_false)):
                continue
        val_edges_false.append([idx_i, idx_j])

However, the test negative set is confirmed by

if ismember([idx_i, idx_j], edges_all):
           continue

Why does validation set use ismember([idx_j, idx_i], train_edges) and ismember([idx_i, idx_j], val_edges) instead of ismember([idx_i, idx_j], edges_all)?

Wu Shiauthie

@gonzalesMK
Copy link

Hi, I had the same issue.

I gave it some thought, and I realized that the negative validation/training samples should be able to sample from the test's samples, otherwise the algorithm would have an edge over the test samples.

In other words, edges in the test set can be sampled as negative examples in the validation/training sets (this could happen in a real world scenario).

So, this explain why ismember is segregated in train_edges and val_edges. However, there is this line:

assert ~ismember(val_edges_false, edges_all)

Which I don't understand the purpose of.

@lif323
Copy link

lif323 commented Dec 14, 2021

I understand why assert error appears sometimes when running the program. This is because val_edge_false may appear in edges_all.

File "train.py", line 47, in <module>
    adj_train, train_edges, val_edges, val_edges_false, test_edges, test_edges_false = mask_test_edges(adj)
  File "/home/lf/work/gae/gae/preprocessing.py", line 100, in mask_test_edges
    assert ~ismember(val_edges_false, edges_all)
AssertionError

@lif323
Copy link

lif323 commented Dec 14, 2021

Hi,
I think a program without assert error, that is, the correct code, is equivalent to the following code:

val_edges_false = []
    while len(val_edges_false) < len(val_edges):
        idx_i = np.random.randint(0, adj.shape[0])
        idx_j = np.random.randint(0, adj.shape[0])
        if idx_i == idx_j:
            continue
        if ismember([idx_j, idx_i], edges_all):
            continue
        if val_edges_false:
            if ismember([idx_j, idx_i], np.array(val_edges_false)):
                continue
            if ismember([idx_i, idx_j], np.array(val_edges_false)):
                continue
        val_edges_false.append([idx_i, idx_j])

@sheenahora
Copy link

Hello, I am having the same issue.
assert ~ismember(val_edges_false, edges_all)
AssertionError
Did anyone find the solution? Kindly help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants