A question about negative samples generation in preprocessing.py #70

huxiaoti · 2021-03-24T12:13:54Z

Hi Thomas,

I'm confused when you generate the negative edge labels of validation set as:

val_edges_false = []
    while len(val_edges_false) < len(val_edges):
        idx_i = np.random.randint(0, adj.shape[0])
        idx_j = np.random.randint(0, adj.shape[0])
        if idx_i == idx_j:
            continue
        if ismember([idx_i, idx_j], train_edges):
            continue
        if ismember([idx_j, idx_i], train_edges):
            continue
        if ismember([idx_i, idx_j], val_edges):
            continue
        if ismember([idx_j, idx_i], val_edges):
            continue
        if val_edges_false:
            if ismember([idx_j, idx_i], np.array(val_edges_false)):
                continue
            if ismember([idx_i, idx_j], np.array(val_edges_false)):
                continue
        val_edges_false.append([idx_i, idx_j])

However, the test negative set is confirmed by

if ismember([idx_i, idx_j], edges_all):
           continue

Why does validation set use ismember([idx_j, idx_i], train_edges) and ismember([idx_i, idx_j], val_edges) instead of ismember([idx_i, idx_j], edges_all)?

Wu Shiauthie

The text was updated successfully, but these errors were encountered:

gonzalesMK · 2021-10-19T15:11:02Z

Hi, I had the same issue.

I gave it some thought, and I realized that the negative validation/training samples should be able to sample from the test's samples, otherwise the algorithm would have an edge over the test samples.

In other words, edges in the test set can be sampled as negative examples in the validation/training sets (this could happen in a real world scenario).

So, this explain why ismember is segregated in train_edges and val_edges. However, there is this line:

assert ~ismember(val_edges_false, edges_all)

Which I don't understand the purpose of.

lif323 · 2021-12-14T03:34:27Z

I understand why assert error appears sometimes when running the program. This is because val_edge_false may appear in edges_all.

File "train.py", line 47, in <module>
    adj_train, train_edges, val_edges, val_edges_false, test_edges, test_edges_false = mask_test_edges(adj)
  File "/home/lf/work/gae/gae/preprocessing.py", line 100, in mask_test_edges
    assert ~ismember(val_edges_false, edges_all)
AssertionError

lif323 · 2021-12-14T03:45:29Z

Hi,
I think a program without assert error, that is, the correct code, is equivalent to the following code:

val_edges_false = []
    while len(val_edges_false) < len(val_edges):
        idx_i = np.random.randint(0, adj.shape[0])
        idx_j = np.random.randint(0, adj.shape[0])
        if idx_i == idx_j:
            continue
        if ismember([idx_j, idx_i], edges_all):
            continue
        if val_edges_false:
            if ismember([idx_j, idx_i], np.array(val_edges_false)):
                continue
            if ismember([idx_i, idx_j], np.array(val_edges_false)):
                continue
        val_edges_false.append([idx_i, idx_j])

sheenahora · 2022-02-10T17:59:41Z

Hello, I am having the same issue.
assert ~ismember(val_edges_false, edges_all)
AssertionError
Did anyone find the solution? Kindly help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about negative samples generation in preprocessing.py #70

A question about negative samples generation in preprocessing.py #70

huxiaoti commented Mar 24, 2021

gonzalesMK commented Oct 19, 2021

lif323 commented Dec 14, 2021

lif323 commented Dec 14, 2021

sheenahora commented Feb 10, 2022

A question about negative samples generation in preprocessing.py #70

A question about negative samples generation in preprocessing.py #70

Comments

huxiaoti commented Mar 24, 2021

gonzalesMK commented Oct 19, 2021

lif323 commented Dec 14, 2021

lif323 commented Dec 14, 2021

sheenahora commented Feb 10, 2022