Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in the make_del_knob function? #5

Open
frankang opened this issue Sep 14, 2020 · 1 comment
Open

error in the make_del_knob function? #5

frankang opened this issue Sep 14, 2020 · 1 comment

Comments

@frankang
Copy link

In the make_del_knob function, when the size product (e_size * f_size) is smaller than the sample_size (20000 by default), the script ends up calculating the similarity score for all combinations of the src and tgt sentences, plus the remainder (20000 - e_size * f_size) . Is this behavior a mistake or an intended feature? It creates a biased histogram of the "real" distrubution by calculating multiple pairs on the 0:0 indexed sentences.

if e_size * f_size < sample_size:
    # dont sample, just compute full matrix
    sample_size = e_size * f_size
    x_idxs = np.zeros(sample_size, dtype=np.int32)
    y_idxs = np.zeros(sample_size, dtype=np.int32)
    c = 0
    for ii in range(e_size):
        for jj in range(f_size):
            x_idxs[c] = ii
            y_idxs[c] = jj
            c += 1
else:
    # get random samples
    x_idxs = np.random.choice(range(e_size), size=sample_size, replace=True).astype(np.int32)
    y_idxs = np.random.choice(range(f_size), size=sample_size, replace=True).astype(np.int32)

# output
random_scores = np.empty(sample_size, dtype=np.float32)

score_path(x_idxs, y_idxs,
           e_laser_norms, f_laser_norms,
           e_laser, f_laser,
           random_scores, )
@janisdd
Copy link

janisdd commented Apr 25, 2024

What do you mean by plus the remainder (20000 - e_size * f_size)? Which lines are you referring to?

The variable sample_size = e_size * f_size stores the correct size (in both cases).
If e_size * f_size < sample_size is true, sample_size is overwritten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants