Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Exception in generate_sample with 0 as sample size (#754). #191

Closed
wants to merge 2 commits into from
Closed

Fix Exception in generate_sample with 0 as sample size (#754). #191

wants to merge 2 commits into from

Conversation

addy007-icy
Copy link
Contributor

Setting output to an empty string to avoid negative index reference.
Ticket link: https://trac.xapian.org/ticket/754

@ojwb
Copy link
Contributor

ojwb commented Mar 13, 2018

Don't keep opening and closing PRs for the same fix - it fractures the discussion and makes it much harder for reviewers to follow what's going on.

I'm going to reopen this one, since it's the first. Just push to the branch the PR was made from to add more changes (i.e. your master in this case, though it's better practice to create a new branch for each PR so that you can handle concurrent in-progress PRs).

@ojwb ojwb reopened this Mar 13, 2018
@ojwb
Copy link
Contributor

ojwb commented Mar 13, 2018

Also, are you OK with the patch licensing requirements documented here:

https://trac.xapian.org/browser/git/xapian-core/HACKING#L1376

@addy007-icy addy007-icy changed the title Handling negative indices when maxlen <= ind.size() Fix Exception in generate_sample with 0 as sample size (#754). Mar 13, 2018
Copy link
Contributor

@ojwb ojwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as the comment here, a bug fix really should add a regression testcase that fails before the change and passes after it.

Firstly this helps to avoid the common problem where a patch claiming to fix a bug doesn't actually fix it due to a bug in the patch.

Secondly it helps to prevent the bug reappearing at a later date when somebody changes or even reimplements this function. One person got this wrong when they wrote this code and someone else could make the same mistake.

Tests for omega go in the omegatest script.

// Monster word! We'll have to just split it.
output.replace(maxlen - ind.size(), string::npos, ind);
}
} else {
output.replace(last_word_end, string::npos, ind2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your indentation here is screwy - it looks like the whole thing is indented an extra level compared to before, and some lines within the new code are indented an additional extra level.

You may need to make sure your editor is set to display a tab character as moving to the next multiple of 8 column (that's often the default, but some editors seem to default to 4 instead).

// coll_freq, and the first wdf value, which more often than not is
// actually the exact bound (in 77% of cases in an example database of
// wikipedia data).
wdf_bound = min(wdf_bound, postlist_table.get_wdf_upper_bound(term));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've seem to have a commit from master on your branch too, which means those changes are appearing in the diff making it harder to review, and also mean that github gets a conflict when it tries to merge master with the branch to CI test it.

I'm not sure how you managed to get that commit there, but whatever you did wasn't quite right.

I think in this case, it's simplest to fix this by rewriting the history of the branch to drop this extra commit. Generally you shouldn't rewrite history of a branch under review because it makes it harder to follow what's changed, but in this case it seems the least bad answer.

You should be able to rebase your branch interactively onto master like so:

git rebase -i master

That'll open an editor with the list of rebased commits - I'd expect the extra commit to automatically drop out as it's present on current master already, but if not just delete the line for it and save the file. Exit the editor and git will rewrite the branch for you.

You can need to force push it to the PR so the remote will accept the rewritten history (to re-iterate, this isn't something you should usually be doing):

git push --force

Addy007-icy added 2 commits March 19, 2018 00:51
Avoided negative reference of index by returning an negative string instead.
@addy007-icy
Copy link
Contributor Author

addy007-icy commented Mar 19, 2018

Added a commit, adding the edge-testcase, where the existing code fails, (the travis build check failed),
Then patched the file to handle the edge case.

@ojwb
Copy link
Contributor

ojwb commented Mar 21, 2018

The code changes look good, and it's smart to commit the test case separately and first to clearly show it fails before the fix and passes after (a sadly rather common pitfall is that the regression test doesn't fail without the fix).

The commit messages could be better (see
https://xapian-developer-guide.readthedocs.io/en/latest/contributing/workflow.html#good-commit-messages for some additional guidance on this):

Added testcase giving exception of existing code

It's not really clear what this is trying to say to someone looking at this later without the context we have right here and now (and that's a common situation when maintaining code - for example, you reproduce a reported bug, use git bisect to find when it was introduced based on when the reproducer started to fail, and that gives you a commit hash. So a commit message really ought to give you pointers to useful context - in this case the bug report).

You talk about "negative" in the context of string indexing but std::string::size_type is unsigned, so can't be negative (it's actually wrapping round to a very large number).

You also say "returning an negative string instead" but a "negative string" isn't a thing. You actually make it return an empty string instead.

The summary line (first line) of the commit message should ideally be at most 50 characters (you're second commit's summary is 65). It's precious space and really just needs to summarise the commit concisely (git log --oneline and GUI tools like gitk will show just this summary line. Putting the bug number there (especially first) is not a good use of this precious space. It is useful to include, but better put in the more detailed message.

Also (as noted in the developer guide at the URL above the convention is to use the imperative tense (so "Add" not "Added").

We also aim to credit bug reporters - we like people to report bugs, and a brief acknowledgement helps them feel appreciated.

Taking these points together, I'd have written something more like:

Add testcase to reproduce exception in generate_sample()

See https://trac.xapian.org/ticket/754

(My summary line above is 56 which is a little longer than ideal, but 50 isn't a hard limit - having something clear is important, especially if it still makes sense if truncated).

And:

Fix exception in generate_sample()

generate_sample() was causing an exception if the requested sample size
was less than the size of the truncation indicator string.  We now return
an empty sample in this case instead.

Fixes https://trac.xapian.org/ticket/754 reported by Gaurav Arora.

Gaurav's bug report was for the case where a 0 size sample was requested, but the bug was actually a bit wider than that (and your fix correctly addresses its full impact) so it's better to document the actual bug rather than just the reported case.

Don't worry about trying to rewrite the commit messages for this change, but please look through my comments above and take a bit more care writing good commit messages.

There's one piece of admin to take care of before I can merge this (this is actually in the proposal template, but you omitted it from your current draft proposal):

Do you agree to dual-license all your contributions to Xapian under the GNU GPL version 2 and all later versions, and the MIT/X licence?

For the avoidance of doubt this includes all contributions to our wiki, mailing lists and documentation, including anything you write in your project's wiki pages.

(For more details, including the rationale for this with respect to code, please see the "Licensing of patches" section in the "HACKING" document: https://trac.xapian.org/browser/git/xapian-core/HACKING#L1376)

@addy007-icy
Copy link
Contributor Author

I agree to dual-license all my contributions to Xapian under the GNU GPL version 2 and all later versions, and the MIT/X licence. I don’t have any objections regarding this.

Yeah thanks! I think I mixed up some words while writing.
Yeah! Those commit statements look neater and better!
Also, took care of the liscence in my current gsoc propsal, sorry about that too.

@ojwb
Copy link
Contributor

ojwb commented Apr 2, 2018

Thanks, merged as d21a213.

@ojwb ojwb closed this Apr 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants