Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java Examples #12

Merged
merged 7 commits into from Dec 18, 2016
Merged

Java Examples #12

merged 7 commits into from Dec 18, 2016

Conversation

mkash32
Copy link
Contributor

@mkash32 mkash32 commented Apr 5, 2016

No description provided.

@jaylett
Copy link
Contributor

jaylett commented Apr 5, 2016

@mkash32 these need to be called things like 'delete1.java' in order to work with the automatic system. Can you run these with a Java build of the docs (building the docs will compile and run the examples automatically), and modify as appropriate? The docs system actually has a test framework built into it, so by copying .out files from another language it should be possible to pull these into shape fairly quickly.

@mkash32
Copy link
Contributor Author

mkash32 commented Apr 6, 2016

I commented out line 309 of conf.py which exits when there are bad characters. If I run make with appropriate JAVA and JAVAC configured I am receiving the following errors:
Bad characters in $JAVAC
Bad characters in $JAVA

@jaylett
Copy link
Contributor

jaylett commented Apr 6, 2016

@mkash32 yes, we need to properly fix that sanitisation somehow, but that's a separate issue to getting the Java examples done. If you've commented out the exit, you can comment out the print line as well since it doesn't tell you anything interesting. Even with the print line left in, it'll now actually build and run the examples (and will tell you if their exit non-zero, or if their output doesn't match / if the expected out file is missing, which you can largely copy from the python versions).

@mkash32
Copy link
Contributor Author

mkash32 commented Apr 6, 2016

@jaylett I'm getting an exit status of 256 for each file. I will try compiling/running one of the files explicitly and see how it goes.

@jaylett
Copy link
Contributor

jaylett commented Apr 6, 2016

Are you getting that from the compile step, or the run step? Generally the build system prints out lots of useful information to help figure out what's going wrong.

@mkash32
Copy link
Contributor Author

mkash32 commented Apr 7, 2016

Oh sorry, I forgot to copy the .out files from python into the java folder so that was the main problem.
I didn't understand the build process. What is the purpose of the .out files?

@jaylett
Copy link
Contributor

jaylett commented Apr 7, 2016

The .out files give expected output of running the program (with a particular set of arguments). The rig will then compare the actual output with them, to verify that everything's okay. (It also complains about non-zero exit codes of either build or run steps, as you noticed.)

@ojwb
Copy link
Contributor

ojwb commented Apr 8, 2016

The basic idea is to ensure that the example code is valid and actually does what it is meant to - it's easy to have typos slip into code examples in documentation, and they make that documentation less useful to people.

@mkash32
Copy link
Contributor Author

mkash32 commented Apr 12, 2016

When I run the make command with my respective directory structure I am getting Error: Could not find or load main class delete1 for delete1 and similarly for the rest. I changed the first letter of each class file (and class declaration in the code) to lowercase.

@ojwb
Copy link
Contributor

ojwb commented Apr 12, 2016

You don't seem to have included the conf.py changes, so I'm not sure how to replicate the error.

@mkash32
Copy link
Contributor Author

mkash32 commented Apr 12, 2016

Oh sorry, I thought it wasn't needed.

{
if(args.length < 2)
{
System.out.println("Insufficient number of arguments (should be dbpath datapath)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, must be DATAPATH DBPATH as per the other examples. (Because these are all run automatically, they have to exactly the same arguments, in the same order.)

@jaylett
Copy link
Contributor

jaylett commented Apr 12, 2016

There are some whitespace issues (I haven't pointed out all of them), and a couple of functional changes which will be needed to make this work. It's possible that will throw up some minor other issues in the process.

In order to correctly build the documentation (rather than just build and run the code), you should put 'marker lines' around the important part of the code in each example (see the python ones for an idea). I think in Java you probably want:

// Start of example code.
...
// End of example code.

This is actually using the same system and lines 42 & 45 of search1.java, which you converted probably without thinking about it from the python. That section is used explicitly in the code example in practical_example/searching/prefix.rst. The default 'marker' detail is 'example code.' but as you can see there, it can be specified each time, meaning you can excerpt different parts of the same example at different points in the documentation.

Once you add the example code markers, you should be able to see the built documentation including the Java code (in _build/html).

@mkash32 mkash32 force-pushed the javaexamples branch 4 times, most recently from 11dfa20 to d2f9e8a Compare April 12, 2016 19:35
package code.java;
import java.util.ArrayList;

public class support {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each tab is 8 spaces, should be changed to 4.

@ojwb
Copy link
Contributor

ojwb commented Apr 18, 2016

The idea is that the examples are executed in the order they appear in the book.

I suspect they're instead getting processed in order of the filename of the RST files they're included from.

I wonder if sphinx has changed behaviour, or if this got broken at some point, or if it never worked (and we failed to noticed because it only ever got tested with the DB present from a previous run).

@jaylett
Copy link
Contributor

jaylett commented Apr 18, 2016

For me, the python examples are run in an order which would suggest that the RST files that contain them are parsed (and hence the examples run) in the following order:

howtos/boolean_filters.rst
howtos/facets.rst
howtos/range_queries.rst
howtos/sorting.rst
practical_example/indexing/updating_the_database.rst
practical_example/indexing/writing_the_code.rst
practical_example/searching/prefix.rst
practical_example/searching/running_the_search.rst

which matches @ojwb's suggestion: these are in alphabetical order. The reason it doesn't cause errors with python is that delete1 is run so late, because all the examples are there.

Complete list of python examples in the order they're run:

Running: python2 code/python/index_filters.py data/100-objects-v1.csv db
Running: python2 code/python/search_filters.py db clock 'steel (metal)'
Running: python2 code/python/search_filters2.py db 'clock material:"steel (metal)"'
Running: python2 code/python/index_facets.py data/100-objects-v1.csv db
Running: python2 code/python/search_facets.py db clock
Running: python2 code/python/index_ranges.py data/100-objects-v1.csv db
Running: python2 code/python/search_ranges.py db ..50mm
Running: python2 code/python/search_ranges.py db 1980..1989
Running: python2 code/python/search_ranges.py db clock 1960..
Running: python2 code/python/search_ranges.py db 1000..mm 1800..1899
Running: python2 code/python/search_ranges.py db 1000mm..
Running: python2 code/python/index_ranges2.py data/states.csv statesdb
Running: python2 code/python/search_ranges2.py statesdb spanish
Running: python2 code/python/search_ranges2.py statesdb 1800..1899
Running: python2 code/python/search_ranges2.py statesdb 11/08/1889..07/10/1890
Running: python2 code/python/search_ranges2.py statesdb 10000000..
Running: python2 code/python/search_ranges2.py statesdb 1780..1789 10000000..
Running: python2 code/python/index_ranges2.py data/states.csv statesdb
Running: python2 code/python/search_sorting.py statesdb spanish
Running: python2 code/python/search_sorting2.py statesdb State
Running: python2 code/python/index_values_with_geo.py data/states.csv statesdb
Running: python2 code/python/search_sorting3.py statesdb State
Running: python2 code/python/delete1.py db 1953-448 1985-438
Running: python2 code/python/index1.py data/100-objects-v1.csv db
Running: python2 code/python/search1.py db title:sunwatch
Running: python2 code/python/search1.py db description:\"leather case\" AND title:sundial
Running: python2 code/python/search1.py db watch
Running: python2 code/python/search1.py db Dent watch

@ojwb
Copy link
Contributor

ojwb commented Apr 18, 2016

I can't see a sphinx option to force it to process in output order, and it seems sphinx has support for only regenerating changed files, so I think it expects that the order of processing doesn't really matter (which is probably true for most documents).

Explicit marking of dependencies (e.g. delete1 says "must be run after index1") is nice because it's explicit, but it needs code to manage the graph and pick an order, and I fear it would be a bit of a maintenance headache.

The examples being run in the text order is very natural, and ensures that someone methodically working through the book can follow them, which seems desirable. So overall I like that best.

So I guess we find a way to rediscover the book order. Looks like we may be able to get it from the JSON output - start with _build/json/index.json and follow the next elements. I'll have a poke.

As for this patch, if you can run the examples by hand in the correct order then that should be OK. Once we sort out the order problem, we can check the tests work, and then merge this.

@ojwb
Copy link
Contributor

ojwb commented Apr 19, 2016

I have a script to generate that list and a patch to conf.py to use it, but I need to sanely fit this in. The key problem is that the script to generate the list parses the JSON output from sphinx, which means sphinx needs to run before the script, and in that run conf.py needs the list. Argh.

I guess we break the cycle with a dummy list there, or something.

Or else we find a way to build the list at the end of the sphinx run, just before we actually run the examples.

@mkash32
Copy link
Contributor Author

mkash32 commented Apr 19, 2016

@ojwb Sure, I will just run each by hand. As it is now, are there any other ordering issues, apart from delete1, that would affect the output of the examples (compared to expected output)? From the order jaylett gave above, it looks like there aren't any other issues.

I'm not so familiar with Sphinx but as an alternative to what was suggested above, we could possibly merge some examples which are dependent on each other into one example and call the "sub examples" based on the arguments we give. For example, we can bundle up index1, delete1, and search1 and make them functions within a main example. The sub examples can be run with "index", "delete", or "search" as an argument. In the case that delete is run before a db exists, we could make a call to index (this is assuming we don't have control over the order in which the same example with different arguments is run, if we do have control then we don't have to worry about this case).

@jaylett
Copy link
Contributor

jaylett commented Apr 19, 2016

Perhaps what we need is for each .rst to stand alone, and so have a non-output run example directive in the sphinx xapian domain. So immediately before calling delete1, we'd call the full index. It would also mean that when rebuilding only the files that have changed, everything will still work properly. Kind of a simplified dependency system; rather than trying to track dependency between the code, we do it explicitly in the usage.

@jaylett
Copy link
Contributor

jaylett commented Apr 19, 2016

@mkash32 you should be able to just run them and ignore the errors with delete1. Providing the output matches what (eg) the python equivalents do for each runexample then it should be fine.

@ojwb
Copy link
Contributor

ojwb commented Apr 19, 2016

we could possibly merge some examples which are dependent on each other into one example and call the "sub examples" based on the arguments we give

That would work, but it would make the example code harder to follow (it's OK in the book, as that just has snippets, but it makes the full source artificially more complex).

jaylett's "quiet example" idea seems quite appealing though, and supporting incremental generation of pages is a nice bonus. If we remove any generated databases at the end of each .rst processed, we can also ensure that there aren't any implicit dependencies. Most dependencies should already be within a single .rst file, so the overhead of the extra example runs shouldn't be large.

@ojwb
Copy link
Contributor

ojwb commented Apr 20, 2016

Implemented "quiet examples" in ee04c2f.

@mkash32 If you rebase on current git master, the tests of the Java example code ought to now pass, assuming your versions are correct.

Beware that currently failed examples still result in generated documentation - you need to make clean to fully retest. I'm working on this...

@mkash32
Copy link
Contributor Author

mkash32 commented Apr 20, 2016

Ok cool!
I'm in the middle of lab exams right now, I'll rebase and try it out tomorrow.

@mkash32
Copy link
Contributor Author

mkash32 commented Apr 20, 2016

@ojwb @jaylett I tried it out and this is the output I got: https://gist.github.com/mkash32/02738d9e08648b8093195a1a314e344e. (Ignore the diffs of the search1 example outputs, it still needs a little more work)

In lines 6-11 the same message is being repeated, earlier this message would only be shown once.
In line 76 I'm getting an error. This occurred when I was trying to build python examples earlier but it didn't come up while building the java examples until now.

@ojwb
Copy link
Contributor

ojwb commented Apr 21, 2016

You see the message repeated in lines 6-11 because we now run that example multiple times (so we don't rely on DB state from a previous .rst file.

The "KeyError" is bogus - just ignore it (unless you can work out how to stop it happening).

I think the problem with the search1 output diffs is you've only copied one of the expected output files for search1 - it'll produce different output for different command line options, so there's a different expected output for each case we show.

@mkash32
Copy link
Contributor Author

mkash32 commented Apr 21, 2016

@ojwb I included all of the output files for search1. The output is still a little off, I think the problem is from the parsing of the csv. I'll take a look at the csv parsing function again and make sure that the input file is being parsed properly. Apart from that, I also have to print the "INFO" line about the search query (INFO:xapian.search:'watch'[0:10] = 4 18 13 33 15 36 46)

@@ -307,7 +307,7 @@ def get_tool_name(envvar, default):
if re.search(r'[^-/_+.A-Za-z0-9]', tool):
# Or we could actually escape it...
print("Bad characters in $%s" % envvar)
sys.exit(1)
# sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current regexp is definitely overly conservative - what characters are causing you problems here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be the hyphens when we we are setting -classpath and -Djava.library.path. There isn't an issue now since we are setting the JAVA_BINDINGS_DIR and generating the classpath and library path from that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- is in there - the leading ^ means "not one of..." and the - after than is literal, not indicating a range.

If you were passing options, the "bad character" was probably a space - the current check assumes there aren't options included with the tool (which is probably flawed).

Copy link
Contributor Author

@mkash32 mkash32 Apr 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still a need for options with java and javac? If there isn't then we can un-comment the exit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your classpath and java_library_path handling means there isn't, so we can.

@jaylett
Copy link
Contributor

jaylett commented Apr 22, 2016

@mkash32 I think the remaining error that isn't formatting is because you aren't constructing the query string properly, not because you aren't parsing the CSV correctly.

https://trac.xapian.org/ticket/716

Converting Python examples to Java

Examples converted:
* index1.py
* search1.py
* delete1.py

Created support.java for parsing csv

ToDo:
* index_ranges.py
* index_filters.py
* index_sorting.py
* search_filters.py
* search_ranges.py
* search_sorting.py
Modified parseCsvLine function in support.java to ignore quotes of the csv values while parsing.
Also when double quotes "" are present within a quoted csv value, those are changed to single quotes in the output.
Modified delete1.java
* Split arg array before passing to the delete method
Modified index1.java
* Fixed Spacing and typos in comments
The build command will now additionaly take JAVA_BINDINGS_DIR as an argument. JAVA_BINDINGS_DIR is the path of the built directory of the java bindings.

* Package names removed from java files
* JAVA_BINDINGS_DIR argument added. Classpath and java.library.path will be built from the JAVA_BINDINGS_DIR
* JAVA and JAVAC should be used when non-default java and javac are needed
Ex: make html SPHINXOPTS=-tjava JAVA_BINDINGS_DIR=<path-to-bindings-built-directory>/built/
@jaylett
Copy link
Contributor

jaylett commented May 4, 2016

@mkash32 please don't force push your commits while there are outstanding review comments, because it makes it harder to see how they relate and basically requires that someone review all the changes again from scratch. Instead, add fixup commits (using git commit --fixup <sha> to fixup a previous commit specified by <sha>) that can be reviewed as dealing with earlier comments, and then once everything is good you can squash and rebase onto master immediately before merge.

@ojwb ojwb merged commit 91f8cf7 into xapian:master Dec 18, 2016
@ojwb
Copy link
Contributor

ojwb commented Dec 18, 2016

Sorry for losing track of this - it looks like it's ready to merge, aside from uncommenting sys.exit(1), which I'll just do as a follow-on commit.

@mkash32 mkash32 deleted the javaexamples branch December 19, 2016 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants