Web App Help

Tim P edited this page Jun 20, 2013 · 4 revisions
Clone this wiki locally

This is the help page for the Public Demo of the Glimmer Web App. At the time of writing the demo is servering indexes build over data from the Web Data Commons Project. The tuples have been filtered so the indexes only include schema.org types.

Glimmer uses MG4J. For complete documentation on the query syntax see the MG4J Query documentation. Here we give some example queries in the context of the public demo and also explain how to access the data with simple HTTP GET requests to Glimmer's ajax service. To get a better understanding of how the data is indexed and ranked please see the following papers:

We have a Yahoo! Groups group you can join for discussion and support: Yahoo! Glimmer Support Group

Some Example Queries

name:(barack obama) birthdate:"08 04 1961"

Find all entities with the property 'name' containing 'barack' and 'obama', and the property 'birthdate' containing the phrase "08 04 1961". The actual tuples this will match on will have predicates of http://schema.org/name and http://schema.org/birthdate with given values contained in the respective objects.

type:{http://schema.org/Person} birthdate:"08 04 1961"

Find all entities with the given values for their birthdate and that have a http://www.w3.org/1999/02/22-rdf-syntax-ns#type predicate of http://schema.org/Person. The Person Resource URL surrounded by '{' and '}' is pre-processed by the web app before the query is passed to MG4J. The pre-processing replaces all resource URLs and BNodes in '{ }' with their internal unique numeric identifier assigned during the indexing process.

The most common predicate types can be queried in the form <predicate>:<value in object> as they are backed by their own MG4J index. For less common predicates this isn't the case, however all predicates are indexed in the predicate index.

predicate:{http://schema.org/deathdate}

In addition to the predicate index. Glimmer also has the object and optionally context indices.

predicate:{http://schema.org/deathdate} object:1888

The predicate, object and context indices are 'parallel' indexes in the MG4J sense. This allows positional queries. Note the difference between the results from the above query and this:

predicate:{http://schema.org/deathdate} ^ object:1888

Adding the 'alignment operator' ^ asserts that it's operands are matched at the same position(think tokens) in the source document.

Query Limitations

The < query operator isn't supported in the public demo. If you deploy your own Glimmer web app < should work as documented in the MG4J guide.

Currently Glimmer doesn't have the ability do range queries on values. This would require that a scalar value be enforced on the given type. For example parsing of dates and indexing them as a scalar value type. The problem here is the diversity of value representations used on the web.

The public demo has a request rate limiting. If you make too many request to quickly your IP address will be temporarily blocked. The current limit is around one request per second for more than an hour. Contact us if you need to run large experiments on the dataset.

Accessing the Data from Code

If you're using Java. You are in luck, so are we! There is a very simple example for querying and parsing of results in Java in the project's git repository:WebRequestDemo. Assuming you have the .jar dependencies and class files in your current directory, you can run it as follows:

java -cp gson-2.1.jar:commons-httpclient-3.0.1.jar:commons-logging-1.1.1.jar:commons-codec-1.5.jar:. com.yahoo.glimmer.web.WebRequestDemo "bill bailey"

If you want to use another language to access the service. Simply do an HTTP GET request to the ajax service url http://glimmer.research.yahoo.com/ajax/query and you'll get a JSON response. The service takes the following parameters:

  • index The index(dataset) to query. See ajax/dataSetList for a list of datasets on the server
  • query The MG4J query string as discussed above.
  • deref 'true'|'false' - If the object of a relation is a URL or BNode, look it up and use it's name/label in place of the URL or BNode.
  • pageStart
  • pageSize

Additionally, each dataset(index) also has some metadata/stats associated with it. You can retrieve the stats by making a request to the ajax service url http://glimmer.research.yahoo.com/ajax/indexStatistics. This service only takes one parameter index.

Final Note

Please be considerate about the number of requests you make to the demo server. It's only got limited resources and is not intended to handle large amounts of requests. If you need to make more request than it can handle, consider setting up your own server. You can contact us about getting the indexed data if you don't have access to a Hadoop cluster to generate it yourself.