Skip to content
This repository has been archived by the owner on Nov 10, 2022. It is now read-only.

Implement "type_strict" #4

Open
wetneb opened this issue Feb 6, 2017 · 7 comments
Open

Implement "type_strict" #4

wetneb opened this issue Feb 6, 2017 · 7 comments

Comments

@wetneb
Copy link
Owner

wetneb commented Feb 6, 2017

@thadguidry: can you point me to the description of what each value of "type_strict" should do?
I can see that it expects "any", "all" or "should", but I'm not sure what they mean. I think the interface currently implements the "any" mode: items have to match at least one type provided (so, the union). I guess "all" means the intersection of the provided types. About "should", I suppose it becomes a soft constraint? How is it specified?

@thadguidry
Copy link
Contributor

Hmm, Its not used on OpenReconcile...
https://code.google.com/archive/p/open-reconcile/source/default/source

\source-archive\open-reconcile\trunk\reconcile\src\com\googlecode\openreconcile\server\Query.java

	/**
	 * returns the type strictness the query
	 * 	  
	 @return the string value for if the type is to be strictly enforced or no (note this is not used)
	 */
	public String getTypeStrict(){
		return type_strict;
	}

\source-archive\open-reconcile\trunk\reconcile\src\com\googlecode\openreconcile\server\ReconcileMatching.java

I've reached out to David Huynh (original Freebase Reconcile server author) to find out for sure how or if he used to handle that "type_strict" parameter.

@wetneb
Copy link
Owner Author

wetneb commented Sep 18, 2017

We don't have any idea what this is, so let's just forget about it.

@wetneb
Copy link
Owner Author

wetneb commented Jan 7, 2018

This would actually be useful for #29.

@wetneb wetneb reopened this Jan 7, 2018
@tfmorris
Copy link

A lot of what was used in the Freebase reconciliation service derives from capabilities which were available in the Freebase Search API and Freebase Suggest, but the Freebase Search API was always woefully underdocumented with many more capabilities in the code than the docs.

This isn't in the seach API docs, but I don't remember if it was in the search API or something implemented separately by Refine.

In any case, the description in the issue is correct, I believe, except that "intersection" isn't how I think of the "all" case. Given a list of types, candidates for "all" are required to have every type on the list, whereas "any" means that only a single type from the list is needed. As you guessed, "should" uses the types for scoring, but doesn't do a hard filter on them.

@thadguidry
Copy link
Contributor

thadguidry commented May 13, 2020

@tfmorris Sorry, I should have updated this issue after I found the info 3 years ago. Actually, I did eventually remember that Andi Vajda and I documented on the Freebase Wiki very well and had to use my email search to find what it was called, "filter constraints".... that eventually got translated into the official API docs... It's combining behavior is documented here (scroll down just a bit):
https://developers.google.com/freebase/v1/search-overview#advanced-filtering

The Search API supports a large number of filter constraints to better aim the search at the correct entities.

For example, using a "type" filter constraint, we can show a list of the most notable people in Freebase.

filter=(any type:/people/person)

Filter constraints accept a variety of inputs:

  • Human readable IDs for schema entities or users, for example:
    • /people/person for a type constraint
    • /film for a domain constraint
  • Freebase MIDs, for example:
    • /m/04kr for the same /people/person type constraint
    • /m/010s for the above /film domain constraint
  • Entity names, for example:
    • "person" for a less precise /people/person type constraint
    • "film" for a less precise /film domain constraint

Filter constraints can be classified into a few categories. See the Search Cookbook for more details.

Filter constraints can be freely combined and repeated in the SearchRequest directly. Repeated filter constraint parameters are combined into an OR query. Different filter constraint parameters or groups are combined into an AND query.

For example:

To search for "people or cities named Gore", try:

query=gore
&filter=(any type:/people/person type:/location/citytown)

This combining behavior can be overriden and better controlled with the filter parameter which offers a richer interface to combining constraints. It is an s-expression, possibly arbitrarily nested, where the operator is one of:

  • any, logically an OR
  • all, logically an AND
  • not
  • should, which can only be used at the top level and which denotes that the constraint is optional. During scoring, matches that don't match optional constraints have their score divided in half for each optional constraint they don't match.

For example:

To match on the /people/person type or the /film domain, try:

query=gore
&filter=(any type:/people/person domain:/film)

@diegodlh
Copy link
Contributor

This would be very useful for the Cita add-on for Zotero, which uses openrefine-wikibase to fetch matching QIDs from Wikidata for bibliographic works.

Strict and flexible type matching are both important in Cita: if matching is too strict, items classified with the wrong type in Zotero may return no matches, and users may create duplicates; if matching is too flexible, QIDs returned may correspond to items with the same name but of other type (diegodlh/zotero-cita/issues/101), and users may end up working on the wrong item.

As a workaround, I will implement two calls to openrefine-wikibase: one for a specific item type, and a second one for a more general item type; and treat any matches from this second call as partial matches.

@wetneb
Copy link
Owner Author

wetneb commented Jun 18, 2021

@diegodlh good to know! There are various proposals in this issue about what this parameter should do. It would be super useful if you could describe precisely how this should work for it to be useful for your use case.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants