Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wondering if the simple query string query is set to search in all fields or in content field. #39

Closed
JosedeKruif opened this issue Aug 28, 2017 · 6 comments
Labels
question issues that require more information or discussion - not ready for development

Comments

@JosedeKruif
Copy link
Collaborator

Default Field
edit

When not explicitly specifying the field to search on in the query string syntax, the index.query.default_field will be used to derive which field to search on. It defaults to _all field.

If the _all field is disabled and no fields are specified in the request`, the simple_query_string query will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields.

@JosedeKruif JosedeKruif added the question issues that require more information or discussion - not ready for development label Aug 28, 2017
@jgonggrijp
Copy link
Member

It is certainly not set to search in the content field (not every corpus may have a field with that exact name, so naturally, the application does not make assumptions about this). As far as I can tell, the _all field is not disabled anywhere in our code, either. Does this sufficiently answer the question?

@BeritJanssen
Copy link
Member

BeritJanssen commented Sep 14, 2017

I see now that the regular query string query actually seems to have features like the ones described in José's manual, i.e. the user can specify which fields to search by typing "field1:this field2:that" (searches for 'this' in field1 OR 'that' in field2).
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-query-string-query.html

@JosedeKruif
Copy link
Collaborator Author

O.k. That means I might have taken the wrong decision when choosing simple query string query. Should this be changed? Or should some code be written to enable users to search in specified fields? Or, I think Berit mentioned this possibility: should we offer a button in the user interface to enable users to search in a specified field? (means writing extra code, right?

@jgonggrijp
Copy link
Member

Part of the confusion may come from the fact that the old search field actually suggested that the user enter field:term syntax. I don't know whether the new Angularized frontend is still doing that. In any case, it made me believe that the field:term syntax was available, too, until I read the simple query string documentation.

As to what to do, I think this depends on what use cases we believe need to be supported and which ones not. Searching the whole query in all fields at the same time seems like an important use case to me, which most users will find convenient. At least Ortal-Paz Saar was very positive about this way of searching in a corpus. In fact, she said that she thought that if searching was in all fields, she probably didn't need much other functionality. (@JelmerVNuss do you recall the same?)

A step upwards in terms of user control would be to apply the whole query only to one field or a subset of the fields. This is what would be achieved by providing a <select> dropdown or a set of checkboxes in the search panel, as Berit suggested. Yes, this would amount to writing extra code, though not necessarily a lot. I am not sure that users actually need this, but maybe they do.

A second step upwards would be to enable users to search some parts of the query in one field and other parts in another field. This is what the field:term format in the non-simple query string format provides. I doubt that this is actually in the interest of the end user; while it theoretically gives them more control, in practice I suspect that it mostly gives them more ways to shoot themselves in the foot. At least, this is what we saw with Texcavator. While researchers may in principle be intelligent enough to use the additional control, Daniel Kahneman predicts that most of the time they will opt out (unconsciously) of applying that intelligence.

So I think these are the options:

  1. Accept that searching is always in all fields, just leave the field:term syntax out of the documentation.
  2. Allow users to restrict the search to a set of fields, provide user-friendly checkboxes to select the set of fields.
  3. Switch to non-simple query strings, accept that most users probably either don't use it or use it incorrectly. I'm somewhat against this last option because of KISS.

@JosedeKruif
Copy link
Collaborator Author

Options:

  1. Leaving field term out of documentation is not wise. Ortal-Paz is not a very common user, her texts are like short sentences. One might want to know what newspapers are writing in their article titles for instance. You would want to search in title: only.
  2. Might be the best option. Offer flexibility and prevent confusion at the same time.
  3. Kahneman is right..........

@BeritJanssen
Copy link
Member

This issue is closed per #133. Julian implemented option 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question issues that require more information or discussion - not ready for development
Projects
None yet
Development

No branches or pull requests

3 participants