Skip to content
This repository

Support for string type in indexed fields on Solr #311

Open
dedsm opened this Issue January 04, 2011 · 6 comments

6 participants

dedsm Daniel Lindsley Simon Willison Molanda picturedots Tim Keefer
dedsm

It would be nice to have indexable fields of type string in solr, right now if you want to have a string field, it cannot be indexed or it has to be faceted.

Daniel Lindsley
Owner

Can you describe the use case here? I'm not sure how just strings are useful. The only use I'm aware of would be matching just exact words, which you can accomplish by making the field faceted & searching in the shadowed facet field (<fieldname>_exact).

Simon Willison
simonw commented June 18, 2011

I just ran in to this problem - I want to order alphabetically by a specific field, but Solr can't order by fields that have been tokenized. I'll have to do the fieldname_exact hack, which is frustrating because my automated deployments currently use the output of build_solr_schema so now I'll have to change them to modify the schema.xml on its way through (or save the output of build_solr_schema in source control).

Simon Willison
simonw commented June 18, 2011

Hmm... looks like I can override the solr.xml template myself by dropping my own search_configuration/solr.xml file in to my template directory - I'll try that for the moment.

Molanda

I ran into this issue as well. To sort a CharField, it must be indexed, but an indexed CharField produces a "text" type, which can produce a java.lang.ArrayIndexOutOfBoundsException when sorted due to being tokenized by the WhitespaceTokenizerFactory.

I was able to make a simple change to my copy of Haystack (1.2.4) so I could specify the field type.

haystack/fields.py - add a index_fieldtype attribute to SearchField
@@ -21,7 +21,7 @@
     def __init__(self, model_attr=None, use_template=False, template_name=None,
                  document=False, indexed=True, stored=True, faceted=False,
                  default=NOT_PROVIDED, null=False, index_fieldname=None,
-                 facet_class=None, boost=1.0, weight=None):
+                 index_fieldtype=None, facet_class=None, boost=1.0, weight=None):
         # Track what the index thinks this field is called.
         self.instance_name = None
         self.model_attr = model_attr
@@ -34,6 +34,7 @@
         self._default = default
         self.null = null
         self.index_fieldname = index_fieldname
+        self.index_fieldtype = index_fieldtype
         self.boost = weight or boost
         self.is_multivalued = False

haystack/backends/solr_backend.py - pass this attribute along to the schema
@@ -360,6 +360,10 @@
                 if field_data['type'] == 'text':
                     field_data['type'] = 'string'

+            # Let the class have the final say on its type.
+            if field_class.index_fieldtype is not None:
+                field_data['type'] = field_class.index_fieldtype
+
             schema_fields.append(field_data)

         return (content_field_name, schema_fields)

And then, in my solr.xml template, I added a text_sort field type. I specify index_fieldtype="text_sort" to create a CharField with this type.

+    <fieldType name="text_sort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
+      <analyzer>
+        <tokenizer class="solr.KeywordTokenizerFactory"/>
+        <filter class="solr.LowerCaseFilterFactory"/>
+        <filter class="solr.TrimFilterFactory"/>
+      </analyzer>
+    </fieldType>
picturedots

The use case for me was creating a field that could be used for sorting.

I encountered the same problem, namely that I was unable determine how to make a sortable field for SOLR via Haystack.

My solution was to change the schema at build time converting the field from type="text" to type="string":
./manage.py build_solr_schema | sed 's/<field name=\"result_title_sort\" type=\"text\"/<field name=\"result_title_sort\" type=\"string\"/' > schema.xml

Details here:
http://stackoverflow.com/questions/7399871/django-haystack-sort-results-by-title/8791793

The solution that is desribed on Stackoverflow works fine, and was very easy to implement once I identified the problem, but it was very confusing before I was able to determine what was going on. So I agree that this is a problem. At the very least, it should be more clear that vanilla haystack generated text fields will not be properly sortable by SOLR.

Tim Keefer

Alphabetical sorting seems like something fundamental to a search index. It seems that this issue is still open and our option is to manually edit the schema file post haystack generation. Can someone correct me if there's a better way to handle fields of type "string"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.