layout | title | section |
---|---|---|
page |
Implementing Search with WebSolr |
Small Topics |
The best options for application search engines run on Java. Solr, by the Apache foundation, has emerged as a favorite among the Rails community.
Managing a second application running in a different language and virtual machine can be a headache. WebSolr, http://websolr.com/, emerged as an easy way to outsource the running of your search engine. Though it runs as an external service, it is completely transparent to your user and, generally, transparent to the developer.
Let's look at making use of it from a Rails application.
The Sunspot gem, http://outoftime.github.com/sunspot/, is the most comprehensive Ruby interface to Solr-powered search engines.
To install Sunspot and it's dependencies, add sunspot_rails
to your Gemfile
and run bundle.
You can optionally setup a local Solr instance with an embedded JRuby server by installing the sunspot_solr
gem. To get the latest beta version, install with this command:
gem install sunspot_solr -v 1.3.0.rc3 --pre
Then start the server with sunspot-solr start
.
At the time of this writing, I had trouble getting `sunspot_solr` to run correctly.
Solr is available as a package in Homebrew for OS X (brew install solr
) or Ubuntu's apt (apt-get install solr
).
To setup your Heroku application to make use of WebSolr, run this command from your project directory:
heroku addons:add websolr
The bottom level WebSolr package is a $20/month add-on.
There are two options for telling your application and the Sunspot library how to find the Solr server.
Sunspot will look for and use a WEBSOLR_URL
environment variable in available.
When you use the WebSolr add-on, this is automatically managed for you.
If you want to setup the configuration information in the application, generate a config file by running:
rails generate sunspot_rails:install
That will create a config/sunspot.yml
where you can set the host and port.
Once you have Solr running and Sunspot setup, you need to tell it how to index your model data.
In the model, call the searchable
method and pass a block. In the block, we call methods specifying the type and name of attributes to index. For instance:
class Article < ActiveRecord::Base
searchable do
text :title
text :body
time :published_at
end
end
Then Sunspot will index each of these three fields in Solr.
The following indexing methods are available:
text
: breaks the data into individual keywordsstring
: index the data as a single string.time
: datetime fieldsinteger
: numeric fields, especially foreign keys
If you index multiple fields, like the title and the body here, then it's likely some components are more important that others.
For instance, you might want to promote matches in the title more highly than matches in the body. You can add the default_boost
parameter, like this:
searchable do
text :title, :default_boost => 2
text :body
end
By default, Sunspot will update the index whenever an object is created, saved, or destroyed.
This is easy, but in production it can slow your application down because it happens during the request/response cycle. Instead, it'd be better to push the index updating to an asynchronous worker process.
Sunspot has a built in capability to use background workers, triggered by adding calling handle_asynchronously :solr_index
:
class Article < ActiveRecord::Base
searchable do
text :title
text :body
end
handle_asynchronously :solr_index
end
The only catch is that this relies on the Heroku default background job queue: delayed_job
. If you're using Resque, instead, try the following code written by the author of Sunspot: https://gist.github.com/659188
You've setup the server and indexed the data, now you can actually run queries. Use the search
class method and pass in a block.
A basic search might look like this:
search_result = Article.search { keywords 'hello' }
The block passed to search can be more specific, too:
search_result = Article.search do
fulltext 'hello world'
with(:published_at).less_than Time.now
order_by :published_at, :desc
end
There are many more options and techniques that can be used to refine the search results, for information on them check out the Sunspot gem API.
Once you execute a search you have access to both the matched objects and metadata about the search itself.
Call the .results
method to get back the ordered set of search results:
search_result = Article.search { keywords 'hello' }
@articles = search_result.results
These are just your normal domain objects with no metadata.
If you're interested in the metadata, use the .hits
method. The Sunspot wiki has two great examples of ways you could use the metadata along with the matched objects, adapted below.
We can use the each_hit_with_result
method to iterate through the match data and the matched objects. Call the .score
method for the numeric quality-of-match indicator, here's how we might output it in the results:
<div class="results">
<% @search.each_hit_with_result do |hit, article| -%>
<div class="result">
<h2><%= article.title %></h2>
<div class="score"><%= hit.score %></div>
<p><%= article.body %></p>
</div>
<% end -%>
</div>
Or, you could highlight the fragment of the object which matched the search:
<div class="results">
<% @search.each_hit_with_result do |hit, article| -%>
<div class="result">
<h2><%= article.title %></h2>
<p class="summary"><%= hit.highlight(:body).format { |fragment| content_tag(:em, fragment) } %></p>
</div>
<% end -%>
</div>
- Heroku DevCenter on Websolr: http://devcenter.heroku.com/articles/websolr
- SunSpot quickstart: https://github.com/sunspot/sunspot/wiki/Adding-Sunspot-search-to-Rails-in-5-minutes-or-less
- Working with Sunspot Results: https://github.com/sunspot/sunspot/wiki/Working-with-search
- WebSolr Add-On Service Levels: http://addons.heroku.com/websolr