Trying to get WebSolr working consistently with Heroku
Ruby JavaScript CoffeeScript
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Solr PDF Search Example in Rails 3.2

I spent the majority of a day bashing my head into a wall trying to get Solr working to index PDF files with Rails. I am not a Java developer, so groking the documentation around the Apache Solr stuff was a bit difficult. 20 different blog posts all said slightly different things and didn't point out the pitfalls.

I ran into issues with 500-errors, PDF results returning nothing, gems fighting each other, etc.

This example application will get you up and running with Solr working for PDF indexing.

My goal was to make an app that worked with Rails 3.2, Amazon AWS/S3, and Solr for indexing (searching) PDF files.

To get started, check out this repo and install the needed gems. Beware that its around 40MB because of all the .jar files in the /solr/lib directory.

git clone
cd Solr-Rails-3.2-Example
bundle install
cp .env_example .env
rake db:migrate

I am using the 'dotenv' gem from bkeepers ( for allowing Webrick and the rails console to easily access environmental variables. Foreman can also be used for this, but then the rails console stuff gets funny.

Edit the .env file with your Amazon credentials from

To start Solr:

rake sunspot:solr:start

To stop Solr:

rake sunspot:solr:stop

Things to watch out for.

  1. I was accidentally running Solr twice. This created an issue with port numbers and caused all sorts of weirdness. Double check if you are running an additional instance of Solr with
ps aux | grep java
  1. The versions of the .jar files in /solr/lib matter quite a bit. I tried updating the pdfbox to version 1.7.x. This was a no-go and cause errors immediately
  2. The versions of stuff in your Gemfile really seem to matter, as well as the git repos. Development of the Solr stuff (and moreso the SolrCell) Ruby libraries has been all over the place. Change these with the utmost caution.
  3. There are a few specific things in the /solr/conf/schema.xml file that enable the PDF search, specifically lines 95, 235 and 236. If you install Solr elsewhere, make sure those lines go into the schema.xml file. Elsewise, I didn't change anything in configuration.

You can test that Solr is working by doing the following:

rails s

Go to http://localhost:3000/documents and upload a new document with a title Now go to http://localhost:3000/documents?search=foobar where foobar is a term you know is in the file you uploaded.

Also from the console then you can do the following:

rails c
r = { keywords 'foobar' }


  • Add a simple search box to the index
  • Format this document better
  • Ensure that this all works with WebSolr on Heroku as well.