Solr Installation

benallenallen edited this page Apr 5, 2017 · 17 revisions

Installing Solr

Solr is the fulltext index used by HPI/OpenContent to allow for fast attribute and fulltext searching of all of the content in your Hadoop or standalone Solr repository.

In an HBase environment, it is recommended that this server be separate from your HBase servers in a production environment, but can safely be co-located with your application server (OpenContent/HPI Tomcat) in small environments.

In a standalone Solr environment, it is recommended to install Solr on a separate server from your OpenContent server.

It is also possible to deploy Solr on more than one server to replicate and/or shard your index. This guide will install a single node and single shard of Solr.

Installing Solr

There are various ways to install/run solr, so pick one of the options that apply to your OS. All require downloading the latest version of Solr from http://lucene.apache.org/solr/ (either zip or tgz)

Option 1 [Linux Only] - Install and run Solr as a Service on Linux:

tar xzf solr-5.5.3.tgz solr-5.5.3/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-5.5.3.tgz

This will start a install Solr as a service and start it in standalone mode (not SolrCloud) with no collections.

If you want to force the Solr service to start as a member of an external Zookeeper (recommended for production), just add the following ENV variables to Linux:

ZK_HOST=zk1,zk2,zk3

Option 2 - Extract and run 'on-demand' standalone Solr

This will simply explode Solr somewhere and start it as a standalone Solr instance (NO SolrCloud)

  1. Extract Solr somewhere on your filesystem (Ex: /opt/solr-5.5.3 or C:\Apache\solr-5.5.3)
  2. Navigate to the bin directory (Ex: /opt/solr-5.5.3/bin or C:\Apache\solr-5.5.3\bin)
  3. Run the command: solr start

Option 3 - Extract and run Solr 'on-demand' as SolrCloud and an Embedded Zookeeper

This will explode Solr somewhere and start it in 'SolrCloud' mode with an embedded Zookeeper instance (not recommended for production).

  1. Extract somewhere on your filesystem (Ex: /opt/solr-5.5.3 or C:\Apache\solr-5.5.3)
  2. Navigate to the bin directory (Ex: /opt/solr-5.5.3/bin or C:\Apache\solr-5.5.3\bin)
  3. Run the command: solr start -c
  4. You now have a running Solr server with just one server, but it is running as a SolrCloud instance with an embedded Zookeeper.

Option 4 - Extract and run Solr 'on-demand' as SolrCloud leveraging an External Zookeeper

This will explode Solr somewhere and start it in "SolrCloud" mode, connecting to an external Zookeeper quorum (see https://github.com/tsgrp/OpenContent/wiki/Solr-Installation#install-an-external-zookeeper-cluster)

  1. Extract somewhere on your filesystem (Ex: /opt/solr-5.5.3 or C:\Apache\solr-5.5.3)
  2. Navigate to the bin directory (Ex: /opt/solr-5.5.3/bin or C:\Apache\solr-5.5.3\bin)
  3. Run the command: solr start -c -z zk1:2181,zk2:2181,zk3:2181 where zk1, zk2, zk3 is the IP/Server Name of each of the RUNNING external zookeeper instances.
  4. Solr will start and join this Zookeeper quorum as a member Solr server.
  5. You must start ALL separate Solr instances with the -z command BEFORE creating any collections to allow for Zookeeper to properly balance with your requested replication factor and sharding

Installing our 'collection' to Solr

Once the Solr instance is up and running (regardless of if it is standalone, SolrCloud Embedded, SolrCloud External Zookeeper), run the following command to create a collection (using TSG's tuned schema for performance and full-text search configurations). It is best practice to spin up all of the members of a SolrCloud instance prior to installing the collection as it will balance the replicas/shards in the most efficient manner if all instances are up when the collection is created. TSG's solr-core-sample configurations should be copied somewhere on the Solr server that is accessible for the script to copy from and the path updated in the below command.

Note: On Linux, be sure you run this command as the 'solr' user if you used the service install, as other users will not have permissions.

Note 2: If you are running a SolrCloud instance, and want to use replication/sharding, you will add the -rf paramater, which will force a replicationFactor of 2 in your SolrCloud instance. Ex:solr create -c hadoop -d /home/ubuntu/solr-core-sample -rf 2

Linux

Navigate to the solr install directory (Linux default: /opt/solr/bin)

sudo su - solr to 'switch users' and become the solr user (Skip this step if you installed Solr as run on-demand)

solr create -c hadoop -d /home/ubuntu/solr-core-sample

logout to return to your existing user

Windows

solr create -c hadoop -d C:\dev\HPI\hpiTrunk\demo-setup\solr\solr-core\solr-core-sample

Advanced Solr Installation Settings

Install an external Zookeeper Cluster

Use this in a production environment that you would like to have three external zookeepers that run your SolrCloud cluster:

  1. Download Zookeeper from https://zookeeper.apache.org/releases.html
  2. Extract somewhere on your filesystem (Ex: /opt/zookeeper-3.4.9)
  3. Navigate to the configdirectory (Ex: /opt/zookeeper-3.4.9/conf)
  4. Create a text file named zoo.cfg in this folder with the following:
    clientPort=2181
    initLimit=5
    syncLimit=2
    server.1=serverIp1:2888:3888
    server.2=serverIp2:2888:3888
    server.3=serverIp3:2888:3888
    
Where server.x is the IP address of each server in your zookeeper quorum.

1. Navigate to the bin directory (Ex: /opt/zookeeper-3.4.9/bin)
1. Start the Zookeeper Instances by running the command:
`./zkServer.sh start`
1. Rinse and repeat for each server in the quorum
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.