Skip to content

woyumen4597/lucene

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawler

An search engine powered by Apache Lucene

1. Overview

This is a simple search engine powered by Lucene,developed by Intellij IDEA.

2. Installing

Use Maven for installing

3. Steps

Step1:clone this repository

git clone https://github.com/woyumen4597/lucene.git
cd lucene

Step2:Maven install

mvn install

Step3:Configuration

Change db.properties to your own database. Then switch to your database: run this following sql:

CREATE TABLE `task` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `url` varchar(255) DEFAULT NULL,
  `state` tinyint(4) DEFAULT NULL COMMENT '0:未抓 1:已抽取 2:抽取失败',
  `update_time` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `NewIndex1` (`url`)
) ENGINE=InnoDB AUTO_INCREMENT=26781 DEFAULT CHARSET=utf8

Step4:Run

mkdir indexDir

Then find LuceneApplication,run the main method. Wait a moment,and open http://localhost:8080/api/collect to collect index.Thus you can watch mysql table task and your indexDir with Luke.This may take much time.

Step5:Search

When the page return ok,you can open http://localhost:8080 to start your search. Congratulations!

License

Feel free to use, reuse and abuse the code in this project.