A job search engine built using Elasticsearch to illustrate the role of tracking, measurement and evaluation for search quality
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
crawler
script
_createindex.json
_elasticfeeder.json
_elasticquery.json
define.php
discriminator.php
feeder.php
footer.php
header.php
index.php
license.txt
location.php
logger.php
query.php
readme.txt
search.php

readme.txt

OVERVIEW
- A job search engine with simple front-end built in PHP using Elasticsearch. Crawler is written in Perl. The purpose is to illustrate the role of tracking, measurement and evaluation for search quality. The entire process is described in the blog post https://wilsonwong.co/2016/10/06/search-engineering-101.

INSTALLATION
- ActivePerl for Windows
- PHP
- Elasticsearch
- MySQL

CONFIGURATION
- add the following to elasticsearch.yml
http.cors.allow-origin: "*"
http.cors.enabled: true 
node.master: true
index.max_result_window: 200000
- configure the values in the ./define.php file and the ./crawler/conf.pl for database connections, Elasticsearch index, target URL for crawling, etc.
- all the crawled jobs are first stored in ./crawler/rawcontent/