Skip to content

Excluding Resources From Indexing

Andy Jackson edited this page Sep 4, 2015 · 1 revision

As per the code if you set

warc.index.exclusions.enabled = true

and then

warc.index.exclusions.file = /path/to/file.txt
warc.index.exclusions.check_interval = 600

under the hood it uses StaticMapExclusionFilterFactory

Currently not clear how best to handle this when running under Hadoop Map-Reduce.

Alternatively, the completed Solr index can be 'clean-up' using delete queries. This is probably a sensible final step anyway, just to make totally sure the problematic content is not there.