An Improvement of SMAPH-S for Entity Linking of Web Queries

Abstract:

SMAPH-S is a precursor of SMAPH-2, a state-of-the-art system for joint entity mention detection and linking in web queries. Both systems use a piggyback approach to annotate queries. A set of candidate entities is drawn directly from Bing search results or annotations of Bing snippets and therefore performance depends heavily on the accuracy of Bing itself. Our system improves on SMAPH-S by systematically detecting queries which produce uninformative Bing results and rewrites them to extract better candidate entities. To this end, we split query strings into smaller chunks based on their linking probability. We also improve the way mention candidates are generated so that the system is able to handle noisy inputs as they are very common in web queries. Finally, we report the results of experimenting with different regressors in the pruning phase, such as Probabilistic Logistic Regression and AdaBoost.

The piggyback paper contains additional details.

This project is based on marcocor's query annotator stub. The project is mavenized.

Dependencies

Python with scikit-learn and Flask.
- The pruner is written in Python using scikit-learn and relies on Flask to expose an API that is started and called from the Java pipeline.
Scala
- We use Scala to generate the dataset for training the pruner.

Running

Make sure you have all dependencies installed.
Fill in your Bing API key in config.properties.
To benchmark our annotator, run BenchmarkMain.

Included classes and POM

POM

File pom.xml defines a Maven project. It includes two dependencies: bat-framework and bing-api-java. You need the BAT-framework to benchmark your annotation system, and the Bing java API to access the Bing API (in case your project is built on top of Bing).

Important classes

SmaphSAnnotator contains the improved SMAPH-S annotator we implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 260 Commits
data/misc		data/misc
models		models
results		results
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml
print_todos.sh		print_todos.sh
smaph_features.csv		smaph_features.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Improvement of SMAPH-S for Entity Linking of Web Queries

Dependencies

Running

Included classes and POM

POM

Important classes

About

Releases

Packages

Contributors 4

Languages

taivop/eth-nlp-project

Folders and files

Latest commit

History

Repository files navigation

An Improvement of SMAPH-S for Entity Linking of Web Queries

Dependencies

Running

Included classes and POM

POM

Important classes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages