-
Notifications
You must be signed in to change notification settings - Fork 2
/
triple-scoring.html
404 lines (356 loc) · 22.2 KB
/
triple-scoring.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>WSDM Cup 2017</title>
<link href="css/bootstrap.min.css" rel="stylesheet" />
<link href="css/prettify.css" rel="stylesheet" />
<style>
.navbar .navbar-nav {
font-weight: bold;
}
</style>
<!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="js/html5shiv.js"></script>
<script src="js/respond.min.js"></script>
<![endif]-->
<link rel="shortcut icon" href="img/icon-wsdm.png">
<!--
<link rel="apple-touch-icon-precomposed" sizes="144x144" href="ico/apple-touch-icon-144-precomposed.png">
<link rel="apple-touch-icon-precomposed" sizes="114x114" href="ico/apple-touch-icon-114-precomposed.png">
<link rel="apple-touch-icon-precomposed" sizes="72x72" href="ico/apple-touch-icon-72-precomposed.png">
<link rel="apple-touch-icon-precomposed" href="ico/apple-touch-icon-57-precomposed.png">
-->
</head>
<body>
<nav class="navbar navbar-inverse navbar-static-top" style="margin-bottom:0px;">
<div class="container-fluid">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">WSDM Cup 2017</a>
</div>
<div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
<ul class="nav navbar-nav navbar-right">
<li><a href="index.html">Home</a></li>
<li><a href="about.html">Organization</a></li>
<li><a href="about.html#important-dates">Important Dates</a></li>
<li><a href="proceedings.html">Proceedings</a></li>
<li class="dropdown active">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Tasks <span class="caret"></span></a>
<ul class="dropdown-menu">
<li><a href="vandalism-detection.html">Vandalism Detection</a></li>
<li><a href="triple-scoring.html">Triple Scoring</a></li>
</ul>
</li>
</ul>
</div>
</div>
</nav>
<div class="container">
<div class="row">
<div class="col-xs-12">
<div class="clearfix">
<h1 id="task-description" class="page-header">
Triple Scoring
<div class="thumbnail pull-right" style="text-align:right;margin-left:15px;"><a href="http://www.adobe.com/" target="_blank"><img src="img/logo-adobe.png" alt="Adobe" style="max-height:150px"></a><div style="font-size:7pt;margin-right:10px;margin-top:2px;">Sponsor</div></div>
</h1>
<p>Knowledge base queries typically produce a list of entities. For
reasons similar as in full-text search, it is usually desirable to
<i>rank</i> these entities. A basic ingredient in such a ranking are
relevance scores for individual triples.
<!-- <p style="color:darkred">Page last updated on 24-10-2016 (more information about the
calling conventions for your software + added evaluator script and
explanations).</p> -->
<p style="color:darkred">Page last updated on 09-01-2017: the submission
deadline is over and the test data is now available for download, see
section "Output / Test data" below.</p>
</div>
<div class="panel panel-default">
<div class="panel-heading">Task</div>
<div class="panel-body">
<p>Given a triple from a "type-like" relation, compute a score that measures the relevance of the statement expressed by the triple compared to other triples from the same relation.
<p><i>Note: read on to understand the emphasis on "type-like" relations. In a nutshell, these are the
relations for which relevance scores are needed most. The task focuses on two such relations:
"profession" and "nationality".</i></p>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">Awards</div>
<div class="panel-body">
<p>The three best-performing approaches submitted by eligible participants as per the performance measures used for this task will receive the following awards, kindly sponsored by Adobe Systems, Inc.:
<ol>
<li>$1500 for the best-performing approach,</li>
<li>$750 for the second best-performing approach, and</li>
<li>$500 for the third best-performing approach.</li>
</ol></p>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">Task Rules</div>
<div class="panel-body">
<p>You are free to use all of the data provided in the next section, but you
do not have to use all of it, and you may use any kind or amount of other
data as well.</p>
<p>You are also free to use an arbitrary amount of computation.</p>
<p>However, you should not generate or make use of large amounts of
human judgements, in addition to the ones provided in the
<i>.train</i> files in the next section.</p>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">Input / Training data</div>
<div class="panel-body">
<p>We provide the following text files. You can just click on the link
and look at the file in your browser. At the end of the list is a link
to a ZIP archive containing all the files. Below the list we provide
some more explanations.</p>
<p><i>Note: some of the filenames have been changed slightly on
16-09-2016. The contents of the file is still exactly the same,
however. We think the new file names are clearer.</i></p>
<p><table>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/profession.kb">profession.kb</a></td>
<td> </td><td>all professions for a set of 343,329 persons</td></tr>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/profession.train">profession.train</a></td>
<td> </td><td>relevance scores for 515 tuples (pertaining to 134 persons) from profession.kb</td></tr>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/nationality.kb">nationality.kb</a></td>
<td> </td><td>all nationalities for a set of 301,590 persons</td></tr>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/nationality.train">nationality.train</a></td>
<td> </td><td>relevance scores for 162 tuples (pertaining to 77 persons) from nationality.kb</td></tr>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/professions">professions</a></td>
<td> </td><td>the 200 different professions from professions.kb (for your convenience)</td></tr>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/nationalities">nationalities</a></td>
<td> </td><td>the 100 different nationalities from nationalities.kb (for your convenience)</td></tr>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/persons">persons</a></td>
<td> </td><td>385,426 different person names from the two .kb files and their Freebase ids (for your convenience)</td></tr>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/wiki-sentences">wiki-sentences</a></td>
<td> </td><td>33,159,353 sentences from Wikipedia with annotations of these 385,426 persons (can but does not have to be used)</td></tr>
</table></p>
<p><table>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/triple-scoring.zip">triple-scoring.zip</a></td>
<td> </td><td>a ZIP file containing all of the files above (1.5 GB compressed, 4.2 GB uncompressed)</td></tr>
</table></p>
<p>Some more explanations:</p>
<ul>
<li>The two <i>.kb</i> files were extracted from a 14-04-2014 dump
of Freebase. This is not important for this task, however. Just in
case you were curious.</li>
<li>The training sets (the <i>.train</i> files provided above)
contain only tuples from the respective <i>.kb</i> files. The same
will hold true for the test sets (provided after the submission
deadline, and on which your submission will be evaluated).</li>
<li>When working on the task you will realize that the two training
sets are not sufficient on their own, but that you need additional
data. In particular, there will be professions / nationalities in the
test set for which there is no tuple in the training set.</li>
<li>The <i>wiki-sentences</i> are just one example of such
additional data, provided above to make it easier for you to get
started. Feel free to use any other data instead or in
addition. The only thing you are not allowed to use is additional
training data generated from human judgement.</li>
<li>We limited the set of professions / nationalities to 200 / 100
to make the task feasible for you, since you probably want to learn
something for each profession / nationality.</li>
<li>The contents of the files <i>professions</i> and
<i>nationalities</i> is redundant and they are provided just for your
convenience. It's exactly the set of distinct professions /
nationalities in the second column of the two <i>.kb</i> files.</li>
<li>The file <i>person</i> contains a few person names that occur in
neither of the two .kb files. Does no harm though.</li>
<li>The person names are exactly the names used by the English
Wikipedia. That is, http://en.wikipedia.org/wiki/<person
name> takes you to the respective Wikipedia page.</li>
<li>The Freebase ids provided in the <i>persons</i> file might be
useful if you want to work with a dataset like FACC1 (which is
analagous to the <i>wiki-sentences</i> provided above, but for
ClueWeb instead of Wikipedia). You don't have to though.</li>
<li>For each of the names in <i>persons</i>, there are sentences in
<i>wiki-sentences</i> (68,662 sentences for the most frequently
mentioned person, 3 sentences for the least frequently mentioned
person).</li>
<li>As mentioned in the task rules above:
feel free to use the provided data, but feel equally free to use any
kind or amount of additional data (except for human judgments for
the person-profession/nationalities pairs in the <i>.all</i>
files).</li>
</ul>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">Output / Test data</div>
<div class="panel-body">
<p>Your software will be evaluated on two test sets (one for
professions and one for nationalities) of exactly the same nature as
the two trainings sets (the <i>.train</i> files) above. The test sets
will be subsets of the <i>.all</i> files above, but with scores like
in the <i>.train</i> files.</p>
<p>Your software should produce an output exactly like in the
<i>.train</i> files above. That is, given a test file, append an
additional column (tab-seperated, like for all files in this task)
with the score, which should be an integer from the range 0..7</p>
<p>Your software has to figure out whether it is being fed the test
file with professions or nationalities (see the section below for the
command line call). It can tell this from the base of the file name,
that is, the part before the first dot. The base names of the test
sets will be <i>profession</i> and <i>nationality</i>, just as for the
training sets above.</p>
<p>Here is the script that we will use for the evaluation, and that
(of course) you can use, too:
<p><table>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/evaluator.py">evaluator.py</a></td>
</table></p>
It is written in Python3. You get a short usage info with <i>python
evaluator.py -h</i>, and a longer explanation in the comment at the
beginning of the script. The script also tests whether the formatting
of the input files is correct, and if not, tells you how and where
not. The three measures evaluated are explained in the next
section.</p>
<p>Update 08-11-2016: the script can now also be used to evaluate
multiple run-truth pairs (in particular, for a joint evaluation of
your performance on the profession and nationality test set, as it will
be done after the submission deadline). The numbers are then for the
unions of the pairs, that is, as if all the run files and all the
truth files were concatenated. Note that you can also still run
the script for a single run-truth pair as before.</p>
<p>Update 09-01-2017: the submission deadline is over and the test
data is now public:</p>
<p><table>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/profession.test">profession.test</a></td>
<td> </td><td>relevance scores for 513 tuples
(pertaining to 134 persons) from profession.kb (see above)</td></tr>
<tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/nationality.test">nationality.test</a></td>
<td> </td><td>relevance scores for 197 tuples
(pertaining to 96 persons) from nationality.kb (see above)</td></tr>
</table></p>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">Performance Measures</div>
<div class="panel-body">
<p>The scores in the train and test files have been obtained via
crowdsourcing. Each tuple (<i><person> <profession></i> or
<i><person> <nationality></i>) has been judged by 7 human judges. Each judgement
is binary: primarily relevant (= 1) or secondarily relevant (= 0).
Note that all our tuples are "correct", so there is no category
"irrelevant" here (in the rare case that a tuple is incorrect, judges
will label it 0).
The 7 judgements per triple are added up, which gives an integer score in the range
0..7.
<p>We evaluate three relevance measures, two score-based and one
rank-based:</p>
<p>Average score difference: for each triple, take the absolute
difference of the relevance score computed by your system and
the score from the ground truth; add up these differences and
divide by the number of triples.</p>
<p>Accuracy: the percentage of triples for which the score
computed by your system differs from the score from the ground
truth by at most 2.</p>
<p>Kendall's Tau: for each relation, for each subject, compute
the ranking of all triples with that subject and
relation according to the scores computed by your system and the
score from the ground truth. Compute the difference of the two
rankings using Kendall's Tau. See the (well-documented) code of
the <i>evaluator.py</i> script above for how ties are handled.</p>
<p>More details on the crowdsourcing task used to obtain the ground
truth scores, on the performance measures, and on a number of
baselines for solving the task can be found in the SIGIR paper cited
in the "Related Work" section below.</p>
<p>The award will go the system/team that achieves the highest
accuracy on the combination of both test sets (profession and
nationality). In our final report about the competition, we will
report results for all three performance measures.</p>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">Submission</div>
<div class="panel-body">
<p>We ask you to prepare your software so that it can be executed via a command line call.</p>
<p>
<pre class="prettyprint lang-c" style="overflow-x:auto">
> mySoftware <b>-i</b> path/to/input/file <b>-o</b> path/to/output/directory</pre></p>
<p>The name of the output file (to be written to the
<i>path/to/output/directory</i> folder) must be the same as the name
of the input file. There can be more than one <b>-i</b> argument. In that case your
software should process each of the runs and produce an output one
file for each.</p>
<p>For example, if your software is called like this:
<p><pre class="prettyprint lang-c" style="overflow-x:auto">
> mySoftware <b>-i</b> /dataset/profession.test <b>-i</b> /dataset/nationality.test <b>-o</b> /output</pre></p>
<p>It should write files <i>profession.test</i> and
<i>nationality.test</i> to the folder <i>/output</i>, and the
two files should be identical to the two input files, except that they
contain an additional column with the scores (from the integer range 0..7).</p>
<p>You can choose freely among the available programming languages and among the operating systems Microsoft Windows and Ubuntu. We will ask you to deploy your software onto a virtual machine that will be made accessible to you after registration. You will be able to reach the virtual machine via ssh and via remote desktop. More information about how to access the virtual machines can be found in the user guide below:</p>
<p><a class="btn btn-default" href="wsdm-cup-17-virtual-machine-user-guide.pdf">Virtual Machine User Guide »</a></p>
<p>Once deployed in your virtual machine, we ask you to access TIRA at <a href="http://www.tira.io">www.tira.io</a>, where you can self-evaluate your software on the test data.</p>
<p><strong>Note:</strong> By submitting your software you retain full copyrights. You agree to grant us usage rights only for the purpose of the WSDM Cup 2017. We agree not to share your software with a third party or use it for other purposes than the WSDM Cup 2017.</p>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">Related Work</div>
<div class="panel-body">
<p>Hannah Bast, Björn Buchhold, and Elmar Haußmann.
<a href="http://ad-publications.informatik.uni-freiburg.de/SIGIR_triplescores_BBH_2015.pdf">Relevance Scores
for Triples from Type-Like Relations</a>. In SIGIR 2015: 243 -- 252.</p>
<p>Hannah Bast, Björn Buchhold, and Elmar Haußmann.
<a href="http://ad-publications.informatik.uni-freiburg.de/FNTIR_semanticsearch_BBH_2016.pdf">Semantic Search on
Text and Knowledge Bases</a>. In FnTIR 10(2-3): 119 -- 271 (2016).</p>
</div>
</div>
<div id="task-committee" class="row" style="padding-top:30px;">
<div class="col-xs-12">
<h1 class="page-header">Task Chairs</h1>
</div>
</div>
<div class="row">
<div class="col-xs-6 col-sm-3">
<div class="thumbnail" style="text-align:center;">
<a href="http://ad.informatik.uni-freiburg.de/staff/bast" target="_blank"><img src="https://ad.informatik.uni-freiburg.de/bilder/HB17Mai14" class="img-rounded" alt="Hannah Bast" height="140"></a>
<p style="white-space:nowrap"><a href="http://ad.informatik.uni-freiburg.de/staff/bast" target="_blank">Hannah Bast</a></p>
<p style="font-size:10pt">University of Freiburg</p>
</div>
</div>
<div class="col-xs-6 col-sm-3">
<div class="thumbnail" style="text-align:center;">
<a href="http://ad.informatik.uni-freiburg.de/staff/buchhold" target="_blank"><img src="http://ad.informatik.uni-freiburg.de/bilder/Bjoern" class="img-rounded" alt="NN" height="140"></a>
<p style="white-space:nowrap"><a href="http://ad.informatik.uni-freiburg.de/staff/buchhold" target="_blank">Björn Buchhold</a></p>
<p style="font-size:10pt">University of Freiburg</p>
</div>
</div>
<div class="col-xs-6 col-sm-3">
<div class="thumbnail" style="text-align:center;">
<a href="http://ad.informatik.uni-freiburg.de/staff/haussmann" target="_blank"><img src="http://ad.informatik.uni-freiburg.de/bilder/Elmar" class="img-rounded" alt="NN" height="140"></a>
<p style="white-space:nowrap"><a href="http://ad.informatik.uni-freiburg.de/staff/haussmann" target="_blank">Elmar Haussmann</a></p>
<p style="font-size:10pt">University of Freiburg</p>
</div>
</div>
</div>
</div> <!-- /container -->
<script src="js/jquery.js"></script>
<script src="js/bootstrap.min.js"></script>
<script src="js/prettify.js"></script>
<script>
!function ($) {
$(function(){
window.prettyPrint && prettyPrint()
})
}(window.jQuery)
</script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-19597677-4', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>