triple-scoring.html

<!DOCTYPE html>
<html lang="en">
<head>

<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">

<title>WSDM Cup 2017</title> 

<link href="css/bootstrap.min.css" rel="stylesheet" />
<link href="css/prettify.css" rel="stylesheet" />

<style>
.navbar .navbar-nav {
  font-weight: bold;
}
</style>

<!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
  <script src="js/html5shiv.js"></script>
  <script src="js/respond.min.js"></script>
<![endif]-->

<link rel="shortcut icon" href="img/icon-wsdm.png">
<!--
<link rel="apple-touch-icon-precomposed" sizes="144x144" href="ico/apple-touch-icon-144-precomposed.png">
<link rel="apple-touch-icon-precomposed" sizes="114x114" href="ico/apple-touch-icon-114-precomposed.png">
<link rel="apple-touch-icon-precomposed" sizes="72x72" href="ico/apple-touch-icon-72-precomposed.png">
<link rel="apple-touch-icon-precomposed" href="ico/apple-touch-icon-57-precomposed.png">
-->

</head>
<body>

<nav class="navbar navbar-inverse navbar-static-top" style="margin-bottom:0px;">
  <div class="container-fluid">
    <div class="navbar-header">
      <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
        <span class="sr-only">Toggle navigation</span>
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
        <span class="icon-bar"></span>
      </button>
      <a class="navbar-brand" href="index.html">WSDM Cup 2017</a>
    </div>
    <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
      <ul class="nav navbar-nav navbar-right">
        <li><a href="index.html">Home</a></li>
        <li><a href="about.html">Organization</a></li>
        <li><a href="about.html#important-dates">Important Dates</a></li>
        <li><a href="proceedings.html">Proceedings</a></li>
        <li class="dropdown active">
          <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Tasks <span class="caret"></span></a>
          <ul class="dropdown-menu">
            <li><a href="vandalism-detection.html">Vandalism Detection</a></li>
            <li><a href="triple-scoring.html">Triple Scoring</a></li>
          </ul>
        </li>
      </ul>
    </div>
  </div>
</nav>

<div class="container">

  <div class="row">
    <div class="col-xs-12">
      <div class="clearfix">
        <h1 id="task-description" class="page-header">
          Triple Scoring
          <div class="thumbnail pull-right" style="text-align:right;margin-left:15px;"><a href="http://www.adobe.com/" target="_blank"><img src="img/logo-adobe.png" alt="Adobe" style="max-height:150px"></a><div style="font-size:7pt;margin-right:10px;margin-top:2px;">Sponsor</div></div>
        </h1>

        <p>Knowledge base queries typically produce a list of entities. For
        reasons similar as in full-text search, it is usually desirable to
        <i>rank</i> these entities. A basic ingredient in such a ranking are
        relevance scores for individual triples.

        <!-- <p style="color:darkred">Page last updated on 24-10-2016 (more information about the
          calling conventions for your software + added evaluator script and
          explanations).</p> -->

        <p style="color:darkred">Page last updated on 09-01-2017: the submission
        deadline is over and the test data is now available for download, see
        section "Output / Test data" below.</p>
      </div>
      
      <div class="panel panel-default">
        <div class="panel-heading">Task</div>
        <div class="panel-body">
          <p>Given a triple from a "type-like" relation, compute a score that measures the relevance of the statement expressed by the triple compared to other triples from the same relation.
          <p><i>Note: read on to understand the emphasis on "type-like" relations. In a nutshell, these are the
                  relations for which relevance scores are needed most. The task focuses on two such relations:
          "profession" and "nationality".</i></p>
        </div>
      </div>
          
      <div class="panel panel-default">
        <div class="panel-heading">Awards</div>
        <div class="panel-body">
          <p>The three best-performing approaches submitted by eligible participants as per the performance measures used for this task will receive the following awards, kindly sponsored by Adobe Systems, Inc.:
          <ol>
            <li>$1500 for the best-performing approach,</li>
            <li>$750 for the second best-performing approach, and</li>
            <li>$500 for the third best-performing approach.</li>
          </ol></p>
        </div>
      </div>

      <div class="panel panel-default">
        <div class="panel-heading">Task Rules</div>
        <div class="panel-body">
          <p>You are free to use all of the data provided in the next section, but you
          do not have to use all of it, and you may use any kind or amount of other
          data as well.</p>
          <p>You are also free to use an arbitrary amount of computation.</p>
          <p>However, you should not generate or make use of large amounts of
          human judgements, in addition to the ones provided in the
          <i>.train</i> files in the next section.</p>
        </div>
      </div>
          
      <div class="panel panel-default">
        <div class="panel-heading">Input / Training data</div>
        <div class="panel-body">
          <p>We provide the following text files. You can just click on the link
          and look at the file in your browser. At the end of the list is a link
          to a ZIP archive containing all the files.  Below the list we provide
          some more explanations.</p>
          <p><i>Note: some of the filenames have been changed slightly on
          16-09-2016.  The contents of the file is still exactly the same,
          however. We think the new file names are clearer.</i></p>
          <p><table>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/profession.kb">profession.kb</a></td>
                <td>&nbsp;&nbsp;</td><td>all professions for a set of 343,329 persons</td></tr>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/profession.train">profession.train</a></td>
                <td>&nbsp;&nbsp;</td><td>relevance scores for 515 tuples (pertaining to 134 persons) from profession.kb</td></tr>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/nationality.kb">nationality.kb</a></td>
                <td>&nbsp;&nbsp;</td><td>all nationalities for a set of 301,590 persons</td></tr>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/nationality.train">nationality.train</a></td>
                <td>&nbsp;&nbsp;</td><td>relevance scores for 162 tuples (pertaining to 77 persons) from nationality.kb</td></tr>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/professions">professions</a></td>
                <td>&nbsp;&nbsp;</td><td>the 200 different professions from professions.kb (for your convenience)</td></tr>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/nationalities">nationalities</a></td>
                <td>&nbsp;&nbsp;</td><td>the 100 different nationalities from nationalities.kb (for your convenience)</td></tr>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/persons">persons</a></td>
                <td>&nbsp;&nbsp;</td><td>385,426 different person names from the two .kb files and their Freebase ids (for your convenience)</td></tr>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/wiki-sentences">wiki-sentences</a></td>
                <td>&nbsp;&nbsp;</td><td>33,159,353 sentences from Wikipedia with annotations of these 385,426 persons (can but does not have to be used)</td></tr>
          </table></p>
          <p><table>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/triple-scoring.zip">triple-scoring.zip</a></td>
                <td>&nbsp;&nbsp;</td><td>a ZIP file containing all of the files above (1.5 GB compressed, 4.2 GB uncompressed)</td></tr>
          </table></p>
          <p>Some more explanations:</p>
          <ul>
            <li>The two <i>.kb</i> files were extracted from a 14-04-2014 dump
            of Freebase. This is not important for this task, however. Just in
            case you were curious.</li>
            <li>The training sets (the <i>.train</i> files provided above)
            contain only tuples from the respective <i>.kb</i> files. The same
            will hold true for the test sets (provided after the submission
            deadline, and on which your submission will be evaluated).</li>
            <li>When working on the task you will realize that the two training
            sets are not sufficient on their own, but that you need additional
            data. In particular, there will be professions / nationalities in the
            test set for which there is no tuple in the training set.</li>
            <li>The <i>wiki-sentences</i> are just one example of such
            additional data, provided above to make it easier for you to get
            started. Feel free to use any other data instead or in
            addition. The only thing you are not allowed to use is additional
            training data generated from human judgement.</li>
            <li>We limited the set of professions / nationalities to 200 / 100
            to make the task feasible for you, since you probably want to learn
            something for each profession / nationality.</li>
            <li>The contents of the files <i>professions</i> and
            <i>nationalities</i> is redundant and they are provided just for your
            convenience. It's exactly the set of distinct professions /
            nationalities in the second column of the two <i>.kb</i> files.</li>
            <li>The file <i>person</i> contains a few person names that occur in
            neither of the two .kb files. Does no harm though.</li>
            <li>The person names are exactly the names used by the English
            Wikipedia. That is, http://en.wikipedia.org/wiki/&lt;person
            name&gt; takes you to the respective Wikipedia page.</li>
            <li>The Freebase ids provided in the <i>persons</i> file might be
            useful if you want to work with a dataset like FACC1 (which is
            analagous to the <i>wiki-sentences</i> provided above, but for
            ClueWeb instead of Wikipedia). You don't have to though.</li>
            <li>For each of the names in <i>persons</i>, there are sentences in
            <i>wiki-sentences</i> (68,662 sentences for the most frequently
            mentioned person, 3 sentences for the least frequently mentioned
            person).</li>
            <li>As mentioned in the task rules above:
            feel free to use the provided data, but feel equally free to use any
            kind or amount of additional data (except for human judgments for
            the person-profession/nationalities pairs in the <i>.all</i>
            files).</li>
          </ul>
        </div>
      </div>
          
      <div class="panel panel-default">
        <div class="panel-heading">Output / Test data</div>
        <div class="panel-body">
          <p>Your software will be evaluated on two test sets (one for
          professions and one for nationalities) of exactly the same nature as
          the two trainings sets (the <i>.train</i> files) above. The test sets
          will be subsets of the <i>.all</i> files above, but with scores like
          in the <i>.train</i> files.</p>

          <p>Your software should produce an output exactly like in the
          <i>.train</i> files above. That is, given a test file, append an
          additional column (tab-seperated, like for all files in this task)
          with the score, which should be an integer from the range 0..7</p>

          <p>Your software has to figure out whether it is being fed the test
          file with professions or nationalities (see the section below for the
          command line call). It can tell this from the base of the file name,
          that is, the part before the first dot. The base names of the test
          sets will be <i>profession</i> and <i>nationality</i>, just as for the
          training sets above.</p>

          <p>Here is the script that we will use for the evaluation, and that
          (of course) you can use, too:
          
          <p><table>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/evaluator.py">evaluator.py</a></td>
          </table></p>
          
          It is written in Python3. You get a short usage info with <i>python
            evaluator.py -h</i>, and a longer explanation in the comment at the
          beginning of the script. The script also tests whether the formatting
          of the input files is correct, and if not, tells you how and where
          not. The three measures evaluated are explained in the next
          section.</p> 

          <p>Update 08-11-2016: the script can now also be used to evaluate
          multiple run-truth pairs (in particular, for a joint evaluation of
          your performance on the profession and nationality test set, as it will
          be done after the submission deadline). The numbers are then for the
          unions of the pairs, that is, as if all the run files and all the
          truth files were concatenated. Note that you can also still run
          the script for a single run-truth pair as before.</p>

          <p>Update 09-01-2017: the submission deadline is over and the test
          data is now public:</p>

          <p><table>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/profession.test">profession.test</a></td>
                <td>&nbsp;&nbsp;</td><td>relevance scores for 513 tuples
                  (pertaining to 134 persons) from profession.kb (see above)</td></tr>
            <tr><td><a href="http://broccoli.cs.uni-freiburg.de/wsdm-cup-2017/nationality.test">nationality.test</a></td>
                <td>&nbsp;&nbsp;</td><td>relevance scores for 197 tuples
                  (pertaining to 96 persons) from nationality.kb (see above)</td></tr>
          </table></p>
        </div>
      </div>
        
      <div class="panel panel-default">
        <div class="panel-heading">Performance Measures</div>
        <div class="panel-body">
          <p>The scores in the train and test files have been obtained via
          crowdsourcing. Each tuple (<i>&lt;person&gt; &lt;profession&gt;</i> or
          <i>&lt;person&gt; &lt;nationality&gt;</i>) has been judged by 7 human judges.  Each judgement
          is binary: primarily relevant (= 1) or secondarily relevant (= 0).
          Note that all our tuples are "correct", so there is no category
          "irrelevant" here (in the rare case that a tuple is incorrect, judges
          will label it 0).
          The 7 judgements per triple are added up, which gives an integer score in the range
          0..7.
          <p>We evaluate three relevance measures, two score-based and one
          rank-based:</p>
          
          <p>Average score difference: for each triple, take the absolute
          difference of the relevance score computed by your system and
          the score from the ground truth; add up these differences and
          divide by the number of triples.</p>

          <p>Accuracy: the percentage of triples for which the score
          computed by your system differs from the score from the ground
          truth by at most 2.</p>

          <p>Kendall's Tau: for each relation, for each subject, compute
          the ranking of all triples with that subject and
          relation according to the scores computed by your system and the
          score from the ground truth. Compute the difference of the two
          rankings using Kendall's Tau. See the (well-documented) code of
          the <i>evaluator.py</i> script above for how ties are handled.</p>
          
          <p>More details on the crowdsourcing task used to obtain the ground
          truth scores, on the performance measures, and on a number of
          baselines for solving the task can be found in the SIGIR paper cited
          in the "Related Work" section below.</p>

          <p>The award will go the system/team that achieves the highest
          accuracy on the combination of both test sets (profession and
          nationality). In our final report about the competition, we will
          report results for all three performance measures.</p>
        </div>
      </div>

      <div class="panel panel-default">
        <div class="panel-heading">Submission</div>
          <div class="panel-body">
            <p>We ask you to prepare your software so that it can be executed via a command line call.</p>

            <p>
            <pre class="prettyprint lang-c" style="overflow-x:auto">
 > mySoftware <b>-i</b> path/to/input/file <b>-o</b> path/to/output/directory</pre></p>

            <p>The name of the output file (to be written to the
            <i>path/to/output/directory</i> folder) must be the same as the name
            of the input file. There can be more than one <b>-i</b> argument. In that case your
            software should process each of the runs and produce an output one
            file for each.</p>

            <p>For example, if your software is called like this:

            <p><pre class="prettyprint lang-c" style="overflow-x:auto">
 > mySoftware <b>-i</b> /dataset/profession.test <b>-i</b> /dataset/nationality.test <b>-o</b> /output</pre></p>

            <p>It should write files <i>profession.test</i> and
            <i>nationality.test</i> to the folder <i>/output</i>, and the
            two files should be identical to the two input files, except that they
            contain an additional column with the scores (from the integer range 0..7).</p>

            <p>You can choose freely among the available programming languages and among the operating systems Microsoft Windows and Ubuntu. We will ask you to deploy your software onto a virtual machine that will be made accessible to you after registration. You will be able to reach the virtual machine via ssh and via remote desktop. More information about how to access the virtual machines can be found in the user guide below:</p>
            <p><a class="btn btn-default" href="wsdm-cup-17-virtual-machine-user-guide.pdf">Virtual Machine User Guide »</a></p>
            <p>Once deployed in your virtual machine, we ask you to access TIRA at <a href="http://www.tira.io">www.tira.io</a>, where you can self-evaluate your software on the test data.</p>
            <p><strong>Note:</strong> By submitting your software you retain full copyrights. You agree to grant us usage rights only for the purpose of the WSDM Cup 2017. We agree not to share your software with a third party or use it for other purposes than the WSDM Cup 2017.</p>
          </div>
        </div>
        
      <div class="panel panel-default">
        <div class="panel-heading">Related Work</div>
        <div class="panel-body">
          <p>Hannah Bast, Björn Buchhold, and Elmar Haußmann.
          <a href="http://ad-publications.informatik.uni-freiburg.de/SIGIR_triplescores_BBH_2015.pdf">Relevance Scores
          for Triples from Type-Like Relations</a>. In SIGIR 2015: 243 -- 252.</p>
          <p>Hannah Bast, Björn Buchhold, and Elmar Haußmann.
          <a href="http://ad-publications.informatik.uni-freiburg.de/FNTIR_semanticsearch_BBH_2016.pdf">Semantic Search on 
          Text and Knowledge Bases</a>. In FnTIR 10(2-3): 119 -- 271 (2016).</p>
        </div>
      </div>

      <div id="task-committee" class="row" style="padding-top:30px;">
        <div class="col-xs-12">
          <h1 class="page-header">Task Chairs</h1>
        </div>
      </div>
      <div class="row">
        <div class="col-xs-6 col-sm-3">
          <div class="thumbnail" style="text-align:center;">
            <a href="http://ad.informatik.uni-freiburg.de/staff/bast" target="_blank"><img src="https://ad.informatik.uni-freiburg.de/bilder/HB17Mai14" class="img-rounded" alt="Hannah Bast" height="140"></a>
            <p style="white-space:nowrap"><a href="http://ad.informatik.uni-freiburg.de/staff/bast" target="_blank">Hannah Bast</a></p>
            <p style="font-size:10pt">University of Freiburg</p>
          </div>
        </div>
        <div class="col-xs-6 col-sm-3">
          <div class="thumbnail" style="text-align:center;">
            <a href="http://ad.informatik.uni-freiburg.de/staff/buchhold" target="_blank"><img src="http://ad.informatik.uni-freiburg.de/bilder/Bjoern" class="img-rounded" alt="NN" height="140"></a>
            <p style="white-space:nowrap"><a href="http://ad.informatik.uni-freiburg.de/staff/buchhold" target="_blank">Björn Buchhold</a></p>
            <p style="font-size:10pt">University of Freiburg</p>
          </div>
        </div>
        <div class="col-xs-6 col-sm-3">
          <div class="thumbnail" style="text-align:center;">
            <a href="http://ad.informatik.uni-freiburg.de/staff/haussmann" target="_blank"><img src="http://ad.informatik.uni-freiburg.de/bilder/Elmar" class="img-rounded" alt="NN" height="140"></a>
            <p style="white-space:nowrap"><a href="http://ad.informatik.uni-freiburg.de/staff/haussmann" target="_blank">Elmar Haussmann</a></p>
            <p style="font-size:10pt">University of Freiburg</p>
          </div>
        </div>
      </div>
  
</div> <!-- /container -->

<script src="js/jquery.js"></script>
<script src="js/bootstrap.min.js"></script>
<script src="js/prettify.js"></script>
<script>
  !function ($) {
    $(function(){
      window.prettyPrint && prettyPrint()   
    })
  }(window.jQuery)
</script>

<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-19597677-4', 'auto');
  ga('send', 'pageview');

</script>

</body>
</html>