Author: Tony Terrasa
Today, you will be exploring a filtered dataset from the English Wiktionary using your XPath knowledge.
The English Wiktionary is self-described as:
“a collaborative project to produce a free-content multilingual dictionary. It aims to describe all words of all languages using definitions and descriptions in English.” (English wiktionary homepage)
In this repository, you will find an XML database of some of the entries. To keep the XML file small, the actual text of each entry has been omitted, but the metadata remains. You can find the XML data dump from which this exercise was pulled here.
For size purposes the original file is not included in this repository. However, you will find the Python script used to filter the entries and you can visit the link above if you'd like to download the original.
You can find XSD for the output data here.
Before starting this activity, you should have a working knowledge of XPath syntax for selecting nodes.
You will also find it easier if know what a function in programming is.
In the activities
folder you will find prompts as well as possible XPath expressions to help you find the answers. I STRONGLY RECOMMEND YOU ATTEMPT EACH PROBLEM ON YOUR OWN BEFORE LOOKING AT THE ANSWER. This may involve googling how to do something. Ask for help as you need.
You will need to download the file enwiki-sample-bodyless.xml
from this repository
You may find this page useful for testing your XPath expressions. Note that you can run your expressions on an input file. You have to manually re-upload the input file each time you run your XPath expression.
You will need to make heavy use of functions to answer the questions. You can find a useful list of functions here. Note these may not be the only functions that you will need.