The dataset included in datasets/timeparse_corpus.json
contains a set of ~2000 human annotated time expression in english and german.
The dataset is a list of json records with the following fields:
- text: the text for the time expression
- ref_time: a timestamp in ISO 8601 format
YYYY-MM-DDTHH:MM:SS
- gold_parse: the human annotation of the time expression. It can be a
Time
orInterval
. - language: a two-digit code indicating the language. In this dataset it is either "en" or "de".
For Time
, the format is as follows:
Time[]{YYYY-MM-DD HH:MM (dow/tod)}
Where:
- YYYY
is a four-digit year or X
, if year is missing
- MM
is a two-digit month or X
, if month is missing
- DD
is a two-digit day or X
, if day is missing
- HH
is a two-digit hour (24 hour clock) or X
, if hour is missing
- MM
is a two-digit minute or X
, if minute is missing
- dow
is an integer between 0 and 6 representing day of week or X, if missing (in the dataset, day of week is always missing)
- tod
is a string representing the time of day (such as earlymorning, morning, forenoon, noon, afternoon, evening, lateevening) or X if not specified.
Example:
Morning of the 11th June 2017 Time[]{2017-06-11 X:X (X/morning)}
For Interval
the format is as follows:
Interval[]{<START_T> - <END_T>}
Where <START_T>
and <END_T>
are the beginning and end of the interval. <START_T>
or <END_T>
can be None if the interval is open-ended. They can be specified
using the same representation for times, as described above:
YYYY-MM-DD HH:MM (dow/tod)
Example:
Wed, Oct 11 2017 8:30 PM - 9:47 PM Interval[]{2017-10-11 08:30 (X/X) - 2017-10-11 09:47 (X/X)}