This directory contains the English source documents in various content domains that are often dealt with by translation service providers (TSPs). See this JSON file for their origin and license.
For the sake of convenience for analysis, we extracted text segments from the original document in HTML, PDF, or Word format. A text segment is not necessarily a single sentence; it may be a sequence of multiple sentences, a section title, a sub-sentential expression, a non-linguistic textual element, or something else.
See license.json; which summarizes URL, license, and URL for license statement for each document.
We would like to thank the copyright holders of the documents (if any) who made their documents publicly available for research and redistribution.