Skip to content

Latest commit

 

History

History
15 lines (9 loc) · 851 Bytes

README.md

File metadata and controls

15 lines (9 loc) · 851 Bytes

Source documents

This directory contains the English source documents in various content domains that are often dealt with by translation service providers (TSPs). See this JSON file for their origin and license.

For the sake of convenience for analysis, we extracted text segments from the original document in HTML, PDF, or Word format. A text segment is not necessarily a single sentence; it may be a sequence of multiple sentences, a section title, a sub-sentential expression, a non-linguistic textual element, or something else.

License

See license.json; which summarizes URL, license, and URL for license statement for each document.

Acknowledgments

We would like to thank the copyright holders of the documents (if any) who made their documents publicly available for research and redistribution.