A Java implementation of Brat standoff format
Java
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
.gitignore
LICENSE.txt
README.md
pom.xml

README.md

pengyifan-brat

A Java implementation of data structures and code to read/write Brat standoff format.

Brat

(from brat standoff format)

Annotations created in brat are stored on disk in a standoff format: annotations are stored separately from the annotated document text, which is never modified by the tool.

For each text document in the system, there is a corresponding annotation file. The two are associated by the file naming convention that their base name (file name without suffix) is the same: for example, the file DOC-1000.ann contains annotations for the file DOC-1000.txt.

Within the document, individual annotations are connected to specific spans of text through character offsets. For example, in a document beginning "Japan was today struck by ..." the text "Japan" is identified by the offset range 0..5. (All offsets all indexed from 0 and include the character at the start offset but exclude the character at the end offset.)

Getting started

<dependency>
  <groupId>com.pengyifan.brat</groupId>
  <artifactId>pengyifan-brat</artifactId>
  <version>1.1.0</version>
</dependency>

or

<repositories>
    <repository>
        <id>oss-sonatype</id>
        <name>oss-sonatype</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>
</repositories>
...
<dependency>
  <groupId>com.pengyifan.brat</groupId>
  <artifactId>pengyifan-brat</artifactId>
  <version>1.2.0-SNAPSHOT</version>
</dependency>

Developers

Webpage

The official Brat format webpage is available with all up-to-date instructions, code, and corpora in the Brat format, and other research on, based on and related to Brat.

A repository of biomedical corpora which uses Brat and BioC format