Skip to content
This repository has been archived by the owner on Sep 22, 2022. It is now read-only.

ubffm/Annohub

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
doc
 
 
 
 
 
 
 
 
 
 
 
 

Annohub

The application can be used to extract annotation information about applied annotation schemes and languages in annotated language resources like corpora, in various formats (e.g. RDF, CoNLL and XML). Results are made available via a web-interface that serves as a means to edit and export the harvested meta-data. A forthcoming paper as well as the manual in the doc folder give more in-depth information about its actual use-case.
Annohub was conducted in the context of the Specialized Information Service Linguistics (FID), funded by German Research Foundation(DFG/LIS, 2017-2019).

Installation

  1. Prerequisites

  2. Download Tinkerpop Gremlin Server version 3.3.10

    https://tinkerpop.apache.org/downloads.html

  3. Unpack the file and install the neo4j-gremlin driver

    cd apache-tinkerpop-gremlin-server-3.3.10

    bin/gremlin-server.sh install org.apache.tinkerpop neo4j-gremlin 3.3.10

The process of plugin installation is handled by Grape, which helps resolve dependencies into the classpath. If you run
into problems you can obtain further information on the installation of Grape at https://tinkerpop.apache.org/docs/current/reference/#neo4j-gremlin

  1. Edit the Gremlin Server configuration file conf/neo4j-empty.properties to set the server's database directory

    gremlin.neo4j.directory=/your/server/directory

  2. Start the server

    bin/gremlin-server.sh

  3. Edit the Annohub configuration file (you can use /src/main/resources/FIDConfig.xml as a template)

    Database setup
    a. Gremlin.Server.home - /your/path/to/apache-tinkerpop-gremlin-server-3.3.10

    b. Gremlin.Server.conf - /your/path/to/apache-tinkerpop-gremlin-server-3.3.10/conf/gremlin-server-neo4j.yaml

    c. Gremlin.Server.data - /another database directory (this is different from the directory entered in step 3 !)

    Application setup
    a. RunParameter.downloadFolder - crawler-download-directory (e.g. /tmp/annohub/downloads)

    b. RunParameter.ServiceUploadDirectory - web-application-upload-directory (e.g. /tmp/annohub/uploads)

    c. RunParameter.decompressionUtility - enter 7z or (7za)

  4. For easy maintenance of your configuration you can set the environment variable FID_CONFIG_FILE to the location of you configuration file

  5. Build the Annohub application with maven

    mvn install clean

  6. Initialize the Annohub model database

    run.sh -init

  7. After initalization has finished you can parse data

    run.sh -execute -seed seed_file

    where seed_file contains a list of language resource URLs (one URL per line)

  8. For the deployment of the Annohub web-application an installation of TomEE (https://tomee.apache.org/) is required.
    Please consider the following configuration options :

    • CATALINA_OPTS=-Xmx4g -Xss5m
    • in context.xml set <Resources cachingAllowed="true" cacheMaxSize="100000" />

About

Tool for extracting language and annotation meta-data from language resources

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages