# Extracting relations from [CRAFT 3.1](https://github.com/UCDenver-ccp/CRAFT)

This notebook demonstrates how to extract relations using [Dep2Rel](https://github.com/tuh8888/Dep2Rel/) from the [CRAFT 3.1](https://github.com/UCDenver-ccp/CRAFT) dataset.

## The data

[CRAFT 3.1](https://github.com/UCDenver-ccp/CRAFT) contains both semantic and structural annotations. 

### Semantic annotations
Semantic annotations (concept annotations) are used in named entity recognition (NER) tasks. In CRAFT, these were made using 10 of the Open Biomedical Ontologies which serve as formal dictionaries mapping persistent URIs to definitions and some relationships including subsumption relations so that they form a hierarchy. The URIs serve as the tags for these annotations. 

The format of the CRAFT semantic annotations is Knowtator XML, but we will convert these to Knowtator 2 XML.

### Structural annotations
Structural annotations consist of part-of-speech (POS) tags, treebank (dependency parses), and span/section tagging. Here, we will mostly be taking advantage of the dependency parses which define syntactic relations between tokens within a sentence. 

The format of the CRAFT syntactic annotations is PennTreebank, but we will convert these to ConllU.

In [4]:
%%bash
cd /media/tuh8888/Seagate\ Expansion\ Drive/data/craft-versions
git clone https://github.com/UCDenver-ccp/CRAFT.git
boot all-concepts -x convert -k
boot treebank convert -u

fatal: destination path 'CRAFT' already exists and is not an empty directory.
/bin/bash: line 2: boot: command not found
/bin/bash: line 3: boot: command not found



## Relation extraction

Now that we have some data in the correct formats, we can read it in.

In [1]:
(require '[clojure.java.io :as io])
(.listFiles(io/file "."))

[./CRAFT 3.1 - Relation Extraction.ipynb, ./util.clj, ./scripts, ./edu, ./BioCreative 2017 - Relation Extraction.ipynb, ./.ipynb_checkpoints, ./README.md]

In [4]:
(load "edu/ucdenver/ccp/nlp/relation_extraction")

java.io.FileNotFoundException:  Could not locate edu/ucdenver/ccp/nlp/relation_extraction__init.class or edu/ucdenver/ccp/nlp/relation_extraction.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.

In [2]:
(ns scripts.relation-extraction-script
  (:require [edu.ucdenver.ccp.nlp.relation-extraction :refer :all]
            [clojure.java.io :as io]
            [taoensso.timbre :as t]
            [edu.ucdenver.ccp.nlp.readers :as rdr]
            [edu.ucdenver.ccp.clustering :refer [single-pass-cluster]]
            [edu.ucdenver.ccp.nlp.evaluation :as evaluation]
            [edu.ucdenver.ccp.knowtator-clj :as k]
            [util :refer [cosine-sim]]
            [clojure.set :as set1]))

java.io.FileNotFoundException:  Could not locate edu/ucdenver/ccp/nlp/relation_extraction__init.class or edu/ucdenver/ccp/nlp/relation_extraction.clj on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.