Skip to content
yugagarin edited this page Dec 25, 2017 · 7 revisions

Welcome to the ctakesspark wiki!

Disclaimer:

This project is by no means a fully spark-aware implementation of cTakes. It meant to fuse cTakes processing engine and parallelism provided by Spark/Hadoop to make cTakes work @ scale and access data stored in HDFS-like storage. This could be viewed as an attempt to provide an alternative to UIMA DUCC.

To make things completely Spark-aware one need to make changes to cTakes processing engine code directly. For example, NER lookups would need to change from using hsql (or any other relational database) to use data frames/RDDs for performance.

To be continued ...

Clone this wiki locally