Home

Welcome to the ctakesspark wiki!

Disclaimer:

This project is by no means a fully spark-aware implementation of cTakes. It meant to fuse cTakes processing engine and parallelism provided by Spark/Hadoop to make cTakes work @ scale and access data stored in HDFS-like storage. This could be viewed as an attempt to provide an alternative to UIMA DUCC.

To make things completely Spark-aware one need to make changes to cTakes processing engine code directly. For example, NER lookups would need to change from using hsql (or any other relational database) to use data frames/RDDs for performance.

To be continued ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Welcome to the ctakesspark wiki!

Disclaimer:

Clone this wiki locally