The Computational Journalism Server is an appliance composed of bleeding edge open source technologies. The base is a SUSE Studio appliance, built on 64-bit openSUSE 12.1. It starts with the Server template and adds
- Complete Linux/Apache/MySQL/PHP (LAMP) stack including phpMyAdmin,
- PostgreSQL, phpPgAdmin, PostgreSQL contributed packages, PL/Perl, PL/Python, PL/Tcl and PL/R,
- The AppArmor application security framework,
- SQLite3, CouchDB, MongoDB and Redis data persistence packages,
- RabbitMQ message queueing,
- GCC and OpenJDK development tools,
- The Pandoc universal document converter,
- The Tesseract Optical Character Recognition (OCR) engine,
- The ImageMagick suite of command-line image processing utilities,
- Perl Redis and Twitter API libraries,
- Redland, Rasqal, Raptor, 4Store and Virtuoso RDF / SPARQL tools,
- R 2.15, and
- The Armadillo high-performance linear algebra libraries.
After downloading and booting up the appliance, the appliance administrator runs scripts to install
- R library packages for mapping, Twitter data collection, text mining, graphics, animation, audio analysis, election audits, weather, and sports analytics,
- R library packages for test-driven development, package management and database interface,
- R web server construction tools, including rApache,
- The RStudio™ Server Integrated Development Environment, and
- Optional selected packages from the R CRAN task views for High Performance Computing, Graphics, Reproducible Research, Machine Learning, Natural Language Processing, Spatial, Econometrics, Time Series and Finance.
The problem domain of computational journalism is dominated by three main application areas:
- Geospatial processing / mapping (GIS),
- Data science
- Natural language processing,
- Text mining,
- Machine learning,
- Social network analysis, and
- Finance, time series and econometrics.
The R language and Comprehensive R Archive Network (CRAN) libraries provide robust open source solutions for all of these.
There are three goals for the Computational Journalism Server:
- Provide an Integrated Development Environment and a platform for computational journalism web server applications written in R,
- Provide a computational journalism server optimizable to native hardware on a physical machine, and
- Provide a computational journalism compute node capable of running as a node in a cluster, grid or cloud infrastructure.
The long-term goal of this project is to provide a fully-open-source computational journalism Platform as a Service (PaaS). The current bill of materials is inspired by the Red Hat / Fedora OpenShift Origin project and the CloudFreeStyle project.
I'm looking for contributors! As with any open source project, users and testers are always wanted. I'm especially interested in computational journalism use cases. I've got a few things I want to build, mostly in the Finance and Twitter text mining area, but I don't know too much about the other application areas.
The other type of contribution I'm seeking is from people who know how to turn an appliance into a full PaaS. I've looked at Cloud Foundry and OpenShift and that's the sort of thing this wants to be when it grows up. Cloud Foundry is built on Ubuntu and OpenShift on Red Hat Enterprise Linux - why shouldn't openSUSE share in that fun?
See the project Milestones / Issues page at https://github.com/znmeb/Computational-Journalism-Server/issues for project road map.