Permalink
Browse files

Cleaned up the intro.

  • Loading branch information...
1 parent 82dba5a commit 7c24fff97a97477c50a3e49d95fd8d1d4b16fb8f @davedash davedash committed Aug 5, 2011
Showing with 35 additions and 22 deletions.
  1. +1 −1 .gitignore
  2. +5 −5 docs/index.rst
  3. +29 −16 docs/introduction.rst
View
@@ -3,4 +3,4 @@ conf/grouperfish.json
.classpath
.settings/
target/
-
+docs/_build
View
@@ -1,16 +1,16 @@
Welcome to Grouperfish's documentation!
=======================================
-**Note:**
-This documentation serves as a specification.
-It describes a system that has not reached a usable state yet.
+.. note::
+ This documentation serves as a specification.
+ It describes a system that has not reached a usable state yet.
Contents:
.. toctree::
:maxdepth: 2
-
+
introduction
architecture
installation
@@ -19,7 +19,7 @@ Contents:
batch_system
transforms
queries
-
+ todo
Indices and tables
==================
View
@@ -1,15 +1,19 @@
Introduction
============
-*What is this about?* The scenarios in which Grouperfish might be used, and some vocabulary.
+Grouperfish is built to perform text clustering for `Firefox Input`_.
+Due to its generic nature, it also serves as a testbed to prototype machine
+learning algorithms.
+.. _Firefox Input: http://input.mozilla.com
-What is Grouperfish?
---------------------
+How does it work?
+-----------------
-Grouperfish is a *document transformation system*, for high throughput applications.
+Grouperfish is a *document transformation system*, for high throughput
+applications.
-Roughly summarized
+Roughly summarized:
* users put *documents* into Grouperfish using a REST interface
@@ -20,29 +24,38 @@ Roughly summarized
* all components are distributed for high volume applications
-What is it for?
-"""""""""""""""
-
-Grouperfish is built to perform text clustering for `Firefox Input`_. Due to its generic nature, it also serves as a testbed to prototype machine learning algorithms.
+What can be done?
+"""""""""""""""""
-.. _Firefox Input: http://input.mozilla.com
+Assume a scenario where a steady stream of documents is generated.
+For example:
+* user feedback
+* software crash reports
+* twitter messages
-Ok... wait --- what is it for?
-""""""""""""""""""""""""""""""
+Now, these documents can be processed to make them more useful.
+For example:
-Assume a scenario where a steady stream of documents is generated. Examples would include user feedback forms or questionnaires, crash reports a desktop application sends home, and of course twitter messages. Now, these documents can be processed to make them more useful. An example would be to generate an index, for which very advanced solutions exist already. Other examples include clustering (grouping related documents together, detecting common topics), classification (associating documents with predefined categories -- for example when detecting spam) or trending (identifying new topics over time).
+* clustering (grouping related documents together, detecting common topics)
+* classification (associating documents with predefined categories including
+ spam)
+* trending (identifying new topics over time).
Vocabulary
----------
Grouperfish users can assume one of three roles (or any combination thereof):
-* Document Producer: Some user (usually another piece of software) that will put documents into the System.
+Document Producer
+ Some user (usually another piece of software) that will
+ put documents into the System.
-* Result Consumer: Some user/software that gets the generated results.
+Result Consumer
+ Some user/software that gets the generated results.
-* Admin: A user who configures which subsets of documents to transform, and how.
+Admin
+ A user who configures which subsets of documents to transform, and how.

0 comments on commit 7c24fff

Please sign in to comment.