[WIP] Add white paper

thinkingmachines · Jul 22, 2019 · f9b06af · f9b06af
1 parent 99bd5db
commit f9b06af
Show file tree

Hide file tree

Showing 2 changed files with 28 additions and 21 deletions.
diff --git a/docs/paper/bibliography.bib b/docs/paper/bibliography.bib
@@ -12,3 +12,10 @@ @article{smith2017ballet
     journal = {Machine Learning Systems Workshop at the Conference on Neural Information Processing Systems},
     month = {November}
 }
+
+@book{gamma1995design,
+    title={Design patterns: elements of reusable object-oriented software},
+    author={Gamma, Erich and Helm, Richard and Johnson, Ralph and Vlissides, John},
+    year={1995},
+    publisher={Addison-Wesley Longman Publishing Co}
+}
diff --git a/docs/paper/white_paper.tex b/docs/paper/white_paper.tex
@@ -26,7 +26,7 @@
 \usepackage{nicefrac}       % compact symbols for 1/2, etc.
 \usepackage{microtype}      % microtypography
 \usepackage{listings}       % for code listings
-\usepackage{inconsolata}    % use as code font
+%\usepackage{inconsolata}    % use as code font
 \usepackage{color}          % for custom colors
 \usepackage{lipsum}
 
@@ -89,13 +89,13 @@
 \section{Introduction}
 
 Geospatial data differ from most datasets due to their spatial component:
-samples can be a set of points, polygons, or raster pixels with real-world
-coordinates. When coupled with large-scale datasets such as OpenStreetMap
-(OSM) \cite{osm2017}, we can easily gain massive amounts of information
-from a given sample based on its location. For example, given a point in
-Manila, it is possible to obtain the number of malls within 1.5-km, distance to
-the nearest supermarket, and frequency of traffic jams that can be used later
-on for downstream machine learning tasks. 
+samples can be a set of points, polygons, or rasters with real-world
+coordinates. When coupled with large-scale datasets such as OpenStreetMap (OSM)
+\cite{osm2017}, we can easily gain massive amounts of information from a given
+sample based on its location. To illustrate, given your current position, it is
+possible to obtain, say, the number of malls within 1.5-km, the distance to the
+nearest supermarket, or the frequency of traffic jams\textemdash all of which
+can be used later on for downstream machine learning tasks. 
 
 Due to this nature, engineering features for geospatial data is a challenging
 task\textemdash it requires a significant amount of compute and storage.
@@ -117,19 +117,18 @@ \section{Architecture}
 
 \paragraph{Concepts}
 The fundamental unit in Geomancer is a logical feature \cite{smith2017ballet}
-called \textit{Spell}. It maps a coordinate to a vector of feature values,
+called a \textit{Spell}. It maps a coordinate to a vector of feature values,
 $f_{j}^{\mathcal{D}} : \mathcal{V}^{2} \rightarrow \mathbb{R}^{q_j}$, where
 $\mathcal{V}$ is the set of feasible coordinates (latitude and longitude in
 EPSG:4326) and $q_{j}$ is the dimensionality of the $j$th feature vector. A
-collection of spells called a \textit{SpellBook} is then defined as a set of
-feature functions $\mathcal{F}^{\mathcal{D}} = \{f_j \vert j = 1 \dots m\}$. 
+collection of spells\textemdash that is, a \textit{SpellBook}\textemdash is
+then defined as a set of feature functions $\mathcal{F}^{\mathcal{D}} = \{f_j
+\vert j = 1 \dots m\}$. 
 
 Geomancer allows users to define feature transforms $F^{\mathcal{D}}$, and
-apply these functions\footnote{In Geomancer, function application is aptly
-named \textit{cast}, like in spellcasting.} to a dataset containing spatial
-coordinates $\mathcal{D}^{\prime}$. The result is a feature matrix
-$X^{\mathcal{D}^\prime}$ \cite{smith2017ballet} that can be used for downstream
-machine learning tasks: 
+apply these functions to a dataset containing spatial coordinates
+$\mathcal{D}^{\prime}$. The result is a feature matrix $X^{\mathcal{D}^\prime}$
+\cite{smith2017ballet} that can be used for downstream machine learning tasks: 
 
 \begin{equation}
     X^{\mathcal{D}^\prime} = 
@@ -139,15 +138,16 @@ \section{Architecture}
 \end{equation}
 
 \paragraph{System Design} A user interacts with Geomancer by defining and
-casting Spells and SpellBooks, which are then translated into a SQL dialect at
+casting Spells or SpellBooks, which are then translated into a SQL dialect at
 runtime. In turn, the SQL dialect connects to the database API and executes the
 query. Figure \ref{architecture} shows the system architecture for Geomancer.
 
 
 Adding new Spells and database connections to the framework is done via the
-factory design pattern. On the other hand, a SpellBook is created through the
-builder pattern, i.e., define Spells that will be included in the SpellBook.
-Figures \ref{spellFactory} and \ref{spellbookBuilder} illustrates the concept.
+factory design pattern \cite{gamma1995design}. On the other hand, a SpellBook
+is created through the builder pattern \cite{gamma1995design}, i.e., define
+Spells that will be included in the SpellBook .  Figures \ref{spellFactory} and
+\ref{spellbookBuilder} illustrates these concepts.
 
 
 
@@ -160,7 +160,7 @@ \section{Framework and usage}
 
 \paragraph{Feature transforms for geospatial feature engineering}
 \texttt{Spells} are transforms that users define based on the features they
-want to obtain. Once a \texttt{Spell}is defined, it can then be casted to a set
+want to obtain. Once a \texttt{Spell} is defined, it can then be casted to a set
 of coordinates.  For example, if we wish to get the distance to the nearest
 embassy given a sample of coordinates: