Skip to content
This repository has been archived by the owner on May 24, 2021. It is now read-only.

Commit

Permalink
[WIP] Add white paper
Browse files Browse the repository at this point in the history
  • Loading branch information
ljvmiranda921 committed Jul 22, 2019
1 parent 99bd5db commit f9b06af
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 21 deletions.
7 changes: 7 additions & 0 deletions docs/paper/bibliography.bib
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,10 @@ @article{smith2017ballet
journal = {Machine Learning Systems Workshop at the Conference on Neural Information Processing Systems},
month = {November}
}

@book{gamma1995design,
title={Design patterns: elements of reusable object-oriented software},
author={Gamma, Erich and Helm, Richard and Johnson, Ralph and Vlissides, John},
year={1995},
publisher={Addison-Wesley Longman Publishing Co}
}
42 changes: 21 additions & 21 deletions docs/paper/white_paper.tex
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
\usepackage{nicefrac} % compact symbols for 1/2, etc.
\usepackage{microtype} % microtypography
\usepackage{listings} % for code listings
\usepackage{inconsolata} % use as code font
%\usepackage{inconsolata} % use as code font
\usepackage{color} % for custom colors
\usepackage{lipsum}

Expand Down Expand Up @@ -89,13 +89,13 @@
\section{Introduction}

Geospatial data differ from most datasets due to their spatial component:
samples can be a set of points, polygons, or raster pixels with real-world
coordinates. When coupled with large-scale datasets such as OpenStreetMap
(OSM) \cite{osm2017}, we can easily gain massive amounts of information
from a given sample based on its location. For example, given a point in
Manila, it is possible to obtain the number of malls within 1.5-km, distance to
the nearest supermarket, and frequency of traffic jams that can be used later
on for downstream machine learning tasks.
samples can be a set of points, polygons, or rasters with real-world
coordinates. When coupled with large-scale datasets such as OpenStreetMap (OSM)
\cite{osm2017}, we can easily gain massive amounts of information from a given
sample based on its location. To illustrate, given your current position, it is
possible to obtain, say, the number of malls within 1.5-km, the distance to the
nearest supermarket, or the frequency of traffic jams\textemdash all of which
can be used later on for downstream machine learning tasks.

Due to this nature, engineering features for geospatial data is a challenging
task\textemdash it requires a significant amount of compute and storage.
Expand All @@ -117,19 +117,18 @@ \section{Architecture}

\paragraph{Concepts}
The fundamental unit in Geomancer is a logical feature \cite{smith2017ballet}
called \textit{Spell}. It maps a coordinate to a vector of feature values,
called a \textit{Spell}. It maps a coordinate to a vector of feature values,
$f_{j}^{\mathcal{D}} : \mathcal{V}^{2} \rightarrow \mathbb{R}^{q_j}$, where
$\mathcal{V}$ is the set of feasible coordinates (latitude and longitude in
EPSG:4326) and $q_{j}$ is the dimensionality of the $j$th feature vector. A
collection of spells called a \textit{SpellBook} is then defined as a set of
feature functions $\mathcal{F}^{\mathcal{D}} = \{f_j \vert j = 1 \dots m\}$.
collection of spells\textemdash that is, a \textit{SpellBook}\textemdash is
then defined as a set of feature functions $\mathcal{F}^{\mathcal{D}} = \{f_j
\vert j = 1 \dots m\}$.

Geomancer allows users to define feature transforms $F^{\mathcal{D}}$, and
apply these functions\footnote{In Geomancer, function application is aptly
named \textit{cast}, like in spellcasting.} to a dataset containing spatial
coordinates $\mathcal{D}^{\prime}$. The result is a feature matrix
$X^{\mathcal{D}^\prime}$ \cite{smith2017ballet} that can be used for downstream
machine learning tasks:
apply these functions to a dataset containing spatial coordinates
$\mathcal{D}^{\prime}$. The result is a feature matrix $X^{\mathcal{D}^\prime}$
\cite{smith2017ballet} that can be used for downstream machine learning tasks:

\begin{equation}
X^{\mathcal{D}^\prime} =
Expand All @@ -139,15 +138,16 @@ \section{Architecture}
\end{equation}

\paragraph{System Design} A user interacts with Geomancer by defining and
casting Spells and SpellBooks, which are then translated into a SQL dialect at
casting Spells or SpellBooks, which are then translated into a SQL dialect at
runtime. In turn, the SQL dialect connects to the database API and executes the
query. Figure \ref{architecture} shows the system architecture for Geomancer.


Adding new Spells and database connections to the framework is done via the
factory design pattern. On the other hand, a SpellBook is created through the
builder pattern, i.e., define Spells that will be included in the SpellBook.
Figures \ref{spellFactory} and \ref{spellbookBuilder} illustrates the concept.
factory design pattern \cite{gamma1995design}. On the other hand, a SpellBook
is created through the builder pattern \cite{gamma1995design}, i.e., define
Spells that will be included in the SpellBook . Figures \ref{spellFactory} and
\ref{spellbookBuilder} illustrates these concepts.



Expand All @@ -160,7 +160,7 @@ \section{Framework and usage}

\paragraph{Feature transforms for geospatial feature engineering}
\texttt{Spells} are transforms that users define based on the features they
want to obtain. Once a \texttt{Spell}is defined, it can then be casted to a set
want to obtain. Once a \texttt{Spell} is defined, it can then be casted to a set
of coordinates. For example, if we wish to get the distance to the nearest
embassy given a sample of coordinates:

Expand Down

0 comments on commit f9b06af

Please sign in to comment.