doc/main.tex


%outline:

% an explanation of both the type system, a run through of one or two examples 
% an architectural description of the implementation.
% Features and limitations should be prominently displayed in your documentation. 

\documentclass[a4paper]{article}

\usepackage{semantic}
\usepackage{dsfont}
\usepackage{a4wide}
\usepackage{array}
\usepackage{bussproofs}
\usepackage{latexsym}
\usepackage{verbatim}

% things for the semantic package
\reservestyle{\command}{\textbf}
\command{let,in,:,case,of,if,then,else,letrec,nil,cons,false,true,[]}
\mathlig{-->}{\longrightarrow}
\newcommand{\tyrel}{\sqsubseteq}
% end semantic

\author{Jurri\"en Stutterheim\footnote{3555003}\and Ruben de Gooijer\footnote{3508617}\and Paul van der Walt\footnote{3120805}}
\date{\today} 
\title{Implementation notes for Type and Effect Systems}

\begin{document}

\maketitle \tableofcontents

\section{Introduction}

This project aims to implement a tool which can do a control-flow analysis on a
lambda-calculus language with support for data structures. The focus is on
 lists%and pairs
, for now.

Control-flow analysis aims to determine for each function application, which
functions may be applied. We also track the creation of data structures and that
of lambda functions. The implementation of our analysis is built on the
algorithm W type inferencing algorithm, using an annotated type system.

Our implementation currently offers support for monovariance and subeffecting.
While support for polyvariance has been attempted, we failed to deliver it fully
operational on time.

This document is organised as follows. First we introduce the design of the
analysis by showing our target language, its type system, its semantics and
our analysis rules. We then proceed to the code level implementation details of
the analysis, after which we discuss in more detail what does and does not work
in our current implementation.

\section{Design}

Here we will detail the design of the language, for example which features it
supports. In Tables \ref{tab:elems} and \ref{tab:grammar} one can see the
grammar of the language. It is based on Haskell (in fact we use the Haskell
source extensions parser), but of course only a subset is supported. 

We include support for % pairs and
 lists, as well as a case statement in which one
can pattern-match on these data structures. There is not yet support for general
data structures.

\subsection{A simple functional language}

%todo: copied from slides, may need adjustment.

The basic elements of the language can be found in Table \ref{tab:elems}. The
language includes natural number constants, variables, and operators. Program
points are introduced whenever a lambda-expression is encountered.
\begin{table}
    \begin{centering}
    \begin{tabular}{rcll}
        $n$ &           $\in$ & \textbf{Num = $\mathds{N}$}& numerals \\
        $f,x$ &         $\in$ & \textbf{Var}               & variables \\
        $\oplus$ &      $\in$ & \textbf{Op}                & binary operators \\
        $\pi$ &         $\in$ & \textbf{Pnt}               & program points  \\
        $t$ &           $\in$ & \textbf{Tm}                & terms \\
    \end{tabular}
    \caption{\label{tab:elems}Basic elements}
    \end{centering}
\end{table}

In Table \ref{tab:grammar} one can see what constructs are valid in our subset
of Haskell, since only a very limited subset is supported. Note that recursion
is only supported inside \textbf{let}s. 

\begin{table}
    \begin{centering}
    \begin{tabular}{lcl}
        \hline
        $t$    & ::= & $n\: |\: \textbf{ false }\: |\: \textbf{ true }\: |\: x\: |\: \lambda_\pi x.t_1 \:$ \\
               & $|$ & $t_1 t_2 \:|\: \textbf{ if } t_1 \textbf{ then } t_2 \textbf{ else } t_3 \:|\:  \textbf{ let } x = t_1 \textbf{ in } t_2$\\
               & $|$ & $\mu f.\textbf{ let } x = t_1 \textbf{ in } t_2$\\
               & $|$ & $t_1 \oplus t_2 \:|\: [\,]_\pi \:|\:t_1\<:>_\pi t_2   $\\%     \: | \: (t_1,t_2)_\pi $ \\ 
               & $|$ & $ \textbf{case } e_0 \textbf{ of } alts $ \\
        $alts$ & ::= & $alt_0; alt_1$ \\
        $alt$  & ::= & $patt \rightarrow t$ \\
        $patt$ & ::= & $x \:|\: \textbf{True} \:|\: \textbf{False}            $\\%                \:|\: (patt, patt)$ \\
               & $|$ & $(x: patt)  \:|\: [\,]$\\
        \hline
    \end{tabular}
    \caption{Abstract syntax. $t$ is a term.}
    \label{tab:grammar}
    \end{centering}
\end{table}

\subsection{Type System}

Here we will detail the type system which has been implemented in this project.
It consists of a polymorphic type system, with the important addition of
annotations. We use annotations and constraints to be able to do a control-flow
analysis, eventually telling us which functions one may expect to be used where. 

Our type system has variables, arrows, and type environments, as well as
annotations, which is what makes it different from a normal polymorphic type
system. Table \ref{tab:typingelems} and \ref{tab:typing} show the elements of
the type system. In the var and let/letrec case \emph{inst} and \emph{gen} are mentioned.
Their definition is identical to the ones found in literature on the standard
Hindley Milner let-polymorphic type system. The final syntax-directed type system is given in table \ref{tab:typing-rules}.

\begin{table}
    \begin{centering}
    \begin{tabular}{rcll}
        $\varphi$ &               $\in$ & \textbf{Ann}                   & annotations \\ 
        $\widehat{\tau}$&         $\in$ & $\widehat{\textbf{Ty}      } $ & annotated types \\
        $\widehat{\sigma} $&      $\in$ & $\widehat{\textbf{TyScheme}} $ & annotated type schemes\\
        $\widehat{\Gamma}$&       $\in$ & $\widehat{\textbf{TyEnv}   } $ & annotated type environments  \\
    \end{tabular}
    \caption{Typing elements, annotated types}
    \label{tab:typingelems}
    \end{centering}
\end{table}
\begin{table}
    \begin{centering}
    \begin{tabular}{lcl}
        $ \varphi$         & ::= & $ \emptyset \:|\: \{\pi\} \:|\: \varphi_1 \cup \varphi_2 $ \\
        $\widehat{\tau}$   & ::= & $\alpha \:|\: Nat \: | \: Bool \: | \: $ \\
        & $|$ & $\widehat{\tau}_1^{\varphi_1}
                           \stackrel{\varphi}{\rightarrow} 
                           \widehat{\tau}_2^{\varphi_2} $ \\
                           & $|$ & $ % \widehat{\tau}_1 \times^\varphi \widehat{\tau}_2 \:|\: 
                           [\widehat{\tau}]^\varphi   $ \\ 
        $\widehat{\sigma}$ & ::= & $\widehat{\tau} \:|\: \forall \alpha. \widehat{\sigma}_1 $ \\ 
        $\widehat{\Gamma}$ & ::= & $[\,] \:|\: \widehat{\Gamma}_1[x \mapsto \widehat{\sigma}] $ \\
    \end{tabular}
    \caption{Typing}
    \label{tab:typing}
    \end{centering}
\end{table}

\begin{table}
    \begin{centering}
    \begin{tabular}{ll}
        \hline \\
        $ [$\emph{t-nat}$] $& \inference{}
        {
        {\Gamma},C |- \mathds{N} : Nat^{\varphi}
        } \\
~&~\\
        $ [$\emph{t-true}$] $& \inference{}
        {
        {\Gamma},C |- \<true>\ : Bool^{\varphi}
        } \\
~&~\\
        $ [$\emph{t-false}$] $& \inference{}
        {
        {\Gamma},C |- \<false>\ : Bool^{\varphi}
        } \\
~&~\\
        $ [$\emph{t-nil}$] $& \inference{
        C |- \varphi \sqsupseteq \pi
        }
        {
        {\Gamma},C |- \<nil>_\pi : [\tau^{\varphi_1}]^{\varphi}
        } \\
~&~\\
        $ [$\emph{t-cons}$] $& \inference{
        {\Gamma},C |- e_1 : \tau^{\varphi_1}
        & \Gamma,C |- e_2 : [\tau^{\varphi_2}]^{\varphi_3}
        & C |- \varphi_1 \sqsupseteq \varphi_2
        & C |- \varphi \sqsupseteq \pi
        }
        {
        {\Gamma},C |- \<cons>_\pi\ e_1\ e_2  : [\tau^{\varphi_1}]^{\varphi}
        } \\
~&~\\
        $ [$\emph{t-var}$] $& \inference{
        {\Gamma}(x) = ({\sigma},\varphi)
        & C |- \varphi_1 \sqsupseteq \varphi
        & inst(\sigma) = \tau
        }
        {
        {\Gamma},C |- x  : {\tau^{\varphi_1}}
        } \\
~&~\\
        $ [$\emph{t-lam}$] $& \inference{
        \Gamma[x \mapsto (\tau_x, \varphi_x)], C |- e_o : \tau_0^{\varphi_0}
        & C |- \varphi \sqsupseteq \pi
        }
        {
        \Gamma,C |- \lambda_\pi x -> e_0 : \tau_x^{\varphi_x} ->^\varphi \tau_0^{\varphi_0}
        } \\
~&~\\
        $ [$\emph{t-app}$] $& \inference{
        \Gamma,C |- e_1 : \tau_2^{\varphi_2} ->^{\varphi} \tau_0^{\varphi_0}
        & \Gamma,C|- e_2 : \tau_2^{\varphi_3}
        & C|- \varphi_2 \sqsupseteq \varphi_3
        & C|- \varphi_4 \sqsupseteq \varphi_0
        }
        {
        \Gamma, C |- e_1 e_2 : \tau_0^{\varphi_4}
        } \\
~&~\\
        $ [$\emph{t-if}$] $& \inference{
        \Gamma, C |- e_0 : Bool^{\varphi_0}
        & \Gamma, C |- e_1 : \tau^{\varphi_1}
        & \Gamma, C |- e_2 : \tau^{\varphi_2}
        & C |- \varphi \sqsupseteq \varphi_1
        & C |- \varphi \sqsupseteq \varphi_2
        }
        {
        \Gamma,C|- \<if>\: e_0\: \<then>\: e_1\: \<else>\: e_2 : \tau^{\varphi}
        } \\
~&~\\
        $ [$\emph{t-bin-op}$] $& \inference{ % what about all the annotated simple vars?
        \Gamma, C |- e_1 : \tau_{1\oplus}^{\varphi_1}
        & \Gamma, C |- e_2 : \tau_{2\oplus}^{\varphi_2}
        & C |- \varphi \sqsupseteq \varphi_1
        & C |- \varphi \sqsupseteq \varphi_2
        }
        {
        \Gamma, C|- e_1 \oplus e_2 : \tau_{\oplus}^\varphi
        } \\
~&~\\
        $ [$\emph{t-let}$] $& \inference{
        \Gamma, C |- e_1 : \tau_1^{\varphi_1}
        & \sigma = gen(\tau_1)
        & \Gamma[x \mapsto (\sigma, \varphi_1)],C |- e_2 : \tau_2^{\varphi_2}
        & C |- \varphi_3 \sqsupseteq \varphi_2
        }
        {
        \Gamma,C |- \<let>\: x = e_1\: \<in>\: e_2 : \tau_2^{\varphi_3}
        } \\
~&~\\
        $ [$\emph{t-letrec}$] $& \inference{
        & \Gamma[f \mapsto (\tau_x^{\varphi_x} -> \tau_r^{\varphi_r}, \varphi),x \mapsto (\tau_x, \varphi_x)],
        & C |- e_1 : \tau_r^{\varphi_r}, \\
        & \sigma = gen(\tau_r),
        & \Gamma[f \mapsto (\sigma, \varphi_r)],
        & C |- e_2 : \tau_2^{\varphi_2}
        & C |- \varphi_3 \sqsupseteq \varphi_2
        }
        {
        \Gamma,C |- \<letrec>\: f\ x = e_1\: \<in>\: e_2 : \tau_2^{ \varphi_3}
        } \\
~&~\\
        $ [$\emph{t-case}$] $& \inference{
        & \Gamma, C |- e_0 : [\tau_0]^\varphi
        & C |- \varphi_3 \sqsupseteq \varphi_1
        & C |- \varphi_3 \sqsupseteq \varphi_2
        }
        {
        \Gamma,C |- \<case>\:\ e_0\ \<of>\ \{ pat : \tau_0 \hookrightarrow e_1 : \tau_1^{\varphi_1} ; pat : \tau_0 \hookrightarrow e_2 : \tau_1^{\varphi_2} \} : \tau_2^{ \varphi_3}
        } \\
~&~\\
%         $ [$\emph{t-app}$] $& \inference{
% ~ 
%         }
%         {
% ~ 
%         } \\
% ~&~\\
%TODO FIXME 
        \hline
    \end{tabular}
    \caption{Typing judgements}
    \label{tab:typing-rules}
    \end{centering}
\end{table}

% \subsection{Example Derivation}

% \begin{prooftree}
% \small
% \AxiomC{$asdf$}
% \RightLabel{\scriptsize{id}}
% \UnaryInfC{$A a \vdash A a$}
%   \AxiomC{$instance A a \Rightarrow A (Maybe a)$}
%   \RightLabel{\scriptsize{inst}}
% \BinaryInfC{$A a oplus A Maybe( a)$}
% \RightLabel{\scriptsize{closure}}
% \UnaryInfC{$A (Maybe a) oplus A (Maybe (Maybe a))$}
%   \AxiomC{$$}
%   \RightLabel{\scriptsize{id}} 
%   \UnaryInfC{$A a oplus A a$}
%     \AxiomC{$instance A a \Rightarrow A (Maybe a)$} 
%   \RightLabel{\scriptsize{inst}}
%   \BinaryInfC{$A a oplus A (Maybe a)$}
% \RightLabel{\scriptsize{trans}}
% \BinaryInfC{$A  a oplus A (Maybe (Maybe a))$}
% \RightLabel{\scriptsize{closure}}
% \UnaryInfC{$A Int oplus A (Maybe (Maybe Int))$}
%   \AxiomC{$asdf$}
%   \RightLabel{\scriptsize{id}}
%   \UnaryInfC{$B a oplus B a$}
%     \AxiomC{$class (A a) \Rightarrow B a$}
%   \BinaryInfC{$B a oplus A a$}
%   \RightLabel{\scriptsize{closure}}
%   \UnaryInfC{$B Int oplus A Int$}
% \RightLabel{\scriptsize{trans}}
% \BinaryInfC{$B Int oplus A (Maybe (Maybe Int))$}
% \end{prooftree}


\subsection{Natural semantics}

In this section, Table \ref{tab:natural-semantics}, we define how terms are
evaluated. This is also known as big-step semantics. Notice the addition of
\textbf{case}-statements for data structure use. See Table 5.4 in \cite{nnh}
for reference. Our listing isn't complete, only the additions with respect to the 
aforementioned table are shown. 

\begin{table}
    \begin{centering}
    \begin{tabular}{ll}
        \hline
        $ [$\emph{ns-con}$] $& $ |- c --> c$ \\ ~&~\\
        $ [$\emph{ns-fn}$]  $& $ |- \lambda_\pi x.t_1 --> \lambda_\pi x.t_1$ \\ ~&~\\
%        $ [$\emph{ns-fn$_{rec}$}$]  $& $ |- \mu f.\lambda_\pi x.t_1 --> \lambda_\pi x.(t_1[f \mapsto \lambda_\pi x.t_1 ])$ \\ ~&~\\
%         $ [$\emph{ns-case$_{pair}$}$] $& \inference{|- e_0 --> (x_1,x_2) & |- x_1 --> v_1 & |- x_2 --> v_2}
% {|- (\<case>\: e_0\: \<of>\: (x_1,x_2) -> e_1) --> e_1[x_1\mapsto v_1, x_2 \mapsto v_2]} \\ ~&~\\
        $ [$\emph{ns-case$_{list1}$}$] $& \inference{|- e_0 --> [] }
{|- (\<case>\: e_0\: \<of>\: [] -> e_1; (x\<:>xs) -> e_2) --> e_1} \\ ~&~\\
        $ [$\emph{ns-case$_{list2}$}$] $& \inference{|- e_0 --> (x\<:>xs) & x --> v & xs --> vs }
{|- (\<case>\: e_0\: \<of>\: [] -> e_1; (x\<:>xs) -> e_2) --> e_2[x\mapsto v, xs \mapsto vs]} \\ ~&~\\


        \hline
    \end{tabular}
    \caption{Natural semantics for the language}
    \label{tab:natural-semantics}
    \end{centering}
\end{table}

\section{Program design}
The program compiles to a single executable, named \emph{cfa}. It relies heavily
on the UU Attribute Grammar system. Below is a list with filenames and a short
description what every one of these files does.

\begin{description}
\item [Components.hs] Uses haskell-src-exts to parse Haskell code and convert
it into a simplified AST. It also provides functions to start the analysis.
\item [Main.hs] Main file containing the program entry point.
\item [AG.ag] Main AG file which includes all other AG files. Contains the worklist,
unification and substitution algorithms.
\item [CollectBinders.ag] Disects a quantified type into separate parts.
\item [DataTypes.ag] Contains all of our data types.
\item [FreeAnnVars.ag] Collects all free annotation variables from the AST.
\item [FreeTyVars.ag] Collects all free type variables from the AST.
\item [FreeVars.ag] Collects all free variables from the AST.
\item [Infer.ag] Contains algorithm W and the generalise and instantiate functions.
\end{description}

\section{Features and Limitations}

The assignment suggested implementing CFA with support for tracking data structure
creation, with lists, pairs and a case-statement. Our implementation supports lists and 
a case statement, with pattern matching and a fall-through case. As mentioned in
the introduction, we do not offer support for pairs.

Currently we only support a monovariant analysis. One of the most prominent issues
preventing us from fully implementing a polyvariant analysis was that it was not
entirely clear yet how we should go about resolving constraints during the analysis.
In addition, the role of the qualified type (or rather, qualified constraints)
was not completely clear either. While we did have support for quantifying over
annotation variables and generating several constraints on the quantified types,
we were unfortunately forced to remove this support in order to make the monovariant
analysis work correctly. So, even though we did not manage to implement a polyvariant
analysis, we did at the very least learn how to do a polyvariant anaylsis on paper,
how to support it in the generalisation and instantiation functions and which 
constraints we should generate for it.

Our case blocks are limited to support for exactly two alternatives.
Generalising to $n$ alternatives would be quite straight-forward, but was dropped
due to time constraints.

We currently know of one particular case in which our current implementation fails to
deliver the expected output:

\begin{verbatim}
let f = \lambda x -> x + 1 in 
    let g = \lambda y -> y * 2 in 
        let h = \lambda z -> z 3 in 
            h g + h f
\end{verbatim}

We expect the result of \emph{h} to be:

\begin{verbatim}
(Nat -> t) ^ {x,y} -> t ^ {z}
\end{verbatim}

But what we actually get is:

\begin{verbatim}
 (Nat -> t) ^ {x} -> t ^ {z}
\end{verbatim}

We could not figure out within the time allotted what was causing this to happen. Although
we suspect that somewhere a substitution is going wrong or a constraint that should have been
present isn't generated.


\section{Example programs}

Example programs can be found in the \texttt{examples/} directory of the distribution, 
and have an extension \texttt{.hm}. Feed the program the files via \emph{stdin}. Each file
includes comments which describe its expected analysis result, and possible peculiarities
it may have. Note the directory \texttt{ill-typed/}, in which the examples fail by design.
They include examples of infinite types, etc. 


\begin{thebibliography}{9}

\bibitem{nnh}
  {Flemming Nielson, Hanne Riis Nielson, Chris Hankin},
  \emph{Principles of Program Analysis},
  Springer, Berlin,
  2nd Edition,
  2005.

\end{thebibliography}

\end{document}