# Abstract Interpretation of SPARK Programs

Traditionally, SPARK programs have been analysed by the Examiner using data flow and information flow analysis.  This study does not consider the proof features of the Examiner (VCG generator) but rather investigates whether the same or better results to flow analysis can be achieved using abstract interpretation. Better results may be achievable using abstract interpretation as it might be possible to determine some non-executable paths and exclude them from the analysis.  To achieve this goal, two of the more advanced techniques of abstract interpretation - variable analysis and path analysis are needed. 

The SPARK language was developed to achieve accurate and straightforward flow analysis. Many of the restrictions placed on Ada by SPARK to make it amenable to flow analysis may also be advantageous for abstract interpretation.  


## Abstract Interpretation

Abstract interpretation is commonly viewed as having three stages:
1. Translate
2. Merge
3. Widen

These three stages are applied to each statement of the given source text until it has been completed.

Translate converts a statement into a simple model representing the statement.  Merge takes the model of the statement and the abstractions of the immediately preceding statements (in general, there may be more than one due to gotos if statements and loops) and merges them into a single abstraction for the statement.  The abstraction is an approximation of the state at the statement.  Generally, it consists of an entry for each variable and an approximation of the range of values that variable may have at the statement.  The Widen stage is typically used after loops to widen the approximate range of possible values a variable may have to represent executing the loop multiple times.

What is interesting about these stages is that they can be adapted to suit a number of different analyses but still fit within the framework of calling each of the three stages for each statement.

For instance,  constant analysis is may be used to obtain an approximation of the range of values a variable may have at a particular statement based on the value of constants within the source text.  Variable analysis is similar but more complex based on the expressions assigned to variables within the source text.

To perform abstract interpretation, an abstract model of the source text needs to be constructed consisting of a model of each statement (the translation) and a sequence of abstractions representing each of the previous statements.  Merge consolidates the immediately preceding statements with the model of the current statement to obtain an abstraction for the current statement which is appended to the sequence.  Merging is a simple operation when the statement has only one immediate predecessor but becomes a little more complex around if statements, loops and the targets of goto statements.

SPARK 2005 (SPARK Classic) has the advantage over general programming language in that, other than in a loop, an immediately preceding statement cannot be later in text than the current statement, simplifying the sequence of preceding statements that need to be maintained.  As SPARK is modular each subprogram is essentially self-contained and only variables used within the subprogram need to be in the abstractions, very little extra context has to be maintained and the sequence of abstractions can be discarded after completing the analysis of the subprogram.

This study starts with the simple but very important check for SPARK variable definedness.   In Ada terms, ideally every variable is initialized to a valid value before it is read.  

# Abstract Interpretation of Definedness

Abstract interpretation uses an abstraction based on an approximation of the range of values a variable may have at a given statement.  In variable definedness we are concerned whether a variable has been assigned a (preferably valid) value but are not concerned with the precise value.  A representation of the values of definedness needs to be used.

For a SPARK program the basic unit of analysis is a subprogram (although some analysis of packages tasks and packages is required) and in this document only analysis of subprograms is considered.

## Definedness Values

The value of a variable is "used" if its name appears in an Expression.  The value of a variable may be "defined", generally by an assignment.  Consider a model to represent the values of definedness. Definedness is not concerned with the actual value of a variable, only whether it is initialized.  Three values are obvious: ***Undefined***, ***Defined*** and ***Used***. However, the analysis has to detect the use of an uninitialized variable, the update of a variable with an expression containing an uninitialized variable and the subsequent use of a variable assigned with such a variable.  Another definedness value is introduced, ***Unsound***, which is the definedness value a variable has when it is assigned from an expression containing a variable with a definedness value of ***Undefined***.  The meaning of the definedness value, ***Unsound***, is extended to include assignment of a value from an expression containing an ***Unsound*** definedness value.ced, ***In_Value***. The values of definedness are thus:

### Values of Definedness

1. ***Undefined*** -- An uninitialized variable
2. ***Unsound***   -- An update with an expression containing an Undefined or Unsound variable
4. ***Defined***   -- A variable that has been updated with a sound expression
5. ***Used***      -- A Defined variable that has been used

If a definedness value is not Undefined or Unsound, it is considered a Sound value. A Sound value is not necessarily a Valid value in the Ada sense as the value could be out of range and, therefore, an invalid value.  A more sophisticated abstract interpretation technique may be able to approximate the range of actual (rather than definedness) values a variable may have.

## Model of Program Statements

The statements of a subprogram read (a use) or assign (a definition) variables suggesting a simple model of a statement for defindness of a sequence of ***Read*** and ***Assign*** actions. The translation phase will convert a program statement into this simple model.

Statements of a program may also indicate a branch, e.g., through an if statement, and a rejoining of a branch, at the end of the statement, e.g., the end if.  The definedness model must take account of these branches and to correctly merge the abstractions of statements from different branches.  In Classic SPARK branching can only occur through, if, case and loop statements, but conditional computation can be also controlled by using **and then** and **or else** operators.

Definedness analysis also needs to know the start and end of a subproram as the is the basic unit in SPARK analysis.

To account for the above effects of program statements on definedness the following actions are defined in the translation of the program text.

### Translation Actions

1. ***Read***
2. ***Assign***
3. ***If_Condition***
4. ***Then_Branch***
5. ***Elsif_Condition***
6. ***Else_Branch***
7. ***End_If***
8. ***Case_Condition***
9. ***Case_When***
10. ***Case_Others***
11. ***End_Case***
12. ***While_Condition***
13. ***For_Forward***
14. ***For_Reverse***
15. ***Loop_Branch***
16. ***Exit_Branch***
17. ***Exit_When***
18. ***End_Loop***
19. ***And_Then***
20. ***Or_Else***
21. ***Proc_Start***
22. ***Proc_End***
23. ***Fun_Start***
24. ***Fun_Return***
25. ***Fun_End***

At this stage Raven SPARK is not considered.  This mwy introduce further actions.

In a single statement, a variable may be both read and assigned, but in SPARK, expressions do not have side effects, so all the variables on the right-hand side of an assignment statement are only read.  On the left-hand side, the only variables that are read are array indices, although unusually, in object declarations, more than one variable may be initialized.  In a subprogram call, formal parameters of mode **in** or **in out** are considered as reads of the corresponding actual parameters and a pre-definition of the formal parameters at the call of a subprogram.  Similarly, formal parameters of mode **out** or **in out** are considered as an assignment of the corresponding actual parameters at the return of the subprogram.  A funtion result is modelled as an **out** parameter.  A subprogram can have many parameters and globals, leading to the model of a statement potentially having multiple uses of variables (possibly of the same variable) and multiple definitions of variables.  SPARK ant-aliasing rules guarantee all variables defined in a single statement are unique.

## The Translation

The proposed translation first considers all of the variables read by the statement and then those that are assigned by the statement. This avoids ambiguity when the same variable is both read and written by the statement.  Each read and assignment of a variable will have a separate entry in the translation.  For instance:

    X := X + X + Y;

would be translated as:

    Read   (X)
    Read   (X)
    Read   (Y)
    Assign (X)

For reporting anomolies an association between the statement position and its translation item are needed.  For simplicity, in this study, only one statement per line is assumed so they can just be the line number.

Assuming the above statement is on line 10, the translation becomes:

    Read   (X, (10, 1))
    Read   (X, (10, 2))
    Read   (Y, (10, 3))
    Assign (X, (10, 4))

It may be unnecessary to record the two ***Reads*** of X, one may be sufficient as a second ***Read*** will not change the definedness value of the variable.

The call of a procedure, 

**procedure** P (A : **in** Integer; B : **in out** Integer; C : **out** Integer);

on line 20

    P (X, Y, X);

would be translated as (assuming no SPARK global or dependency relation is applied to the procedure declaration):

    Read   (X, (20, 1))
    Read   (Y, (20, 2))
    Read   (X, (20, 3))
    Assign (Y, (20, 4))
    Assign (X, (20, 5))

## Statement Abstraction (STAB)

Each statement has an abstraction representing all the variables that are read or updated by the subprogram and their state on completion of execution of the statement.  Commonly, in abstract interpretation, the state of a each variable is represented by an approximation of the range of values that the variable may have at a statement.  For definedness, the maximum range is [Undefined, Used]. An order is placed on the definedness values: ***Undefined*** < ***Unsound*** < ***Defined*** < ***Used***.  The abstraction of a statement, S, is formed by merging the abstractions of all its immediately preceding statements and applying the effects of executing S to the merged statements. In this document the abstraction representing a statement is called a STatement ABtraction, abreviated to STAB.

## Merging

Merging creates the STAB of a statement, S, by merging STABs of all of the immediately preceding statements of S and applying the effects of executing S on the merged abstractions.

#### Initial Merge (JOIN)

The initial part of the merge for a statement, S, considers the state of all variables from the STABS of all immediately preceding statements of S and determines the widest posssiible definedness range of each variable.   That is, the variable's lower bound is taken to be the lowest bound of all its immediate predecessors and the high bound from the highest.  The merged abstractions of the immediately preceding statements of S is the basis for the STAB of S.  In this document the abstraction fomred by this initial merge of the merged abstractions of the immediatly preceding statements is termed a JOIN.  The STAB of a statement is equal to the JOIN before considering the effects of executing the statement.

### Detecting Definedness Anomalies

Consider the variables read by statement S. By definition all of the variables read will be in the JOIN.  If the range of a variable in the JOIN is [Undefined, Undefined], then a ***Read** action on this variable is an unconditional use of an uninitialized variable.   If the lower bound of a variable in the JOIN, has a lower bound of ***Undefined*** but an upper bound of which is higher, then a read of this variable is a conditional use of an uninitialized variable.

If the lower bound of the range of a variable in the JOIN is ***Unsound***, then a read of this varible may be Unsound, unconditonally if the range is [Unsound, Unsound].

### The Effects of Read Actions on STAB

A read by statement, S, of a variable (it uses it), with an upper bound in the JOIN of ***In_Value*** or ***Defined***, will set its upper bound to ***Used*** in the STAB of S.  The lower bound of the variable will be the same as it is in the JOIN.  If the upper bound of the variable, in the JOIN is not ***Defined***, the bounds of the variable in STAB of S are as in the JOIN.

In summary here are the rules for the effects of a read of a variable in a statement on the definedness value of the variable in the STAB of the statement:

#### Read Action Table:

        JOIN                 <Read>          STAB           
    [Undefined, Undefined]     ->     [Undefined, Undefined] (1)
    [Undefined, Unsound]       ->     [Undefined, Unsound]   (2)
    [Undefined, Defined]       ->     [Undefined, Used]      (2)
    [Undefined, Used]          ->     [Undefined, Used]      (2)
    [Unsound, Unsound]         ->     [Unsound, Unsound]     (3)
    [Unsound, Defined]         ->     [Unsound, Used]        (4)
    [Unsound, Used]            ->     [Unsound, Used]        (4)
    [Defined, Defined]         ->     [Defined, Used]
    [Defined, Used]            ->     [Defined, Used]

Only definedness ranges that are possible are considered in the table.
***Used*** cannot be a lower bound as the variable must have a definedness value of ***Defined*** before it can take a value of ***Used***.  ***Defined*** is considered to be less then ***Used*** in the definedness ordering.

Notes:
(1) An unconditional read of an uninitialized variable.
(2) A conditional read of an uninitialized variable.
(3) An unconditional read of a varable wih an unsound value.
(4) A conditional read of a variable with an unsound value.

### The Effects of Assign Actions on STAB

An assignment to a variable changes its actual value as well as its definedness value, so the range of its definedness value can be widened or reduced.  The range of its definedness values depends on the variables used in determining its actual value.

After the assingment of a variable, its definedness value in STAB depends on the definedness bounds of in STAB of variables read by a statement after the ***Read*** action has been applied as described above.  If the upper bound in STAB of any of the variables read is less than ***Defined***, i.e., Undefined or Unsound, the value is questionable and the upper bound of the assigned variable in STAB should be set to ***Unsound***. Otherwise its upper bound is set to ***Defined***.  It cannot be ***Used*** as the assigned variable has not been read yet. If the lower bound of any variables read in STAB is ***Undefined*** or ***Unsound*** the lower bound, in STAB, of the variable assigned is set to ***Unsound***.  It cannot be ***Undefined*** as the variable has been assigned. If the lower bound value of all read variables in STAB is greater or equal to ***Defined***, i.e., ***Defined*** or ***Used***, the assignment is sound and the lower bound of the assigned variable in STAB should be set to ***Defined***. 

In summary, the effects of the variables Read in determining the value of the assignment are below but only the possible bounds as given above are included:

#### Assign Action Table

          STAB              <Assign>      STAB'
    [Undefined, Undefined]     ->    [Unsound, Unsound] (1) 
    [Undefined, Unsound]       ->    [Unsound, Unsound] (1)
    [Undefined, Defined]       ->    [Unsound, Defined] (1)
    [Undefined, Used]          ->    [Unsound, Defined] (1)
    [Unsound, Unsound]         ->    [Unsound, Unsound] (1)
    [Unsound, Defined]         ->    [Unsound, Defined] (1)
    [Unsound, Used]            ->    [Unsound, Defined] (1)
    [Defined, Defined]         ->    [Defined, Defined] (2)
    [Defined, Used]            ->    [Defined, Defined]

In other words, a bound becomes ***Defined*** if it is ***Defined*** or ***Used*** otherwise it becomes ***Unsound***.  

Notes:
(1) The assigned value is of questionable validity.
(2) Successive assignments without an intervening read.

SPARK anti-aliasing rules ensure the same actual parameter in a subprogram call cannot be assigned to more than one mode **out** or **in out** parameter so the same variable cannot be defined more than once by the same statement.


## Validity of Model of Definedness

The definedness values are assumed to be ordered:

***Undefined*** < ***Unsound*** < ***Defined*** < ***Used***.

### Axioms ###

1. A variable will have an definedness value of ***Undefined*** when it is declared without an initalization expression or if it is an **out** mode formal parameter or **out** mode global of a subprogram before it has been assigned a value.  Its definedness range will be [Undefined, Undefined].
2.  The definedness value of a variable can only be increased to ***Used*** if it has a value of ***Defined*** and a ***Read*** action is applied to the variable.
3.  A ***Read*** action on a variable can only change the definedness value of the upper bound of its range from ***Defined*** to ***Used***, otherwise a ***Read*** action has no effect on the definedness range of the variable.
4.  The definedness value of a variable cannot be decreased to ***Undefined*** from a higher value.
5.  If the definedness range of a variable is [Undefined, Undefined] then a ***Read** of the variable is an unconditional use of an ***Undefined*** variable - the read of an uninitialized variable.
6.  If a variable has lower bound definedness value of ***Undefined*** but the upper bound is higher, then a ***Read*** of the variable is a conditional use of an ***Undefined*** variable - the variable read of may be uninitialized.
7.  If the definedness range of a variable is [Unsound, Unsound], then a ***Read*** of the variable is an unconditional use of an ***Unsound*** value - an uninitialized variable has been used in calculating the actual value of the variable.
8.  If a variable has a lower bound definedness value of ***Unsound*** but the upper bound is higher, then a ***Read** of the variable is a conditional use of an ***Unsound*** variable - an uninialized variable may have been used in determining actual value of the variable.
9.  An ***Assign*** action can only set the definedness value of ***Unsound*** or ***Defined***.
10.  Only an ***Assign*** action can increase the definedness value of a variable from ***Undefined*** or ***Unsound***.
11.  An **Assignment to a variable is ***Unsound*** if any of the variables ***Read*** in determining its value have a definedness value of ***Undefined*** or ***Unsound***.  Otherwise the defindness value of the assigned variable will be ***Defined***.
12.  A Statement Abstraction, STAB, contains a definedness range for each variable in a subprogram.
13.  A JOIN of at a statement, S, contains definedness range for all variables in immediately preceding statements; the lower bound of the range each variable is the lowest lower bound of the variable has in all the STABs of the immediately preceding statements of S.  The upper bound of the range of each variable is the maximum value the variable has in all of the STABs of the immediately preceding statement of S.
14.  A STAB of a statement, S, is formed from the JOIN at S by applying:  
    a. Applying axioms 2 and 3 to the upper bound of the definedness range, in the JOIN, of each variable ***Read*** by S, giving an intermediate STAB,  
    b. Applying axioms 9, 10 and 11 to the definedness bounds, in the intermediate STAB, of each variable to which an ***Assign*** action is applied by S, to give the STAB.
     
The above axioms are summarised in the tables Read Action Table and Assign Action Table given earlier.

The question is: do these rules ensure that the use of Undfined values are always detected?

In straight line code (the sequence of statements has no branches) the value of the initial JOIN at a statement, S, is identical to the STAB of the immediately preceding statement of S.  For a local variable of a subprogram, V, if it has not been initialized when declared, will have an initial definedness range of [Undefined, Undefined].  If there are no assignments of any form to V in the preceding statements of S, V's range will remain [Undefined, Undefined] and a ***Read*** action on V will indicate the unconditional use of an ***Undefined*** variable.  This will always be detected.  If this were not the case either:
1. The local variable has been defined when declared without initialization.  This is a contradiction of axiom 1.
2. The definedness range of the local variable has been changed. From axiom 10 and 2 this can only be acheived by an ***Assign*** action by some form of assignment in the preceding statements.  This is also a contradiction as the hypothesis was that there were no assignments.

A similar argument is true for subprogram parameters and globals which have a mode of **out** as these types of variables also have an initial range of [Undefined, Undefined] as specified in axiom 1.

Code that contains branches has to be merged at the points where branches join.  The JOIN at a point, P, is formed by creating the widest possible range of each variable in the subprogram from the minimum and maximum range values from the STAB of the last statement of all of the branches joining at P, as given in axiom 13.  

Since ***Undefined*** is the lowest definedness value, any branch joining at point P having a STAB with a variable, V, that has a lower bound of ***Undefined***, V will have the same lower bound in the JOIN at P.  

If V also has maximum definedness upper bound of ***Undefined*** in the STABs of all branches joining at point P, then at the defindness range of V will be [Undefined, Undefined] in the JOIN at P. The statement, S, immediately following the joining branches at P will have the JOIN with the definedness range of V of [Undefined, Undefined].  As in straigt-line code, a ***Read*** action on V by statement S, indicates an unconditional use of an ***Undefined*** variable, according to axiom 5, and will always be detected.    

If instead, V has an upper bound in greater than  ***Undefined*** in the STAB of any branch joining at P, then the upper bound in JOIN will be the greatest upper bound (GUB) of V from the STABs of all of the branches joining at P.  The range of V in the JOIN at P will be [Undefined, GUB].  A ***Read*** action on V by statement S, if GUB is greater then ***Undefined***, indicates a conditional use of an ***Undefined*** variable, according to axiom 6, and will always be detected.




## Abstract Interpretation of Definedness
Abstract interpretation uses an abstraction based on an approximation of the range of values a variable may have at a given statement.  In variable definedness we are concerned whether a variable has been assigned a (preferably valid) value but are not concerned with the precise value.  A representation of the values of definedness needs to be used.

The value of a variable is "used" if its name appears in an Expression.  The value of a variable may be "defined", generally by an assignment.  Consider a model to represent the values of definedness. Definedness is not concerned with the actual value of a variable, only whether it is initialized.  Three values are obvious: Undefined, Defined and Used. However, the analysis has to detect the use of an uninitialized variable and the update of a variable with an expression containing an uninitialized variable.  To track these anomalies three extra values are introduced, giving the following Definedness values a variable may take:

1. Undefined      -- An uninitialized variable
2. UD_Used        -- A variable that has been used while Undefined
3. Unsound        -- An update with an expression containing an Undefined or Unsound variable
4. US_Used        -- A variable that has been used while Unsound
5. Defined        -- A variable that has been updated with a sound expression
7. Used           -- A Defined variable that has been used

Undefined, the variable has not been assigned a sound value, UD_Used, the variable has been used while Undefined, Unsound, the variable has been updated with an Undefined or Unsound value, US_Used, the variable has been used whilst it is Unsound, Defined, the variable has been updated with a value not dependent on an Undefined or Unsound value, and Used, the variable has a Defined value and it has been used. The value Unsound represents a value derived from an Undefined or Unsound value.  If a definedness value is not Undefined, UD_Used, Unsound, or US_Used, it is considered a Sound value. A Sound value is not necessarily a Valid value in the Ada sense as a value could be out of range and, therefore, an invalid value.  A more sophisticated abstract interpretation technique may be able to approximate the range of values a variable may have.

The statements of a subprogram read (a use) or Assign (a definition) variables. The read of a variable with an undefined value is erroneous and the read of a variable with an unsound value is of questionable Validity.  Two successive Assignments to the same variable without an intervening read of the variable means (in SPARK) that the value of the first assignment is unused and therefore may suggest a programming mistake. 

In a single statement, a variable may be both read and assigned, but in SPARK, expressions do not have side effects, so all the variables on the right-hand side of an assignment statement are only used.  On the left-hand side, only array indices are used, although unusually, in object declarations, more than one variable may be initialized.  In a subprogram call, formal parameters of mode **in** or **in out** are modelled as uses of the corresponding actual parameters and a definition of the formal parameters.  Similarly, formal parameters of mode **out** or **in out** are modelled as a definition of the corresponding actual parameters in a procedure call statement.  A subprogram can have many parameters and globals, leading to the model of a statement potentially having multiple uses of variables (possibly of the same variable) and multiple definitions of variables.  All variables defined in a single statement are unique.

The proposed translation first considers all of the variables read by the statement and then those that are assigned by the statement. This avoids ambiguity when the same variable is both read and written by the statement.  Each use and update of a variable will have a separate entry in the translation.  For instance:

    X := X + X + Y;

would be translated as:

    Read   (X)
    Read   (X)
    Read   (Y)
    Assign (X)

To keep an association between the translation and the statement a position and a translation item are needed.  For simplicity, in this study, only one statement per line is assumed so they can just be the line number.

Assuming the above statement is on line 10, the translation becomes:

    Read   (X, (10, 1))
    Read   (X, (10, 2))
    Read   (Y, (10, 3))
    Assign (X, (10, 4))

It may be unnecessary to record the two Reads of X, one may be sufficient as a second Read will not change the definedness value of the variable.

The call of a procedure, 

**procedure** P (A : **in** Integer; B : **in out** Integer; C : **out** Integer);

on line 20

    P (X, Y, X);

would be translated as (assuming no dependency relation is given):

    Read   (X, (20, 1))
    Read   (Y, (20, 2))
    Assign (Y, (20, 3))
    Read   (X, (20, 4))
    Read   (Y, (20, 5))
    Assign (X, (20, 6))

The reads of variables must use the variable's definedness value from the merged abstractions of the immediately preceding statements rather than the actual values Assigned in the procedure call.

Each statement has an abstraction representing the state of all variables on completion of execution of the statement.  The abstraction of the statement is formed by merging the abstractions of all immediately preceding statements and applying the effects of the current statement.  Commonly, an abstraction has a notion of the range of values that a variable may have at the statement.  For definedness, the maximum range is [Undefined, Used], although in definedness there is not necessarily an order to these possible values but, for simplicity, an order is assumed Undefined < UD_Used, Unsound < US_Used < Defined < Used.  A merge has to take the abstractions from the immediately preceding statements and the translation of the statement from this pair of values.

First, consider merging the immediately preceding statements.  This part of the merge determines the widest range of values the variable could take from the abstractions of all of its immediately preceding statements.  That is, the lower bound is taken to be the lowest bound of all its immediate predecessors and the high bound from the highest.

Next, consider the variables read by the statement.  SPARK flow analysis reports on all uses of an uninitialized variable, whether unconditional (it will always happen when executing the program) or conditional (it happens depending on the execution path taken when the program is run). The abstract interpretation model of definedness needs to do this too.

If a variable read by a statement has a range in the merged abstractions of immediately preceding statements of [Undefined, Undefined], then this is an unconditional use of an uninitialized variable and should be reported.  In this case, the range becomes [Undefined, UD_Used].  If the lower bound of the variable in the merged abstractions of preceding statements is undefined but the upper bound is defined or used, then this is the conditional use of an uninitialized variable and should be reported as such.  The lower bound should remain as undefined as the use of the variable has not changed this, however, if the upper bound is defined or used, it should become used as the previously defined or used variable has been read.

Determining the range of an Unsound variable is similar.  If the range of the variable is [Udefined, Unsound] or [Unsound, Unsound] then a read of the variable changes the upper bound to US_Used.  If the range of the variable is [Unsound, Defined] then it becomes [Unsound, Used] after the read of the variable and a variable with the range [Unsound, Used] remains unchanged. 

In summary here are the rules for the effects of a read of a variable in a statement on the definedness value of the variable:

### Read Action Table
    Preceding Abstract State <Read>   New Bounds   
    [Undefined, Undefined]     ->     [Undefined, Ud_Used] (1)
    [Undefined, UD_Used]       ->     [Undefined, UD_Used] 
    [Undefined, Unsound]       ->     [Undefined, US_Used] 
    [Undefined, US_Used]       ->     [Undefined, US_Used]
    [Undefined, Defined]       ->     [Undefined, Defined] (2)
    -- UD_Used cannot be a lower bound as it is impossible to assign a UD_Used value
    [Unsound, Unsound]         ->     [Unnsound, US_Used]
    [Unsound, US_Used]         ->     [Unnsound, US_Used] 
    [Unsound, Defined]         ->     [Unsound, Used]
    [Unsound, Used]            ->     [Unsound, Used]
    -- US_Used cannot be a lower bound as it is impossible to assign a US_Used value
    [Defined, Defined]         ->     [Defined, Used]
    [Defined, Used]            ->     [Defined, Used]
    -- Used cannot be a lower bound as it is impossible to assign a Used value

In other words, a variable must be Defined to become Used.

Notes:
(1) Issue an unconditional Read of an uninitialized variable message.
(2) Issue a conditional Read of an uninitialized variable message.

It is not clear that a message should be issued yet regarding the Read of an Unsound variable.

Assigning to a variable potentially changes its definedness value depending on the variables read in determining its value.  SPARK flow analysis also reports on two successive assignments to a variable without an intervening read of the variable.   The value of the first assignment is not used and may be a mistake in the program.  Such assignments should be detected with abstract interpretation too.

First, consider the definedness bounds of the variables read by a statement after the Read action has been applied as described above.  If any of the bounds of the variables Read have a lower bound of less than Defined, the value is questionable and the lower bound of the assigned value should be set to Unsound.  If the upper bound is Undefined or Unsound, then the upper bound should also be set to Unsound as the (actual as opposed to defindness) value assigned must have an Unsound defindness value.

In summary, the effects of the variables Read in determining the value of the assignment are below but only the possible bounds as given above are included:

### Assign Action Table
    Bounds of Read Variable <Assign> Bounds of Assignment Value
    [Undefined, Undefined]     ->    [Unsound, Unsound]   
    [Undefined, UD_Used]       ->    [Unsound, Unsound] 
    [Undefined, Unsound]       ->    [Unsound, Unsound]
    [Undefined, US_Used]       ->    [Unsound, Unsound]
    [Undefined, Defined]       ->    [Unsound, Defined]
    [Undefined, Used]          ->    [Unsound, Defined] 
    [Unsound, Unsound]         ->    [Unsound, Unsound]
    [Unsound, US_Used]         ->    [Unsound, Unsound]
    [Unsound, Defined]         ->    [Unsound, Defined]
    [Unsound, Used]            ->    [Unsound, Defined]
    [Defined, Defined]         ->    [Defined, Defined] (1)
    [Defined, Used]            ->    [Defined, Defined]

In other words, a bound becomes Defined if it is Defined or Read otherwise it becomes Unsound.  

Notes:
(1) Successive assignments without an intervening read.

SPARK anti-aliasing rules ensure the same actual parameter in a subprogram call cannot be assigned to more than one mode **out** or **in out** parameter so the same variable cannot be defined more than once by the same statement.

It would be possible to check that read-only variables (constants in Ada or **in** mode parameters) are not Assigned but Ada compilers already do this check.

The abstract state at a statement, which is the possible definedness value range of all variables in the subprogram, has to be determined.  The definedness value range of a variable may be affected by both reads and assignments of a statement.  The abstract state for the statement is formed from the abstract state immediately preceding the statement (the merged abstractions from all immediately preceding statements) by leaving the ranges of the variables not read or assigned by the statement unchanged.  Variables read by the statement but not assigned have the bounds determined from the Read Action Table above.  Variables assigned by the statement have the bounds determined from the Assign Action Table.



So far individual statements have been considered but the basic analysis unit in SPARK is the subprogram and further actions than just Read variable and Assign variable are needed as well as more analysis of the final abstract state of the subprogram.

The subprogram declaration has to be translated into a model.  A subprogram may have mode **in**, **out**, or **in out** formal parameters. A mode **in** or **in out** formal parameter is taken to have the actual parameter in a subprogram call assigned to it for analysis purposes and, therefore, has the bounds [Defined, Defined]. A mode **out** formal parameter has initial bounds of [Undefined, Undefined].  On completion of the interpretation of the subprogram, the actual parameters of mode **in out** or **out** are assumed to be assigned from the formal parameters, i.e., formal parameters of mode **in out** or **out** are read at the end of the subprogram. To identify the type of parameter a tag is added to the abstract state of each variable:

    Pin -- a mode in parameter
    Pio -- a mode in out parameter
    Pou -- a mode out parameter
    Rou -- a result variable
    Gin -- a mode in global
    Gio -- a mode in out global
    Gou -- a mode out global
    Lrw -- a local read/write variable
    Lro -- a local read only variable
    Cst -- a static constant (in SPARK terms)

Extra actions performed by statements are required in the translation:

    In_Param
    Io_Param
    Ou_Param

then
    procedure P (A : in Integer; B : in out Integer; C : out Integer);

will be translated as:

    Action                 Abstraction of the statement
    In_Param (A, (1, 1))  (A, Pin, (Defined, (1, 1)), (Defined, (1, 1)))
    Io_Param (B, (1, 2))  (B, Pio  (Defined, (1, 2)), (Defined, (1, 2)))
    Ou_Param (C, (1, 3))  (C, Pou, (Undefined, (1, 3)), (Undefined, (1, 3)))

In SPARK all parameters of a function subprogram must be mode **in**.  The return value is modelled as an auxiliary output variable that is Assigned the value of the return statement.

SPARK 2005 requires all global variables used by a subprogram to be included in the subprogram declaration these also have a mode **in**, **in out** or **out** and are treated similarly to formal parameters.  However, Ada compilers do not check the correct use (according to their mode) of global variables as is done with parameters.  For instance, an Ada compiler does not report the assignment to a mode **in** global variable.  Extra checks for correct use of global variables are needed when interpreting assignments.

Local variable declarations also have to be considered as these introduce new variables into the state abstraction.  Local variables will have initial bounds of [Undefined, Undefined], but if they have an initialization expression, it will be translated as Read of the variables in the expression followed by an Assignment to the declared variable.

Extra statement actions are required:

    Declare

The Translation of:

    V1 : Integer;
    V2 : Integer := 16;       --  No variables Read in initialization expression
    V3 : Integer := V2 + 16;  --  Not allowed in SPARK 2005 where initializations must be static
    C1 : constant Integer := 32;
    C2 : constant Integer := C1 + 64;
    RO1 : constant Integer := V2 + 128;  --  not allowed in SPARK 2005
    RO2 : constant Integer := RO1 + C2;  --  not allowed in SPARK 2005
    V4  : Integer := RO1 + RO2;          --  not allowed in SPARK 2005

is:

    Declare (V1, (1, 1))   (V1,  Lrw, (Undefined, (1, 1)), (Undefined, (1, 1)))
    Declare (V2, (2, 1))   (V2,  Lrw, (Defined, (2, 1)),   (Defined, (2, 1)))
    Read    (V2, (3, 1))   (V2,  Lrw, (Defined, (2, 1)),   (Used, (3,1)))
    Declare (V3, (3, 2))   (V3,  Lrw, (Defined, (3, 2)),   (Defined, (3, 2)))
    Declare (C1, (4, 1))   (C1,  Cst, (Defined, (4, 1)),   (Defined, (4, 1)))
    Declare (C2, (5, 1))   (C2,  Cst, (Defined, (5, 1)),   (Defined, (4, 1)))
    Read    (V2, (6, 1))   (V2,  Lrw, (Defined, (2, 1)),   (Used, (6, 1)))
    Declare (RO1, (6, 2))  (RO1, Lro, (Defined, (6, 2)),   (Defined. (6, 2)))
    Read    (RO1, (7, 1))  (RO1, Lro, (Defined, (6, 2)),   (Used, (7, 1)))
    Declare (RO2, (7, 2))  (RO2, Lro, (Defined, (7, 2)),   (Defined, (7, 2)))
    Read    (RO1, (8, 1))  (RO1, Lro, (Defined, (6, 2)),   (Used, (8, 1)))
    Read    (RO2, (8, 2))  (RO2, Lro, (Defined, (7, 2)),   (Used, (8, 2)))
    Declare (V4, (8, 3))   (V4,  Lrw, (Defined, (8, 3)),   (Defined, (8, 3)))

A translation is required for an end of the subprogram to notionally read the formal parameters of mode **in out** or **out** and assign their values to the actual parameters. The subprogram-wide checks are then instigated, followed by the abstract state clean-up.  The abstract state of the parameters and locals of the subprogram are not required after the end of the subprogram:

    Action
    End_Sub
    
### Straight Line Code

First, consider the following simple SPARK procedure taken from early SWES courses.

    procedure Swap (X : in out Integer; Y : in out Integer) is
      Temp : Integer;
    begin
      Temp := X;
      X := Y;
      Y := Temp;
    end Swap;

As it is written there are no uses of uninitialised variables.

Using the ideas for translations and abstractions, the following spreadsheet was constructed to demonstrate using abstract interpretation to check for definedness.

![image.png](attachment:84ecc425-647b-4f3d-ac88-001f7b003066.png)

#### Python Version of Spreadsheet
The Translate operation of the Abstract Interpretation is done by hand.

An enumeration is introduced to represent the possible states of a variable and two named tuples are declared to  of an action and the action and the variable to which it refers.

Two sequences are now declared, one to record where each statement starts and one containing the expanded and translated statements.

When conditional statements are considered the structure will be more complex but first consider straight line code in which only the current statement translation and the abstraction current at the immediatly preceding statement have to be merged.
A set of rules for merging and a data structure representing the abstract state is required.  The merge rules may be represented as a matrix and the abstract state, looking at the spreadsheet could be an extension of the Expansion structure.

In [103]:
from enum import Enum
class Defindness (Enum):
    UNINITIALIZED = 1
    READ          = 2
    ASSIGNED      = 3
    UNSOUND       = 4
    @classmethod
    def pos (cls, state):
        return state.value - 1

class Translate_Actions (Enum):
    READ_VAR    = 1
    ASSIGN_VAR  = 2
    IN_PARAM    = 3
    OUT_PARAM   = 4
    DECLARE_VAR = 5
    @classmethod
    def pos (cls, act):
        return act.value - 1
    
State  = Enum('State',  ['UNINITIALIZED', 'READ', 'ASSIGNED', 'UNSOUND'])
Action = Enum('Action', ['READ_VAR', 'ASSIGN_VAR', 'IN_PARAM', 'OUT_PARAM', 'DECLARE_VAR'])

import collections
Position  = collections.namedtuple ('Postition', ['line', 'expand'])
Expansion = collections.namedtuple ('Expansion',['var', 'action', 'position'])

translation = [
    [Expansion("X__in",  Action.IN_PARAM,    Position(0, 1)),  # Statement 0 represents declaring a 
     Expansion("X__out", Action.OUT_PARAM,   Position(0, 2)),  # variable for each in and out parameter.
     Expansion("Y__in",  Action.IN_PARAM,    Position(0, 3)),  # Variables with the parameter names
     Expansion("Y__out", Action.OUT_PARAM,   Position(0, 4)),  # are declared for use within the
     Expansion("X",      Action.DECLARE_VAR, Position(0, 5)),  # translation of the body of the
     Expansion("Y",      Action.DECLARE_VAR, Position(0, 6))], # subprogram.
     
    [Expansion("X__in",  Action.READ_VAR,     Position(1, 1)),  # The in parameters are assigned to the
     Expansion("Y__in",  Action.READ_VAR,     Position(1, 2)),  # local variables X and Y.
     Expansion("X",      Action.ASSIGN_VAR,   Position(1, 3)),
     Expansion("Y",      Action.ASSIGN_VAR,   Position(1, 4))],

    [Expansion("Temp",   Action.DECLARE_VAR,  Position(2, 1))], # The declaration of Temp.
                                                                # The begin keyword is not translated?
    [Expansion("X",      Action.READ_VAR,     Position(4, 1)),
     Expansion("Temp",   Action.ASSIGN_VAR,   Position(4, 2))],

    [Expansion("Y",      Action.READ_VAR,     Position(5, 1)),
     Expansion("X",      Action.ASSIGN_VAR,   Position(5, 2))],

    [Expansion("Temp",   Action.READ_VAR,     Position(6, 1)),
     Expansion("Y",      Action.ASSIGN_VAR,   Position(6, 2))],

    [Expansion("X",      Action.READ_VAR,     Position(7, 1)),  # end keyword denotes updating the out
     Expansion("Y",      Action.READ_VAR,     Position(7, 2)),  # parameters.
     Expansion("X__out", Action.ASSIGN_VAR,   Position(7, 3)),
     Expansion("Y__out", Action.ASSIGN_VAR,   Position(7, 4))]]

In [104]:
translation

[[Expansion(var='X__in', action=<Action.IN_PARAM: 3>, position=Postition(line=0, expand=1)),
  Expansion(var='X__out', action=<Action.OUT_PARAM: 4>, position=Postition(line=0, expand=2)),
  Expansion(var='Y__in', action=<Action.IN_PARAM: 3>, position=Postition(line=0, expand=3)),
  Expansion(var='Y__out', action=<Action.OUT_PARAM: 4>, position=Postition(line=0, expand=4)),
  Expansion(var='X', action=<Action.DECLARE_VAR: 5>, position=Postition(line=0, expand=5)),
  Expansion(var='Y', action=<Action.DECLARE_VAR: 5>, position=Postition(line=0, expand=6))],
 [Expansion(var='X__in', action=<Action.READ_VAR: 1>, position=Postition(line=1, expand=1)),
  Expansion(var='Y__in', action=<Action.READ_VAR: 1>, position=Postition(line=1, expand=2)),
  Expansion(var='X', action=<Action.ASSIGN_VAR: 2>, position=Postition(line=1, expand=3)),
  Expansion(var='Y', action=<Action.ASSIGN_VAR: 2>, position=Postition(line=1, expand=4))],
 [Expansion(var='Temp', action=<Action.DECLARE_VAR: 5>, position=Posti

In [105]:
statements = [
    0,  # Represent setting values of in parameters in a procedure call.
    7,  # procedure Swap (X : in out Integer; Y : in out Integer) is
    11, # Temp : Integer;
    12,  # Temp := X;
    14, # X := Y;
    16, # Y := Temp;
    18  # end; Represent setting values of out parameters.
]

In [106]:
statements

[0, 7, 11, 12, 14, 16, 18]

When conditional statements are considered the structure will be more complex but first consider straight line code in which only the current statement translation and the abstraction current at the immediatly preceding statement have to be merged.
A set of rules for merging and a data structure representing the abstract state is required.  The merge rules may be represented as a matrix and the abstract state, looking at the spreadsheet could be an extension of the Expansion structure.


|                  |READ_VAR     |ASSIGN_VAR |IN_PARAM |OUT_PARAM     |DECLARE_VAR   |
|------------------|-------------|-----------|---------|--------------|--------------|
|**UNINITIALIZED** |UNSOUND      |ASSIGNED   |ASSIGNED |UNINITIALIZED |UNINITIALIZED |
|**READ**          |READ         |ASSIGNED   |ASSIGNED |UNINITIALIZED |UNINITIALIZED | 
|**ASSIGNED**      |READ         |ASSIGNED   |ASSIGNED |UNINITIALIZED |UNINITIALIZED | 
|**UNSOUND**       |UNSOUND      |ASSIGNED   |ASSIGNED |UNINITIALIZED |UNINITIALIZED |

There is a lot of redundancy in this table, it is immaterial what the current state is for the actions IN_PARAM, OUT_PARAM or DECLARE_VAR.  Similarly for ASSIGN_VAR except that two successive assigns without an intervening read is a reportable state change.  The matrix is only small so the redundancy is of little concern. 

In Python the matrix is represented as a list of lists.

In [135]:
  # READ_VAR      ASSIGN_VAR     IN_PARAM        OUT_PARAM            DECALRE_VAR
merge_matrix = [
  # READ_VAR      ASSIGN_VAR     IN_PARAM        OUT_PARAM            DECALRE_VAR
 [State.UNSOUND, State.ASSIGNED, State.ASSIGNED, State.UNINITIALIZED, State.UNINITIALIZED],# UNINITIALIZED
 [State.READ,    State.ASSIGNED, State.ASSIGNED, State.UNINITIALIZED, State.UNINITIALIZED],# READ
 [State.READ,    State.ASSIGNED, State.ASSIGNED, State.UNINITIALIZED, State.UNINITIALIZED],# ASSIGNED
 [State.UNSOUND, State.ASSIGNED, State.ASSIGNED, State.UNINITIALIZED, State.UNINITIALIZED] # UNSOUND
]

In [134]:
merge_matrix [Defindness.pos(State.UNSOUND)][Translate_Actions.pos(Action.READ_VAR)]

<State.UNSOUND: 4>

The detection of errant actions of reading an uninitialized variable or an unsound value is directly available from the matrix but two assignments of the same variable without an intervening read is not.  Nor is assignment of an unsound value.  Is Unread another state? - but then what happens to assigned?

Consider the structure of the abstract state.  It contains an element for each variable in the subprogram.  Each element contains the merged state of the variable after the current instruction.  For error reporting the position may be needed too.  The requirements look the same as for the Expansion. 

In [None]:
State = Enum('Action', ['UNINITIALIZED', 'READ', 'ASSIGNED', 'UNSOUND'])

Var_State = collections.namedtuple ('Var_State',['var', 'state', 'position'])
