# Abstract Interpretation of SPARK Programs

Traditionally, SPARK programs have been analysed by the Examiner using data flow and information flow analysis.  This study does not consider the proof features of the Examiner (VCG generator) but rather investigates whether the same or better results to flow analysis can be achieved using abstract interpretation. Better results may be achievable using abstract interpretation as it might be possible to determine some non-executable paths and exclude them from the analysis.  To achieve this goal, two of the more advanced techniques of abstract interpretation - variable analysis and path analysis are needed. 

The SPARK language was developed to achieve accurate and straightforward flow analysis. Many of the restrictions placed on Ada by SPARK to make it amenable to flow analysis may also be advantageous for abstract interpretation.  


## Abstract Interpretation

Abstract interpretation is commonly viewed as having three stages:
1. Translate
2. Merge
3. Widen

These three stages are applied to each statement of the given source text until it has been completed.

Translate converts a statement into a simple model representing the statement.  Merge takes the model of the statement and the abstractions of the immediately preceding statements (in general, there may be more than one due to gotos if statements and loops) and merges them into a single abstraction for the statement.  The abstraction is an approximation of the state at the statement.  Generally, it consists of an entry for each variable and an approximation of the range of values that variable may have at the statement.  The Widen stage is typically used after loops to widen the approximate range of possible values a variable may have to represent executing the loop multiple times.

What is interesting about these stages is that they can be adapted to suit a number of different analyses but still fit within the framework of calling each of the three stages for each statement.

For instance,  constant analysis is may be used to obtain an approximation of the range of values a variable may have at a particular statement based on the value of constants within the source text.  Variable analysis is similar but more complex based on the expressions assigned to variables within the source text.

To perform abstract interpretation, an abstract model of the source text needs to be constructed consisting of a model of each statement (the translation) and a sequence of abstractions representing each of the previous statements.  Merge consolidates the immediately preceding statements with the model of the current statement to obtain an abstraction for the current statement which is appended to the sequence.  Merging is a simple operation when the statement has only one immediate predecessor but becomes a little more complex around if statements, loops and the targets of goto statements.

SPARK 2005 (SPARK Classic) has the advantage over general programming language in that, other than in a loop, an immediately preceding statement cannot be later in text than the current statement, simplifying the sequence of preceding statements that need to be maintained.  As SPARK is modular each subprogram is essentially self-contained and only variables used within the subprogram need to be in the abstractions, very little extra context has to be maintained and the sequence of abstractions can be discarded after completing the analysis of the subprogram.


## SPARK 2005 Examiner Flow Analysis

The Examiner performs analyses based on flow relations.  The flow relations facilitate regular data flow analysis but provide further deeper analysis. If dependency relations are provided for subprograms, the Examiner performs an information-flow analysis whereby the output - input dependencies specified for a subprogram are checked against the program statements of the subprogram.  The information flow analysis tends to be less widely used than it once was since formal proofs of various properties of a subprogram have become much more practical with modern provers.  Information flow is still used in security analysis, but, ideally, it should be more flexible than currently provided by the Examiner.

Initially, this study will concentrate on the analyses provided by the Examiner, excluding information-flow analysis.  In the following subsections, the warnings and errors potentially reported by the Examiner concerning flow analysis (but not information flow) are taken from the SPARK Examiner User Manual 2012.  The Examiner has a severity notation scheme using 3 leading characters.  Warnings begin with "---", conditional errors with "???" and unconditional errors with "!!!".

An unconditional error will be encountered no matter which program path is executed. In contrast, a conditional error is only encountered on some program paths and may not occur on all program executions.

### Examiner reports Data-flow errors (References to undefined variables)

!!! Flow Error :20: Expression contains reference(s) to variable XXX which has an undefined value.
The expression may be that in an assignment or return statement, an actual parameter, or a condition occurring in an if or case statement, an iteration scheme or exit statement. NOTE: the presence of random and possibly invalid values introduced by data flow errors invalidates proof of exception freedom for the subprogram body which contains them. All unconditional data flow errors must be eliminated before attempting exception freedom proofs. See the manual "SPARK Proof Manual" for full details.

!!! Flow Error :23: Statement contains reference(s) to variable XXX which has an undefined value.
The statement here is a procedure call or an assignment to an array element, and the variable XXX may appear in an actual parameter, whose value is imported when the procedure is executed. If the variable XXX does not occur in the actual parameter list, it is an imported global variable of the procedure (named in its global definition). NOTE: the presence of random and possibly invalid values introduced by data flow errors invalidates proof of exception freedom for the subprogram body which contains them. All unconditional data flow errors must be eliminated before attempting exception freedom proofs. See the manual "SPARK Proof Manual" for full details.

??? Flow Error :501: Expression contains reference(s) to variable XXX, which which may have an undefined value.
The expression may be that in an assignment or return statement, an actual parameter, or a condition occurring in an if or case statement, an iteration scheme or exit statement. The Examiner has identified at least one syntactic path to this point where the variable has NOT been given a value. Conditional data flow errors are extremely serious and must be carefully investigated. NOTE: the presence of random and possibly invalid values introduced by data flow errors invalidates proof of exception freedom for the subprogram body which contains them. All reports of data flow errors must be eliminated or shown to be associated with semantically infeasible paths before attempting exception freedom proofs. See the manual "SPARK Proof Manual" for full details.

??? Flow Error :504: Statement contains reference(s) to variable XXX, which which may have an undefined value.
The statement here is a procedure call, and the variable XXX may appear in an actual parameter, whose value is imported when the procedure is executed. If the variable XXX does not occur in the actual parameter list, it is an imported global variable of the procedure (named in its global definition). The Examiner has identified at least one syntactic path to this point where the variable has NOT been given a value. Conditional data flow errors are extremely serious and must be carefully investigated. NOTE: the presence of random and possibly invalid values introduced by data flow errors invalidates proof of exception freedom for the subprogram body which contains them. All reports of data flow errors must be eliminated or shown to be associated with semantically infeasible paths before attempting exception freedom proofs. See the manual "SPARK Proof Manual" for full details

### Examiner reports Data-flow anomalies and ineffective statements

!!! Flow Error :10: Ineffective statement.
Execution of this statement cannot affect the final value of any exported variable of the subprogram in which it occurs. The cause may be a data-flow anomaly (i.e. the statement could be an assignment to a variable, which is always updated again before it is read. However, statements may be ineffective for other reasons - see Section 4.1 of Appendix A.

!!! Flow Error :10: Assignment to XXX is ineffective.
This message always relates to a procedure call or an assignment to a record. The variable XXX may be an actual parameter corresponding to a formal one that is exported; otherwise XXX is an exported global variable of the procedure. The message indicates that the updating of XXX, as a result of the procedure call, has no effect on any final values of exported variables of the calling subprogram. Where the ineffective assignment is expected (e.g. calling a supplied procedure that returns more parameters than are needed for the immediate purpose), it can be a useful convention to choose a distinctive name, such as "Unused" for the actual parameter concerned. The message "Assignment to Unused is ineffective" is then self-documenting.

!!! Flow Error :53: The package initialization of XXX is ineffective.
Here XXX is an own variable of a package, initialized in the package initialization. The message states that XXX is updated elsewhere, before being read.

!!! Flow Error :54: The initialization at declaration of XXX is ineffective.
Issued if the value assigned to a variable at declaration cannot affect the final value of any exported variable of the subprogram in which it occurs because, for example, it is overwritten before it is used.

### Examiner reports Invariant conditions and stable exit conditions

!!! Flow Error :22: Value of expression is invariant.
The expression is either a case expression or a condition (Boolean-valued expression) associated with an if-statement, not contained in a loop statement. The message indicates that the expression takes the same value whenever it is evaluated, in all program executions. Note that if the expression depends on values obtained by a call to another other subprogram then a possible source for its invariance might be an incorrect annotation on the called subprogram.

!!! Flow Error:40: Exit condition is stable, of index 0.
!!! Flow Error:40: Exit condition is stable, of index 1.
!!! Flow Error:40: Exit condition is stable, of index greater than 1.
In these cases the (loop) exit condition occurs in an iteration scheme, an exit statement, or an if-statement whose (unique) sequence of statements ends with an unconditional exit statement - see the SPARK Definition. The concept of loop stability is explained in Section 4.4 of Appendix A. A loop exit condition which is stable of index 0 takes the same value at every iteration around the loop, and with a stability index of 1, it always takes the same value after the first iteration. Stability with indices greater than 0 does not necessarily indicate a program error, but the conditions for loop termination require careful consideration.

!!! Flow Error:41: Expression is stable, of index 0.
!!! Flow Error:41: Expression is stable, of index 1.
!!! Flow Error:41: Expression is stable, of index greater than 1.
The expression, occurring within a loop, is either a case expression or a condition (Boolean- valued expression) associated with an if-statement, whose value determines the path taken through the body of the loop, but does not (directly) cause loop termination. Information flow analysis shows that the expression does not vary as the loop is executed, so the same branch of the case or if statement will be taken on every loop iteration. An Index of 0 means that the expression is immediately stable, 1 means it becomes stable after the first pass through the loop and so on. The stability index is given with reference to the loop most closely-containing the expression. Stable conditionals are not necessarily an error but do require careful evaluation; they can often be removed by lifting them outside the loop.

### Examiner reports Supplementary error messages

The following supplementary messages are issued, to assist error diagnosis. The meanings of these messages are evident.

!!! Flow Error:30: The variable XXX is imported but neither referenced nor exported. :31: The variable XXX is exported but not (internally) defined.
!!! Flow Error:31: The variable XXX is exported but not (internally) defined.
!!! Flow Error:32: The variable XXX is neither imported nor defined.
!!! Flow Error:33: The variable XXX is neither referenced nor exported.
!!! Flow Error:35: Importation of the initial value of variable XXX is ineffective.
The meaning of this message is explained in Section 4.2 of Appendix A.

### Examiner reports Discrepancies between specified dependency relations and executable code

At this stage, information flow analysis against a given dependency relation is not considered. However, there is one information flow anomaly that should be addressed:

??? Flow Error :602: The undefined initial value of XXX may be used in the derivation of YYY.
Here XXX is a non-imported variable, and YYY is an export, of a procedure subprogram.

### Examiner reports Violation of restriction on imported-only variables

!!! Flow Error :34: The imported, non-exported variable XXX may be redefined.
The updating of imported-only variables is forbidden under all circumstances

### Examiner reports Warnings

The Examiner reports many warnings.  The pertinent ones for this part of the study are:

--- Warning :400: Variable XXX is declared but not used.
Issued when a variable declared in a subprogram is neither referenced, nor updated. (warning
control file keyword: unused_variables).

--- Warning :403: XXX is declared as a variable but used as a constant.
XXX is a variable which was initialized at declaration but whose value is only ever read not
updated; it could therefore have been declared as a constant. (warning control file keyword:
constant_variables).

## Classification of Examiner warnings and errors

The Examiner warnings and errors may be categorised into those related to a variable, those that relate to the value of the variable, and those that relate to the value of an expression:

### Errors related to a variable

!!! Flow Error:30: The variable XXX is imported but neither referenced nor exported. :31: The variable XXX is exported but not (internally) defined.
!!! Flow Error:31: The variable XXX is exported but not (internally) defined.
!!! Flow Error:32: The variable XXX is neither imported nor defined.
!!! Flow Error:33: The variable XXX is neither referenced nor exported.
!!! Flow Error:35: Importation of the initial value of variable XXX is ineffective.
!!! Flow Error :34: The imported, non-exported variable XXX may be redefined.
--- Warning :400: Variable XXX is declared but not used.
--- Warning :403: XXX is declared as a variable but used as a constant.

### Errors related to the value of a variable

!!! Flow Error :20: Expression contains reference(s) to variable XXX which has an undefined value.
!!! Flow Error :23: Statement contains reference(s) to variable XXX which has an undefined value.
??? Flow Error :501: Expression contains reference(s) to variable XXX, which which may have an undefined value.
??? Flow Error :504: Statement contains reference(s) to variable XXX, which which may have an undefined value.
!!! Flow Error :10: Ineffective statement.
!!! Flow Error :10: Assignment to XXX is ineffective.
!!! Flow Error :53: The package initialization of XXX is ineffective.
!!! Flow Error :54: The initialization at declaration of XXX is ineffective.
??? Flow Error :602: The undefined initial value of XXX may be used in the derivation of YYY.

### Errors related to the value of an expression

!!! Flow Error :22: Value of expression is invariant.
!!! Flow Error:40: Exit condition is stable, of index 0.
!!! Flow Error:40: Exit condition is stable, of index 1.
!!! Flow Error:40: Exit condition is stable, of index greater than 1.
!!! Flow Error:41: Expression is stable, of index 0.
!!! Flow Error:41: Expression is stable, of index 1.
!!! Flow Error:41: Expression is stable, of index greater than 1.



### Data Flow Analysis

Lecture Notes on Program Validation Part 2: Program Flow-Analysis classifies data flow errors as the potential use of an undefined variable and data flow anomalies as the potential redefinition of a variable without its previous value having been used.

The SPADE Data-Flow Analyser, on which the SPARK 2005 Examiner is based, reports the following flow errors and anomalies:

1. Unconditional data flow errors; an undefined variable is always used;
2. Sets of blocking statements which are not errors of type (1); use of undefined variables on first iteration of a loop;
3. Conditional flow errors which are not associated with blocking statements; an undefined variable may be used;
4. Unconditional data flow anomalies; a variable is always redefined without using its previous value;
5. Conditional data flow anomalies; the Examiner does not report these anomalies;
6. Ineffective switch statements 

# Abstract Interpretation of Definedness

The following description is taken from HandWiki.org:
In mathematics, an expression is called well-defined or unambiguous if its definition assigns it a unique interpretation or value. Otherwise, the expression is said to be not well-defined, ill-defined or ambiguous.

Ada, and, therefore, SPARK are imperative programming languages and, unlike mathematical varables, Ada variables may have different values during their lifetime.  Some of these values may not be well-defined.  The same variable, in Ada, may be well-defined or ill-defined at different parts of the program.  Ada has a special case of ill-defined, ***uninitalized***, when a variable is unassigned which can happen when the variable is declared without an initialization.  Clearly an uninitialized variable is ill-defined.

Expressions, in Ada, are ***evaluated***.  When an ill-defined variable in an expression is evaluated, the value of the expression is ill-defined.  In SPARK, expressions do not have side-effects and, therefore, an evaluation of an expression in which all the variables are well-defined will render a mathematically well-defined value for the expression.  Note, however, the expression is not necessarily ***valid*** in Ada as evaluation may lead to an accumulator overflow or produce a value which is out of range for its type.

Ada refers to the act of assigning a value to a variable as ***assignment*** and, in SPARK, assignment to a variable is by an initialisation expression, an assignment statement or by a call of a procedure specified with mode **in out** or **out** global variables or formal parameters.

Abstract interpretation uses an abstraction based on an approximation of the range of values a variable may have at a given statement.  In variable definedness we are concerned whether a variable has been assigned a (preferably well-defined) value but are not concerned with the precise value.  A representation of the values of definedness may be used.

For a SPARK program the basic unit of analysis is a subprogram (although some analysis of packages tasks and packages is required) and in this document only analysis of subprograms is considered.

## Definedness Values

Consider a model to represent the values of definedness. Definedness is not concerned with the actual value of a variable, only whether it is well-defined.  Definedness is basically binary, well-defined or ill-defined, but as noted, Ada has the special case of ill-defined, uninitialized, which can be considered as another value. Another value can be added which is a special case of well-defined, which represents the value of a well-defined variable that has been evaluated.

For the model of definednes the values that a variable may have is specified below:

### Values of Definedness

1. ***Uninitialized*** -- The value of an uninitialized variable
2. ***Ill***           -- The value of an ill-defined variable
4. ***Well***          -- The value of a well-defined variable
5. ***Used***          -- The value of a well-defined variable which has been evaluated

If a definedness value is not ***Uninitialized*** or ***Ill***, it is considered a well-defined value. A well-defined value is not necessarily a valid value in the Ada sense as the value could be out of range and, therefore, an invalid value.  A more sophisticated abstract interpretation technique may be able to approximate the range of actual (rather than definedness) values a variable may have.

## Model of Program Statements

The statements of a subprogram evaluate expressions or assign values to variables suggesting a simple model of a statement for defindness of a sequence of ***Eval*** and ***Assign*** actions. The translation phase will convert a program statement into this simple model.

Statements of a program may also indicate a branch, e.g., through an if statement, and a rejoining of a branch, at the end of the statement, e.g., the end if.  The definedness model must take account of these branches and to correctly merge the abstractions of statements from different branches.  In Classic SPARK branching can only occur through, if, case and loop statements, but conditional computation can be also controlled by using **and then** and **or else** operators.  Ada 2012 onwards introduced conditional expressions which introduce further branches.

In a simple defindeness abstract interpretation model it will not be possible determine which branch of a conditional expression will be taken as this would require a more sophiticated analysis.  Each branch of a conditional expression only contains a (possibly conditional) expression.

Definedness analysis also needs to know the start and end of a subproram as the is the basic unit in SPARK analysis.

To account for the above effects of program statements on definedness the following actions are defined in the translation of the program text.

### Translation Actions

### Simple Actions

1. ***Eval***
2. ***Assign***

### Conditional Statements

3. ***If_Condition***
4. ***Then_Branch***
5. ***Elsif_Condition***
6. ***Else_Branch***
7. ***End_If***
8. ***Case_Condition***
9. ***Case_When***
10. ***Case_Others***
11. ***End_Case***
12. ***While_Condition***
13. ***For_Forward***
14. ***For_Reverse***
15. ***Loop_Branch***
16. ***Exit_Branch***
17. ***Exit_When***
18. ***End_Loop***
19. ***And_Then***
20. ***Or_Else***

### Conditional Expressions

21. ***And_If***
22. ***And_If_Branch***
23. ***Or_Else***
24. ***Or_Else_Branch***
25. ***Cond_If***
26. ***Cond_If_Branch***
27. ***Cond_Elsif***
28. ***Cond_Elsif_Branch***
29. ***Cond_Else_Branch***
30. ***Cond_If_End***
31. ***Cond_Case***
32. ***Cond_Case_When***
33. ***Cond_Case_Others***
34. ***Cond_Case_End***

## Subprogram Start and End

35. ***Proc_Start***
36. ***Proc_End***
37. ***Fun_Start***
38. ***Fun_Return***
39. ***Fun_End***

At this stage Raven SPARK is not considered.  This mwy introduce further actions.

In a single statement, a variable may be both read and assigned, but in SPARK, expressions do not have side effects, so, in an assignment statement, all the variables on the right-hand side of of the statement are only evaluated.  On the left-hand side of an assignment statement, the only variables that are evaluated are array indices.

In a subprogram call, formal parameters of mode **in** or **in out** are considered as an evaluation of the corresponding actual parameters and a pre-definition of the formal parameters at the call of a subprogram.  Similarly, formal parameters of mode **out** or **in out** are considered as an assignment of the corresponding actual parameters at the return of the subprogram.  A function result nay be modelled as an **out** parameter.  A subprogram can have many parameters and globals, leading to the model of a statement potentially having multiple uses of variables (possibly of the same variable) and multiple assignments to variables.  SPARK ant-aliasing rules guarantee all variables assigned in a single statement are unique.

## The Translation

The proposed translation first considers all of the variables evaluated by the statement and then those that are assigned by the statement. This avoids ambiguity when the same variable is both evaluated and assigned by the statement.  Each evaluation and assignment of a variable will have a separate entry in the translation.  For instance:

    X := X + X + Y;

would be translated as:

    Eval   (X)
    Eval   (X)
    Eval   (Y)
    Assign (X)

For reporting anomolies an association between the statement position and its translation item are useful.  For simplicity, in this study, only one statement per line is assumed so they can just be the line number.

Assuming the above statement is on line 10, the translation becomes:

    Eval (X, (10, 1))
    Eval (X, (10, 2))
    Eval (Y, (10, 3))
    Eval (X, (10, 4))

It may be unnecessary to record the two ***Eval***s of X, one may be sufficient as a second ***Eval*** will not change the definedness value of the variable.

The call of a procedure, 

**procedure** P (A : **in** Integer; B : **in out** Integer; C : **out** Integer);

on line 20

    P (X, Y, X);

would be translated as (assuming no SPARK global or dependency relation is applied to the procedure declaration):

    Eval   (X, (20, 1))
    Eval   (Y, (20, 2))
    Eval   (X, (20, 3))
    Assign (Y, (20, 4))
    Assign (X, (20, 5))

## Statement Abstraction (STAB)

Each statement of a subprogram has an abstraction representing all the variables that are evaluated or assigned by the subprogram and their state on completion of execution of the statement.  Commonly, in abstract interpretation, the state of a each variable is represented by an approximation of the range of values that the variable may have at a statement.  For definedness, the maximum range is [Uninitialized, Used]. An order is placed on the definedness values: ***Uninitialized*** < ***Ill*** < ***Well*** < ***Used***.  The abstraction of a statement, S, is formed by merging the abstractions of all its immediately preceding statements and applying the effects of executing S to the merged statements. In this document the abstraction representing a statement is called a STatement ABtraction, abreviated to STAB.

## Merging

Merging creates the STAB of a statement, S, by merging STABs of all of the immediately preceding statements of S and applying the effects of executing S on the merged abstractions.

#### Initial Merge (JOIN)

The initial part of the merge for a statement, S, considers the state of all variables from the STABs of all immediately preceding statements of S and determines the widest posssiible definedness range of each variable.   That is, the variable's lower bound is taken to be the lowest bound of all its immediate predecessors and the high bound from the highest.  The merged abstractions of the immediately preceding statements of S is the basis for the STAB of S.  In this document the abstraction formed by this initial merge of the abstractions of the immediatly preceding statements is termed a JOIN.  The STAB of a statement is equal to the JOIN before considering the effects of executing the statement.

### Detecting Definedness Anomalies

Consider the variables evaluated by statement S. By definition all of the variables evaluated will be in the JOIN.  If the range of a variable in the JOIN is [Uninitialized, Uninitialized], then an ***Eval** action on this variable is an unconditional use of an uninitialized variable.   If the lower bound of a variable in the JOIN, has a lower bound of ***Uninitialized*** but an upper bound of which is higher, then a read of this variable is a conditional use of an uninitialized variable.

If the lower bound of the range of a variable in the JOIN is ***Ill***, then an ***Eval*** of this variable may be ill-defined, unconditonally if the range is [Ill, Ill].

### The Effect of Evaluation on the JOIN

An evaluation by a statement forms an intermediate STAB for the statement from the JOIN.

An ***Eval*** by statement, S, of a variable (it uses it), with an upper bound in the JOIN of ***Well***, will set its upper bound to ***Used*** in the STAB of S.  The lower bound of the variable will be the same as it is in the JOIN.  If the upper bound of the variable, in the JOIN is not ***Well***, the bounds of the variable in intermediate STAB of S are as in the JOIN.

In summary here are the rules for the effects of a read of a variable in a statement on the definedness value of the variable in the STAB of the statement:

#### Eval Action Table:

        JOIN                       <Eval>      Intermediate STAB           
    [Uninitialized, Uninitialized]   ->     [Uninitialized, Uninitialized] (1)
    [Uninitialized, Ill]             ->     [Uninitialized, Ill]           (2)
    [Uninitialized, Well]            ->     [Uninitialized, Used]          (3)
    [Uninitialized, Used]            ->     [Uninitialized, Used]          (3)
    [Ill, Ill]                       ->     [Ill, Ill]                     (2)
    [Ill, Well]                      ->     [Ill, Used]                    (4)
    [Ill, Used]                      ->     [Ill, Used]                    (4)
    [Well, Well]                     ->     [Well, Used]
    [Well, Used]                     ->     [Well, Used]

Only definedness ranges that are possible are considered in the table.
***Used*** cannot be a lower bound as the variable must have a definedness value of ***Well*** before it can take a value of ***Used***.  ***Well*** is considered to be less then ***Used*** in the definedness ordering.

Notes:
(1) An unconditional evaluation of an uninitialized variable.
(2) An unconditional evaluation of an ill-defined variable
(3) A conditional evaluation of an uninitialized variable.
(4) A conditional evaluation of an ill-defined variable.

The Evaluation of variables by a statement defines an intermediate STAB which is finalised by the actions of assignments by the statement.

### The Effects of Assign Actions on STAB

The assignments of a statement form the STAB of the statement from the intermediate STAB.

An assignment to a variable changes its actual value as well as its definedness value, so the range of its definedness value range can be widened or reduced.  The range of its definedness values depends on the state variables used in determining its actual value.

The definedness range of a variable assigned by a statement depends on the definedness range of the variables evaluated in determining its value.  The definedness range of each variable evaluated is given by the range of the variable in the intermediate STAB, i.e.,  the ***Eval*** actions have been applied to the JOIN as described above.

If the upper bound in the intermediate STAB of any of the variables evaluated in determining the assigned value is less than ***Well***, i.e., ***Uninitialized*** or ***Ill***, the value is questionable and the upper bound of the assigned variable in STAB should be ***Ill***. Otherwise its upper bound is ***Well***.  It cannot be ***Used*** as the assigned variable has not been evaluated yet. If the lower bound of any variables evaluated in determing the assigned value, in the intermediate STAB, is ***Uninitialized*** or ***Ill*** the lower bound, in STAB, of the variable assigned is ***Ill***.  It cannot be ***Uninitialized*** as the variable has been assigned. If the lower bound value of all evaluated variables determining the value assigned, in the intermediate STAB, is greater or equal to ***Well***, i.e., ***Well*** or ***Used***, the assignment is sound and the lower bound of the assigned variable in STAB should be set to ***Well***. 

In summary, the effects of the variables evaluated in determining the value of the assignment are below but only the possible bounds as given above are included:

#### Assign Action Table

    Intermediate STAB               <Assign>     STAB
    [Uninitialized, Uninitialized]     ->    [Ill, Ill]   (1) 
    [Uninitialized, Ill]               ->    [Ill, Ill]   (1)
    [Uninitialized, Well]              ->    [Ill, Well]  (2)
    [Uninitialized, Used]              ->    [Ill, Well]  (2)
    [Ill, Ill]                         ->    [Ill, Ill]   (1)
    [Ill, Well]                        ->    [Ill, Well]  (2)
    [Ill, Used]                        ->    [Ill, Well]  (2)
    [Well, Well]                       ->    [Well, Well] (3)
    [Well, Used]                       ->    [Well, Well]

In other words, a bound becomes ***Well*** if it is ***Well*** or ***Used*** otherwise it becomes ***Ill***.  

Notes:
(1) The assigned variable is Ill-Defined.
(2) The assigned variable may be Ill-Defined
(3) Successive assignments without an intervening evaluatiion.

SPARK anti-aliasing rules ensure the same actual parameter in a subprogram call cannot be assigned to more than one mode **out** or **in out** parameter or global variable so the same variable cannot be defined more than once by the same statement.


## Validity of Model of Definedness

The definedness values are assumed to be ordered:

***Uninitialized*** < ***Ill*** < ***Well*** < ***Used***.

***Uninitialized*** is a special case of  ***Ill*** and ***Used*** is a special case of ***Well***.

### Axioms ###

1. A variable may have a definedness value of ***Uninitialized***, ***Ill***, ***Well*** or ***Used*** and the values are considered to be ordered as, ***Uninitialized*** < ***Ill*** < ***Well*** < ***Used***.
2. The definedness value of a variable cannot be decreased to ***Uninitialized*** from a higher value.
3. A definedness range of a variable is the lowest definedness value the variable may have (lower bound) to the highest value it may have (upper bound), inclusive.  It is denoted by [Lower bound, Upper bound].
4. A statement abstraction, STAB, is a set containing a definedness range for each variable used by a subprogram.
5. A JOIN of program branches at a statement, S, is a set containing the definedness range of all variables in immediately preceding statements (all branches); the lower bound of the range each variable is the lowest lower bound of the variable has in all the STABs of the immediately preceding statements of S.  The upper bound of the range of each variable is the maximum value the variable has in all of the STABs of the immediately preceding statement of S.
6. The initial STAB of a subprogram is determined by the specification of a the subprogram.  Mode **in** or **in out** subprogram formal parameters or globals are considered to be well defined when the subprogram is called and have an initial definedness range of [Well, Well].  All formal paramters and globals of mode **out** are considered to be uninitialized and have an initial definedness range of [Undefined, Undefined].
7. A local variable will have an initial definedness range of [Uninitialized, Uninitialized] when it is declared.  An initialization expression of a declaration is considered as an assignment after the declaration.
8. Only an ***Assign*** action can instigate a change in the potential definedness value of a variable.  An ***Eval*** action has to be applied to a ***Well*** variable to increase its potential definedness value to ***Used***.
11.  A statement applying an ***Assign** action to a variable, V, has the definedness range [Well, Well] if all of the variables evaluated in determining the actual value of V, the defining expression, have a definedness lower bound equal or greater than ***Well***.  If the definedness lower bound of any variable evaluated in the defining expression has a definedess lower bound of less than ***Well*** the definedness lower bound of V is ***Ill***.  If the definedness upper bound of all variables evaluated in the defining expression is less or equal to ***Ill*** then the definedness range of V is [Ill, Ill] otherwise the range of V is [Ill, Well]. 
12.  It follows from 8 and 9, in order for a variable to have an ***Ill*** value, an expression containing an ***Uninitialized*** value has been used in determining its value.  The value may have been determined directly using an ***Uninitilzed*** variable or indirectly using another variable that has an ***Ill*** value due to its assignment having an Ill-Defined value derived from an expression containing an ***Uninitialized*** variable.  In other words, an ***Ill*** definedness value variable cannot exist without evaluating an ***Uninitialized*** definedness value variable.
13.  An ***Eval*** action on a variable can only change the definedness value of the upper bound of its range from ***Well*** to ***Used***, otherwise a ***Eval*** action has no effect on the definedness range of the variable.
14.  If the definedness range of a variable is [Uninitialized, Uninitialized] then a ***Eval** of the variable is an unconditional evaluation of an ***Uninitialized*** variable - the value is Ill-Defined.
15.  If a variable has lower bound definedness value of ***Uninitialized*** but the upper bound is higher, then a ***Eval*** of the variable is a conditional evaluation of an ***Uninitialized*** variable - the value may be Ill-Defined.
16.  If the definedness range of a variable is [Ill, Ill], then a ***Eval*** of the variable is an unconditional evaluation of an ***Ill*** variable - the value will be Ill-Defined.
17.  If a variable has a lower bound definedness value of ***Ill*** but the upper bound is higher, then a ***Eval** of the variable is a conditional evaluation of an ***Ill*** variable - the value may be Ill-Defined.
18.  A Statement Abstraction, STAB, contains a definedness range for each variable in a subprogram.
19.  A JOIN of program branches at a statement, S, contains definedness range for all variables in immediately preceding statements (all branches); the lower bound of the range each variable is the lowest lower bound of the variable has in all the STABs of the immediately preceding statements of S.  The upper bound of the range of each variable is the maximum value the variable has in all of the STABs of the immediately preceding statement of S.
20.  A STAB of a statement, S, is formed from the JOIN at S by applying:  
    a. Applying axioms 2 and 3 to the upper bound of the definedness range, in the JOIN, of each variable having an ***Eval*** action applied by S, resulting in an intermediate STAB,  
    b. Applying axioms 9, 10 and 11 to the definedness bounds, in the intermediate STAB, of each variable to which an ***Assign*** action is applied by S, to give the STAB.
     
The above axioms are summarised in the tables ***Eval*** Action Table and ***Assign*** Action Table given earlier.

The question is: do these rules ensure that the use of Ill-Defined values are always detected?

As ***Uninitialized*** is a special case of ***Ill***, in the following argument, what applies to ***Ill*** applies to ***Uninitialized*** too.  Similarly, ***Used*** is a special case of ***Well***, and what applies to ***Well*** applies to ***Used*** also. 
  
In straight line code (the sequence of statements has no branches) the value of the initial JOIN at a statement, S, is identical to the STAB of the immediately preceding statement of S.  A local variable of a subprogram, L, if it has not been initialized when declared, will have an initial definedness range of [Uninitialized, Uninitialized].  If there are no assignments of any form to L in the preceding statements of S, L's range will remain [Uninitialized, Uninitialized] and a ***Eval*** action on L will indicate the unconditional ***Eval*** of an ***Uninitialized*** and therefore Ill-Defined variable.  The use of an uninitialized local variable will always be detected.  If this were not the case either:
1. The local variable has been Well-Defined when declared without initialization.  This is a contradiction of axiom 1.
2. The definedness range of the local variable has been changed. From axiom 9 this can only be acheived by an ***Assign*** action by some form of assignment in the preceding statements.  This is also a contradiction as the hypothesis was that there were no assignments.

A similar argument is true for subprogram parameters and globals which have a mode of **out** as these types of variables also have an initial range of [Uninitialized, Uninitialized] as specified in axiom 1.

From axiom 11, a variable with a potential definedness value of ***Ill*** has an actual value that has be derived from an expression containing a variable with a definedness value of ***Uninitalized***.  The use of the uninitialized vaiable will have been detected when it was evaluated.



Code that contains branches has to be merged at the points where branches join.  The JOIN at a point, P, is formed by creating the widest possible range of each variable in the subprogram from the minimum and maximum range values from the STAB of the last statement of all of the branches joining at P, as given in axiom 13.  

Since ***Uninitialized*** is the lowest definedness value, any branch joining at point P having a STAB with a variable, V, that has a lower bound of ***Uninitialized***, V will have the same lower bound in the JOIN at P.  

If V also has maximum definedness upper bound of ***Uninitialized*** in the STABs of all branches joining at point P, then at the defindness range of V will be [Uninitialized, Uninitialized] in the JOIN at P. The statement, S, immediately following the joining branches at P will have the JOIN with the definedness range of V of [Uninitialized, Uninitialized].  As in straigt-line code, a ***Read*** action on V by statement S, indicates an unconditional use of an ***Uninitialized*** variable, according to axiom 5, and will always be detected.    

If instead, V has an upper bound in greater than  ***Uninitialized*** in the STAB of any branch joining at P, then the upper bound in JOIN will be the greatest upper bound (GUB) of V from the STABs of all of the branches joining at P.  The range of V in the JOIN at P will be [Uninitialized, GUB].  A ***Read*** action on V by statement S, if GUB is greater then ***Uninitialized***, indicates a conditional use of an ***Uninitialized*** variable, according to axiom 6, and will always be detected.




## Abstract Interpretation of Definedness
Abstract interpretation uses an abstraction based on an approximation of the range of values a variable may have at a given statement.  In variable definedness we are concerned whether a variable has been assigned a (preferably valid) value but are not concerned with the precise value.  A representation of the values of definedness needs to be used.

The value of a variable is "used" if its name appears in an Expression.  The value of a variable may be "defined", generally by an assignment.  Consider a model to represent the values of definedness. Definedness is not concerned with the actual value of a variable, only whether it is initialized.  Three values are obvious: Uninitialized, Defined and Used. However, the analysis has to detect the use of an uninitialized variable and the update of a variable with an expression containing an uninitialized variable.  To track these anomalies three extra values are introduced, giving the following Definedness values a variable may take:

1. Uninitialized      -- An uninitialized variable
2. UD_Used        -- A variable that has been used while Undefined
3. Unsound        -- An update with an expression containing an Undefined or Unsound variable
4. US_Used        -- A variable that has been used while Unsound
5. Defined        -- A variable that has been updated with a sound expression
7. Used           -- A Defined variable that has been used

Undefined, the variable has not been assigned a sound value, UD_Used, the variable has been used while Undefined, Unsound, the variable has been updated with an Undefined or Unsound value, US_Used, the variable has been used whilst it is Unsound, Defined, the variable has been updated with a value not dependent on an Undefined or Unsound value, and Used, the variable has a Defined value and it has been used. The value Unsound represents a value derived from an Undefined or Unsound value.  If a definedness value is not Undefined, UD_Used, Unsound, or US_Used, it is considered a Sound value. A Sound value is not necessarily a Valid value in the Ada sense as a value could be out of range and, therefore, an invalid value.  A more sophisticated abstract interpretation technique may be able to approximate the range of values a variable may have.

The statements of a subprogram read (a use) or Assign (a definition) variables. The read of a variable with an undefined value is erroneous and the read of a variable with an unsound value is of questionable Validity.  Two successive Assignments to the same variable without an intervening read of the variable means (in SPARK) that the value of the first assignment is unused and therefore may suggest a programming mistake. 

In a single statement, a variable may be both read and assigned, but in SPARK, expressions do not have side effects, so all the variables on the right-hand side of an assignment statement are only used.  On the left-hand side, only array indices are used, although unusually, in object declarations, more than one variable may be initialized.  In a subprogram call, formal parameters of mode **in** or **in out** are modelled as uses of the corresponding actual parameters and a definition of the formal parameters.  Similarly, formal parameters of mode **out** or **in out** are modelled as a definition of the corresponding actual parameters in a procedure call statement.  A subprogram can have many parameters and globals, leading to the model of a statement potentially having multiple uses of variables (possibly of the same variable) and multiple definitions of variables.  All variables defined in a single statement are unique.

The proposed translation first considers all of the variables read by the statement and then those that are assigned by the statement. This avoids ambiguity when the same variable is both read and written by the statement.  Each use and update of a variable will have a separate entry in the translation.  For instance:

    X := X + X + Y;

would be translated as:

    Read   (X)
    Read   (X)
    Read   (Y)
    Assign (X)

To keep an association between the translation and the statement a position and a translation item are needed.  For simplicity, in this study, only one statement per line is assumed so they can just be the line number.

Assuming the above statement is on line 10, the translation becomes:

    Read   (X, (10, 1))
    Read   (X, (10, 2))
    Read   (Y, (10, 3))
    Assign (X, (10, 4))

It may be unnecessary to record the two Reads of X, one may be sufficient as a second Read will not change the definedness value of the variable.

The call of a procedure, 

**procedure** P (A : **in** Integer; B : **in out** Integer; C : **out** Integer);

on line 20

    P (X, Y, X);

would be translated as (assuming no dependency relation is given):

    Read   (X, (20, 1))
    Read   (Y, (20, 2))
    Assign (Y, (20, 3))
    Read   (X, (20, 4))
    Read   (Y, (20, 5))
    Assign (X, (20, 6))

The reads of variables must use the variable's definedness value from the merged abstractions of the immediately preceding statements rather than the actual values Assigned in the procedure call.

Each statement has an abstraction representing the state of all variables on completion of execution of the statement.  The abstraction of the statement is formed by merging the abstractions of all immediately preceding statements and applying the effects of the current statement.  Commonly, an abstraction has a notion of the range of values that a variable may have at the statement.  For definedness, the maximum range is [Undefined, Used], although in definedness there is not necessarily an order to these possible values but, for simplicity, an order is assumed Undefined < UD_Used, Unsound < US_Used < Defined < Used.  A merge has to take the abstractions from the immediately preceding statements and the translation of the statement from this pair of values.

First, consider merging the immediately preceding statements.  This part of the merge determines the widest range of values the variable could take from the abstractions of all of its immediately preceding statements.  That is, the lower bound is taken to be the lowest bound of all its immediate predecessors and the high bound from the highest.

Next, consider the variables read by the statement.  SPARK flow analysis reports on all uses of an uninitialized variable, whether unconditional (it will always happen when executing the program) or conditional (it happens depending on the execution path taken when the program is run). The abstract interpretation model of definedness needs to do this too.

If a variable read by a statement has a range in the merged abstractions of immediately preceding statements of [Undefined, Undefined], then this is an unconditional use of an uninitialized variable and should be reported.  In this case, the range becomes [Undefined, UD_Used].  If the lower bound of the variable in the merged abstractions of preceding statements is undefined but the upper bound is defined or used, then this is the conditional use of an uninitialized variable and should be reported as such.  The lower bound should remain as undefined as the use of the variable has not changed this, however, if the upper bound is defined or used, it should become used as the previously defined or used variable has been read.

Determining the range of an Unsound variable is similar.  If the range of the variable is [Udefined, Unsound] or [Unsound, Unsound] then a read of the variable changes the upper bound to US_Used.  If the range of the variable is [Unsound, Defined] then it becomes [Unsound, Used] after the read of the variable and a variable with the range [Unsound, Used] remains unchanged. 

In summary here are the rules for the effects of a read of a variable in a statement on the definedness value of the variable:

### Read Action Table
    Preceding Abstract State <Read>   New Bounds   
    [Undefined, Undefined]     ->     [Undefined, Ud_Used] (1)
    [Undefined, UD_Used]       ->     [Undefined, UD_Used] 
    [Undefined, Unsound]       ->     [Undefined, US_Used] 
    [Undefined, US_Used]       ->     [Undefined, US_Used]
    [Undefined, Defined]       ->     [Undefined, Defined] (2)
    -- UD_Used cannot be a lower bound as it is impossible to assign a UD_Used value
    [Unsound, Unsound]         ->     [Unnsound, US_Used]
    [Unsound, US_Used]         ->     [Unnsound, US_Used] 
    [Unsound, Defined]         ->     [Unsound, Used]
    [Unsound, Used]            ->     [Unsound, Used]
    -- US_Used cannot be a lower bound as it is impossible to assign a US_Used value
    [Defined, Defined]         ->     [Defined, Used]
    [Defined, Used]            ->     [Defined, Used]
    -- Used cannot be a lower bound as it is impossible to assign a Used value

In other words, a variable must be Defined to become Used.

Notes:
(1) Issue an unconditional Read of an uninitialized variable message.
(2) Issue a conditional Read of an uninitialized variable message.

It is not clear that a message should be issued yet regarding the Read of an Unsound variable.

Assigning to a variable potentially changes its definedness value depending on the variables read in determining its value.  SPARK flow analysis also reports on two successive assignments to a variable without an intervening read of the variable.   The value of the first assignment is not used and may be a mistake in the program.  Such assignments should be detected with abstract interpretation too.

First, consider the definedness bounds of the variables read by a statement after the Read action has been applied as described above.  If any of the bounds of the variables Read have a lower bound of less than Defined, the value is questionable and the lower bound of the assigned value should be set to Unsound.  If the upper bound is Undefined or Unsound, then the upper bound should also be set to Unsound as the (actual as opposed to defindness) value assigned must have an Unsound defindness value.

In summary, the effects of the variables Read in determining the value of the assignment are below but only the possible bounds as given above are included:

### Assign Action Table
    Bounds of Read Variable <Assign> Bounds of Assignment Value
    [Undefined, Undefined]     ->    [Unsound, Unsound]   
    [Undefined, UD_Used]       ->    [Unsound, Unsound] 
    [Undefined, Unsound]       ->    [Unsound, Unsound]
    [Undefined, US_Used]       ->    [Unsound, Unsound]
    [Undefined, Defined]       ->    [Unsound, Defined]
    [Undefined, Used]          ->    [Unsound, Defined] 
    [Unsound, Unsound]         ->    [Unsound, Unsound]
    [Unsound, US_Used]         ->    [Unsound, Unsound]
    [Unsound, Defined]         ->    [Unsound, Defined]
    [Unsound, Used]            ->    [Unsound, Defined]
    [Defined, Defined]         ->    [Defined, Defined] (1)
    [Defined, Used]            ->    [Defined, Defined]

In other words, a bound becomes Defined if it is Defined or Read otherwise it becomes Unsound.  

Notes:
(1) Successive assignments without an intervening read.

SPARK anti-aliasing rules ensure the same actual parameter in a subprogram call cannot be assigned to more than one mode **out** or **in out** parameter so the same variable cannot be defined more than once by the same statement.

It would be possible to check that read-only variables (constants in Ada or **in** mode parameters) are not Assigned but Ada compilers already do this check.

The abstract state at a statement, which is the possible definedness value range of all variables in the subprogram, has to be determined.  The definedness value range of a variable may be affected by both reads and assignments of a statement.  The abstract state for the statement is formed from the abstract state immediately preceding the statement (the merged abstractions from all immediately preceding statements) by leaving the ranges of the variables not read or assigned by the statement unchanged.  Variables read by the statement but not assigned have the bounds determined from the Read Action Table above.  Variables assigned by the statement have the bounds determined from the Assign Action Table.



So far individual statements have been considered but the basic analysis unit in SPARK is the subprogram and further actions than just Read variable and Assign variable are needed as well as more analysis of the final abstract state of the subprogram.

The subprogram declaration has to be translated into a model.  A subprogram may have mode **in**, **out**, or **in out** formal parameters. A mode **in** or **in out** formal parameter is taken to have the actual parameter in a subprogram call assigned to it for analysis purposes and, therefore, has the bounds [Defined, Defined]. A mode **out** formal parameter has initial bounds of [Undefined, Undefined].  On completion of the interpretation of the subprogram, the actual parameters of mode **in out** or **out** are assumed to be assigned from the formal parameters, i.e., formal parameters of mode **in out** or **out** are read at the end of the subprogram. To identify the type of parameter a tag is added to the abstract state of each variable:

    Pin -- a mode in parameter
    Pio -- a mode in out parameter
    Pou -- a mode out parameter
    Rou -- a result variable
    Gin -- a mode in global
    Gio -- a mode in out global
    Gou -- a mode out global
    Lrw -- a local read/write variable
    Lro -- a local read only variable
    Cst -- a static constant (in SPARK terms)

Extra actions performed by statements are required in the translation:

    In_Param
    Io_Param
    Ou_Param

then
    procedure P (A : in Integer; B : in out Integer; C : out Integer);

will be translated as:

    Action                 Abstraction of the statement
    In_Param (A, (1, 1))  (A, Pin, (Defined, (1, 1)), (Defined, (1, 1)))
    Io_Param (B, (1, 2))  (B, Pio  (Defined, (1, 2)), (Defined, (1, 2)))
    Ou_Param (C, (1, 3))  (C, Pou, (Undefined, (1, 3)), (Undefined, (1, 3)))

In SPARK all parameters of a function subprogram must be mode **in**.  The return value is modelled as an auxiliary output variable that is Assigned the value of the return statement.

SPARK 2005 requires all global variables used by a subprogram to be included in the subprogram declaration these also have a mode **in**, **in out** or **out** and are treated similarly to formal parameters.  However, Ada compilers do not check the correct use (according to their mode) of global variables as is done with parameters.  For instance, an Ada compiler does not report the assignment to a mode **in** global variable.  Extra checks for correct use of global variables are needed when interpreting assignments.

Local variable declarations also have to be considered as these introduce new variables into the state abstraction.  Local variables will have initial bounds of [Undefined, Undefined], but if they have an initialization expression, it will be translated as Read of the variables in the expression followed by an Assignment to the declared variable.

Extra statement actions are required:

    Declare

The Translation of:

    V1 : Integer;
    V2 : Integer := 16;       --  No variables Read in initialization expression
    V3 : Integer := V2 + 16;  --  Not allowed in SPARK 2005 where initializations must be static
    C1 : constant Integer := 32;
    C2 : constant Integer := C1 + 64;
    RO1 : constant Integer := V2 + 128;  --  not allowed in SPARK 2005
    RO2 : constant Integer := RO1 + C2;  --  not allowed in SPARK 2005
    V4  : Integer := RO1 + RO2;          --  not allowed in SPARK 2005

is:

    Declare (V1, (1, 1))   (V1,  Lrw, (Undefined, (1, 1)), (Uninitialised, (1, 1)))
    Declare (V2, (2, 1))   (V2,  Lrw, (Defined, (2, 1)),   (Defined, (2, 1)))
    Read    (V2, (3, 1))   (V2,  Lrw, (Defined, (2, 1)),   (Used, (3,1)))
    Declare (V3, (3, 2))   (V3,  Lrw, (Defined, (3, 2)),   (Defined, (3, 2)))
    Declare (C1, (4, 1))   (C1,  Cst, (Defined, (4, 1)),   (Defined, (4, 1)))
    Declare (C2, (5, 1))   (C2,  Cst, (Defined, (5, 1)),   (Defined, (4, 1)))
    Read    (V2, (6, 1))   (V2,  Lrw, (Defined, (2, 1)),   (Used, (6, 1)))
    Declare (RO1, (6, 2))  (RO1, Lro, (Defined, (6, 2)),   (Defined. (6, 2)))
    Read    (RO1, (7, 1))  (RO1, Lro, (Defined, (6, 2)),   (Used, (7, 1)))
    Declare (RO2, (7, 2))  (RO2, Lro, (Defined, (7, 2)),   (Defined, (7, 2)))
    Read    (RO1, (8, 1))  (RO1, Lro, (Defined, (6, 2)),   (Used, (8, 1)))
    Read    (RO2, (8, 2))  (RO2, Lro, (Defined, (7, 2)),   (Used, (8, 2)))
    Declare (V4, (8, 3))   (V4,  Lrw, (Defined, (8, 3)),   (Defined, (8, 3)))

A translation is required for an end of the subprogram to notionally read the formal parameters of mode **in out** or **out** and assign their values to the actual parameters. The subprogram-wide checks are then instigated, followed by the abstract state clean-up.  The abstract state of the parameters and locals of the subprogram are not required after the end of the subprogram:

    Action
    End_Sub
    
### Straight Line Code

First, consider the following simple SPARK procedure taken from early SWES courses.

    procedure Swap (X : in out Integer; Y : in out Integer) is
      Temp : Integer;
    begin
      Temp := X;
      X := Y;
      Y := Temp;
    end Swap;

As it is written there are no uses of uninitialised variables.

Using the ideas for translations and abstractions, the following spreadsheet was constructed to demonstrate using abstract interpretation to check for definedness.

![image.png](attachment:84ecc425-647b-4f3d-ac88-001f7b003066.png)

#### Python Version of Spreadsheet
The Translate operation of the Abstract Interpretation is done by hand.

An enumeration is introduced to represent the possible states of a variable and two named tuples are declared to  of an action and the action and the variable to which it refers.

Two sequences are now declared, one to record where each statement starts and one containing the expanded and translated statements.

When conditional statements are considered the structure will be more complex but first consider straight line code in which only the current statement translation and the abstraction current at the immediatly preceding statement have to be merged.
A set of rules for merging and a data structure representing the abstract state is required.  The merge rules may be represented as a matrix and the abstract state, looking at the spreadsheet could be an extension of the Expansion structure.

In [103]:
from enum import Enum
class Defindness (Enum):
    UNINITIALIZED = 1
    READ          = 2
    ASSIGNED      = 3
    UNSOUND       = 4
    @classmethod
    def pos (cls, state):
        return state.value - 1

class Translate_Actions (Enum):
    READ_VAR    = 1
    ASSIGN_VAR  = 2
    IN_PARAM    = 3
    OUT_PARAM   = 4
    DECLARE_VAR = 5
    @classmethod
    def pos (cls, act):
        return act.value - 1
    
State  = Enum('State',  ['UNINITIALIZED', 'READ', 'ASSIGNED', 'UNSOUND'])
Action = Enum('Action', ['READ_VAR', 'ASSIGN_VAR', 'IN_PARAM', 'OUT_PARAM', 'DECLARE_VAR'])

import collections
Position  = collections.namedtuple ('Postition', ['line', 'expand'])
Expansion = collections.namedtuple ('Expansion',['var', 'action', 'position'])

translation = [
    [Expansion("X__in",  Action.IN_PARAM,    Position(0, 1)),  # Statement 0 represents declaring a 
     Expansion("X__out", Action.OUT_PARAM,   Position(0, 2)),  # variable for each in and out parameter.
     Expansion("Y__in",  Action.IN_PARAM,    Position(0, 3)),  # Variables with the parameter names
     Expansion("Y__out", Action.OUT_PARAM,   Position(0, 4)),  # are declared for use within the
     Expansion("X",      Action.DECLARE_VAR, Position(0, 5)),  # translation of the body of the
     Expansion("Y",      Action.DECLARE_VAR, Position(0, 6))], # subprogram.
     
    [Expansion("X__in",  Action.READ_VAR,     Position(1, 1)),  # The in parameters are assigned to the
     Expansion("Y__in",  Action.READ_VAR,     Position(1, 2)),  # local variables X and Y.
     Expansion("X",      Action.ASSIGN_VAR,   Position(1, 3)),
     Expansion("Y",      Action.ASSIGN_VAR,   Position(1, 4))],

    [Expansion("Temp",   Action.DECLARE_VAR,  Position(2, 1))], # The declaration of Temp.
                                                                # The begin keyword is not translated?
    [Expansion("X",      Action.READ_VAR,     Position(4, 1)),
     Expansion("Temp",   Action.ASSIGN_VAR,   Position(4, 2))],

    [Expansion("Y",      Action.READ_VAR,     Position(5, 1)),
     Expansion("X",      Action.ASSIGN_VAR,   Position(5, 2))],

    [Expansion("Temp",   Action.READ_VAR,     Position(6, 1)),
     Expansion("Y",      Action.ASSIGN_VAR,   Position(6, 2))],

    [Expansion("X",      Action.READ_VAR,     Position(7, 1)),  # end keyword denotes updating the out
     Expansion("Y",      Action.READ_VAR,     Position(7, 2)),  # parameters.
     Expansion("X__out", Action.ASSIGN_VAR,   Position(7, 3)),
     Expansion("Y__out", Action.ASSIGN_VAR,   Position(7, 4))]]

In [104]:
translation

[[Expansion(var='X__in', action=<Action.IN_PARAM: 3>, position=Postition(line=0, expand=1)),
  Expansion(var='X__out', action=<Action.OUT_PARAM: 4>, position=Postition(line=0, expand=2)),
  Expansion(var='Y__in', action=<Action.IN_PARAM: 3>, position=Postition(line=0, expand=3)),
  Expansion(var='Y__out', action=<Action.OUT_PARAM: 4>, position=Postition(line=0, expand=4)),
  Expansion(var='X', action=<Action.DECLARE_VAR: 5>, position=Postition(line=0, expand=5)),
  Expansion(var='Y', action=<Action.DECLARE_VAR: 5>, position=Postition(line=0, expand=6))],
 [Expansion(var='X__in', action=<Action.READ_VAR: 1>, position=Postition(line=1, expand=1)),
  Expansion(var='Y__in', action=<Action.READ_VAR: 1>, position=Postition(line=1, expand=2)),
  Expansion(var='X', action=<Action.ASSIGN_VAR: 2>, position=Postition(line=1, expand=3)),
  Expansion(var='Y', action=<Action.ASSIGN_VAR: 2>, position=Postition(line=1, expand=4))],
 [Expansion(var='Temp', action=<Action.DECLARE_VAR: 5>, position=Posti

In [105]:
statements = [
    0,  # Represent setting values of in parameters in a procedure call.
    7,  # procedure Swap (X : in out Integer; Y : in out Integer) is
    11, # Temp : Integer;
    12,  # Temp := X;
    14, # X := Y;
    16, # Y := Temp;
    18  # end; Represent setting values of out parameters.
]

In [106]:
statements

[0, 7, 11, 12, 14, 16, 18]

When conditional statements are considered the structure will be more complex but first consider straight line code in which only the current statement translation and the abstraction current at the immediatly preceding statement have to be merged.
A set of rules for merging and a data structure representing the abstract state is required.  The merge rules may be represented as a matrix and the abstract state, looking at the spreadsheet could be an extension of the Expansion structure.


|                  |READ_VAR     |ASSIGN_VAR |IN_PARAM |OUT_PARAM     |DECLARE_VAR   |
|------------------|-------------|-----------|---------|--------------|--------------|
|**UNINITIALIZED** |UNSOUND      |ASSIGNED   |ASSIGNED |UNINITIALIZED |UNINITIALIZED |
|**READ**          |READ         |ASSIGNED   |ASSIGNED |UNINITIALIZED |UNINITIALIZED | 
|**ASSIGNED**      |READ         |ASSIGNED   |ASSIGNED |UNINITIALIZED |UNINITIALIZED | 
|**UNSOUND**       |UNSOUND      |ASSIGNED   |ASSIGNED |UNINITIALIZED |UNINITIALIZED |

There is a lot of redundancy in this table, it is immaterial what the current state is for the actions IN_PARAM, OUT_PARAM or DECLARE_VAR.  Similarly for ASSIGN_VAR except that two successive assigns without an intervening read is a reportable state change.  The matrix is only small so the redundancy is of little concern. 

In Python the matrix is represented as a list of lists.

In [135]:
  # READ_VAR      ASSIGN_VAR     IN_PARAM        OUT_PARAM            DECALRE_VAR
merge_matrix = [
  # READ_VAR      ASSIGN_VAR     IN_PARAM        OUT_PARAM            DECALRE_VAR
 [State.UNSOUND, State.ASSIGNED, State.ASSIGNED, State.UNINITIALIZED, State.UNINITIALIZED],# UNINITIALIZED
 [State.READ,    State.ASSIGNED, State.ASSIGNED, State.UNINITIALIZED, State.UNINITIALIZED],# READ
 [State.READ,    State.ASSIGNED, State.ASSIGNED, State.UNINITIALIZED, State.UNINITIALIZED],# ASSIGNED
 [State.UNSOUND, State.ASSIGNED, State.ASSIGNED, State.UNINITIALIZED, State.UNINITIALIZED] # UNSOUND
]

In [134]:
merge_matrix [Defindness.pos(State.UNSOUND)][Translate_Actions.pos(Action.READ_VAR)]

<State.UNSOUND: 4>

The detection of errant actions of reading an uninitialized variable or an unsound value is directly available from the matrix but two assignments of the same variable without an intervening read is not.  Nor is assignment of an unsound value.  Is Unread another state? - but then what happens to assigned?

Consider the structure of the abstract state.  It contains an element for each variable in the subprogram.  Each element contains the merged state of the variable after the current instruction.  For error reporting the position may be needed too.  The requirements look the same as for the Expansion. 

In [None]:
State = Enum('Action', ['UNINITIALIZED', 'READ', 'ASSIGNED', 'UNSOUND'])

Var_State = collections.namedtuple ('Var_State',['var', 'state', 'position'])
