A parser generator for visibly pushdown grammars (VPGs) is implemented
and verified in the proof assistant Coq under
the folder VPGParser/
. The parser generator is exported to OCaml
library VerifiedParser.ml
with commands under the folder Export/
.
Coq@8.16.1
The following command compiles the Coq library, exports the OCaml
library to a single file VerifiedParser.ml
, and generates documentation for the Coq library. The command takes ~5
mins on Intel 9700 with 16 GB memory.
make && make doc
Given a VPG
The function
The subfunctions are briefly discussed as follows. The parsing
function
Let
Finally, the backward extraction function
All functions are implemented and verified in the proof assistant
Coq; the library is then exported to OCaml
(VerifiedParser.ml
) for further utilization.
The verified properties of the parser generator are summarized in
Theorem Property_VPG_Parser_Generator
in the module
VPGParser/TimedExtraction.v
, declared as follows.
Theorem Property_VPG_Parser_Generator:
∃ k b1 b2,
∀ m M T w i, Forest (m::M) T (w++[i]) ->
let Vinit := f_init m in
let Vw := extraction M Vinit in
(** 1. Vw has no duplicates *)
NoDup Vw /\
(** 2. The time cost is O(k·|Vw|·|w|) *)
cost_extraction M Vinit <= (k * |Vw| + b1) * |w++[i]| + b2 /\
(** 3. The correctness *)
(∀ v, (∃ I, In (v, I) Vw) <-> DM.Dm L_0 (w ++ [i]) v).
The relation Forest (m::M) T (w++[i])
formalizes the process to generate a forest
Inductive Forest : (list CME) -> (list CME) -> (list symbol) -> Prop :=
| FInit: Forest [m0] [] []
| FInte i m M T w m' T':
Forest (m::M) T w
-> (m', T') = p m T i
-> Forest (m'::m::M) T' (w++[i]).
Back to the theorem, we can see that
-
$V_w$ , conceptually a set of VPG parse trees, is implemented as a list of VPG parse trees. The first propertyNoDup Vw
shows$V_w$ has no duplicates, therefore can be viewed as a set. -
The function
$\text{extraction}$ has a monadic counterpart$\text{extraction}'$ , which is verified to have the time complexitycost_extraction
. The second property showscost_extraction}=O(|V_w||w|)
, i.e.,$O(|w|)$ times the number of all valid parse trees. When$|V_w|\leq 1$ , which is always the case for unambiguous grammars, it shows that the extraction function is linear-time. -
The last property shows the correctness: a VPG parse tree
$v$ is in$V_w$ , iff the following relation holds.$$\text{Dm}\ L_0\ w\ v$$ The relation
$\text{Dm}$ ("major derivation") is called the big-step derivation of the VPG parse trees; it defines how to "correctly" derive a VPG parse tree from the start nonterminal$L_0$ . In the paper [1], the notation ofDm
is$$L_0\Downarrow (w, v).$$ In Coq, the relation is defined as follows.Inductive Dm : var -> list symbol -> list VE -> Prop := | DEps : ∀ L, (in_rules (L, Alt_Epsilon) P) -> Dm L [] [] | DLa : ∀ L a L1 w' pt', (in_rules (L, Alt_Linear (Call a) L1) P) -> Dm L1 w' pt' -> Dm L (Call a::w') (Calv (PndCalE L a L1)::pt') | DLc : ∀ L c L1 w' pt', (in_rules (L, Alt_Linear (Plain c) L1) P) -> Dm L1 w' pt' -> Dm L (Plain c::w') (Plnv (PlnE L c L1)::pt') | DLb : ∀ L b L1 w' pt', (in_rules (L, Alt_Linear (Ret b) L1) P) -> Dm L1 w' pt' -> Dm L (Ret b::w') (Retv (PndRetE L b L1)::pt') | DMat : ∀ L a b L1 L2 w1 w2 pt1 pt2, (in_rules (L, Alt_Match (a) (b) L1 L2) P) -> Dm L1 w1 pt1 -> Dm L2 w2 pt2 -> Dm L ([Call a] ++ w1 ++ [Ret b] ++ w2) ([Calv (MatCalE L a L1 b L2)]++pt1++ [Retv (MatRetE L a L1 b L2)]++pt2).
Because it works for general VPGs, the above definition has two more rules than the big-step relation for well-matched VPGs discussed in the paper [1], namely
DLa
, derivation for linear call rules, andDLb
, derivation for linear return rules.
The library depends on two axioms, collected in the module type
VPG
, defined as follows.
Module Type VPG.
Parameter L_0 : vpg_var.
Parameter P : vpg.rules.
Axiom A_VPG_Linear: ∀ L c L1,
in_rules (L, Alt_Linear c L1) P
-> (∃ u, L = V0 u)
-> ∃ u1 i, L1 = V0 u1 /\ c=Plain i.
Axiom A_VPG_Match: ∀ L a L1 b L2,
in_rules (L, Alt_Match a b L1 L2) P
-> (∃ u1, L1 = V0 u1) /\
((∃ u, L = V0 u) -> ∃ u2, L2 = V0 u2).
End VPG.
The type VPG
requires a VPG
Please refer to VPGParser/README.md for more implementation details.
The benchmarks are located in Benchmark/
.
[1] TOPLAS '23. Xiaodong Jia, Ashish Kumar, and Gang Tan. A Derivative-Based Parser Generator for Visibly Pushdown Grammars. PDF