Review of Rosette Symbolic Virtual Machine Paper
Torlak, E., & Bodik, R. (2013). A lightweight symbolic virtual machine for solver-aided host languages. Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI '14. doi:10.1145/2594291.2594340
In the paper, the authors introduce a new method to conduct verification for all languages. Developing a symbolic compiler could take years of effort and the ouput of so could likely be some hard-to-solve constraints. A DSL (domain specific language) operating on the host language, however, can mitigate this issue since they have smaller logical encodings. The host language then reduces the implementation plus the DSLs to constraints to be solved by SMT solver backends like Z3. One benefit of this technique is that developer of the DSL could use metaprogramming. These all in together have been referred to as SVM.
SVM could be used to conduct following queries:
- Angelic Execution: executing with symbolic variables and returns the input for assertions are true
- Debugging: find the minimum unsatisfiable core that need to be changed in order to make assertions true for a given input
- Verification: verify whether all assertions are true. Return counterexample if assertions fail.
- Synthesis: fill in sketch of the program that fits the constraints
The authors also propose other techniques to improve this which can be categorized as following:
Lifting: Since accomadating metaprogramming could be tedious and complex, the authors then introduce the term lifting. A lifted constructs are that are already supported by the symbolic compiler (e.g. plus, minus). An unlifted constructs are that are not implemented. These would not require implementation if it is only for concrete evaluation (i.e. does not involve symbolic variable in input).
Guard: $[b,v] \in B \times (V\setminus U)$, b is the guard and V is any constructs except bool. This could prevent implementation of constraint synthesis for objects like lists.
Type-driven state merging: Since direct bounded model checking is hard to develop, the authors instead merge states from alternative execution paths but not creating symbolic unions by merging same class. $u \dagger v \implies \phi(b,u,v)$
Termination: SVM execution is not bounded by a depth of recursion. The authors instead assume that the DSL does not contain any infinite loop since this could easily been set during the definition of DSL in the host language.
The authors manage to leverage an innovative combination of different techniques to create a solver-aided host language. The SVM could allow DSL implementation developers to leverage multiple benefits of the host language. For instance, the developers could utilize metaprogramming and additionally use operators that are not commonly supported by solver (e.g. list operation and string operation). Even though they might have to write their own constraint generating mechanism, this is unneeded most of the time with the help of guard and support of concrete evaluation. Apart from that, the authors have proven the correctness and efficiency of it. SVM would not get into the case of path explosion and the time of it is polynomial s.t. input size. All in all, the authors successfully leverage concepts from symbolic execution and bounded model checking to create a great, concise and easy to use symbolic compiler.
However, the authors have not clearly address the benefits of SVM over IVL. For the claim that
This eases some of the engineering burden of building a verifier, but compilation to an IVL still requires significant effort and expertise, especially when there is a fundamental mismatch between the verifier’s source language and the target IVL.
there are counterexamples like LLVM and other popular IR toolkits. There are a lot of exisiting compilers to LLVM IR for almost all languages. Does this imply that to make a easy-to-use verifier, we only need build something for LLVM IR? It also seems to be more efficient to compile LLVM IR to constraint comparing to compiling Racket since there are less operations and more structured grammar in LLVM. Additionally, the authors mention that
IVL compilers are more limited than the SVM in the kinds of queries they can encode
In my opinion, both are the same (e.g. it is easy to represent array in LLVM IR). Authors of IVL could also similarly define a few lifted constructs and ask the upstream DSL implementation developers to define unlifted constructs with these. For other that may not be able to be defined with existing lifted constructs in LLVM IR for example, I could only think of unverifiable constructs like concurrency, file operation, etc. which are also not supported by SVM.
Question for the author
- What is the biggest benefit of SVM comparing to other existing tools like Dafny and Sketch? Is SVM more efficient and robust? Is there a benchmark of other tools that SVM can compare to?
- Even though the time complexity is polynomial, is it possible to make the whole thing concurrent or even distributed?