Review of CDCL-driven Program Synthesis Paper

ReviewPLSE


Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis using Conflict-Driven Learning. In Proceedings of 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’18). ACM, New York, NY, USA, 16 pages. https://doi.org/10.1145/3192366.3192382

In the paper, the authors propose a new algorithm for program synthesis (given constraints, partial program, specification of DSL, and context-free grammar of DSL to deduce a program that adheres to the constraints). Comparing to the exisiting program synthesis algorithms, this algorithm adds a knowledge base, which has constraints learned from the generation of infeasible programs.

The algorithm starts with deciding which hole to patch. This process begins by getting a list of AST nodes of partial programs and productions that adhere to the knowledge base after patching. Then, it selects one (hole, production) tuple by statistic model. After that, the algorithm calls the propagate procedure, which infers all additional assignments for the nodes based on specs and constraints. Then, the CheckConflict method is called to deduce a minimal unsatisfiable core (MUC) if there is one. When MUC is an empty set, which implies no conflict, the algorithm continues to decide other holes to patch. If the MUC is not an empty set, the AnalyzeConflict procedure is called, and the algorithm would backpropagate to the last satisfiable state. For the AnalyzeConflict procedure, the author proposes two ideas. Firstly, if both components are logically equivalent, then they are equivalent modulo conflict. However, this is hardly possible in real-world DSL since two operations are mostly not the same. Another approach is to find the more restrictive components, which implies that the program would similarly not be feasible if they are selected.

The authors manage to leverage the algorithm for SAT solving, which is called conflict-driven clause learning (CDCL), to solve program synthesis problems. Specifically, the algorithm proposed by the authors includes a few similar parts of CDCL. Both have procedures to learn from past mistakes and avoid them in the future. In program synthesis, this would effectively prune a considerable amount of infeasible programs, reducing the amount of enumeration on all $O(m^n)$ candidates where m are the # of holes to fill and n is the # of all components. The benchmark against other state of art program synthesizers Morpheus and DeepCoder also indicates that the implementation of this algorithm, Neo, is more efficient. Additionally, the authors created a similar implementation but removing the component of learning from conflicts. And the result turns out that Neo performs much better than its counterpart that does not check conflicts. As a result, the authors innovatively mix CDCL, an algorithm from another field, to program synthesis algorithm, and the experimental result shows that this approach is more efficient than former approaches.

Neo, on the other hand, is straightforward to use. Users only need to define their DSL specification and provide a program with holes, which is the same as other state-of-the-art program synthesizers. The authors also mention that allowing Neo to conduct program synthesis in two different fields only took them one day. As a result, the newly proposed algorithm does not require extra human work on its input.

A particular part of the algorithm, nevertheless, may be incomplete. The authors claim that if DSL spec is over-approximated, the resulting program may not be satisfying the constraints provided, and users could execute programs to verify this.

a sensible design choice is to use over-approximate spec- ifications of the DSL constructs and then check whether P actually satisfies Φ by executing P on these examples

However, the authors never propose a solution to the programs generated that do not follow the constraint. If it is only to add new constraints learned from this execution failure and restart the algorithm, this would be extremely ineffective. This problem should be clearly addressed since the idea leverages the use of over-approximated specs.

Questions for the authors