Definite Clause Grammars

Not Just for Parsing Anymore

Abstract

Definite Clause Grammars (DCG) have proved to be a system to build parsing systems simply and effectively. Unfortunately, DCG are so effective as a parser building tool, that their other uses are too often ignored -- it seems that DCG have been pigeon-holed into a parser building niche. This paper demonstrates the common usage of DCG and then expands these common usage patterns into areas other than parsing, showing that DCG are effective means of increasing programming efficiency while simultameously maintaining (or even clarifying) declarative semantics.

Introduction to DCG

Definite Clause Grammars (DCG) are syntactic extentions to Prolog that allow a subset of definite programs 1 to be written as Prolog programs. DCGs, and the definite programs they represent, facilitate, among other things, an almost unchanged translation of grammars defined in Bacchus-Naur Form (BNF) into (parts of) Prolog programs -- DCG is a useful tool for generating parsers and scanners.

Take, for example, a simple command-control language for, e.g. a robot-arm, 2 issuing commands as per the following:

up down up up down

An example representation of the above grammar in BNF for the above language is as follows ...

< move > ::= < step > < move >
< move > ::= < step >
< step > ::= up
< step > ::= down

... and the equivalent Prolog program using DCG is the following:

move --> step, move.
move --> step.
step --> [up].
step --> [down].

This nearly direct translation is simply amazing, especially as compared to other programming language families. It is inconceivable in other programming languages to construct a scanner for the above language so concisely in the native representation. The other amazing facet, not discussed here, is that the scanners and parsers scale with the grammar's complexity linearly ... writing scanners and parsers in other programming languages becomes much more difficult with increasing complexity, because, usually, such scanners and parsers increase in complexity polymonially or even exponentially to the complexity of the grammar.

Unfortunately, perhaps, because the above scanner and associated parsers are so facilitated by DCGs, the other uses of DCGs are too often overlooked. Let's step back from the application of DCGs and examine the essentials -- DCGs provide a system: accepting some atoms and updating the state of the world or rejecting others and backtracking to find a match, so DCGs provide (in other words, guarantee) a limited subset symbols and associated actions on those symbols.

Type systems can be viewed in a similar fasion: they accept some inputs (compiling and executing on accepted instantiated values) and reject others (causing a compilation failure). The advantages of typeful systems has been covered exhaustively in the literature, and programming systems developed recently accept programming with types as a matter of course. Viewing DCGs in a similar light of types can convey the advantages of types in a programming language without types.

One of the oft-trumpeted advantages of programming with (static) types is that programs with type declarations execute faster than equivalent programs without types (or with dynamic typing). Why? Types form a duality: those things acceptable to perform computations, and those other things that are not to be used in the computation. Given that duality, the system can perform the computation in the safety that all the values are valid (in that they conform to the formal type). Under a dynamic typing system, the system must first check the type of each value, ensuring that the value's type is acceptable, before it can perform any computation. In a statically typed system, every check is compiled away (verified) before execution occurs.

Definite Clause Grammars can provide the same service that static typing does. In the usual case, a scanning/parsing system receives a language (set of data) that (we hope) conforms to the grammar. However, we can build a system where we control the grammar definition AND the input data set. A good example to examine where this approach yields effective results is in the domain of generate-and-test problem-solvers.

SEND + MORE = MONEY

(Cryptarithmic problem solvers)

Cryptarithmic problem solvers take a trivially encrypted mathematical problem and determine what that problem represents. The most general case is that a symbol stands for any digit at each position, e.g.:

        6 . .
      x # . .
-------------
        # . .
    # . . .
+ # 5 . 5
-------------
  # . 5 . 4 .

where # stands for any digit other than 0, and

. stands for any digit

The above problem solved in the usual manner would take the generate-and-test approach:

digit(0). digit(1). digit(2). digit(3). digit(4).
digit(5). digit(6). digit(7). digit(8). digit(9).

first_digit(X) :- digit(X), X > 0.

And would provide ways to construct representations of (composite) numbers and translations to and from these representations and the "actual" numbers they represent (so, for example, the system can use the standard operators to perform arithmetic):

as_number(Digits, Number) :- as_number_aux(Digits, 0, Number).

% an auxilary pred to make the above interface pred tail recursive3
as_number_aux([], Num, Num).
as_number_aux([Digit|Digits], Num, Result) :-
  NewSum is Digit * 10 * Num,
  as_number_aux(Digits, NewSum, Result).

as_list(0, []).
as_list(N, [Digit|Rest]) :-
  N > 0,
  Digit is N mod 10,
  Remainder is N // 10,
  as_list(Remainder, Rest).

Then, using the above, one converts the problem as written into the equivalent representations and asks the system to find the solution (note that the above list representation puts numbers in reverse order: units are the first element of the list):

dots([SixNum, Multiplier, Ones, Tens, Hundreds,  Solution]) :-
  digit(A), digit(B),
  as_number([B, A, 6], SixNum),
  digit(C), digit(D), first_digit(E),
  as_number([C, D, E], Multiplier),
  Ones is SixNum * C, Ones < 1000,
  Tens is 10 * D * SixNum, Tens < 100000,
  Fivers is SixNum * E, as_list(Fivers, [5, _, 5, _]),
  Hundreds is Fivers * 100,
  Solution is Ones + Tens + Hundreds,
  as_list(Solution, [_, 4, _, 5, _, _]).

A Pentium Windows system using SWI Prolog solves this problem in 0.79 seconds, or 476,919 inferences in 0.79 seconds (602827 Lips). Pretty impressive, and, for this general type of problem there's no need to improve on this generic generate-and-test methodology.

What happens with a cryptarithmic problem with more stringent constraints, such as uniqueness? To solve the equation ...

  SEND
+ MORE
------
 MONEY

... one could use the generate-and-test approach, and enforce that each letter represents an unique digit:

send_more_money([Send, More, Money]) :-
  first_digit(S), first_digit(M),
  digit(E), digit(N), digit(D), digit(O), digit(R), digit(Y),
  all_different([S, E, N, D, M, O, R, Y], []),
  Send is S * 1000 + E * 100 + N * 10 + D,
  More is M * 1000 + O * 100 + R * 10 + E,
  Money is M * 10000 + O * 1000 + N * 100 + E * 10 + Y,
  Money is Send + More.

all_different([], _).
all_different([Num|Rest], Diffs) :-
  not(member(Num, Diffs)),
  all_different(Rest, [Num|Diffs]).

And such an approach, as above, is simple to implement, and easy to understand, but the runtime cost is enormous -- it took 1,571,755,688 inferences in 1817.34 seconds (864865 Lips) to find the solution. Instead, let us reexamine the problem constraint (uniqueness) and use it to provide information to the problem solver so as to facilitate its effort, and at the same time retain the clarity of the declarative syntax.

The uniqueness constraint tells us that each letter must represent an unique digit. This can be described inductively:

S can be any first_digit, or [1..9];
E can be any digit except the digit captured by S, or [0..9] - [S];
N can be any digit except the digits captured by S and E, or [0..9] - [S, E];
... etc.

Looking at the above inductive description of the unique constraints shows a consumer pattern very much like how a DCG system consumes symbols from a list. So, we'll implement just such a system; in this particular case the symbols of the list are digits. As we rewrite the system to use DCG to replace the generate-and-test predicates, the digit/1 predicate becomes a definite predicate that consumes a digit from the (input) digit list and the first_digit/1 predicate becomes definite, as well:

takeout(X, [X|Y], Y).
takeout(X, [H|T], [H|R]) :- takeout(X, T, R).

digit(X) --> takeout(X).4
first_digit(X) --> digit(X), { X > 0 }.

Then the solution predicate changes in that it now operates under the structure of DCG with the input list being the enumerated ten digits (and we also include an interface predicate so that the user is not burdened with providing the digits):

send_money_quickly([Send, More, Money]) :-
  do_send_moola([Send, More, Money], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], _).

do_send_moola([Send, More, Money]) -->
  first_digit(S), first_digit(M),
  digit(E), digit(N), digit(D), digit(O), digit(R), digit(Y),
  { Send is (S * 1000) + (E * 100) + (N * 10) + D,
    More is (M * 1000) + (O * 100) + (R * 10) + E,
    Money is (M * 10000) + (O * 1000) + (N * 100) + (E * 10) + Y,
    Money is Send + More }.

Did you notice that the above code now has no uniqueness check? By translating the code into a definite predicate, we guarantee that each logical variable holds an unique digit at unification! What is occuring here, by using DCG to assign unique digits is equivalent to creating a type for each logical variable, and each successive type is more restrictive than its predecessor. The code (algorithm) changed very little structurally, but what does this DCG-as-types modification buy us?

Quite a bit. The reduction in cost at runtime is astounding! This new system solves the equation using only 8,215,467 inferences in 36.91 seconds (222562 Lips). This is nearly fifty times faster than the fully nondeterministic version presented first. By iteratively restricting the search space with DCG we have simulated a type system that allows the system to find a solution much more efficiently without sacrificing the clarity of the declarative semantics of the problem description.

Summary for DCGs as Types

The above example demonstrates when one needs to search a restricted solution space, DCG can provide runtime benefits over pure nondeterminism without sacrificing clarity of code (as so many other "optimization" techniques do). The above example also demonstrates that if the search space becomes more restrictive in a predictable way over the duration of the search, DCG can provide enormous runtime benefits.

DCG Implementation of Dynamic Programming

DCGs are effective parser generators and are commonly used as such, we've also seen that DCGs can simulate types and reap the benefits of static type systems in a dynamically typed environment. DCGs have other uses as well. One such use is to simulate dynamic programming, 5 which computes solutions from an iterative bottom-up approach instead of (usual for Prolog and other languages that rely on recursion) the recursive top-down approach.

Dynamic programming shines where the solution must be approached incrementally from a known-good set of states, usually these are situations were a declarative top-down description of the solution leads to a combinatorial explosion of attempted solutions that can overwhelm any computational resource. An excellent example of such a problem is computing the nth Fibonacci number.

Only some Fibonacci numbers declaratively

Below is the formula to find the nth Fibonacci number:

fib n | n == 0    = 1
      | n == 1    = 1
      | otherwise = fib (n-1) + fib (n-2)

Or, as translated (naively) into Prolog:

naive_fib(0, 1).
naive_fib(1, 1).
naive_fib(X, Y) :-
  X > 1,
  Idx1 is X - 1,
  Idx2 is X - 2,
  naive_fib(Idx1, A),
  naive_fib(Idx2, B),
  Y is A + B.

This is all very well and good, but, unfortunately, the above specification is exponentially recursive ... not only does it take a long time to find a solution, but it can only find the first few decades of solutions before running out of available computational resources (memory). A query of naive_fib(20, X) takes 65,671 inferences in 4.54 seconds (14476 Lips) to find the solution, but a call for the 30th Fibonacci number causes an "out of local stack" error.

The problem with the top-down approach is that it attempts to find the solution by finding the two previous (unknown) solutions, until it locates a known answer. Unfortunately, the bottom is too far down to be reached with the resources available, and the problem does not make conversion into a tail-recursive solution practicable.

Any Fibonacci, quickly, with DCGs

DCGs solve this problem by simulating dynamic programming ... approach the solution from a set of known states, and keep building from that known state until the system finds the solution. This approach has the additional advantage that since it is iterative, it eliminates the exponential recursive explosion that the previous declarative approach suffered.

The idea is use a list6 to memoize the result set as the solution space grows, until the system reaches the desired solution; 7 that is where DCGs assist us in maintaining a declarative description while controlling resources. Declaratively, then, if the requested Fibonacci number is at the head of the solution list, then return it:

fibonacci(X, [X|_]) --> [].

Otherwise, if we're at then end of the list and still have not found the solution, push the next two Fibonacci numbers onto the result set and keep searching:

fibonacci(X, [A, B]) -->
  next_fib(X, A, B, Fib1),
  next_fib(X, B, Fib1, Fib2),
  fibonacci(X, [Fib1, Fib2]).

Finally, if we're in the middle of the result set and have not found the solution, keep searching:

fibonacci(X, [A, B|Rest]) -->
  succ_push(X, A),
  fibonacci(X, [B|Rest]).8

The DCG housekeeping methods are what one would expect:

% Ensures the current fib is not the one we need, generates a new
% one, and pushes the current one onto the result set.
next_fib(X, fib(Idx1, Num1), fib(Idx2, Num2), fib(Idx3, Num3)) -->
  succ_push(X, fib(Idx1, Num1)),
  { Idx3 is Idx2 + 1,
    Num3 is Num1 + Num2 }.

% add a definite branch to gt that pushes the compared-to element
% onto the end of the list (transformed into a difference list)
succ_push(A, B, Old, New) :-
  gt(A, B),
  concat([Old|T1] - T1, [B|T2] - T2, New).9

As before, we retain the declarative nature using DCGs to solve the problem, but we also obtain the benefit of much more powerful computational ability with much less runtime cost, 10 to wit: quick_fib(1450, X) finds the solution using 12,327 inferences in 0.02 seconds (615464 Lips) -- out-of-reach and impossibly fast for the top-down standard predicate.12

Conclusion

Definite Clause Grammars (DCGs) have long been known to be an effective tool to generate scanners and parsers. In this article, we explore alternative uses for DCGs, such as emulating static type system and implementing dynamic programming system. We have seen that, unlike other "optimization" techniques, DCGs maintain or even improve the clarity of declarative specifications while at the same time providing the benefits of these other systems. These benefits are manifested as improvements by orders of magnitude in both computational power and the speed at which the runtime achieves desired solutions.

Endnotes

1	Definite programs and Prolog programs using DCG to model definite programs is discussed in detail in [DM93].
2	This example is developed in [Bratko01], chapter 21, with two alternative BNF representations offered.
3	A naive version of as_number/2 would be ... as_number([], 0). as_number([Digit\|Digits], Result) :- as_number(Digits, Rest), Result is Digit + 10 * Rest. ... but this is not tail recursive, an so may not benefit from optimization from the Prolog system. In general, this kind of transformation from a "naive" recursive system to one that uses an accumulator is known as folding, and is the result of the fruits of research in the functional programming community.
4	We use takeout/3 instead of delete/3 because delete/3 is not reversible, eliminating the desired backtracking to explore alternate solutions. The definition, use, and a wonderful explanation of takeout/3 come from [Fisher99], § 2.7.
5	Dynamic programming is discussed in detail in [RL99], which provides an implementation of a generic dynamic programming algorithm in the Haskell programming language.
6	Actually, [RL99] uses a dictionary, not a list, as the data structure to memoize the known state, but the difference list approach was so effective that it was unnecessary to use a different data structure.
7	As each Fibonacci number is computed from the previous two Fibonacci numbers, I found it most convenient to use (specifically) a difference list as the data structure to memoize the results.
8	We have two iteration branches for this predicate. The first adds two more Fibonacci numbers when we've come to the final two Fibonacci numbers in the result set; the second continues iteration because there are more than two remaining Fibonacci numbers. As we need two Fibonacci numbers to compute the next Fibonacci number, these two branches guarantee that we perform that computation when we reach the last two numbers, and not before. This approach forms the basis of this dynamic programming system.
9	The implementation of concat/3 is straight from [Bratko01], § 8.5.3. (concat(A1-Z1, Z1-Z2, A1-Z2)).
10	Not only that, but this solution also provides the history of all the Fibonacci numbers prior to the requested one in a difference list by calling fibonacci/4 directly;11 the traditional top-down solution provides no such history.
11	The resulting difference list can be converted into human-readable form with the following predicate: simple_list([]) --> []. simple_list([H, Datum\|Diff] - Diff, L0, List) :- simple_list(H, [Datum\|L0], List).
12	An implementation using monads in Haskell for the fibonacci computer is available at logicaltypes.blogspot.com

Works Consulted

[Bratko01]	Prolog Programming for Artificial Intelligence, 3rd ed. Ivan Bratko, Pearson Education Limited, Essex, England, 2001.
[DM93]	A Grammatical View of Logic Programming, Pierre Deransart and Jan Mal/uszyn'ski, MIT Press, Cambridge, Massachusetts, 1993.
[Fisher99]	prolog :- tutorial, J. R. Fisher, http://www.csupomona.edu/~jrfisher/www/prolog_tutorial/contents.html, 1999.
[RL99]	Algorithms: A Functional Programming Approach, Fethi Rabhi and Guy Lapalme, Addison-Wesley, Essex, England, 1999

Definitions

backtrack(ing):	exploring alternate solutions by attempting new values before the point of failure.
BNF:	Bacchus Naur Form is a system to describe grammars of (programming) languages.
DCG:	Definite Clause Grammar(s)
difference list:	Difference lists are list that allow access to elements within the list directly; they avoid the need to "cdr down" (a Lisp term mean to walk the list from its first element to the desired element) the list to reach the desired elements.
Lips:	Logical Inferences Per Second
memoize:	To memoize a result is to keep that result locally active so that it is not necessary to recompute it.

Solutions to Problems

The dots problem has the following (only) solution:

     645 
   x 721
   -----
     645
   12900
+ 451500
--------
  465045

The solution to SEND + MORE = MONEY is 9567 + 1085 = 10652.

The 20th Fibonacci number is 10946 (computed from fibonacci(fib(20, X), [fib(1,1), fib(0,1)], A, B)).

The 1450th Fibonacci number is 7.79078e+302; the 1500th Fibonacci number causes a floating-point overflow error.

where	# stands for any digit other than 0, and
	. stands for any digit