Keywords and phrases: Computational model, computation, program, programming language, syntax, semantics, pragmatics, bound, free, scope, environment, block.
Suppose that we have the values 3.14 and 5, the operation of multiplication (×) and we perform the computation specified by the following arithmetic expression
the result of which is the value:
If 3.14 is an approximation for pi, we can replace 3.14 with pi abstracting the expression to:
We say that pi is bound to 3.14 and is a constant. The where introduces a local environment or block for local definitions. The scope of the definitions is just the expression. If 5 is intended to be the value of a radius, then the expression can be generalized by introducing a variable for the radius:
Of course the value of the expression is the circumference of a circle so we may further abstract by assigning a name to the expression:
This last equation binds the name Circumference to the expression 2 × pi × radius where pi=3.14. The variable radius is said to be free in the right hand side of the equation. It is a variable since its value is not determined. pi is not a variable, it is a constant, the name of a particular value. Any context (scope), in which this equation and the variable radius appears and radius is assigned to a value, determines a value for Circumference. A further generalization is possible by parameterizing Circumference with the variable radius.
The variable radius appearing in the right hand side is no longer free. It is bound to the parameter radius. Circumference has a value (other than the right hand side) only when the parameter is replaced with an expression. For example, in
The parameter radius is bound to the value 5 and, as a result, Circumference(5) is bound to 3.14. In this form, the definition is a recipe or program for computing the circumference of a circle from the radius of the circle. The mathematical notation (syntax) provides the programming language and arithmetic provides the computational model for the computation. The mapping from the syntax to the computational model provides the meaning (semantics) for the program. The notation employed in this example is based on the very pragmatic considerations of ease of use and understanding. It is so similar to the usual mathematical notation that most people have difficulty in distinguishing between the syntax and the computational model.
This example serves to illustrate several key ideas in the study of programming languages which are summarized in definition 1.1.
- A computational model is a collection of values and operations.
- A computation is the application of a sequence of operations to a value to yield another value.
- A program is a specification of a computation.
- A programming language is a notation for writing programs.
- The syntax of a programming language refers to the structure or form of programs.
- The semantics of a programming language describe the relationship between a program and the model of computation.
- The pragmatics of a programming language describe the degree of success with which a programming language meets its goals both in its faithfulness to the underlying model of computation and in its utility for human programmers.
Another view of a program is that it models a problem domain and the execution of the program is a simulation of the problem domain.
In any case, data objects are central to programs.
The values can be separated into two groups, primitive and compound. The primitive values are usually numbers, boolean values, and characters. The composite values are usually arrays, records, and recursively defined values. Strings may occur as either primitive or composite values. Lists, stacks, trees, and queues are examples of recursively defined values.
Associated with the primitive values are the usual operations (e.g., arithmetic operations for the numbers). Associated with each composite value are operations to construct the values of that type and operations to access component elements of the type. A collection of values that share a common set of operations is called a data type.
The primitive types are implemented using the underlying hardware and, sometimes, special purpose software. So that only appropriate operations are applied to values, the value's type must be known. In assembly language programs it is up to the programmer to keep track of a datum's type. Type information is contained in a descriptor.
Descriptor | Value |
Boolean values are implemented using a single bit of storage. Since single bits are not usually addressable, the implementation is extended to be a single addressable unit of memory. In this case either a single bit within the addressable unit is used for the value or a zero value in the storage unit designates false while any non-zero value designates true. Operation on bits and boolean values are included in processor instruction sets.
Integer values are most often implemented using a hardware defined integer storage representation, often 32-bits or four bytes with one bit for the sign.
sign
bit |
7-bits | byte | byte | byte |
Natural number values are most often implemented using the hardware defined storage unit. The advantage of providing an natural number type is that an additional bit of storage is available thus providing larger positive values than are provided for integer values.
Rational number values may be implemented as pairs of integers. Rationals are provided when it is desired to avoid the problems of round off and truncation which occurs when floating point numbers are used to represent rational numbers.
Real number values are most often implemented using a hardware
defined floating point representation. One such representation consists
of 32-bits or four bytes where the first bit is the sign, the next seven
bits the exponent and the remaining three bytes the mantissa.
sign
bit |
exponent | byte | byte | byte |
Character values are almost always supported by the underlying
hardware and operating system, usually one byte per character.
Characters are encoded using the 8-bit ASCII or EBCDIC encoding scheme
or the emerging 16-bit Unicode encoding scheme.
Enumeration values are usually represented by a subsequence of the integers and as such inherit an appropriate subset of the integer operations.
Where strings are treated as a primitive type, they are usually of fixed length and their operations are implemented in hardware.
Compound (or structured) data types include arrays, records, and files.
Abstract data types are best implemented with pointers. The user program holds a pointer to a value of the abstract type. This use of pointers is quite safe since the pointer manipulation is restricted to the implementation module and the pointer is notationally hidden.
The initial example of the computation of the circumference of a circle is an example of functional programming. A more interesting example is a program to compute the standard deviation of a list of scores. The formula for standard deviation is:
where xi is an individual score and N is the number of scores. The formula requires computing both the sum of the scores and the sum of the squares of the scores. The higher-order function map applies a function to each element of a list and the higher-order function fold reduces a list by applying a function to the first element of a list and the result of folding the rest of the list. Figure 1 illustrates what an implementation in a functional programming language might look like.
Figure 1.1: Standard deviation using higher-order functions sd(xs) = sqrt(v) where n = length( xs ) v = fold( plus, map(sqr, xs ))/n - sqr( fold(plus, xs)/n)
The functional model is important because it has been under development for hundreds of years and its notation and methods form the base upon which a large portion of our problem solving methodologies rest. The prime concern in functional programming is defining functional relationships.
values
functions function definition function application function composition |
Program = set of function definitions
Computation = function application |
The function is represented as a relation between R and C.
A better illustration logic programming is a program to determine the mortality of Socrates and Penelope. We begin with the fact that Socrates and Penelope are human and the rule that all humans are mortal or equivalently for all X, if X is human then X is mortal. An equivalent form of the fact and rule are:
human(Socrates)To determine the mortality of Socrates or Penelope we make the assumption that there are no mortals i. e.
human(Penelope)
mortal(X) if human(X)
¬mortal(Y)Figure 2 contains a computation (proof) that Socrates and Penelope are mortal.
1a. human(Socrates) | Fact |
1b. human(Penelope) | Fact |
2. mortal(X) if human(X) | Rule |
3. ¬mortal(Y) | Assumption |
4a. X = Y | from 2 & 3 by unification |
4b. ¬human(Y) | and modus tollens |
5a. Y = Socrates | from 1 and 4 by unification |
5b. Y = Penelope | |
6. Contradiction | 5a, 4b, and 1a; 5b, 4b and 1b |
The first step in the computation is the deduction of line 4 from lines 2 and 3. It is justified by the inference rule modus tollens which states that if the conclusion of a rule is known to be false, then so is the hypothesis. The variables X and Y may be unified since they may have any value. By unification, Lines 5a, 4b, and 1a; 5b, 4b and 1b produce contradictions and identify both Socrates and Penelope as mortal.
Resolution is the an inference rule which looks for a contradiction
and it is facilitated by unification which determines if there is
a substitution which makes two terms the same. The logic model is important
because it is a formalization of the reasoning process. It is related to
relational data bases and expert systems. The prime concern in logic programming
is defining relationships.
values
relations logical inference |
Program = set of relation definitions
Computation = constructive proof (inference from definitions) |
Figure 1.5: State Sequence
S0 -O0-> S1 - ... -> Sn-1 -On-1-> Sn
The computation requires the implementation to determine the value of Radius and pi in the state and then change the state by pairing Circumference with a new value.constant pi = 3.14input (Radius) Circumference := 2 * pi * Radius Output (Circumference)
It is easier to keep track of the state when state information is included with the code.
where _|_ designates an undefined value.constant pi = 3.14 Radius _|_, Circumference = _|_, pi=3.14 input (Radius) Radius x, Circumference = _|_, pi=3.14 Circumference := 2 * pi * Radius Radius x, Circumference = 2 × x × pi, pi=3.14 Output (Circumference) Radius x, Circumference = 2 × x × pi, pi=3.14
The imperative model is often called the procedural model because groups of operations are abstracted into procedures.
The imperative-procedural model is important because it models change and changes are an integral part of our environment. It is the model of computation that is closest to the hardware on which programs are executed. Its closeness to hardware makes it the easiest to implement and imperative programs tend to make the least demands for system resources (time and space). The prime concern in imperative programming is defining a sequence of state changes.
memory cells
values commands |
Program = sequence of commands
Computation = sequence of state changes |
The method of computation provided in a programming language is dependent on the model of computation implemented by the programming language. Most programming languages utilize more than one model of computation but one model usually predominates. Lisp, Scheme, and ML are based on the functional model of computation but provide some imperative constructs while, Miranda and Haskell provide a nearly pure implementation of the functional model of computation. Prolog provides a partial implementation of the logic computational model but, for reasons of efficiency and practicality, fails in several areas and contains imperative constructs. The language Gödel is much closer to the ideal. The imperative model requires some functional and logical elements and languages such as Pascal, C/C++, Ada and Java emphasize assignments, methods of defining various computation sequences and provide minimal implementations of the functional and logic model of computation.
The structure of a programming language should be well defined, and the outcome of a particular section of code easily predicted.The notation used in the functional and logic models reflects common mathematical practice and exhibits the notational simplicity and regularity found in that discipline. Because the notation used for the imperative model must be able to specify both a variety of state sequences and expressions, it tends to be irregular and of greater complexity than the notation for functional and logic models. Because of this complexity and the wide spread use of imperative programming languages, the bulk of the work done in the area of programming language semantics deals with imperative programming languages.
For a language to have wide applicability it must make provision for abstraction, generalization and modularity. Abstraction (associating a name with an object and using the name to whenever the object is required) permits the suppression of detail and provides constructs which permit the extension of a programming language. These extensions are necessary to reduce the complexity of programs. Generalization (replacing a constant with a variable) permits the application of constructs to more objects and possibly to other classes of objects. Modularity is a partitioning of a program into sections usually for separate compilation and into libraries of reusable code. Abstraction, generalization and modularity ease the burden on a programmer by permitting the programmer to introduce levels of detail and logical partitioning of a program. The implementation of the programming language should be faithful to the underlying computational model and be an efficient implementation.
Concurrent programming involves the notations for expressing potential parallel execution of portions of a program and the techniques for solving the resulting synchronization and communication problems. The concurrent programming may be implemented within any of the computational models. Concurrency within the functional and logic model is particularly attractive since, subexpression evaluation and inferences may be performed concurrently and requires no additional syntax. Concurrency in the imperative model requires additional syntactic elements.
Object-oriented programming OOP involves the notations for structuring a program into a collection of objects which compute by exchanging messages. Each object is bound up with a value and a set of operations which determine the messages to which it can respond. The objects are organized hierarchically and inherit operations from objects higher up in the hierarchy. Object-oriented programming may be implemented within any of the other computational models.
Programs are written and read by humans but are executed by computers. Since both humans and computers must be able to understand programs, it is necessary to understand the requirements of both classes of users.
The native programming languages of computers bear little resemblance to natural languages. Machine languages are unstructured and contain few, if any, constructs resembling the level at which humans think. The instructions typically include arithmetic and logical operations, memory modification instructions and branching instructions. For example, the circumference computation might be written in assembly language as:
Because the imperative model is closer to actual hardware, imperative programs have tended to be more efficient in their use of time and space than equivalent functional and logic programs.Load Radius R1 Mult R1 2 R1 Load Pi R2 Mult R1 R2 R1 Store R1 Circumference
Natural languages are not suitable for programming languages because humans themselves do not use natural languages when they construct precise formulations of concepts and principles of particular knowledge domains. Instead, they use a mix of natural language, formalized symbolic notations of mathematics and logic and diagrams. The most successful of these symbolic notations contain a few basic objects which may be combined through a few simple rules to produce objects of arbitrary levels of complexity. In these systems, humans reduce complexity by the use of definitions, abstractions, generalizations and analogies. Successful programming languages do the same by catering to the natural problem solving approaches used by humans. Ideally, programming languages should approach the level at which humans reason and should reflect the notational approaches that humans use in problem solving and must include ways of structuring programs to ease the tasks of program understanding, debugging and maintenance.
Research in computer architecture is producing Research in programming language implementation is producing more efficient implementations of programming languages for all models of computation.
Research in software engineering is producing a better understanding of the program structuring techniques that lead to programs that are easier to write, read (understand), and maintain.
All general purpose programming languages adhere to the following programming language design principle.
In the following pages we will study programming languages as the realization of computational models, semantics as the relationship between computational models and syntax, and associated pragmatic concerns.
fac(N) = if N = 0 then 1 else N*fac(N-1)
N := 4; F := 1; While N > 0 do F := N*F; N := N-1; end;
complete the following computation (proof) and determine the result of concatenating the two lists.list([ ]) -- the empty list list([X|L]) if list(L) -- first element is X the rest of the list is L [X0,...Xn] is an abbreviation for [X0|[...[Xn|[ ]]...]
1. concat([ ],L,L) | Fact |
2. concat([X|L0],L1,[X|L2]) if concat(L0,L1,L2) | Rule |
3. ¬concat([0,1],[a,b],L) | Assumption |