Introduction

A complete description of a programming language includes the computational model, the syntax and semantics of programs, and the pragmatic considerations that shape the language.

Keywords and phrases: Computational model, computation, program, programming language, syntax, semantics, pragmatics, bound, free, scope, environment, block.


Suppose that we have the values 3.14 and 5, the operation of multiplication (×) and we perform the computation specified by the following arithmetic expression

2 × 3.14 × 5

the result of which is the value:

31.4

If 3.14 is an approximation for pi, we can replace 3.14 with pi abstracting the expression to:

2 × pi × 5 where pi = 3.14

We say that pi is bound to 3.14 and is a constant. The where introduces a local environment or block for local definitions. The scope of the definitions is just the expression. If 5 is intended to be the value of a radius, then the expression can be generalized by introducing a variable for the radius:

2 × pi × radius where pi = 3.14

Of course the value of the expression is the circumference of a circle so we may further abstract by assigning a name to the expression:

Circumference = 2 × pi × radius where pi = 3.14

This last equation binds the name Circumference to the expression 2 × pi × radius where pi=3.14. The variable radius is said to be free in the right hand side of the equation. It is a variable since its value is not determined. pi is not a variable, it is a constant, the name of a particular value. Any context (scope), in which this equation and the variable radius appears and radius is assigned to a value, determines a value for Circumference. A further generalization is possible by parameterizing Circumference with the variable radius.

Circumference(radius) = 2 × pi × radius where pi = 3.14

The variable radius appearing in the right hand side is no longer free. It is bound to the parameter radius. Circumference has a value (other than the right hand side) only when the parameter is replaced with an expression. For example, in

Circumference(5) = 3.14

The parameter radius is bound to the value 5 and, as a result, Circumference(5) is bound to 3.14. In this form, the definition is a recipe or program for computing the circumference of a circle from the radius of the circle. The mathematical notation (syntax) provides the programming language and arithmetic provides the computational model for the computation. The mapping from the syntax to the computational model provides the meaning (semantics) for the program. The notation employed in this example is based on the very pragmatic considerations of ease of use and understanding. It is so similar to the usual mathematical notation that most people have difficulty in distinguishing between the syntax and the computational model.

This example serves to illustrate several key ideas in the study of programming languages which are summarized in definition 1.1.


Definition 1.1

  1. A computational model is a collection of values and operations.
  2. A computation is the application of a sequence of operations to a value to yield another value.
  3. A program is a specification of a computation.
  4. A programming language is a notation for writing programs.
  5. The syntax of a programming language refers to the structure or form of programs.
  6. The semantics of a programming language describe the relationship between a program and the model of computation.
  7. The pragmatics of a programming language describe the degree of success with which a programming language meets its goals both in its faithfulness to the underlying model of computation and in its utility for human programmers.

Data

A program can be viewed as a function, the output data values are a function of the input data values.
Output = Program(Input)

Another view of a program is that it models a problem domain and the execution of the program is a simulation of the problem domain.

Program = Model of a problem domain
Execution of a program = simulation of the problem domain

In any case, data objects are central to programs.

The values can be separated into two groups, primitive and compound. The primitive values are usually numbers, boolean values, and characters. The composite values are usually arrays, records, and recursively defined values. Strings may occur as either primitive or composite values. Lists, stacks, trees, and queues are examples of recursively defined values.

Associated with the primitive values are the usual operations (e.g., arithmetic operations for the numbers). Associated with each composite value are operations to construct the values of that type and operations to access component elements of the type.  A collection of values that share a common set of operations is called a data type.

The primitive types are implemented using the underlying hardware and, sometimes, special purpose software. So that only appropriate operations are applied to values, the value's type must be known. In assembly language programs it is up to the programmer to keep track of a datum's type. Type information is contained in a descriptor.

 
Descriptor Value
When the type of a value is known at compile time the type descriptor is a part of the compiler's symbol table and the descriptor is not needed at run-time and therefore, the descriptor is discarded after compilation. When the type of a value is not known until run-time, the type descriptor must be associated with the value to permit type checking.

Boolean values are implemented using a single bit of storage. Since single bits are not usually addressable, the implementation is extended to be a single addressable unit of memory. In this case either a single bit within the addressable unit is used for the value or a zero value in the storage unit designates false while any non-zero value designates true.  Operation on bits and boolean values are included in processor instruction sets.

Integer values are most often implemented using a hardware defined integer storage representation, often 32-bits or four bytes with one bit for the sign.

sign
bit
7-bits byte byte byte
The integer arithmetic and relational operations are implemented using the set of hardware operations. The storage unit is divided into a sign and a binary number. Since the integers form an infinite set, only a subrange of integers is provided. Some languages (for example Lisp and Scheme) provide for a greatly extended range by implementing integers in lists and providing the integer operations in software. This provides for ``infinite'' precision arithmetic.

Natural number values are most often implemented using the hardware defined storage unit. The advantage of providing an natural number type is that an additional bit of storage is available thus providing larger positive values than are provided for integer values.

Rational number values may be implemented as pairs of integers. Rationals are provided when it is desired to avoid the problems of round off and truncation which occurs when floating point numbers are used to represent rational numbers.

Real number values are most often implemented using a hardware defined floating point representation. One such representation consists of 32-bits or four bytes where the first bit is the sign, the next seven bits the exponent and the remaining three bytes the mantissa.
 

sign
bit
exponent byte byte byte
The floating point arithmetic and relational operations are implemented using the set of hardware operations. Some floating point operations such as exponentiation are provided in software. The storage unit is divided into a mantissa and an exponent. Sometimes more than one storage unit is used to provide greater precision.

Character values are almost always supported by the underlying hardware and operating system, usually one byte per character.
Characters are encoded using the 8-bit ASCII or EBCDIC encoding scheme or the emerging 16-bit Unicode encoding scheme.

Enumeration values are usually represented by a subsequence of the integers and as such inherit an appropriate subset of the integer operations.

Where strings are treated as a primitive type, they are usually of fixed length and their operations are implemented in hardware.

Compound (or structured) data types include arrays, records, and files.

Abstract data types are best implemented with pointers. The user program holds a pointer to a value of the abstract type. This use of pointers is quite safe since the pointer manipulation is restricted to the implementation module and the pointer is notationally hidden.

Models of Computation

There are three basic computational models -- functional, logic, and imperative. In addition to the set of values and associated operations, each of these computational models has a set of operations which are used to define computation. The functional model uses function application, the logic model uses logical inference and the imperative model uses sequences of state changes.

The Functional Model

The functional model of computation consists of a set of values, functions, and the operations of function application and function composition. In addition to the usual values, functions can take other functions as arguments and return functions as results (higher-order functions). A program is a collection of definitions of functions and a computation is function application (evaluation of an expression).

The initial example of the computation of the circumference of a circle is an example of functional programming. A more interesting example is a program to compute the standard deviation of a list of scores. The formula for standard deviation is:

sigma = sqrt( (Sigmai=1N xi2)/N - [(Sigmai=1N xi)/N]2 )

where xi is an individual score and N is the number of scores. The formula requires computing both the sum of the scores and the sum of the squares of the scores. The higher-order function map applies a function to each element of a list and the higher-order function fold reduces a list by applying a function to the first element of a list and the result of folding the rest of the list. Figure 1 illustrates what an implementation in a functional programming language might look like.


Figure 1.1: Standard deviation using higher-order functions
sd(xs) = sqrt(v) 
         where 
            n = length( xs )
            v = fold( plus, map(sqr, xs ))/n 
                - sqr( fold(plus, xs)/n)

The functional model is important because it has been under development for hundreds of years and its notation and methods form the base upon which a large portion of our problem solving methodologies rest. The prime concern in functional programming is defining functional relationships.



 
Figure 1.2: Functional Programming
values 
functions 
function definition 
function application 
function composition
Program = set of function definitions
Computation = function application

The Logic Model

The logic model of computation consists of a set of values, definitions of relations and logical inference. Programs consist of definitions of relations and a computation is a proof (a sequence of inferences). For example the earlier circumference computation can be represented as:
circle(R, C) if Pi = 3.14 and C = 2 * pi * R.

The function is represented as a relation between R and C.

A better illustration logic programming is a program to determine the mortality of Socrates and Penelope. We begin with the fact that Socrates and Penelope are human and the rule that all humans are mortal or equivalently for all X, if X is human then X is mortal. An equivalent form of the fact and rule are:

human(Socrates)
human(Penelope)
mortal(X) if human(X)
To determine the mortality of Socrates or Penelope we make the assumption that there are no mortals i. e.
¬mortal(Y)
Figure 2 contains a computation (proof) that Socrates and Penelope are mortal.



 
Figure 1.3: Socrates is mortal
1a. human(Socrates) Fact 
1b. human(Penelope) Fact 
2. mortal(X) if human(X) Rule 
3. ¬mortal(Y) Assumption 
4a. X = Y  from 2 & 3 by unification 
4b. ¬human(Y) and modus tollens 
5a. Y = Socrates from 1 and 4 by unification 
5b. Y = Penelope
6. Contradiction  5a, 4b, and 1a; 5b, 4b and 1b 

The first step in the computation is the deduction of line 4 from lines 2 and 3. It is justified by the inference rule modus tollens which states that if the conclusion of a rule is known to be false, then so is the hypothesis. The variables X and Y may be unified since they may have any value. By unification, Lines 5a, 4b, and 1a; 5b, 4b and 1b produce contradictions and identify both Socrates and Penelope as mortal.

Resolution is the an inference rule which looks for a contradiction and it is facilitated by unification which determines if there is a substitution which makes two terms the same. The logic model is important because it is a formalization of the reasoning process. It is related to relational data bases and expert systems. The prime concern in logic programming is defining relationships.


Figure 1.4: Logic  Programming
values 
relations 
logical inference
Program = set of relation definitions
Computation = constructive proof (inference from definitions)
 

Inferences

program = set of axioms -- the formalization of knowledge computation = constructive proof of a goal statement from the program

The Imperative Model

The imperative model of computation consists of a set of values including a state and the operation of assignment to modify the state. State is the set of name-value pairs of the constants and variables. Programs consist of sequences of assignments and a computation is a sequence of states. Each step in the computation is the result of an assignment operation (See Figure 3).


Figure 1.5: State Sequence
S0 -O0-> S1 - ... -> Sn-1 -On-1-> Sn




For example, an imperative implementation of the earlier circumference computation might be written as:
constant pi = 3.14
input (Radius) Circumference := 2 * pi * Radius Output (Circumference)
The computation requires the implementation to determine the value of Radius and pi in the state and then change the state by pairing Circumference with a new value.

It is easier to keep track of the state when state information is included with the code.

constant pi = 3.14

Radius _|_, Circumference = _|_, pi=3.14
input (Radius)
Radius x, Circumference = _|_, pi=3.14
Circumference := 2 * pi * Radius
Radius x, Circumference = 2 × x × pi, pi=3.14
Output (Circumference)
Radius x, Circumference = 2 × x × pi, pi=3.14
where _|_ designates an undefined value.

The imperative model is often called the procedural model because groups of operations are abstracted into procedures.

The imperative-procedural model is important because it models change and changes are an integral part of our environment. It is the model of computation that is closest to the hardware on which programs are executed. Its closeness to hardware makes it the easiest to implement and imperative programs tend to make the least demands for system resources (time and space). The prime concern in imperative programming is defining a sequence of state changes.


Figure 1.6: Imperative  Programming
memory cells 
values 
commands
Program = sequence of commands
Computation = sequence of state changes

Computability

The functional, logic and imperative models of computation are equivalent in the sense that any problem that has a solution in one model is solvable (in principle) each of the other models. Other models of computation have been proposed. The other models have been shown to be equivalent to these three models. These are said to be universal models of computation.

The method of computation provided in a programming language is dependent on the model of computation implemented by the programming language. Most programming languages utilize more than one model of computation but one model usually predominates. Lisp, Scheme, and ML are based on the functional model of computation but provide some imperative constructs while, Miranda and Haskell provide a nearly pure implementation of the functional model of computation. Prolog provides a partial implementation of the logic computational model but, for reasons of efficiency and practicality, fails in several areas and contains imperative constructs. The language Gödel is much closer to the ideal. The imperative model requires some functional and logical elements and languages such as Pascal, C/C++, Ada and Java emphasize assignments, methods of defining various computation sequences and provide minimal implementations of the functional and logic model of computation.

Syntax and Semantics

Syntax describes the structure of programs and semantics defines the relationship between the syntax and the computational model. To simplify the task of reasoning about programs, the syntax of a programming language should be closely related to the computational model. The key principle is the principle of clarity.

Principle of Clarity

The structure of a programming language should be well defined, and the outcome of a particular section of code easily predicted.
The notation used in the functional and logic models reflects common mathematical practice and exhibits the notational simplicity and regularity found in that discipline. Because the notation used for the imperative model must be able to specify both a variety of state sequences and expressions, it tends to be irregular and of greater complexity than the notation for functional and logic models. Because of this complexity and the wide spread use of imperative programming languages, the bulk of the work done in the area of programming language semantics deals with imperative programming languages.

Pragmatics

Pragmatics is concerned about the usability of the language, the application areas, ease of implementation and use, and the language's success in fulfilling its design goals. The forces that shape a programming language include computer architecture, software engineering practices (especially the software life cycle), computational models, and the application domain (e.g. user interfaces, systems programming, and expert systems).

For a language to have wide applicability it must make provision for abstraction, generalization and modularity. Abstraction (associating a name with an object and using the name to whenever the object is required) permits the suppression of detail and provides constructs which permit the extension of a programming language. These extensions are necessary to reduce the complexity of programs. Generalization (replacing a constant with a variable) permits the application of constructs to more objects and possibly to other classes of objects. Modularity is a partitioning of a program into sections usually for separate compilation and into libraries of reusable code. Abstraction, generalization and modularity ease the burden on a programmer by permitting the programmer to introduce levels of detail and logical partitioning of a program. The implementation of the programming language should be faithful to the underlying computational model and be an efficient implementation.

Concurrent programming involves the notations for expressing potential parallel execution of portions of a program and the techniques for solving the resulting synchronization and communication problems. The concurrent programming may be implemented within any of the computational models. Concurrency within the functional and logic model is particularly attractive since, subexpression evaluation and inferences may be performed concurrently and requires no additional syntax. Concurrency in the imperative model requires additional syntactic elements.

Object-oriented programming OOP involves the notations for structuring a program into a collection of objects which compute by exchanging messages. Each object is bound up with a value and a set of operations which determine the messages to which it can respond. The objects are organized hierarchically and inherit operations from objects higher up in the hierarchy. Object-oriented programming may be implemented within any of the other computational models.

Programs are written and read by humans but are executed by computers. Since both humans and computers must be able to understand programs, it is necessary to understand the requirements of both classes of users.

The native programming languages of computers bear little resemblance to natural languages. Machine languages are unstructured and contain few, if any, constructs resembling the level at which humans think. The instructions typically include arithmetic and logical operations, memory modification instructions and branching instructions. For example, the circumference computation might be written in assembly language as:

Load Radius R1
Mult R1 2 R1
Load Pi R2
Mult R1 R2 R1
Store R1 Circumference
Because the imperative model is closer to actual hardware, imperative programs have tended to be more efficient in their use of time and space than equivalent functional and logic programs.

Natural languages are not suitable for programming languages because humans themselves do not use natural languages when they construct precise formulations of concepts and principles of particular knowledge domains. Instead, they use a mix of natural language, formalized symbolic notations of mathematics and logic and diagrams. The most successful of these symbolic notations contain a few basic objects which may be combined through a few simple rules to produce objects of arbitrary levels of complexity. In these systems, humans reduce complexity by the use of definitions, abstractions, generalizations and analogies. Successful programming languages do the same by catering to the natural problem solving approaches used by humans. Ideally, programming languages should approach the level at which humans reason and should reflect the notational approaches that humans use in problem solving and must include ways of structuring programs to ease the tasks of program understanding, debugging and maintenance.

Language Design Principles

Programming languages are largely determined by the importance the language designers attach to the the computational model, the intended application domain readability, write-ability and efficient execution. Some languages are largely determined by the necessity for efficient implementation and execution. Others are designed to be faithful to a computational model.

Research in computer architecture is producing Research in programming language implementation is producing more efficient implementations of programming languages for all models of computation.

Research in software engineering is producing a better understanding of the program structuring techniques that lead to programs that are easier to write, read (understand), and maintain.

All general purpose programming languages adhere to the following programming language design principle.

Principle of Computational Completeness
The computational model for a general purpose programming language must be universal.
Principle of Implementation
The implementation should be efficient in its use of space and time.
Principle of Programming The program should be written in a language that reflects the problem domain.
The line of reasoning developed above may be summarized in the following principle.
Principle of Programming Language Design
A programming language must be designed to facilitate readability and writ-ability for its human users and efficient execution on the available hardware.
Readability and write-ability are facilitated by the following principles.
Principle of Simplicity
The language should be based upon as few
Principle of Orthogonality
Independent functions should be controlled by independent mechanisms.
Principle of Regularity
A set of objects is said to be regular with respect to some condition if, and only if, the condition is applicable to each element of the set.
Principle of Extensibility
New objects of each syntactic class may be constructed (defined) from the basic and defined constructs in a systematic way.
The principle of regularity and and extensibility require that the basic concepts of the language should be applied consistently and universally.

In the following pages we will study programming languages as the realization of computational models, semantics as the relationship between computational models and syntax, and associated pragmatic concerns.

Historical Perspectives and Further Reading

For a programming languages text which presents programming languages from the virtual machine point of view see Pratt, from the point of view of denotational semantics see Tennent, and from a programming methodology point of view see Hehner.

Exercises

  1. Identify the applicable scope rules in Figure 2.
  2. Construct a trace of the execution of the following program (i.e. complete the following proof).
    1. parentOf(john, mary).
    2. parentOf(kay, john).
    3. parentOf(bill, kay).
    4. ancestorOf(X,Y) if parentOf(X,Y).
    5. ancestorOf(X,Z) if parentOf(X,Y) and ancestorOf(Y,Z).
    6. not ancestorOf(bill,mary).
  3. Construct a trace of the execution of fac(4) given the function definition
  4. fac(N) = if N = 0 then 1
             else N*fac(N-1)
  5. Construct a trace of the execution of the following program
  6. N := 4;
    F := 1;
    While N > 0 do
       F := N*F;
       N := N-1;
    end;
  7. Using the following definition of a list,
  8. list([ ]) -- the empty list
    list([X|L]) if list(L) -- first element is X the rest of
                              the list is L
    [X0,...Xn] is an abbreviation for [X0|[...[Xn|[ ]]...]
    complete the following computation (proof) and determine the result of concatenating the two lists.
     
    1. concat([ ],L,L) Fact 
    2. concat([X|L0],L1,[X|L2]) if concat(L0,L1,L2) Rule 
    3. ¬concat([0,1],[a,b],L) Assumption 
  9. Classify the following languages in terms of a computational model: Ada, APL, BASIC, C, COBOL, FORTRAN, Haskell, Icon, LISP, Pascal, Prolog, SNOBOL.
  10. For the following applications, determine an appropriate computational model which might serve to provide a solution: automated teller machine, flight-control system, a legal advice service, nuclear power station monitoring system, and an industrial robot.
  11. Compare the syntactical form of the if-command/expression as found in Ada, APL, BASIC, C, COBOL, FORTRAN, Haskell, Icon, LISP, Pascal, Prolog, SNOBOL.
  12. An extensible language is a language which can be extended after language design time. Compare the extensibility features of C or Pascal with those of LISP or Scheme.
  13. What programming language constructs of C are dependent on the local environment?
  14. What languages provide for binding of type to a variable at run-time?
  15. Discuss the advantages and disadvantages of early and late binding for the following language features. The type of a variable, the size of an array, the forms of expressions and commands.
  16. Compare two programming languages from the same computational paradigm with respect to the programming language design principles.
  17. Construct a program in your favorite language to do one of the following:
    1. Perform numerical integration where the function is passed as a parameter.
    2. Perform sorting where the the less-than function is passed as a parameter.

by Anthony A. Aaby. Send comments to aabyan@wwc.edu