Abstraction and Generalization

Abstraction is an emphasis on the idea, qualities and properties rather than the particulars (a suppression of detail).

Generalization is a broadening of application to encompass a larger domain of objects of the same or different type.

A parameter is a quantity whose value varies with the circumstances of its application.

Substitution---To put something in the place of another.

Encapsulate---To completely enclose.

Keywords and phrases: Name, binding, abstract, definition, declaration, variables, parameters, arguments, formals, actuals. Modularity, encapsulation, function, procedure, abstract type, generic, library, object, class, inheritance, partition, package, unit, separate compilation, linking, import, export, instance, scope. block, garbage collection, static and dynamic links, display, static and dynamic binding, activation record, environment, Static and Dynamic Scope, aliasing, variables, value, result, value-result, reference, name, unification, eager evaluation, lazy evaluation, strict, non-strict, Church-Rosser, overloading, polymorphism, monomorphism, coercion, transfer functions.


The ability to abstract and to generalize is an essential part of any intellectual activity. Abstraction and generalization are fundamental to mathematics and philosophy and are essential in computer science as well.

The importance of abstraction is derived from its ability to hide irrelevant details and from the use of names to reference objects. Programming languages provide abstraction through procedures, functions, and modules which permit the programmer to distinguish between what a program does and how it is implemented. The primary concern of the user of a program is with what it does. This is in contrast with the writer of the program whose primary concern is with how it is implemented. Abstraction is essential in the construction of programs. It places the emphasis on what an object is or does rather than how it is represented or how it works. Thus, it is the primary means of managing complexity in large programs.

Of no less importance is generalization. While abstraction reduces complexity by hiding irrelevant detail, generalization reduces complexity by replacing multiple entities which perform similar functions with a single construct. Programming languages provide generalization through variables, parameterization, generics and polymorphism. Generalization is essential in the construction of programs. It places the emphasis on the similarities between objects. Thus, it helps to manage complexity by collecting individuals into groups and providing a representative which can be used to specify any individual of the group.

Abstraction and generalization are often used together. Abstracts are generalized through parameterization to provide greater utility. In parameterization, one or more parts of an entity are replaced with a name which is new to the entity. The name is used as a parameter. When the parameterized abstract is invoked, it is invoked with a binding of the parameter to an argument. Figure N.1 summarizes the notation which will be used for abstraction and generalization.



 
Figure N.1: Abstraction and Generalization
Abstraction  name : abstract
Invocation  name
Substitution  E[p:a] (a replaces p in E)
Generalization  lambda p.E
Specialization  (lambda p.E a) = E[p:a]
Abstraction and generalization name : lambda p.E name(p) : E name p : E
Invocation and specialization  (name a) name(a)

When an abstraction is fully parameterized (all free variables bound to parameters) the abstraction may be understood without looking beyond the abstraction.

Abstraction and generalization depend on the principle of referential transparency.

Principle of Referential Transparency The meaning of an entity is unchanged when a part of the entity is replaced with an equal part.

Abstraction

Principle of Abstraction An abstract is a named entity which may be invoked by mentioning the name.
Giving an object a name gives permission to substitute the name for the thing named (or vice versa) without changing the meaning. We use the notation
name : abstract

to denote the binding of a name to an abstract. Declarations and definitions are all instances of the use of abstraction in programming languages

In addition to naming there is a second aspect to abstraction. It is that the abstract is encapsulated, that is, the details of the abstract are hidden so that the name is sufficient to represent the entity. This aspect of abstraction is considered in more detail in a later chapter.

An object is said to be fully abstract if it can be understood without reference to any thing external to the object.

Terminology. The naming aspect of abstraction is captured in the concepts of binding, definition and declaration while the hiding of irrelevant details is captured by the concept of encapsulation. A binding is an association of two entities. A definition is a binding of a name to an entity, a declaration is a definition which binds a name to a variable, and an assignment is a binding of a value and a variable.

We could equally well say identifier instead of name. A variable is an entity whose value is not fixed but may vary. Names are bound to variables in declaration statements. Among the various terms for abstracts found in other texts are module, package, library, unit, subprogram, subroutine, routine, function, procedure, abstract type, object.

Binding

The concept of binding is common to all programming languages. The objects which may be bound to names are called the bindables of the language. The bindables may include: primitive values, compound values, references to variables, types, and executable abstractions. While binding occurs in definitions and declarations, it also occurs at the virtual and hardware machine levels between values and storage locations. Typically the text of a program contains a number of bindings between names and objects and the bindings may be composed collaterally, sequentially or recursively.

A collateral binding is to perform the bindings independently of each other and then to combine the bindings to produce the completed set of bindings. Nether binding can reference a name used in any other binding. Collateral bindings are not very common but occur in Scheme and ML.

The most common way of composing bindings is sequentially. A sequential binding is to perform the bindings in the sequence in which they occur. The effect is to allow later bindings to use bindings produced earlier in the sequence. It must be noted that sequential bindings do not permit mutually recursive definitions.

In C/C++ and Pascal, constant, variable, and procedure and function bindings are sequential. To provide for mutually recursive definitions of functions and procedures, C/C++ and Pascal provide for the separation of the signature of a function or procedure from the body by the means of function prototypes & forward declarations so that so that mutually recursive definitions may be constructed.

A recursive binding is one in which the name being bound is used (directly or indirectly) in its own binding.

Programming languages that require "declaration before reference" have to invent special mechanisms to handle forward references. For dynamic data types, the rule is relaxed to permit the definition of pointer types. For functions and procedures, there are separate declarations for the signature of the function or procedure and its body. Pascal with its "forward" declarations and C++ with its function prototypes are typical.

The "declaration before reference" is often chosen to simplify the construction of the compiler. In Modula-3 and Java the choice has been made to simplify the programmer's task rather than the compiler's and permit forward references.

Encapsulation

The abstract part of a binding often contains other bindings which are said to be local definitions. Such local definitions are not visible or available to be referenced outside of the abstract. Thus the abstract part of a binding involves ``information hiding''. This hidden information is sometimes made available by exporting the names.

A module system provides a way of writing large program so that the various pieces of the program don't interfere with on another because of name clashes and also provides a way of hiding implementation details. ... A module generally consists of two parts, the export part and the local part. The export part of a module consists of language declarations for the symbols available for use in either part of the module and in other modules which import them and module declaration giving the symbols from other modules which are available for use in either part of the module and in other modules which import them. The local part of a module consists of language declarations for the symbols avaliable for use only in this part. TGPL-Hill and Lloyd

The work of constructing large programs is divided among several people, each of whom must produce a part of the whole. Each part is called a module and each programmer must be able to construct his/her module without knowing the internal details of the other parts. This is only possible when each module is is separated into an interface part and an implementation part. The interface part describes all the information required to use the module while the implementation part describes the implementation. This idea is already present in most programming languages in the manner in which functions and procedures are defined. Function and procedure definitions usually are separated into two parts. The first part gives the subprogram's name and parameter requirements and the second part describes the implementation. A module is a generalization of the concept of abstraction in that a module is permitted to contain a collection of definitions. An additional goal of modules is to confine changes to a few modules rather than throughout the program.

While the concept of modules is a useful abstraction, the full advantages of modules are gained only when modules may be written, compiled and possibly executed separately. In many cases modules should be able to be tested independently of other modules.

EXPANDTHIS!!! Advantages

Implementation Typical applications: examples from Ada, C++, etc

Generalization

Principle of Generalization A generic is an entity which may be specialized (elaborated) upon invocation.
Generalization permits the use of a single pattern to represent each member of a group. We use the notation: lambda p.B'

(called a lambda abstraction) to denote the generalization of B where p is called a parameter and B' is B with p replacing any number of occurrences of some part of B by p. The parameter p is said to be bound in the expression but free in B' and the scope of p is said to be B'.

The symbol lambda is a quantifier. Quantifiers are used to replace constants with variables.

The specialization (elaboration) of a generic is called application and takes the form:

(lambda p.B a)

It denotes the expression B' obtained from the lambda expression when the free occurences of p in B are replaced by a.

Aside. The symbol lambda was introduced by Church for variable introduction in the lambda calculus. It roughly corresponds to the symbol forall, the universal quantifier, of first-order logic. The appendix contains a brief introduction to first-order logic. The functional programming chapter contains a brief introduction to the lambda calculus.
Generalization is often combined with abstraction and takes the following form:
n( p ) : B

where p is the name, x is the parameter, and B is the abstract. The invocation of the abstract takes the form:

n(a)

or occaisionally (n a) where n is the name and a is called the argument whose value is substituted for the parameter. Upon invocation of the abstract, the argument is bound to the parameter. Figure N.1 summarizes the variety of notation that is used to denote the elaboration of a generalization.

Most programming languages permit an implicit form of generalization in which variables may be introduced without providing for an invocation procedure which replaces the parameter with an argument. For example, consider the following psudocode for a program which computes the circumference of a circle:
 

pi : 3.14

c : 2*pi*r

begin
   r := 5
   write c
   r := 20
   write c
end
The value of r depends on the context in which the function is defined. The variable r is a global name and is said to be free. In the first write command, the circumference is computed for a circle of radius 5 while in the second write command the circumference is computed for a circle of radius 20. The write commands cannot be understood without reference to both the definition of c and to the environment (pi is viewed as a constant). Therefore, this program is not ``fully abstract''. In contrast, the following program is fully abstract:
 
pi : 3.14

c(r) : 2*pi*r

begin
   FirstRadius := 5
   write c(FirstRadius)
   SecondRadius := 20
   write c(SecondRadius)
end
The principle of generalization depends on the analogy principle.
Analogy Principle When there is a conformation in pattern between two different objects, the objects may be replaced with a single object parameterized to permit the reconstruction of the original objects.
It is the analogy principle which permits the introduction of a variable to represent an arbitrary element of a class.

The Principle of Generalization makes no restrictions on parameters or the parts of an entity that may be parameterized. Neither should programming languages. This is emphasized in the following principle:

Principle of Parameterization A parameter of a generic may be from any domain.
Terminology. The terms formal parameters (formals) and actual parameters (actuals) are sometimes used instead of the terms parameters and arguments respectively.

Substitution

The utility of both abstraction and generalization depend on substitution. The tie between the two is captured in the following principle:
Principle of Correspondence Parameter binding mechanisms and definition mechanisms are equivalent.
The Principle of Correspondence is a formalization of that aspect of the Principle of Abstraction that implies that definition and substitution are intimately related.

We use the notation

E[p:a]

to denote the substitution of a for p in E. The notation is read as ``E[p:a] is the expression obtained from E by replacing all free occurrences of p with a'' .

Terminology. The notation for substitution was chosen to emphasize the relationship between abstraction and substitution. Other texts use the notation E[p:=a] for substitution. Their notation is motivated by the assignment operation which assigns the value a to p. Other texts use the notation E[a/p] for substitution. This latter notation is motivated by the cancelation that occurs when a number is multiplied by its inverse ( p(a/p) = a).
Together, abstraction, invokation, generalization and specialization provide powerful mechanisms for program development. Generalization provides a mechanism for the construction of common subexpressions and abstraction a mechanism for the factoring out of the common subexpressions. In the following example, the factors are first generalized to contain common subexpressions and then abstracted out. The product
(a+b-c)*(x+y-z)

is formed from two very similar factors. The factors generalize to a common expression

lambda i j k. i+j-k.

The lambda expression can use to rewrite the product as:

(lambda i j k. i+j-k) a b c * (lambda i j k. i+j-k) x y z.

The lambda expression can be abstracted to a name with three arguments,

f(i j k) : i+j-k,

which can be used to replace the lambda expressions with the name and we get the expression

f(a b c) * f(x y z) where f(i j k) : i+j-k

which clearly indicates the similarity of the the factors.

Block structure

A block is a construct that delimits the scope of any definitions that it may contain. It provides a local environment i.e., a opportunity for local definitions. The block structure (the textual relationship between blocks) of a programming language has a great deal of influence over program structure and modularity. There are three basic block structures--monolithic, flat and nested. In the discussion that follows, we will refer to the block structures found in Figure N.2.



 
Figure M.N: Block Syntax
let Definitions in Body end 

Body where Definitions 


Figure N.2 presents two styles of blocks, the first requires the definitions to proceed the body and the second requires definitions to follow the body.

A program has a monolithic block structure if it consists of just one block. This structure is typical of BASIC and early versions of COBOL. The monolithic structure is suitable only for small programs. The scope of every definition is the entire program. Typically all definitions are grouped in one place even if they are used in different parts of the program. 


Figure M.N: Monolithic Block Structure
 
Global Data 
Return Address1 
... 
Return Addressn 
 

A program has a flat block structure if it is partitioned into distinct blocks, an outer all inclosing block one or more inner blocks i.e., the body may contain additional blocks but the inner blocks may not contain blocks. This structure is typical of FORTRAN and C. In these languages, all subprograms (procedures and functions) are separate, and each acts as a block. Variables can be declared inside a subprogram are then local to that subprogram. Subprogram names are part of the outer block and thus their scope is the entire program along with global variables. All subprogram names and global variables must be unique. If a variable cannot be local to a subprogram then it must be global and accessable by all subprograms even though it is used in only a couple of subprograms.
 
 

A: 
 
 
B: 
 
 
... N: 
 
 
 

A program has nested block structure if blocks may be nested inside other blocks i.e., there is no restriction on the nesting of blocks within the body. This is typical of the block structure of the Algol-like languages. A block can be located close to the point of use.  In blocks visibility is controlled by nesting.  All names are visible (implicitly exported) to internally nested blocks.  No names are visible (exported) to enclosing blocks.  In a block, the only names visible are those that are declared in all enclosing blocks or are declared in the block, but not those declared in nested blocks.

Figure M.N: Nested blocks
A: 
B: 
C: 
 
 
 

D: 

 
 
 
A: 
  • Can see all names in A including the names A and B.
  • Cannot see names in B, C, or D.
B: 
  • Can see all names in A and B including the names A, B, and C.
  • Cannot see names in C or D.
C: 
  • Can see all names in A, B, and C including the names A, B, and C.
  • Cannot see names in D..
D: 
  • Can see all names in A and D including the names A, B, and D.
  • Cannot see names in B or C.
A local name is one that is declared within a block for use only within that block.

A global name is a name that when referenced within a block refers to a name declared outside the block.

An activation of a block is a time interval during which that block is being executed.

The three basic block structures are sufficient for what is called programming in the small (PITS). These are programs which are comprehensible in their entirety by an individual programmer. However, they are not general enough for very large programs. Large programs which are written by many individuals and which must consist of modules that can be developed and tested independently of other modules. Such programming is called programming in the large (PITL).

Activation Records

Each block Storage for local variables.

Scope Rules

The act of partitioning a program raises the issue of the scope of names. Which objects with in the partition are to be visible outside the partition? The usual solution is to designate some names to be exported and others to be private or local to the partition and invisible to other partitions. In case there might be name conflict between exported names from partitions, partitions are often permitted to designate names that are to be imported from designated partitions or to qualify the name with the partition name.

The scope rules for modules define relationships among the names within the partitions. There are four choices.

Name conflict is resolved via qualification with the partition name.

Dynamic scope rules

A dynamic scope rule defines the dynamic scope of each association in terms of the dynamic course of program execution. Lisp.

implementation ease, cheap generalization for parameterless functions.

Static scope rules

Teminology. Static scope rules are also called lexical scope rules.
Cobol, BASIC, FORTRAN, Prolog, Lambda calculus, Scheme, Miranda, Algol-60, Pascal

Environment

An environment is a set of bindings.

Scope has to do with the range of visibility of names. For example, a national boundary may encapsulate a natural language. However, some words used within the boundary are not native words. They are words borrowed from some other language and are defined in that foreign language. So it is in a program. A definition introduces a name and a boundary ( the object ). The object may contain names for which there is no local definition (assuming definitions may be nested). These names are said to be free. The meaning assigned to these names is to be found outside of the definition. The rules followed in determining the meaning of these free names are called scope rules.

Scope It is concerned with name control.

ADTs

An even more effective approach is to separate the signatures of the operations from the bodies of the operations and the type representation so that the operation bodies and type representation can be compiled separately. This facilitates the development of software in that when an abstract data type's representation is changed (e.g. to improve performance) the changes are localized to the abstract data type.
name : adt
operation signatures
...
name : adt body
type representation definition
operation bodies
...

Pragmatics

Bindings and Binding Times

Bindings may occur at various times from the point of language definition through program execution. The time at which the binding occurs is termed the binding time. Four distinct binding times may be distinguished.
  1. Language design time. Much of the structure of a programming language is fixed and language design time. Data types, data structures, command and expression forms, and program structure are examples of language features that are fixed at language design time. Most programming languages make provision for extending the language by providing for programmer defined data types, expressions and commands.
  2. Language implementation time. Some language features are determined by the implementation. Programs that run on one computer may not run or give incorrect results when run on another machine. This occurs when the hardware differs in its representation of numbers and arithmetic operations. For example, the maxint of Pascal is determined by the implementation. The C programming language provides access to the underlying machine and therefore programs which depend on the characteristics of the underlying machine may not perform as expected when moved to another machine.
  3. Program translation time. The binding between the source code and the object code occurs at program translation time. Programmer defined variables and types are another example of bindings that occur at program translation time.
  4. Program execution time. Binding of values to variables and formal parameters to actual parameters occur during program execution.
Early binding often permits more efficient execution of programs though translation time type checking while late binding permits more flexibility through program modification a run-time. The implementation of recursion may require allocation of memory at run-time in contrast a one time to allocation of memory at compile-time. An Example from Pratt 1984:

X := X + 10

  1. Set of possible data types for X (Language design time: Fortran; Translation time: C, Pascal (user defined))
  2. Type of variable X (Translation time: C; Execution time: Lisp, APL)
  3. Set of possible values for X (Language implementation time (often constrained by hardware))
  4. Value of the variable X (Execution time (assignment))
  5. Representation of the constant 10 (language definition time (base 10); language implementation time (base 2)).
  6. Properties of the operater + (language definition time - addition operations; translation time - type of addition; execution-time - APL)

Procedures and Functions

THISSHOULDBEGENERALIZETOINCLUDEOTHERABSTRACTIONS!
In the discussion which follows, the term subprogram will be used to refer to whole programs, procedures and functions.

A program may be composed of a main program which during execution may call subprograms which in turn may call other subprograms and so on. When a subprogram is called, the calling subprogram waits for the called subprogram to terminate. Each subprogram is expected to eventually terminate and return control to the calling subprogram. The execution of the calling subprogram resumes at the point immediately following the point of call. Each subprogram may have its own local data which is found in an activation record. An activation record consists of an association between variables and the value to which they are assigned. An activation record may be created each time a subprogram is called and destroyed when the subprogram terminates.

DYNAMIC VS. STATIC ALLOCATION

The run time environment must keep track of the current instruction and the referencing environment for each active or waiting program so that when a subprogram terminates, the proper instruction and data environment may be selected for the calling subprogram.

The current instruction of the calling subprogram is maintained on a stack. When a subprogram is called, the address of the instruction following the call of the calling program is pushed on the stack. When a subprogram terminates, the instruction pointer is set to the address on the top of the stack and the address popped off the stack. The stack is often called the return address stack


Figure M.N: Return Address Stack
 
Return Address1 
... 
Return Addressn 
 

The addresses of the current environment is also maintained on a stack. The top of the stack always points to the current environment. When a subprogram is called, the address of the new environment is pushed on the stack. When a subprogram terminates, the stack is popped revealing the previous environment. The stack is often called the dynamic links because the stack contains links (pointers) which reveal the dynamic history of the program.

When a programming language does not permit recursive procedures and data structure size is independent of computed or input data, the maximum storage requirements of the program can be determined at compile time. This simplifies the run time support required by the program and it is possible to statically allocate the storage used during program execution.

Parameters and Arguments

Strict, Non-strict, Eager and Lazy
An generic is said to be strict in a parameter if it is sure to need the value of the parameter and non-strict in a parameter if it may not require the value of the parameter. Arithmetic operators are strict
A + B

because their arguments must be evaluated to determine the value of the arithmetic expression but the conditional expression

if B then E1 else E2

is not strict in its second and third arguments since the selection of the second or third argument is dependent on the value of the boolean condition (the first argument).

Most programming languages assume that abstracts are strict in their parameters and, therefore, the parameters are evaluated when the function is called. This evaluation scheme is called eager evaluation. This is not always desirable and so some languages provide a mechanism for the programmer to inform a function not to evaluate its parameters. Scheme provides for the quote operator to prevent the evaluation of an argument. Logic languages like Prolog and functional languages languages like Haskell and Miranda are non-strict languages and the arguments are evaluated only when the value is required. This evaluation scheme is called normal-order evaluation and is often implemented using lazy evaluation (the argument is evaluated only when it is first needed).

Most languages use strict evaluation because it is more efficient and simplifies the implementation of parameter passing for imperative programming languages. Normal-order evaluation coupled with side-effects found in imperative langauges produces unexpected results. Algol-60's provides a parameter passing mechanism (pass by name) which is based on that does not provide the generality that is required in the imperative model as the following example shows.
 

Figure M.N: Algol-60, Jensen's device 
procedure swap(x,y:sometype);
var t:sometype
begin
   t := x; x := y; y := t
end;
...
I := 1
a[I] := 3
swap(I,a[I])
Based on the code in the body of the procedure, it would seem that the values of the arguments would be swaped. That this is not the case is easily seen when the formal parameters are textually replaced with the actual parameters and the resulting code is executed in the context of the actual parameters. In this case, prior to the call to sort, I is 1 and A[i] is 3. Upon textual substitution, we have
 
...
I := 1
{I = 1}
a[I] := 3 
{I=1, a[I] = a[1] = 3}
t := I; I := a[I]; a[I] := t -- replaces call to swap
{T=1, I=3, a[i] = a[3] = 1, a[1] = 3}
After execution, I is 3 and a[1] is still 3, but a[3] is now 1.
Argument Passing Mechanisms
In the previous chapter (Abstraction and Generalization), it appears that when an argument is passed to an abstract, it replaces the parameter, that is, it textually replaces the parameter. If the argument is large, the space and time requirements can be a significant overhead. Especially since the each time the argument is referenced, it must be evaluated not in the internal (local) environment of the abstract but in the environment external to (global) the abstract. This need not be the case and several mechanisms have been developed to make passing arguments simpler and more efficient.

The copy mechanism requires values to be copied into an generic when it is entered and copied out of the generic when the generic is exited. This form of parameter passing is often referred to as passing by value. The formal parameters are local variables and the argument is copied into the local variable on entry to the generic and copied out of the local variable to the argument on exit from the generic.

The value parameter of Pascal and the in parameter of Ada are examples of parameters which may be passed by using the copy mechanism. The value of the argument is copied into the parameter on entry but the value of the formal parameter is not copied to the actual parameter on exit. In imperative languages, copying is unnecessary if the language prohibits assignment to the formal parameter. In such a case, the parameter may be passed by reference.

Ada's out parameter and function results are examples of parameters which may be passed by using the copy mechanism. The value of the argument (actual parameter) is not copied into the formal parameter on entry but the value of the parameter is copied into the argument upon exit. In Pascal the function name is used as the parameter and assignments may be made to the function name. This form of parameter passing is often referred to as passing by result.

When the passing by value and result are combined, the passing mechanism is referred to as passing by value-result. Ada's in out parameter is an example of a parameter which may be passed by this form of the copy mechanism. The value of the actual parameter is copied into the formal parameter on entry and the value of the formal parameter is copied into the actual parameter upon exit.

The copy mechanism has some disadvantages. The copying of large composite values (arrays etc) is expensive and the parameters must be assignable (e.g. expressions and file types in Pascal are not assignable).

The effect of a definitional mechanism is as if the abstact were surrounded by a block, in which there is a definition that binds the parameter to the argument.

Parameter : Argument

An parameter is said to be passed by reference if the argument is an address. References to the parameter are references to the argument. Assignments to the parmeter are assignments to the argument. The reference parameter of Pascal and the array and structure parameters of C++ are passed using this mechanism. A toster provides an illustration of the effect of passing by reference.

If an argument --- A parameter is said to be passed by name if, in effect, the argument replaces parameter throughout the body of the subroutine (textual substitution with suitable renaming of local variables to avoid conflicts between local variables and variables occuring in the argument) i.e., in the subprogram, each reference to the parameter results in an evaluation of the argument in the calling environment.

In addition to the problems of the pass-by-name mechanism of Algol-60, imperative languages with with reference parameters present the possibility of aliasing. Aliasing occurs when two or more names reference the same object. For example, the following procedure and call,

procedure confuse (var m, n : Integer );
   begin
      n := 1; n := m + n
   end;
...
i := 5;
confuse(i,i)
both m and n are bound to the same variable, i, and i is initially 5, then after the call to the procedure, the value of i is 2 not 6. m and n are both bound to i and after the assignment n := 1, the value of m is also 1.

Scope and Blocks

A variables declared within a block have a lifetime which extends from the moment an activation record is created for the block until the activation record for the block is destroyed. A variable is bound to an offset within the activation record at compile time. It is bound to a specific storage location when the block is activated and becomes a part of storage.

STATIC/DYNAMIC ALLOCATION

Data Access

block structure, COMMON, ADT's, aliasing

Scope Rules

Conceptually the dynamic scope rules may be implemented as follows. Each variable is assigned a stack to hold the current values of the variable.

When a subprogram is called, a new uninitialized stack element is pushed on the stack corresponding to each variable in the block.

A reference to a variable involves the inspection or updating of the top element of the appropriate stack. This provides access to the variable in closest block with respect to the dynamic calling sequence.

When a subprogram terminates, the stacks corresponding to the variables of the block are popped, restoring the calling environment. The static scope rules may be implemented as follows. The data section of each procedure is associated with an activation record. The activation records are dynamically allocated space on a runtime stack. Each recursive call is associated with it own activation record. Associated with each activation record is a dynamic link which points to the previous activation records, a return address which is the address of the instruction to be executed upon return from the procedure and a static link which provides access to the referencing environment.

An activation record consists of storage for local variables, the static and dynamic links and the return address.


Figure M.N: Activation Record
 
Static Link Return Address Dynamic Link 
Local Data 
 

The runtime stack of activation records (local data, static and dynamic links).


Figure M.N: Runtime Stack
 
Main 
Local data for M 
 
SL  RA  DL 
Local data for A 
 
SL RA DL 
Local data for B 
 
Main calls A which calls B.

Global data values are found by following the static chain to the appropriate activation record.

An alternative method for the implementation of static scope rules is the display. A display is a set of registers ( in hardware or software) which contain pointers to the current environment. On procedure call, the current display is pushed onto the runtime stack and a new display is constructed containing the revised environment. On procedure exit, the display is restored from the copy on the stack.


Figure M.N: Display & Runtime Stack


Partitions

A partition of a set is a collection of disjoint sets whose union is the set.

There are a number of mechanisms for partitioning program text. Functions and procedures are among the most common. However, the result is still a single file. When the partitions of program text are arranged in separate files, the partitions are called modules. Here are several program partitioning mechanisms.

Partitioning of program text is desirable to provide for separate compilation and for pipeline processing of data.

There are a number of mechanisms for combining the partitions into a single program for the purposes of compilation and execution. The include statement is provided in a number of languages. It is a compiler directive with directs the compiler to textually include the named file in the source program. In some systems the partitions may be separately compiled and there is a linking phase in which the compiled program modules are linked together for execution. In other systems, at run-time any missing function or procedure results in a run-time search for the missing module which if found is then executed or if not found results in a run-time error.

Modules

A module is a program unit which is an (more or less) independent entity.  A module consists of a number of definitions (of types, variables, functions, procedures and so on), with a clearly defined interface stating what it exports to other modules which use it.  Modules have a number of advantages for the construction of large programs. Modules are used to construct libraries, ADTs, classes, interfaces, and implementations.  A module is the compilation unit. A module which contains only type abstractions is a specification or interface module.

In program construction the module designer must answer the following questions.

Programming in the large is concerned with programs that are not comprehensible by a single individual and are developed by teams of programmers. At this level programs must consist of modules that can be written, compiled, and tested independently of other modules. A module has a single purpose, and has a narrow interface to other modules. It is likely to be reuseable (able to be incorporated into may programs) and modifiable with out forcing changes in other modules.

Modules must provide answers to two questions:

The what is of concern to the user of the module while the how is of concern to the implementer of the module.

Functions and procedures are simple modules. Their signature is a description of what they do while their body describes how it is achieved. More typically a module encapsulates a group of components such as types, constants, variables, procedures, functions and so on.

To present a narrow interface to other modules, a module makes only a few components visible outside. Such components are said to be exported by the module. The other components are said to be hidden inside the module. The hidden components are used to implement the exported components.

Access to the components is often by a qualified name -- module name. component name. When strong safety considerations are important, modules using components of another module may be required to explicitly import the required module and the desired components.

Historical Perspectives and Further Reading

Exercises

  1. [time/difficulty] (section) Problem statement
  2. Extend the compiler to handle constant, type, variable, function and procedure definitions and references to the same.
  3. What is the effect of providing lazy evaluation in an imperative programming language? % Ans: Due to the presence of side effects, the value of the actual % parameter may be different between the point of entry and the point % of evaluation.
  4. Extend the compiler to handle parameterization of functions and procedures.
  5. For a specific programming language, report on its provision for abstraction and generalization. Specifically, what entities may be named, what entities may be parameterized, what entities may be passed as parameters, and what entities may be returned as results (of functions). What irregularities do you find in the language?
  6. Algebraic Semantics: stack, tree, queue, grade book etc
  7. Lexical Scope Rules
  8. Dynamic Scope Rules
  9. Parameter Passing
  10. Run-time Stack

Author: Anthony Aaby.