Generalization is a broadening of application to encompass a larger domain of objects of the same or different type.
A parameter is a quantity whose value varies with the circumstances of its application.
Substitution---To put something in the place of another.
Encapsulate---To completely enclose.
Keywords and phrases: Name, binding, abstract, definition, declaration, variables, parameters, arguments, formals, actuals. Modularity, encapsulation, function, procedure, abstract type, generic, library, object, class, inheritance, partition, package, unit, separate compilation, linking, import, export, instance, scope. block, garbage collection, static and dynamic links, display, static and dynamic binding, activation record, environment, Static and Dynamic Scope, aliasing, variables, value, result, value-result, reference, name, unification, eager evaluation, lazy evaluation, strict, non-strict, Church-Rosser, overloading, polymorphism, monomorphism, coercion, transfer functions.
The ability to abstract and to generalize is an essential part of any intellectual activity. Abstraction and generalization are fundamental to mathematics and philosophy and are essential in computer science as well.
The importance of abstraction is derived from its ability to hide irrelevant details and from the use of names to reference objects. Programming languages provide abstraction through procedures, functions, and modules which permit the programmer to distinguish between what a program does and how it is implemented. The primary concern of the user of a program is with what it does. This is in contrast with the writer of the program whose primary concern is with how it is implemented. Abstraction is essential in the construction of programs. It places the emphasis on what an object is or does rather than how it is represented or how it works. Thus, it is the primary means of managing complexity in large programs.
Of no less importance is generalization. While abstraction reduces complexity by hiding irrelevant detail, generalization reduces complexity by replacing multiple entities which perform similar functions with a single construct. Programming languages provide generalization through variables, parameterization, generics and polymorphism. Generalization is essential in the construction of programs. It places the emphasis on the similarities between objects. Thus, it helps to manage complexity by collecting individuals into groups and providing a representative which can be used to specify any individual of the group.
Abstraction and generalization are often used together. Abstracts are generalized through parameterization to provide greater utility. In parameterization, one or more parts of an entity are replaced with a name which is new to the entity. The name is used as a parameter. When the parameterized abstract is invoked, it is invoked with a binding of the parameter to an argument. Figure N.1 summarizes the notation which will be used for abstraction and generalization.
Abstraction | name : abstract | ||
Invocation | name | ||
Substitution | E[p:a] | (a replaces p in E) | |
Generalization | lambda p.E | ||
Specialization | (lambda p.E a) = E[p:a] | ||
Abstraction and generalization | name : lambda p.E | name(p) : E | name p : E |
Invocation and specialization | (name a) | name(a) |
When an abstraction is fully parameterized (all free variables bound to parameters) the abstraction may be understood without looking beyond the abstraction.
Abstraction and generalization depend on the principle of referential transparency.
Principle of Referential Transparency The meaning of an entity is unchanged when a part of the entity is replaced with an equal part.
Principle of Abstraction An abstract is a named entity which may be invoked by mentioning the name.Giving an object a name gives permission to substitute the name for the thing named (or vice versa) without changing the meaning. We use the notation
to denote the binding of a name to an abstract. Declarations and definitions are all instances of the use of abstraction in programming languages
In addition to naming there is a second aspect to abstraction. It is that the abstract is encapsulated, that is, the details of the abstract are hidden so that the name is sufficient to represent the entity. This aspect of abstraction is considered in more detail in a later chapter.
An object is said to be fully abstract if it can be understood without reference to any thing external to the object.
Terminology. The naming aspect of abstraction is captured in the concepts of binding, definition and declaration while the hiding of irrelevant details is captured by the concept of encapsulation. A binding is an association of two entities. A definition is a binding of a name to an entity, a declaration is a definition which binds a name to a variable, and an assignment is a binding of a value and a variable.We could equally well say identifier instead of name. A variable is an entity whose value is not fixed but may vary. Names are bound to variables in declaration statements. Among the various terms for abstracts found in other texts are module, package, library, unit, subprogram, subroutine, routine, function, procedure, abstract type, object.
A collateral binding is to perform the bindings independently of each other and then to combine the bindings to produce the completed set of bindings. Nether binding can reference a name used in any other binding. Collateral bindings are not very common but occur in Scheme and ML.
The most common way of composing bindings is sequentially. A sequential binding is to perform the bindings in the sequence in which they occur. The effect is to allow later bindings to use bindings produced earlier in the sequence. It must be noted that sequential bindings do not permit mutually recursive definitions.
In C/C++ and Pascal, constant, variable, and procedure and function bindings are sequential. To provide for mutually recursive definitions of functions and procedures, C/C++ and Pascal provide for the separation of the signature of a function or procedure from the body by the means of function prototypes & forward declarations so that so that mutually recursive definitions may be constructed.
A recursive binding is one in which the name being bound is used (directly or indirectly) in its own binding.
Programming languages that require "declaration before reference" have to invent special mechanisms to handle forward references. For dynamic data types, the rule is relaxed to permit the definition of pointer types. For functions and procedures, there are separate declarations for the signature of the function or procedure and its body. Pascal with its "forward" declarations and C++ with its function prototypes are typical.
The "declaration before reference" is often chosen to simplify the construction of the compiler. In Modula-3 and Java the choice has been made to simplify the programmer's task rather than the compiler's and permit forward references.
A module system provides a way of writing large program so that the various pieces of the program don't interfere with on another because of name clashes and also provides a way of hiding implementation details. ... A module generally consists of two parts, the export part and the local part. The export part of a module consists of language declarations for the symbols available for use in either part of the module and in other modules which import them and module declaration giving the symbols from other modules which are available for use in either part of the module and in other modules which import them. The local part of a module consists of language declarations for the symbols avaliable for use only in this part. TGPL-Hill and Lloyd
The work of constructing large programs is divided among several people, each of whom must produce a part of the whole. Each part is called a module and each programmer must be able to construct his/her module without knowing the internal details of the other parts. This is only possible when each module is is separated into an interface part and an implementation part. The interface part describes all the information required to use the module while the implementation part describes the implementation. This idea is already present in most programming languages in the manner in which functions and procedures are defined. Function and procedure definitions usually are separated into two parts. The first part gives the subprogram's name and parameter requirements and the second part describes the implementation. A module is a generalization of the concept of abstraction in that a module is permitted to contain a collection of definitions. An additional goal of modules is to confine changes to a few modules rather than throughout the program.
While the concept of modules is a useful abstraction, the full advantages of modules are gained only when modules may be written, compiled and possibly executed separately. In many cases modules should be able to be tested independently of other modules.
EXPANDTHIS!!! Advantages
Principle of Generalization A generic is an entity which may be specialized (elaborated) upon invocation.Generalization permits the use of a single pattern to represent each member of a group. We use the notation: lambda p.B'
(called a lambda abstraction) to denote the generalization of B where p is called a parameter and B' is B with p replacing any number of occurrences of some part of B by p. The parameter p is said to be bound in the expression but free in B' and the scope of p is said to be B'.
The symbol lambda is a quantifier. Quantifiers are used to replace constants with variables.
The specialization (elaboration) of a generic is called application and takes the form:
It denotes the expression B' obtained from the lambda expression when the free occurences of p in B are replaced by a.
Aside. The symbol lambda was introduced by Church for variable introduction in the lambda calculus. It roughly corresponds to the symbol forall, the universal quantifier, of first-order logic. The appendix contains a brief introduction to first-order logic. The functional programming chapter contains a brief introduction to the lambda calculus.Generalization is often combined with abstraction and takes the following form:
where p is the name, x is the parameter, and B is the abstract. The invocation of the abstract takes the form:
or occaisionally (n a) where n is the name and a is called the argument whose value is substituted for the parameter. Upon invocation of the abstract, the argument is bound to the parameter. Figure N.1 summarizes the variety of notation that is used to denote the elaboration of a generalization.
Most programming languages permit an implicit form of generalization
in which variables may be introduced without providing for an invocation
procedure which replaces the parameter with an argument. For example, consider
the following psudocode for a program which computes the circumference
of a circle:
pi : 3.14 c : 2*pi*r begin r := 5 write c r := 20 write c end |
pi : 3.14 c(r) : 2*pi*r begin FirstRadius := 5 write c(FirstRadius) SecondRadius := 20 write c(SecondRadius) end |
It is the analogy principle which permits the introduction of a variable to represent an arbitrary element of a class.Analogy Principle When there is a conformation in pattern between two different objects, the objects may be replaced with a single object parameterized to permit the reconstruction of the original objects.
The Principle of Generalization makes no restrictions on parameters or the parts of an entity that may be parameterized. Neither should programming languages. This is emphasized in the following principle:
Principle of Parameterization A parameter of a generic may be from any domain.
Terminology. The terms formal parameters (formals) and actual parameters (actuals) are sometimes used instead of the terms parameters and arguments respectively.
Principle of Correspondence Parameter binding mechanisms and definition mechanisms are equivalent.The Principle of Correspondence is a formalization of that aspect of the Principle of Abstraction that implies that definition and substitution are intimately related.
We use the notation
to denote the substitution of a for p in E. The notation is read as ``E[p:a] is the expression obtained from E by replacing all free occurrences of p with a'' .
Terminology. The notation for substitution was chosen to emphasize the relationship between abstraction and substitution. Other texts use the notation E[p:=a] for substitution. Their notation is motivated by the assignment operation which assigns the value a to p. Other texts use the notation E[a/p] for substitution. This latter notation is motivated by the cancelation that occurs when a number is multiplied by its inverse ( p(a/p) = a).Together, abstraction, invokation, generalization and specialization provide powerful mechanisms for program development. Generalization provides a mechanism for the construction of common subexpressions and abstraction a mechanism for the factoring out of the common subexpressions. In the following example, the factors are first generalized to contain common subexpressions and then abstracted out. The product
is formed from two very similar factors. The factors generalize to a common expression
The lambda expression can use to rewrite the product as:
The lambda expression can be abstracted to a name with three arguments,
which can be used to replace the lambda expressions with the name and we get the expression
which clearly indicates the similarity of the the factors.
let Definitions in Body end
Body where Definitions |
Figure N.2 presents two styles of blocks, the first requires the definitions to proceed the body and the second requires definitions to follow the body.
A program has a monolithic block structure if it consists of just one block. This structure is typical of BASIC and early versions of COBOL. The monolithic structure is suitable only for small programs. The scope of every definition is the entire program. Typically all definitions are grouped in one place even if they are used in different parts of the program.
Global Data Return Address1 ... Return Addressn
A program has a flat block structure if it is partitioned into
distinct blocks, an outer all inclosing block one or more inner blocks
i.e., the body may contain additional blocks but the inner blocks may not
contain blocks. This structure is typical of FORTRAN and C. In these languages,
all subprograms (procedures and functions) are separate, and each acts
as a block. Variables can be declared inside a subprogram are then local
to that subprogram. Subprogram names are part of the outer block and thus
their scope is the entire program along with global variables. All subprogram
names and global variables must be unique. If a variable cannot be local
to a subprogram then it must be global and accessable by all subprograms
even though it is used in only a couple of subprograms.
A:
|
B:
|
... | N:
|
A program has nested block structure if blocks may be nested inside other blocks i.e., there is no restriction on the nesting of blocks within the body. This is typical of the block structure of the Algol-like languages. A block can be located close to the point of use. In blocks visibility is controlled by nesting. All names are visible (implicitly exported) to internally nested blocks. No names are visible (exported) to enclosing blocks. In a block, the only names visible are those that are declared in all enclosing blocks or are declared in the block, but not those declared in nested blocks.
A:
|
|
A global name is a name that when referenced within a block refers to a name declared outside the block.
An activation of a block is a time interval during which that block is being executed.
The three basic block structures are sufficient for what is called programming in the small (PITS). These are programs which are comprehensible in their entirety by an individual programmer. However, they are not general enough for very large programs. Large programs which are written by many individuals and which must consist of modules that can be developed and tested independently of other modules. Such programming is called programming in the large (PITL).
The scope rules for modules define relationships among the names within the partitions. There are four choices.
implementation ease, cheap generalization for parameterless functions.
Teminology. Static scope rules are also called lexical scope rules.Cobol, BASIC, FORTRAN, Prolog, Lambda calculus, Scheme, Miranda, Algol-60, Pascal
Scope has to do with the range of visibility of names. For example, a national boundary may encapsulate a natural language. However, some words used within the boundary are not native words. They are words borrowed from some other language and are defined in that foreign language. So it is in a program. A definition introduces a name and a boundary ( the object ). The object may contain names for which there is no local definition (assuming definitions may be nested). These names are said to be free. The meaning assigned to these names is to be found outside of the definition. The rules followed in determining the meaning of these free names are called scope rules.
Scope It is concerned with name control.
name : adt operation signatures ...
name : adt body type representation definition operation bodies ...
X := X + 10
A program may be composed of a main program which during execution may call subprograms which in turn may call other subprograms and so on. When a subprogram is called, the calling subprogram waits for the called subprogram to terminate. Each subprogram is expected to eventually terminate and return control to the calling subprogram. The execution of the calling subprogram resumes at the point immediately following the point of call. Each subprogram may have its own local data which is found in an activation record. An activation record consists of an association between variables and the value to which they are assigned. An activation record may be created each time a subprogram is called and destroyed when the subprogram terminates.
DYNAMIC VS. STATIC ALLOCATION
The run time environment must keep track of the current instruction and the referencing environment for each active or waiting program so that when a subprogram terminates, the proper instruction and data environment may be selected for the calling subprogram.
The current instruction of the calling subprogram is maintained on a stack. When a subprogram is called, the address of the instruction following the call of the calling program is pushed on the stack. When a subprogram terminates, the instruction pointer is set to the address on the top of the stack and the address popped off the stack. The stack is often called the return address stack.
Return Address1 ... Return Addressn
The addresses of the current environment is also maintained on a stack. The top of the stack always points to the current environment. When a subprogram is called, the address of the new environment is pushed on the stack. When a subprogram terminates, the stack is popped revealing the previous environment. The stack is often called the dynamic links because the stack contains links (pointers) which reveal the dynamic history of the program.
When a programming language does not permit recursive procedures and data structure size is independent of computed or input data, the maximum storage requirements of the program can be determined at compile time. This simplifies the run time support required by the program and it is possible to statically allocate the storage used during program execution.
because their arguments must be evaluated to determine the value of the arithmetic expression but the conditional expression
is not strict in its second and third arguments since the selection of the second or third argument is dependent on the value of the boolean condition (the first argument).
Most programming languages assume that abstracts are strict in their parameters and, therefore, the parameters are evaluated when the function is called. This evaluation scheme is called eager evaluation. This is not always desirable and so some languages provide a mechanism for the programmer to inform a function not to evaluate its parameters. Scheme provides for the quote operator to prevent the evaluation of an argument. Logic languages like Prolog and functional languages languages like Haskell and Miranda are non-strict languages and the arguments are evaluated only when the value is required. This evaluation scheme is called normal-order evaluation and is often implemented using lazy evaluation (the argument is evaluated only when it is first needed).
Most languages use strict evaluation because it is more efficient and
simplifies the implementation of parameter passing for imperative programming
languages. Normal-order evaluation coupled with side-effects found in imperative
langauges produces unexpected results. Algol-60's provides a parameter
passing mechanism (pass by name) which is based on that does not provide
the generality that is required in the imperative model as the following
example shows.
procedure swap(x,y:sometype); var t:sometype begin t := x; x := y; y := t end; ... I := 1 a[I] := 3 swap(I,a[I]) |
... I := 1 {I = 1} a[I] := 3 {I=1, a[I] = a[1] = 3} t := I; I := a[I]; a[I] := t -- replaces call to swap {T=1, I=3, a[i] = a[3] = 1, a[1] = 3} |
The copy mechanism requires values to be copied into an generic when it is entered and copied out of the generic when the generic is exited. This form of parameter passing is often referred to as passing by value. The formal parameters are local variables and the argument is copied into the local variable on entry to the generic and copied out of the local variable to the argument on exit from the generic.
The value parameter of Pascal and the in parameter of Ada are examples of parameters which may be passed by using the copy mechanism. The value of the argument is copied into the parameter on entry but the value of the formal parameter is not copied to the actual parameter on exit. In imperative languages, copying is unnecessary if the language prohibits assignment to the formal parameter. In such a case, the parameter may be passed by reference.
Ada's out parameter and function results are examples of parameters which may be passed by using the copy mechanism. The value of the argument (actual parameter) is not copied into the formal parameter on entry but the value of the parameter is copied into the argument upon exit. In Pascal the function name is used as the parameter and assignments may be made to the function name. This form of parameter passing is often referred to as passing by result.
When the passing by value and result are combined, the passing mechanism is referred to as passing by value-result. Ada's in out parameter is an example of a parameter which may be passed by this form of the copy mechanism. The value of the actual parameter is copied into the formal parameter on entry and the value of the formal parameter is copied into the actual parameter upon exit.
The copy mechanism has some disadvantages. The copying of large composite values (arrays etc) is expensive and the parameters must be assignable (e.g. expressions and file types in Pascal are not assignable).
The effect of a definitional mechanism is as if the abstact were surrounded by a block, in which there is a definition that binds the parameter to the argument.
An parameter is said to be passed by reference if the argument is an address. References to the parameter are references to the argument. Assignments to the parmeter are assignments to the argument. The reference parameter of Pascal and the array and structure parameters of C++ are passed using this mechanism. A toster provides an illustration of the effect of passing by reference.
If an argument --- A parameter is said to be passed by name if, in effect, the argument replaces parameter throughout the body of the subroutine (textual substitution with suitable renaming of local variables to avoid conflicts between local variables and variables occuring in the argument) i.e., in the subprogram, each reference to the parameter results in an evaluation of the argument in the calling environment.
In addition to the problems of the pass-by-name mechanism of Algol-60, imperative languages with with reference parameters present the possibility of aliasing. Aliasing occurs when two or more names reference the same object. For example, the following procedure and call,
both m and n are bound to the same variable, i, and i is initially 5, then after the call to the procedure, the value of i is 2 not 6. m and n are both bound to i and after the assignment n := 1, the value of m is also 1.procedure confuse (var m, n : Integer ); begin n := 1; n := m + n end; ... i := 5; confuse(i,i)
STATIC/DYNAMIC ALLOCATION
When a subprogram is called, a new uninitialized stack element is pushed on the stack corresponding to each variable in the block.
A reference to a variable involves the inspection or updating of the top element of the appropriate stack. This provides access to the variable in closest block with respect to the dynamic calling sequence.
When a subprogram terminates, the stacks corresponding to the variables of the block are popped, restoring the calling environment. The static scope rules may be implemented as follows. The data section of each procedure is associated with an activation record. The activation records are dynamically allocated space on a runtime stack. Each recursive call is associated with it own activation record. Associated with each activation record is a dynamic link which points to the previous activation records, a return address which is the address of the instruction to be executed upon return from the procedure and a static link which provides access to the referencing environment.
An activation record consists of storage for local variables, the static and dynamic links and the return address.
Static Link Return Address Dynamic Link Local Data
The runtime stack of activation records (local data, static and dynamic links).
Figure M.N: Runtime Stack Main calls A which calls B.
Main
0 0 0 Local data for M A
SL RA DL Local data for A B
SL RA DL Local data for B
Global data values are found by following the static chain to the appropriate activation record.
An alternative method for the implementation of static scope rules is the display. A display is a set of registers ( in hardware or software) which contain pointers to the current environment. On procedure call, the current display is pushed onto the runtime stack and a new display is constructed containing the revised environment. On procedure exit, the display is restored from the copy on the stack.
There are a number of mechanisms for partitioning program text. Functions and procedures are among the most common. However, the result is still a single file. When the partitions of program text are arranged in separate files, the partitions are called modules. Here are several program partitioning mechanisms.
There are a number of mechanisms for combining the partitions into a single program for the purposes of compilation and execution. The include statement is provided in a number of languages. It is a compiler directive with directs the compiler to textually include the named file in the source program. In some systems the partitions may be separately compiled and there is a linking phase in which the compiled program modules are linked together for execution. In other systems, at run-time any missing function or procedure results in a run-time search for the missing module which if found is then executed or if not found results in a run-time error.
In program construction the module designer must answer the following questions.
Modules must provide answers to two questions:
Functions and procedures are simple modules. Their signature is a description of what they do while their body describes how it is achieved. More typically a module encapsulates a group of components such as types, constants, variables, procedures, functions and so on.
To present a narrow interface to other modules, a module makes only a few components visible outside. Such components are said to be exported by the module. The other components are said to be hidden inside the module. The hidden components are used to implement the exported components.
Access to the components is often by a qualified name -- module name. component name. When strong safety considerations are important, modules using components of another module may be required to explicitly import the required module and the desired components.