Abstraction

Topics:

Procedures, functions, and iterators as abstraction mechanisms
Parameterization mechanisms (reference vs. value)
Activation records and storage management
Modules in programming languages

Learning Objectives:

Explain how abstraction mechanisms support the creation of reusable software components.
Demonstrate the difference between call-by-value and call-by-reference parameter passing.
Defend the importance of abstractions, especially with respect to programming-in-the-large.
Describe how the computer system uses activation records to manage program modules and their data.

Note:

This lecture comes from the excellent lectures of Anthony A. Aaby.
Further reading Programming Language Pragmatics, chapter 3

Abstraction

The ability to abstract and to generalize is an essential part of any intellectual activity. Abstraction and generalization are fundamental to mathematics and philosophy and are essential in computer science as well.

The importance of abstraction is derived from its ability to hide irrelevant details and from the use of names to reference objects. Programming languages provide abstraction through procedures, functions, and modules which permit the programmer to distinguish between what a program does and how it is implemented. The primary concern of the user of a program is with what it does. This is in contrast with the writer of the program whose primary concern is with how it is implemented. Abstraction is essential in the construction of programs. It places the emphasis on what an object is or does rather than how it is represented or how it works. Thus, it is the primary means of managing complexity in large programs.

Of no less importance is generalization. While abstraction reduces complexity by hiding irrelevant detail, generalization reduces complexity by replacing multiple entities which perform similar functions with a single construct. Programming languages provide generalization through variables, parameterization, generic and polymorphism. Generalization is essential in the construction of programs. It places the emphasis on the similarities between objects. Thus, it helps to manage complexity by collecting individuals into groups and providing a representative which can be used to specify any individual of the group.

Abstraction and generalization are often used together. Abstracts are generalized through parameterization to provide greater utility. In parameterization, one or more parts of an entity are replaced with a name which is new to the entity. The name is used as a parameter. When the parameterized abstract is invoked, it is invoked with a binding of the parameter to an argument.

Exercise: draw a mammal that is not a dog, cat, bear, human, etc...

Principle of Abstraction An abstract is a named entity which may be invoked by mentioning the name.

An object is said to be fully abstract if it can be understood without reference to any thing external to the object.

Terminology. The naming aspect of abstraction is captured in the concepts of binding, definition and declaration while the hiding of irrelevant details is captured by the concept of encapsulation. A binding is an association of two entities. A definition is a binding of a name to an entity, a declaration is a definition which binds a name to a variable, and an assignment is a binding of a value and a variable.
We could equally well say identifier instead of name. A variable is an entity whose value is not fixed but may vary. Names are bound to variables in declaration statements. Among the various terms for abstracts found in other texts are module, package, library, unit, subprogram, subroutine, routine, function, procedure, abstract type, object.

Binding

The concept of binding is common to all programming languages. The objects which may be bound to names are called the bindables of the language. The bindables may include: primitive values, compound values, references to variables, types, and executable abstractions. While binding occurs in definitions and declarations, it also occurs at the virtual and hardware machine levels between values and storage locations.

Encapsulation

The abstract part of a binding often contains other bindings which are said to be local definitions. Such local definitions are not visible or available to be referenced outside of the abstract. Thus the abstract part of a binding involves ``information hiding''. This hidden information is sometimes made available by exporting the names.

A module system provides a way of writing large program so that the various pieces of the program don't interfere with on another because of name clashes and also provides a way of hiding implementation details. ... A module generally consists of two parts, the export part and the local part. The export part of a module consists of language declarations for the symbols available for use in either part of the module and in other modules which import them and module declaration giving the symbols from other modules which are available for use in either part of the module and in other modules which import them. The local part of a module consists of language declarations for the symbols available for use only in this part. TGPL-Hill and Lloyd

The work of constructing large programs is divided among several people, each of whom must produce a part of the whole. Each part is called a module and each programmer must be able to construct his/her module without knowing the internal details of the other parts. This is only possible when each module is is separated into an interface part and an implementation part. The interface part describes all the information required to use the module while the implementation part describes the implementation. This idea is already present in most programming languages in the manner in which functions and procedures are defined. Function and procedure definitions usually are separated into two parts. The first part gives the subprogram's name and parameter requirements and the second part describes the implementation. A module is a generalization of the concept of abstraction in that a module is permitted to contain a collection of definitions. An additional goal of modules is to confine changes to a few modules rather than throughout the program.

While the concept of modules is a useful abstraction, the full advantages of modules are gained only when modules may be written, compiled and possibly executed separately. In many cases modules should be able to be tested independently of other modules.

Advantages

reduction in complexity
team programming
maintainability
re-usability of code
project management

Implementation

common storage area -- Fortran
include directive -- C++
subroutine library

Typical applications:

subroutine packages -- mathematical, statistical etc
ADTs

Block structure

A block is a construct that delimits the scope of any definitions that it may contain. It provides a local environment i.e., a opportunity for local definitions. The block structure (the textual relationship between blocks) of a programming language has a great deal of influence over program structure and modularity. There are three basic block structures--monolithic, flat and nested.

A program has a monolithic block structure if it consists of just one block. This structure is typical of BASIC and early versions of COBOL. The monolithic structure is suitable only for small programs. The scope of every definition is the entire program. Typically all definitions are grouped in one place even if they are used in different parts of the program.

Global Data

Return Address₁

...

Return Address_n

A program has a flat block structure if it is partitioned into distinct blocks, an outer all inclosing block one or more inner blocks i.e., the body may contain additional blocks but the inner blocks may not contain blocks. This structure is typical of FORTRAN and C. In these languages, all subprograms (procedures and functions) are separate, and each acts as a block. Variables can be declared inside a subprogram are then local to that subprogram. Subprogram names are part of the outer block and thus their scope is the entire program along with global variables. All subprogram names and global variables must be unique. If a variable cannot be local to a subprogram then it must be global and accessible by all subprograms even though it is used in only a couple of subprograms.

...

A program has nested block structure if blocks may be nested inside other blocks i.e., there is no restriction on the nesting of blocks within the body. This is typical of the block structure of the Algol-like languages. A block can be located close to the point of use. In blocks visibility is controlled by nesting. All names are visible (implicitly exported) to internally nested blocks. No names are visible (exported) to enclosing blocks. In a block, the only names visible are those that are declared in all enclosing blocks or are declared in the block, but not those declared in nested blocks.

Figure M.N: Nested blocks

Can see all names in A including the names A and B.
Cannot see names in B, C, or D.

Can see all names in A and B including the names A, B, and C.
Cannot see names in C or D.

Can see all names in A, B, and C including the names A, B, and C.
Cannot see names in D..

Can see all names in A and D including the names A, B, and D.
Cannot see names in B or C.

A local name is one that is declared within a block for use only within that block.

A global name is a name that when referenced within a block refers to a name declared outside the block.

An activation of a block is a time interval during which that block is being executed.

The three basic block structures are sufficient for what is called programming in the small (PITS). These are programs which are comprehensible in their entirety by an individual programmer. However, they are not general enough for very large programs. Large programs which are written by many individuals and which must consist of modules that can be developed and tested independently of other modules. Such programming is called programming in the large (PITL).

Activation Records

Each block Storage for local variables.

Scope Rules

The act of partitioning a program raises the issue of the scope of names. Which objects with in the partition are to be visible outside the partition? The usual solution is to designate some names to be exported and others to be private or local to the partition and invisible to other partitions. In case there might be name conflict between exported names from partitions, partitions are often permitted to designate names that are to be imported from designated partitions or to qualify the name with the partition name.

The scope rules for modules define relationships among the names within the partitions. There are four choices.

All local names visible globally.
All external names visible locally.
Only local explicitly exported names visible globally.
Only external names explicitly imported are visible locally.

Name conflict is resolved via qualification with the partition name.

Environment

An environment is a set of bindings.

Pragmatics

Bindings and Binding Times

Bindings may occur at various times from the point of language definition through program execution. The time at which the binding occurs is termed the binding time. Four distinct binding times may be distinguished.

Language design time. Much of the structure of a programming language is fixed and language design time. Data types, data structures, command and expression forms, and program structure are examples of language features that are fixed at language design time. Most programming languages make provision for extending the language by providing for programmer defined data types, expressions and commands.
Language implementation time. Some language features are determined by the implementation. Programs that run on one computer may not run or give incorrect results when run on another machine. This occurs when the hardware differs in its representation of numbers and arithmetic operations. For example, the maxint of Pascal is determined by the implementation. The C programming language provides access to the underlying machine and therefore programs which depend on the characteristics of the underlying machine may not perform as expected when moved to another machine.
Program translation time. The binding between the source code and the object code occurs at program translation time. Programmer defined variables and types are another example of bindings that occur at program translation time.
Program execution time. Binding of values to variables and formal parameters to actual parameters occur during program execution.

Early binding often permits more efficient execution of programs though translation time type checking while late binding permits more flexibility through program modification a run-time.

Argument Passing Mechanisms

In the previous chapter, it appears that when an argument is passed to an abstract, it replaces the parameter, that is, it textually replaces the parameter. If the argument is large, the space and time requirements can be a significant overhead. Especially since the each time the argument is referenced, it must be evaluated not in the internal (local) environment of the abstract but in the environment external to (global) the abstract. This need not be the case and several mechanisms have been developed to make passing arguments simpler and more efficient.

The copy mechanism requires values to be copied into an generic when it is entered and copied out of the generic when the generic is exited. This form of parameter passing is often referred to as passing by value. The formal parameters are local variables and the argument is copied into the local variable on entry to the generic and copied out of the local variable to the argument on exit from the generic.

The value parameter of Pascal and the in parameter of Ada are examples of parameters which may be passed by using the copy mechanism. The value of the argument is copied into the parameter on entry but the value of the formal parameter is not copied to the actual parameter on exit. In imperative languages, copying is unnecessary if the language prohibits assignment to the formal parameter. In such a case, the parameter may be passed by reference.

The copy mechanism has some disadvantages. The copying of large composite values (arrays etc) is expensive and the parameters must be assignable (e.g. expressions and file types in Pascal are not assignable).

An parameter is said to be passed by reference if the argument is an address. References to the parameter are references to the argument. Assignments to the parameter are assignments to the argument. The reference parameter of Pascal and the array and structure parameters of C++ are passed using this mechanism. A toaster provides an illustration of the effect of passing by reference.

If an argument --- A parameter is said to be passed by name if, in effect, the argument replaces parameter throughout the body of the subroutine (textual substitution with suitable renaming of local variables to avoid conflicts between local variables and variables occurring in the argument) i.e., in the subprogram, each reference to the parameter results in an evaluation of the argument in the calling environment.

Scope and Blocks

A variables declared within a block have a lifetime which extends from the moment an activation record is created for the block until the activation record for the block is destroyed. A variable is bound to an offset within the activation record at compile time. It is bound to a specific storage location when the block is activated and becomes a part of storage.

Conceptually the dynamic scope rules may be implemented as follows. Each variable is assigned a stack to hold the current values of the variable.

When a subprogram is called, a new uninitialized stack element is pushed on the stack corresponding to each variable in the block.

A reference to a variable involves the inspection or updating of the top element of the appropriate stack. This provides access to the variable in closest block with respect to the dynamic calling sequence.

When a subprogram terminates, the stacks corresponding to the variables of the block are popped, restoring the calling environment. The static scope rules may be implemented as follows. The data section of each procedure is associated with an activation record. The activation records are dynamically allocated space on a runtime stack. Each recursive call is associated with it own activation record. Associated with each activation record is a dynamic link which points to the previous activation records, a return address which is the address of the instruction to be executed upon return from the procedure and a static link which provides access to the referencing environment.

An activation record consists of storage for local variables, the static and dynamic links and the return address.

Figure M.N: Activation Record

Static Link Return Address Dynamic Link

Local Data

The runtime stack of activation records (local data, static and dynamic links).

Figure M.N: Runtime Stack

Main

0 0 0

Local data for M

A

SL RA DL

Local data for A

B

SL RA DL

Local data for B

Main calls A which calls B.

Global data values are found by following the static chain to the appropriate activation record.

Modules

A module is a program unit which is an (more or less) independent entity. A module consists of a number of definitions (of types, variables, functions, procedures and so on), with a clearly defined interface stating what it exports to other modules which use it. Modules have a number of advantages for the construction of large programs.

Modules facilitate parallel and independent development
Modules facilitate separate compilation
Modules facilitate code reuse

Modules are used to construct libraries, ADTs, classes, interfaces, and implementations. A module is the compilation unit. A module which contains only type abstractions is a specification or interface module.

In program construction the module designer must answer the following questions.

What is the module's purpose?
How does it achieve that purpose?

Programming in the large is concerned with programs that are not comprehensible by a single individual and are developed by teams of programmers. At this level programs must consist of modules that can be written, compiled, and tested independently of other modules. A module has a single purpose, and has a narrow interface to other modules. It is likely to be reusable (able to be incorporated into may programs) and modifiable with out forcing changes in other modules.

Modules must provide answers to two questions:

What is the purpose of the module?
How does it achieve that purpose?

The what is of concern to the user of the module while the how is of concern to the implementer of the module.

Functions and procedures are simple modules. Their signature is a description of what they do while their body describes how it is achieved. More typically a module encapsulates a group of components such as types, constants, variables, procedures, functions and so on.

To present a narrow interface to other modules, a module makes only a few components visible outside. Such components are said to be exported by the module. The other components are said to be hidden inside the module. The hidden components are used to implement the exported components.

Access to the components is often by a qualified name -- module name. component name. When strong safety considerations are important, modules using components of another module may be required to explicitly import the required module and the desired components.