cs310: Programming Languages: 2.8 Abstraction

2.8 Abstraction

A Little History

(note: for pedagogical reasons, the following tale is a little chronologically tangled.)

Once apon a time, a long time ago, in a galaxy far far away, memory was a set of bits. Programmers wrote little notes to themselves that bits 0 to 6 held a "name", that the next 32 bits were a pointer and the rest was an image bit map.

Then, at the end of WWII, Grace Hopper asked herself "for tools are needed to make programs easier to write?". Her answer was "the compiler", a tool that translates instructions written in a programming language into the binary coding of machine language (where the programming language was written in some kind of subset of English that anyone could read). Memory was no longer a string of bits, but "records" with "fields"; e.g.

;; COBOL records
FD Employee-File
     RECORD CONTAINS 65 CHARACTERS
     DATA RECORD IS Employee-Rec.

01 Employee-Rec. 
   02 Employee-ID PIC X(10). 
   02 Employee-Name. 
      03 Last-Name PIC X(20). 
      03 First-Name PIC X(12). 
      03 Middle-Init PIC X. 
   02 Position. 
      03 Job-Code PIC X(4). 
      03 Department PIC X(3). 
      03 Manager-ID PIC X(10)
   02 Hourly-Pay PIC 9(3)V99.

And so was born abstraction. It was realized that different records had similar fields (e.g. numbers, strings, dates) and each of these "types" came with a set of constraints. Some simple types can be built into the compiler (e.g. Integer, Real) and some others might be domain-specific. So compilers got augmented with type definitions:

/* types in pascal */
type  
  Boolean= (true, false);
  Char = 'a' .. 'z';
  Letter = (draft,sending,sent);
var
  r: Real;
  i: Integer;
  c: Char;
  b: Boolean;
  t: Letter;
  e: (apple, pear, banana, orange, lemon);

Then it was realized that some types are complex combinations of simple types:

type 
  a = Array [1..10] of Integer;
  b = record
        x: Integer;
        y: Char
      end;
  c = File of a;
  Nodeptr = ^Node;
  Node = record
        a: Integer;
        b: Char;
        c: Nodeptr
      end;

(In the above, the notation "^Node" is a special PASCAL type for "pointers").

Here’s another example:

(* complext types in OCAML *)

type paragraph =
    Normal of par_text
  | Pre of string * string option
  | Heading of int * par_text
  | Quote of paragraph list
  | Ulist of paragraph list * paragraph list list
  | Olist of paragraph list * paragraph list list

and par_text = text list

and text =
    Text of string
  | Emph of string
  | Bold of string
  | Struck of par_text
  | Code of string
  | Link of href
  | Anchor of string
  | Image of img_ref

and href = { href_target : string; href_desc : string; }

and img_ref = { img_src : string; img_alt : string; }

Then it was realized that the types can be augmented with operations that apply just to those types.

 -- complex numbers in LUA

 complex = {}
    
    function complex.new (r, i) return {r=r, i=i} end
    
    -- defines a constant `i'
    complex.i = complex.new(0, 1)
    
    function complex.add (c1, c2) return complex.new(c1.r + c2.r, c1.i + c2.i) end
    
    function complex.sub (c1, c2) return complex.new(c1.r - c2.r, c1.i - c2.i) end
    
    function complex.mul (c1, c2)
      return complex.new(c1.r*c2.r - c1.i*c2.i,
                         c1.r*c2.i + c1.i*c2.r)
    end
    
    function complex.inv (c)
      local n = c.r^2 + c.i^2
      return complex.new(c.r/n, -c.i/n)
    end
    
    return complex

   -- example 
   c = complex.add(complex.i, complex.new(10, 20))

Then it was realized that packages of operations and types can units of parrallelism and concurrecny (it you pick up "X", you get all "X"’s internal state and operations). And that gave us simulation languages like Simula.

Then it was realized that types can have sub-types that specialize the operations (welcome to polymorphism and inheritance).

And some argued that should be no "primitive types" pre-defined and unalterable by the compiler- that everything should be customizable (welcome to Smalltalk)

And all through this sequence, the compilers got smarter and smarter about types and abstraction: if we know that only these operations are valid for this variable in this memory location, then we can block the application of the wrong operations to the wrong memory location.

Note that if we try to infer as much as possible about the space of what types are touched by some variable (of some other type), then we can detect programming errors before we run the program.

Which takes us to the modern functional languages that view runtime operations as the conversion of type1 to type2 (and, it type2 is some error type then "type checking" is still type conversion).

But the notion of types and abstraction applies to more than just OCaml, Haskell, etc. As shown below, this sequence of named memory locations of certain types prevades all modern languages.

So too our list of "what is common across all languages", we now add "types" and "abstraction" (and the rest of the list is "tests, paradigms" and, for some languages "closures" (a.k.a. lambda bodies)).

So lets go back to basics.

What is Abstraction

The ability to abstract and to generalize is an essential part of any intellectual activity. Abstraction and generalization are fundamental to mathematics and philosophy and are essential in computer science as well.

The importance of abstraction is derived from its ability to hide irrelevant details and from the use of names to reference objects. Programming languages provide abstraction through procedures, functions, and modules which permit the programmer to distinguish between what a program does and how it is implemented. The primary concern of the user of a program is with what it does. This is in contrast with the writer of the program whose primary concern is with how it is implemented. Abstraction is essential in the construction of programs. It places the emphasis on what an object is or does rather than how it is represented or how it works. Thus, it is the primary means of managing complexity in large programs.

Of no less importance is generalization. While abstraction reduces complexity by hiding irrelevant detail, generalization reduces complexity by replacing multiple entities which perform similar functions with a single construct. Programming languages provide generalization through variables, parameterization, generic and polymorphism. Generalization is essential in the construction of programs. It places the emphasis on the similarities between objects. Thus, it helps to manage complexity by collecting individuals into groups and providing a representative which can be used to specify any individual of the group.

Abstraction and generalization are often used together. Abstracts are generalized through parameterization to provide greater utility. In parameterization, one or more parts of an entity are replaced with a name which is new to the entity. The name is used as a parameter. When the parameterized abstract is invoked, it is invoked with a binding of the parameter to an argument.

Exercise: draw a mammal that is not a dog, cat, bear, human, etc...

Principle of Abstraction An abstract is a named entity which may be invoked by mentioning the name.

An object is said to be fully abstract if it can be understood without reference to any thing external to the object. So a fully abstract concept is a complete thing unto itself and can be studied seperately to the rest of the universe

Among the various terms for abstracts found in other texts are module, package, library, unit, subprogram, subroutine, routine, function, procedure, abstract type, object. So, as you can see, abstraction is a core concept to many languages.

Terminology. The naming aspect of abstraction is captured in the concepts of binding, definition and declaration while the hiding of irrelevant details is captured by the concept of encapsulation. A binding is an association of two entities:

A definition is a binding of a name to an entity;
A declaration is a definition which binds a name to a variable;
An assignment is a binding of a value and a variable.

We could equally well say identifier instead of name:

Names are bound to variables in declaration statements.
A variable is an entity whose value is not fixed but may vary.

Binding

The concept of binding is common to all programming languages. The objects which may be bound to names are called the bindables of the language. The bindables may include: primitive values, compound values, references to variables, types, and executable abstractions. While binding occurs in definitions and declarations, it also occurs at the virtual and hardware machine levels between values and storage locations.

Encapsulation

The abstract part of a binding often contains other bindings which are said to be local definitions. Such local definitions are not visible or available to be referenced outside of the abstract. Thus the abstract part of a binding involves “information hiding”. This hidden information is sometimes made available by exporting the names.

A module system provides a way of writing large program so that the various pieces of the program don’t interfere with on another because of name clashes and also provides a way of hiding implementation details. ... A module generally consists of two parts, the export part and the local part. The export part of a module consists of language declarations for the symbols available for use in either part of the module and in other modules which import them and module declaration giving the symbols from other modules which are available for use in either part of the module and in other modules which import them. The local part of a module consists of language declarations for the symbols available for use only in this part. TGPL-Hill and Lloyd

-- exporting names in LUA
    local P = {} -- shorthand for "complex"... save LOTS of typing
    complex = P

    local function checkComplex (c)
      if not ((type(c) == "table")
         and tonumber(c.r) and tonumber(c.i)) then
        error("bad complex number", 3)
      end
    end
    
    local function new (r, i) return {r=r, i=i} end
    
    local function add (c1, c2)
      checkComplex(c1);
      checkComplex(c2);
      return new(c1.r + c2.r, c1.i + c2.i)
    end
    
      ...
    
    complex = {
      new = new,
      add = add,
      sub = sub,
      mul = mul,
      div = div,
    }

To fully understand the LUA you ahve to understand that "new = new" binds a novel symbol to an existing symbol inside the package. Its hard to get your brain around but simple enough to do in LUA.

Here’s packages in SWI-PROLOG. You don’t need to understand the language- the only important detail here is that if someone else loads the following file, all they can call is something called "reverse" that takes two arguments. Any lower-level stuff (e.g. the fact that there is some sub-routine called "rev") is hidden away.

% packages in swi prolog
:- module(reverse, [reverse/2]).

reverse(List1, List2) :-
        rev(List1, [], List2).

rev([], List, List).
rev([Head|List1], List2, List3) :-
        rev(List1, [Head|List2], List3).

And in case you missed it- you’ve been using packages in LISP from day1. Ever noticed the prompt on the REPL?

CL-USER>

That says that you are working in the common-lisp user package. Every symbol (which includes variables and functions) in LISP exists in one package. Symbols from other packages can be imported . And this lets us divide up large stuff into smaller chunks with well-defined interfaces.

CL-USER> (in-package :com.gigamonkeys.email-db)
#<The COM.GIGAMONKEYS.EMAIL-DB package>
EMAIL-DB> 
EMAIL-DB> (defun hello-world () (format t "hello from EMAIL-DB package~%"))
HELLO-WORLD
EMAIL-DB> (hello-world)
hello from EMAIL-DB package
NIL
EMAIL-DB> (in-package :cl-user)
#<The COMMON-LISP-USER package>
CL-USER> 
CL-USER> (hello-world)
hello, world
NIL

While the concept of modules is a useful abstraction, the full advantages of modules are gained only when modules may be written, compiled and possibly executed separately. In many cases modules should be able to be tested independently of other modules.

Advantages

reduction in complexity
team programming maintainability
re-usability of code project management

Implementation

common storage area – Fortran
include directive – C++
subroutine library

Typical applications:

subroutine packages – mathematical, statistical etc
ADTs

Abstract Data Types

An abstract data type (ADT) is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics.

An abstract data type is defined indirectly, only by the operations that may be performed on it and by mathematical constraints on the effects (and possibly cost) of those operations. For example:

an abstract stack data structure could be defined by two operations: push, that inserts some data item into the structure, and pop, that extracts an item from it; with the constraint that each pop always returns the most recently pushed item that has not been popped yet.
When analyzing the efficiency of algorithms that use stacks, one may also specify that both operations take the same time no matter how many items have been pushed into the stack, and that the stack uses a constant amount of storage for each element.

 -- complex numbers in LUA

 complex = {}
    
    function complex.new (r, i) return {r=r, i=i} end
    
    -- defines a constant `i'
    complex.i = complex.new(0, 1)
    
    function complex.add (c1, c2) return complex.new(c1.r + c2.r, c1.i + c2.i) end
    
    function complex.sub (c1, c2) return complex.new(c1.r - c2.r, c1.i - c2.i) end
    
    function complex.mul (c1, c2)
      return complex.new(c1.r*c2.r - c1.i*c2.i,
                         c1.r*c2.i + c1.i*c2.r)
    end
    
    function complex.inv (c)
      local n = c.r^2 + c.i^2
      return complex.new(c.r/n, -c.i/n)
    end
    
    return complex

   -- example 
   c = complex.add(complex.i, complex.new(10, 20))

Weak vs Strong Types

Weakly typed programming languages support either implicit type conversion:

;; e.g. Perl, Rexx 
a = 2
b = '2'
 
concatenate(a, b) # Returns '22'
add(a, b)         # Returns 4

Stong typing:

;; e.g. JAVA, Python
a = 2
b = '2'
 
concatenate(str(a), b) # Returns '22'
add(a, int(b))         # Returns 4
concatenate(a, b)     # Type Error
add(a, b)             # Type Error

If you are not weak, you are strong:

Strong guarantees about the run-time behavior of a program before program execution, whether provided by static analysis, the execution semantics of the language or another mechanism.
Compile-time checks for type constraint violations. That is, the compiler ensures that operations only occur on operand types that are valid for the operation.
At compile or run time you can rejection operations or function calls which attempt to disregard data types.
Fixed and invariable typing of data objects. The type of a given data object does not vary over that object’s lifetime. For example, class instances may not have their class altered.
A complex, fine-grained type system with compound types.

The object-oriented programming languages Smalltalk, Ruby, Python, and Self are all "strongly typed" in the sense that typing errors are prevented at runtime and they do little implicit type conversion, but these languages make no use of static type checking: the compiler does not check or enforce type constraint rules.

The term duck typing is used to describe the dynamic typing paradigm used by the languages in this group.

Standard ML, Objective Caml and Haskell have purely static type systems, in which the compiler automatically infers a precise type for all values. These languages (along with most functional languages) are considered to have stronger type systems than Java, as they permit no implicit type conversions. While OCaml’s libraries allow one form of evasion (Object magic), this feature remains unused in most applications

Dynamic vs Static Typing

A programming language is said to be dynamically typed when the majority of its type checking is performed at run-time as opposed to at compile-time.

In dynamic typing, values have types but variables do not; that is, a variable can refer to a value of any type.

Dynamically typed languages include APL, Erlang, Groovy, JavaScript, Lisp, Lua, MATLAB/GNU Octave, Perl (with respect to user-defined types but not built-in types), PHP, Prolog, Python, Ruby, Smalltalk and Tcl.

Compared to static typing, dynamic typing can be more flexible (e.g., by allowing programs to generate types and functionality based on run-time data), though at the expense of fewer a priori guarantees. This is because a dynamically typed language accepts and attempts to execute some programs which may be ruled as invalid by a static type checker.

e.g. Caml is expression-based, there are no pure "commands" like in Java/C++; instead, commands are also expressions, they return values.

# (if (2=3) then 5 else 6) + 1;;
- : int = 7

One sometimes annoying consequence of the above is the two branches of the if need to return the same type.

if (2=3) then 5 else 6.5;;
Characters 21-24:

This expression has type float but is here used with type int

Dynamic typing may result in runtime type errors—that is, at runtime, a value may have an unexpected type, and an operation nonsensical for that type is applied. This operation may occur long after the place where the programming mistake was made—that is, the place where the wrong type of data passed into a place it should not have. This may make the bug difficult to locate.

Iterators

Q: If we break things up into seperate bits, each with invariants and internal details, how do we run over them all? Keeping internals private?

A: Iterators. Generic looping constructs that return "next" or "nil".

While next not nil
Process the next

Example (if you understand the following, then you understand closures- a.k.a. lambda bodies- and you understand that LUA uses closures):

-- LUA: return an iterator for  all items in a table of size table.getn(t)
  function list_iter (t)
      local i = 0
      local n = table.getn(t)
      return function ()
               i = i + 1
               if i <= n then return t[i] end
             end
  end

  t = {10, 20, 30}
  for element in list_iter(t) do
      print(element)
  end

Note: in LUA, the "for" keyword knows about iterators.

A more complex example: returns all words in a file.

   -- LUA
   function allwords ()
      local line = io.read()  -- current line
      local pos = 1           -- current position in the line
      return function ()      -- iterator function
        while line do         -- repeat while there are lines
          local s, e = string.find(line, "%w+", pos)
          if s then           -- found a word?
            pos = e + 1       -- next position is after this word
            return string.sub(line, s, e)     -- return the word
          else
            line = io.read()  -- word not found; try next line
            pos = 1           -- restart from first position
          end
        end
        return nil            -- no more lines: end of traversal
      end
    end

The main part of the iterator function is the call to string.find. This call searches for a word in the current line, starting at the current position. It describes a "word" using the pattern ’%w+’, which matches one or more alphanumeric characters. If it finds the word, the function updates the current position to the first character after the word and returns that word. (The string.sub call extracts a substring from line between the given positions). Otherwise, the iterator reads a new line and repeats the search. If there are no more lines, it returns nil to signal the end of the iteration.

Despite its complexity, the use of allwords is straightforward:

    -- LUA
    for word in allwords() do
      print(word)
    end

This is a common situation with iterators: They may be difficult to write, but are easy to use. This is not a big problem; more often than not, end users programming in Lua do not define iterators, but only use those provided by the application.

Block structure

A block is a construct that delimits the scope of any definitions that it may contain. It provides a local environment i.e., a opportunity for local definitions. The block structure (the textual relationship between blocks) of a programming language has a great deal of influence over program structure and modularity. There are three basic block structures–monolithic, flat and nested.

A program has a monolithic block structure if it consists of just one block. This structure is typical of BASIC and early versions of COBOL. The monolithic structure is suitable only for small programs. The scope of every definition is the entire program. Typically all definitions are grouped in one place even if they are used in different parts of the program.

Flat Blocks

A program has a flat block structure if it is partitioned into distinct blocks, an outer all inclosing block one or more inner blocks i.e., the body may contain additional blocks but the inner blocks may not contain blocks. This structure is typical of FORTRAN and C. In these languages, all subprograms (procedures and functions) are separate, and each acts as a block. Variables can be declared inside a subprogram are then local to that subprogram. Subprogram names are part of the outer block and thus their scope is the entire program along with global variables. All subprogram names and global variables must be unique. If a variable cannot be local to a subprogram then it must be global and accessible by all subprograms even though it is used in only a couple of subprograms.

Nested Blocks

A program has nested block structure if blocks may be nested inside other blocks i.e., there is no restriction on the nesting of blocks within the body. This is typical of the block structure of the Algol-like languages. A block can be located close to the point of use. In blocks visibility is controlled by nesting. All names are visible (implicitly exported) to internally nested blocks. No names are visible (exported) to enclosing blocks. In a block, the only names visible are those that are declared in all enclosing blocks or are declared in the block, but not those declared in nested blocks.

In this example, procedure:

P2 is called only by P1 , and need not be visible outside. It is therefore declared inside P1 , limiting its scope (its region of visibility) to the portion of the program shown here.
In a similar fashion, P4 is visible only within P1 , P3 is visible only within P2 , and F1 is visible only within P4 .
Under the standard rules for nested scopes, F1 could call P2 , and P4 could call F1 , but P2 could not call F1 .
Though they are hidden from the rest of the program, nested subroutines are able to access the parameters and local variables (and other local objects) of the surrounding scope(s).
In our example, P3 can name (and modify) A1 , X , and A2 , in addition to A3 .
Because P1 and F1 both declare local variables named X , the inner declaration hides the outer one within a portion of its scope.
Uses of X in F1 refer to the inner X ; uses of X in other regions of the code shown here refer to the outer X.

Note that

A local name is one that is declared within a block for use only within that block.
A global name is a name that when referenced within a block refers to a name declared outside the block.
An activation of a block is a time interval during which that block is being executed (and an activation block stores the local variables)

The three basic block structures are sufficient for what is called programming in the small (PITS). These are programs which are comprehensible in their entirety by an individual programmer. However, they are not general enough for very large programs. Large programs which are written by many individuals and which must consist of modules that can be developed and tested independently of other modules. Such programming is called programming in the large (PITL).

Scope Rules

The act of partitioning a program raises the issue of the scope of names. Which objects with in the partition are to be visible outside the partition? The usual solution is to designate some names to be exported and others to be private or local to the partition and invisible to other partitions. In case there might be name conflict between exported names from partitions, partitions are often permitted to designate names that are to be imported from designated partitions or to qualify the name with the partition name. The scope rules for modules define relationships among the names within the partitions. There are four choices.

All local names visible globally.
All external names visible locally.
Only local explicitly exported names visible globally.
Only external names explicitly imported are visible locally.
Name conflict is resolved via qualification with the partition name.

Internally, this means that the symbols are stored in a tree structure and when looking up a definition, you look locally in your branch and upwards:

Bindings and Binding Times

Bindings may occur at various times from the point of language definition through program execution. The time at which the binding occurs is termed the binding time. Four distinct binding times may be distinguished.

Language design time. Much of the structure of a programming language is fixed and language design time. Data types, data structures, command and expression forms, and program structure are examples of language features that are fixed at language design time. Most programming languages make provision for extending the language by providing for programmer defined data types, expressions and commands.
Language implementation time. Some language features are determined by the implementation. Programs that run on one computer may not run or give incorrect results when run on another machine. This occurs when the hardware differs in its representation of numbers and arithmetic operations. For example, the maxint of Pascal is determined by the implementation. The C programming language provides access to the underlying machine and therefore programs which depend on the characteristics of the underlying machine may not perform as expected when moved to another machine.
Program translation time. The binding between the source code and the object code occurs at program translation time. Programmer defined variables and types are another example of bindings that occur at program translation time.
Program execution time. Binding of values to variables and formal parameters to actual parameters occur during program execution.

Early binding often permits more efficient execution of programs though translation time type checking while late binding permits more flexibility through program modification a run-time.

Misc Topics

Exception Classes

Here’s one version of the Smalltalk class hierarchy. Note that in 116 Objects, nearly a third are OO models of errors.

     1	Object	                  | one ring to rule then all
     2	    Behavior
     3	        ClassDescription
     4	            Class                | Class is an instance of MetaClass
     5	            Metaclass            | MetaClass is an instance of itself
     6	    BlockClosure                 | e.g. [30 > 3] 
     7	    Boolean
     8	        False                    | has one instance: false
     9	        True                     | has one instance: true
    10	    Directory
    11	    FilePath                       
    12	        File                     | File
    13	    Iterable
    14	        Collection
    15	            Bag                  | when  adding, inc counts
    16	            HashedCollection
    17	                Dictionary       | Set of Associations (key/value pairs)
    18	                        ...
    19	                             SystemDictionary | one instance: Smalltalk
    20	                    LookupTable
    21	                        IdentityDictionary    | fast. uses == for equality
    22	                Set                           | unordered, no repeats
    23	                    IdentitySet               | fase. uses == for equality
    24	            SequenceableCollection
    25	                ArrayedCollection
    26	                    Array                     | fixed size, can't grow
    27	                    CharacterArray
    28	                        String
    29	                            Symbol            | memory effecient Strings
    30	                        UnicodeString
    31	                    Interval                  | start to stop by step
    32	                LinkedList                    | an old friend
    33	                    Semaphore
    34	                OrderedCollection             | closest thing we have to a list
    35	                    SortedCollection          | sort as a side effect of addition
    36	        Stream                                | pointers into a Collection
    37	            FileDescriptor
    38	                FileStream
    39	            Generator
    40	            PeekableStream
    41	            PositionableStream
    42	                ReadStream
    43	                WriteStream
    44	                    ReadWriteStream | toss of the coin
    45	            Random
    46	            RoundRobinStream
    47	    Magnitude               | anything that can be compared with "<"
    48	        Character
    49	            UnicodeCharacter
    50	        Date
    51	            DateTime
    52	        LookupKey
    53	            Association
    54	        Number
    55	            Float
    56	            Fraction   | uses two intergers
    57	            Integer
    58	                LargeInteger
    59	                    LargeNegativeInteger
    60	                    LargePositiveInteger
    61	                        LargeZeroInteger
    62	                SmallInteger
    63	        Time
    64	            Duration
    65	    Point
    66	    Regex
    67	    Signal    | OO error handling
    68	        Exception
    69	            Error
    70	                ArithmeticError
    71	                    ZeroDivide
    72	                InvalidValue
    73	                    EmptyCollection
    74	                    InvalidArgument
    75	                        AlreadyDefined
    76	                        ArgumentOutOfRange
    77	                            IndexOutOfRange
    78	                        InvalidSize
    79	                        NotFound
    80	                            PackageNotAvailable
    81	                    InvalidProcessState
    82	                    NotIndexable
    83	                    ProcessTerminated
    84	                    ReadOnlyObject
    85	                    WrongClass
    86	                        MustBeBoolean
    87	                MessageNotUnderstood
    88	                MutationError
    89	                NotEnoughElements
    90	                NotImplemented
    91	                    NotYetImplemented
    92	                    ShouldNotImplement
    93	                        SubclassResponsibility
    94	                        WrongMessageSent
    95	                VMError
    96	                    BadReturn
    97	                    NoRunnableProcess
    98	                    PrimitiveFailed
    99	                        CInterfaceError
   100	                        FileError
   101	                        WrongArgumentCount
   102	                    SecurityError
   103	                    UserInterrupt
   104	                    VerificationError
   105	            Halt
   106	            Notification
   107	                EndOfStream
   108	                PackageSkip
   109	                ProcessBeingTerminated
   110	                Warning
   111	            UnhandledException
   112	    UndefinedObject     | one instance: nil

Signatures

A type signature defines the inputs and outputs for a function, subroutine or method.

A type signature includes at least the function name and the number of its arguments.

In some programming languages, it may also specify the function’s return type, the types of its arguments, or errors it may pass back.

// Ocaml
#let rec fib n =
   if n < 2 then 1 else fib(n-1) + fib(n-2);;
val fib : int -> int = <fun>

Pre-conditions, Post-conditions

decrement is  -- Decrease counter by one.
     require  
             item > 0           -- pre-condition
     do
            item := item - 1
     ensure
            item = old item - 1 -- post-condition
     end

Types as Conditions.

Descend the hierarchy, make the types more specific, covers few examples (e.g. mammals -> dogs).

Acsend the hierachy, make the types more general, cover more examples (e.g. OldMan -> Man -> Object).

Note: another way to write pre- post- conditions is to strengthen and weaken signatures

e.g. weaken pre-condition: make the input types less specific (higher in the class hierarchy)
e.g. strengthen post-condition: make the return class more specific (lower in the type hierarchy)

Liskov substitution principle

If S is a subtype of T, then:

Objects of type T may be replaced with objects of type S (i.e., objects of type S may be substitutes for objects of type T),
without altering any of the desirable properties of that program (correctness, task performed, etc.).

A.k.a. strong behavioural subtyping

It is a semantic rather than merely syntactic relation;
Intends to guarantee semantic interoperability of types in a hierarchy, object types in particular.
Formally:
- Let q(x) be a property provable about objects x of type T.
- Then q(y) should be true for objects y of type S where S is a subtype of T.

Example of violation:

Square isa Rectangle
Rectanlge has getter and setter methods exist for both width and height.
Square always assumes that the width is equal with the height.
But someone uses the settings inherited from Rectangle to make the Square oblong.
Note: hard to fix:
- If we can modify the setter methods in the Square class so that they preserve the Square invariant (i.e. keep the dimensions equal) ...
- ... then these methods will weaken (violate) the postconditions for the Rectangle setters, which state that dimensions can be modified independently.
- Mutability is a key issue here. If Square and Rectangle had only getter methods (i.e. they were immutable objects), then no violation of LSP could occur.

Hard to prove (to say the least):

If q is the property "method for always terminates", then it is impossible for a program (compiler) to verify that it holds true for some subtype S of T even if q does hold for T.
The principle is useful however in reasoning about the design of class hierarchies:
- Preconditions cannot be strengthened in a subtype.
- Postconditions cannot be weakened in a subtype.
- Invariants of the supertype must be preserved in a subtype.

Modules

A module is a program unit which is an (more or less) independent entity. A module consists of a number of definitions (of types, variables, functions, procedures and so on), with a clearly defined interface stating what it exports to other modules which use it. Modules have a number of advantages for the construction of large programs.

Modules facilitate parallel and independent development
Modules facilitate separate compilation
Modules facilitate code reuse

Modules are used to construct libraries, ADTs, classes, interfaces, and implementations. A module is the compilation unit. A module which contains only type abstractions is a specification or interface module.

In program construction the module designer must answer the following questions.

What is the module’s purpose?
How does it achieve that purpose?

Programming in the large is concerned with programs that are not comprehensible by a single individual and are developed by teams of programmers. At this level programs must consist of modules that can be written, compiled, and tested independently of other modules. A module has a single purpose, and has a narrow interface to other modules. It is likely to be reusable (able to be incorporated into may programs) and modifiable with out forcing changes in other modules.

Modules must provide answers to two questions:

What is the purpose of the module?
How does it achieve that purpose?

The what is of concern to the user of the module while the how is of concern to the implementer of the module.

Functions and procedures are simple modules. Their signature is a description of what they do while their body describes how it is achieved. More typically a module encapsulates a group of components such as types, constants, variables, procedures, functions and so on.

To present a narrow interface to other modules, a module makes only a few components visible outside. Such components are said to be exported by the module. The other components are said to be hidden inside the module. The hidden components are used to implement the exported components.

Access to the components is often by a qualified name – module name. component name. When strong safety considerations are important, modules using components of another module may be required to explicitly import the required module and the desired components.

This document was generated on April 19, 2011 using texi2html 5.0.