[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
(note: for pedagogical reasons, the following tale is a little chronologically tangled.)
Once apon a time, a long time ago, in a galaxy far far away, memory was a set of bits. Programmers wrote little notes to themselves that bits 0 to 6 held a "name", that the next 32 bits were a pointer and the rest was an image bit map.
Then, at the end of WWII, Grace Hopper asked herself "for tools are needed to make programs easier to write?". Her answer was "the compiler", a tool that translates instructions written in a programming language into the binary coding of machine language (where the programming language was written in some kind of subset of English that anyone could read). Memory was no longer a string of bits, but "records" with "fields"; e.g.
;; COBOL records FD Employee-File RECORD CONTAINS 65 CHARACTERS DATA RECORD IS Employee-Rec. 01 Employee-Rec. 02 Employee-ID PIC X(10). 02 Employee-Name. 03 Last-Name PIC X(20). 03 First-Name PIC X(12). 03 Middle-Init PIC X. 02 Position. 03 Job-Code PIC X(4). 03 Department PIC X(3). 03 Manager-ID PIC X(10) 02 Hourly-Pay PIC 9(3)V99.
And so was born abstraction. It was realized that different records had similar fields (e.g. numbers, strings, dates) and each of these "types" came with a set of constraints. Some simple types can be built into the compiler (e.g. Integer, Real) and some others might be domain-specific. So compilers got augmented with type definitions:
/* types in pascal */ type Boolean= (true, false); Char = 'a' .. 'z'; Letter = (draft,sending,sent); var r: Real; i: Integer; c: Char; b: Boolean; t: Letter; e: (apple, pear, banana, orange, lemon);
Then it was realized that some types are complex combinations of simple types:
type a = Array [1..10] of Integer; b = record x: Integer; y: Char end; c = File of a; Nodeptr = ^Node; Node = record a: Integer; b: Char; c: Nodeptr end;
(In the above, the notation "^Node" is a special PASCAL type for "pointers").
Here’s another example:
(* complext types in OCAML *) type paragraph = Normal of par_text | Pre of string * string option | Heading of int * par_text | Quote of paragraph list | Ulist of paragraph list * paragraph list list | Olist of paragraph list * paragraph list list and par_text = text list and text = Text of string | Emph of string | Bold of string | Struck of par_text | Code of string | Link of href | Anchor of string | Image of img_ref and href = { href_target : string; href_desc : string; } and img_ref = { img_src : string; img_alt : string; }
Then it was realized that the types can be augmented with operations that apply just to those types.
-- complex numbers in LUA complex = {} function complex.new (r, i) return {r=r, i=i} end -- defines a constant `i' complex.i = complex.new(0, 1) function complex.add (c1, c2) return complex.new(c1.r + c2.r, c1.i + c2.i) end function complex.sub (c1, c2) return complex.new(c1.r - c2.r, c1.i - c2.i) end function complex.mul (c1, c2) return complex.new(c1.r*c2.r - c1.i*c2.i, c1.r*c2.i + c1.i*c2.r) end function complex.inv (c) local n = c.r^2 + c.i^2 return complex.new(c.r/n, -c.i/n) end return complex -- example c = complex.add(complex.i, complex.new(10, 20))
Then it was realized that packages of operations and types can units of parrallelism and concurrecny (it you pick up "X", you get all "X"’s internal state and operations). And that gave us simulation languages like Simula.
Then it was realized that types can have sub-types that specialize the operations (welcome to polymorphism and inheritance).
And some argued that should be no "primitive types" pre-defined and unalterable by the compiler- that everything should be customizable (welcome to Smalltalk)
And all through this sequence, the compilers got smarter and smarter about types and abstraction: if we know that only these operations are valid for this variable in this memory location, then we can block the application of the wrong operations to the wrong memory location.
Note that if we try to infer as much as possible about the space of what types are touched by some variable (of some other type), then we can detect programming errors before we run the program.
Which takes us to the modern functional languages that view runtime operations as the conversion of type1 to type2 (and, it type2 is some error type then "type checking" is still type conversion).
But the notion of types and abstraction applies to more than just OCaml, Haskell, etc. As shown below, this sequence of named memory locations of certain types prevades all modern languages.
So too our list of "what is common across all languages", we now add "types" and "abstraction" (and the rest of the list is "tests, paradigms" and, for some languages "closures" (a.k.a. lambda bodies)).
So lets go back to basics.
The ability to abstract and to generalize is an essential part of any intellectual activity. Abstraction and generalization are fundamental to mathematics and philosophy and are essential in computer science as well.
The importance of abstraction is derived from its ability to hide irrelevant details and from the use of names to reference objects. Programming languages provide abstraction through procedures, functions, and modules which permit the programmer to distinguish between what a program does and how it is implemented. The primary concern of the user of a program is with what it does. This is in contrast with the writer of the program whose primary concern is with how it is implemented. Abstraction is essential in the construction of programs. It places the emphasis on what an object is or does rather than how it is represented or how it works. Thus, it is the primary means of managing complexity in large programs.
Of no less importance is generalization. While abstraction reduces complexity by hiding irrelevant detail, generalization reduces complexity by replacing multiple entities which perform similar functions with a single construct. Programming languages provide generalization through variables, parameterization, generic and polymorphism. Generalization is essential in the construction of programs. It places the emphasis on the similarities between objects. Thus, it helps to manage complexity by collecting individuals into groups and providing a representative which can be used to specify any individual of the group.
Abstraction and generalization are often used together. Abstracts are generalized through parameterization to provide greater utility. In parameterization, one or more parts of an entity are replaced with a name which is new to the entity. The name is used as a parameter. When the parameterized abstract is invoked, it is invoked with a binding of the parameter to an argument.
Exercise: draw a mammal that is not a dog, cat, bear, human, etc...
Principle of Abstraction An abstract is a named entity which may be invoked by mentioning the name.
An object is said to be fully abstract if it can be understood without reference to any thing external to the object. So a fully abstract concept is a complete thing unto itself and can be studied seperately to the rest of the universe
Among the various terms for abstracts found in other texts are module, package, library, unit, subprogram, subroutine, routine, function, procedure, abstract type, object. So, as you can see, abstraction is a core concept to many languages.
Terminology. The naming aspect of abstraction is captured in the concepts of binding, definition and declaration while the hiding of irrelevant details is captured by the concept of encapsulation. A binding is an association of two entities:
We could equally well say identifier instead of name:
The concept of binding is common to all programming languages. The objects which may be bound to names are called the bindables of the language. The bindables may include: primitive values, compound values, references to variables, types, and executable abstractions. While binding occurs in definitions and declarations, it also occurs at the virtual and hardware machine levels between values and storage locations.
The abstract part of a binding often contains other bindings which are said to be local definitions. Such local definitions are not visible or available to be referenced outside of the abstract. Thus the abstract part of a binding involves “information hiding”. This hidden information is sometimes made available by exporting the names.
A module system provides a way of writing large program so that the various pieces of the program don’t interfere with on another because of name clashes and also provides a way of hiding implementation details. ... A module generally consists of two parts, the export part and the local part. The export part of a module consists of language declarations for the symbols available for use in either part of the module and in other modules which import them and module declaration giving the symbols from other modules which are available for use in either part of the module and in other modules which import them. The local part of a module consists of language declarations for the symbols available for use only in this part. TGPL-Hill and Lloyd
-- exporting names in LUA local P = {} -- shorthand for "complex"... save LOTS of typing complex = P local function checkComplex (c) if not ((type(c) == "table") and tonumber(c.r) and tonumber(c.i)) then error("bad complex number", 3) end end local function new (r, i) return {r=r, i=i} end local function add (c1, c2) checkComplex(c1); checkComplex(c2); return new(c1.r + c2.r, c1.i + c2.i) end ... complex = { new = new, add = add, sub = sub, mul = mul, div = div, }
To fully understand the LUA you ahve to understand that "new = new" binds a novel symbol to an existing symbol inside the package. Its hard to get your brain around but simple enough to do in LUA.
Here’s packages in SWI-PROLOG. You don’t need to understand the language- the only important detail here is that if someone else loads the following file, all they can call is something called "reverse" that takes two arguments. Any lower-level stuff (e.g. the fact that there is some sub-routine called "rev") is hidden away.
% packages in swi prolog :- module(reverse, [reverse/2]). reverse(List1, List2) :- rev(List1, [], List2). rev([], List, List). rev([Head|List1], List2, List3) :- rev(List1, [Head|List2], List3).
And in case you missed it- you’ve been using packages in LISP from day1. Ever noticed the prompt on the REPL?
CL-USER>
That says that you are working in the common-lisp user package. Every symbol (which includes variables and functions) in LISP exists in one package. Symbols from other packages can be imported . And this lets us divide up large stuff into smaller chunks with well-defined interfaces.
CL-USER> (in-package :com.gigamonkeys.email-db) #<The COM.GIGAMONKEYS.EMAIL-DB package> EMAIL-DB> EMAIL-DB> (defun hello-world () (format t "hello from EMAIL-DB package~%")) HELLO-WORLD EMAIL-DB> (hello-world) hello from EMAIL-DB package NIL EMAIL-DB> (in-package :cl-user) #<The COMMON-LISP-USER package> CL-USER> CL-USER> (hello-world) hello, world NIL
While the concept of modules is a useful abstraction, the full advantages of modules are gained only when modules may be written, compiled and possibly executed separately. In many cases modules should be able to be tested independently of other modules.
Advantages
Implementation
Typical applications:
An abstract data type (ADT) is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics.
An abstract data type is defined indirectly, only by the operations that may be performed on it and by mathematical constraints on the effects (and possibly cost) of those operations. For example:
-- complex numbers in LUA complex = {} function complex.new (r, i) return {r=r, i=i} end -- defines a constant `i' complex.i = complex.new(0, 1) function complex.add (c1, c2) return complex.new(c1.r + c2.r, c1.i + c2.i) end function complex.sub (c1, c2) return complex.new(c1.r - c2.r, c1.i - c2.i) end function complex.mul (c1, c2) return complex.new(c1.r*c2.r - c1.i*c2.i, c1.r*c2.i + c1.i*c2.r) end function complex.inv (c) local n = c.r^2 + c.i^2 return complex.new(c.r/n, -c.i/n) end return complex -- example c = complex.add(complex.i, complex.new(10, 20))
Weakly typed programming languages support either implicit type conversion:
;; e.g. Perl, Rexx a = 2 b = '2' concatenate(a, b) # Returns '22' add(a, b) # Returns 4
Stong typing:
;; e.g. JAVA, Python a = 2 b = '2' concatenate(str(a), b) # Returns '22' add(a, int(b)) # Returns 4 concatenate(a, b) # Type Error add(a, b) # Type Error
If you are not weak, you are strong:
The object-oriented programming languages Smalltalk, Ruby, Python, and Self are all "strongly typed" in the sense that typing errors are prevented at runtime and they do little implicit type conversion, but these languages make no use of static type checking: the compiler does not check or enforce type constraint rules.
Standard ML, Objective Caml and Haskell have purely static type systems, in which the compiler automatically infers a precise type for all values. These languages (along with most functional languages) are considered to have stronger type systems than Java, as they permit no implicit type conversions. While OCaml’s libraries allow one form of evasion (Object magic), this feature remains unused in most applications
A programming language is said to be dynamically typed when the majority of its type checking is performed at run-time as opposed to at compile-time.
In dynamic typing, values have types but variables do not; that is, a variable can refer to a value of any type.
Dynamically typed languages include APL, Erlang, Groovy, JavaScript, Lisp, Lua, MATLAB/GNU Octave, Perl (with respect to user-defined types but not built-in types), PHP, Prolog, Python, Ruby, Smalltalk and Tcl.
Compared to static typing, dynamic typing can be more flexible (e.g., by allowing programs to generate types and functionality based on run-time data), though at the expense of fewer a priori guarantees. This is because a dynamically typed language accepts and attempts to execute some programs which may be ruled as invalid by a static type checker.
e.g. Caml is expression-based, there are no pure "commands" like in Java/C++; instead, commands are also expressions, they return values.
# (if (2=3) then 5 else 6) + 1;; - : int = 7
One sometimes annoying consequence of the above is the two branches of the if need to return the same type.
if (2=3) then 5 else 6.5;; Characters 21-24:
This expression has type float but is here used with type int
Dynamic typing may result in runtime type errors—that is, at runtime, a value may have an unexpected type, and an operation nonsensical for that type is applied. This operation may occur long after the place where the programming mistake was made—that is, the place where the wrong type of data passed into a place it should not have. This may make the bug difficult to locate.
Q: If we break things up into seperate bits, each with invariants and internal details, how do we run over them all? Keeping internals private?
A: Iterators. Generic looping constructs that return "next" or "nil".
Example (if you understand the following, then you understand closures- a.k.a. lambda bodies- and you understand that LUA uses closures):
-- LUA: return an iterator for all items in a table of size table.getn(t) function list_iter (t) local i = 0 local n = table.getn(t) return function () i = i + 1 if i <= n then return t[i] end end end t = {10, 20, 30} for element in list_iter(t) do print(element) end
Note: in LUA, the "for" keyword knows about iterators.
A more complex example: returns all words in a file.
-- LUA function allwords () local line = io.read() -- current line local pos = 1 -- current position in the line return function () -- iterator function while line do -- repeat while there are lines local s, e = string.find(line, "%w+", pos) if s then -- found a word? pos = e + 1 -- next position is after this word return string.sub(line, s, e) -- return the word else line = io.read() -- word not found; try next line pos = 1 -- restart from first position end end return nil -- no more lines: end of traversal end end
The main part of the iterator function is the call to string.find. This call searches for a word in the current line, starting at the current position. It describes a "word" using the pattern ’%w+’, which matches one or more alphanumeric characters. If it finds the word, the function updates the current position to the first character after the word and returns that word. (The string.sub call extracts a substring from line between the given positions). Otherwise, the iterator reads a new line and repeats the search. If there are no more lines, it returns nil to signal the end of the iteration.
Despite its complexity, the use of allwords is straightforward:
-- LUA for word in allwords() do print(word) end
This is a common situation with iterators: They may be difficult to write, but are easy to use. This is not a big problem; more often than not, end users programming in Lua do not define iterators, but only use those provided by the application.
A block is a construct that delimits the scope of any definitions that it may contain. It provides a local environment i.e., a opportunity for local definitions. The block structure (the textual relationship between blocks) of a programming language has a great deal of influence over program structure and modularity. There are three basic block structures–monolithic, flat and nested.
A program has a monolithic block structure if it consists of just one block. This structure is typical of BASIC and early versions of COBOL. The monolithic structure is suitable only for small programs. The scope of every definition is the entire program. Typically all definitions are grouped in one place even if they are used in different parts of the program.
A program has a flat block structure if it is partitioned into distinct blocks, an outer all inclosing block one or more inner blocks i.e., the body may contain additional blocks but the inner blocks may not contain blocks. This structure is typical of FORTRAN and C. In these languages, all subprograms (procedures and functions) are separate, and each acts as a block. Variables can be declared inside a subprogram are then local to that subprogram. Subprogram names are part of the outer block and thus their scope is the entire program along with global variables. All subprogram names and global variables must be unique. If a variable cannot be local to a subprogram then it must be global and accessible by all subprograms even though it is used in only a couple of subprograms.
A program has nested block structure if blocks may be nested inside other blocks i.e., there is no restriction on the nesting of blocks within the body. This is typical of the block structure of the Algol-like languages. A block can be located close to the point of use. In blocks visibility is controlled by nesting. All names are visible (implicitly exported) to internally nested blocks. No names are visible (exported) to enclosing blocks. In a block, the only names visible are those that are declared in all enclosing blocks or are declared in the block, but not those declared in nested blocks.
In this example, procedure:
Note that
The three basic block structures are sufficient for what is called programming in the small (PITS). These are programs which are comprehensible in their entirety by an individual programmer. However, they are not general enough for very large programs. Large programs which are written by many individuals and which must consist of modules that can be developed and tested independently of other modules. Such programming is called programming in the large (PITL).
The act of partitioning a program raises the issue of the scope of names. Which objects with in the partition are to be visible outside the partition? The usual solution is to designate some names to be exported and others to be private or local to the partition and invisible to other partitions. In case there might be name conflict between exported names from partitions, partitions are often permitted to designate names that are to be imported from designated partitions or to qualify the name with the partition name. The scope rules for modules define relationships among the names within the partitions. There are four choices.
Internally, this means that the symbols are stored in a tree structure and when looking up a definition, you look locally in your branch and upwards:
Bindings may occur at various times from the point of language definition through program execution. The time at which the binding occurs is termed the binding time. Four distinct binding times may be distinguished.
Early binding often permits more efficient execution of programs though translation time type checking while late binding permits more flexibility through program modification a run-time.
Here’s one version of the Smalltalk class hierarchy. Note that in 116 Objects, nearly a third are OO models of errors.
1 Object | one ring to rule then all 2 Behavior 3 ClassDescription 4 Class | Class is an instance of MetaClass 5 Metaclass | MetaClass is an instance of itself 6 BlockClosure | e.g. [30 > 3] 7 Boolean 8 False | has one instance: false 9 True | has one instance: true 10 Directory 11 FilePath 12 File | File 13 Iterable 14 Collection 15 Bag | when adding, inc counts 16 HashedCollection 17 Dictionary | Set of Associations (key/value pairs) 18 ... 19 SystemDictionary | one instance: Smalltalk 20 LookupTable 21 IdentityDictionary | fast. uses == for equality 22 Set | unordered, no repeats 23 IdentitySet | fase. uses == for equality 24 SequenceableCollection 25 ArrayedCollection 26 Array | fixed size, can't grow 27 CharacterArray 28 String 29 Symbol | memory effecient Strings 30 UnicodeString 31 Interval | start to stop by step 32 LinkedList | an old friend 33 Semaphore 34 OrderedCollection | closest thing we have to a list 35 SortedCollection | sort as a side effect of addition 36 Stream | pointers into a Collection 37 FileDescriptor 38 FileStream 39 Generator 40 PeekableStream 41 PositionableStream 42 ReadStream 43 WriteStream 44 ReadWriteStream | toss of the coin 45 Random 46 RoundRobinStream 47 Magnitude | anything that can be compared with "<" 48 Character 49 UnicodeCharacter 50 Date 51 DateTime 52 LookupKey 53 Association 54 Number 55 Float 56 Fraction | uses two intergers 57 Integer 58 LargeInteger 59 LargeNegativeInteger 60 LargePositiveInteger 61 LargeZeroInteger 62 SmallInteger 63 Time 64 Duration 65 Point 66 Regex 67 Signal | OO error handling 68 Exception 69 Error 70 ArithmeticError 71 ZeroDivide 72 InvalidValue 73 EmptyCollection 74 InvalidArgument 75 AlreadyDefined 76 ArgumentOutOfRange 77 IndexOutOfRange 78 InvalidSize 79 NotFound 80 PackageNotAvailable 81 InvalidProcessState 82 NotIndexable 83 ProcessTerminated 84 ReadOnlyObject 85 WrongClass 86 MustBeBoolean 87 MessageNotUnderstood 88 MutationError 89 NotEnoughElements 90 NotImplemented 91 NotYetImplemented 92 ShouldNotImplement 93 SubclassResponsibility 94 WrongMessageSent 95 VMError 96 BadReturn 97 NoRunnableProcess 98 PrimitiveFailed 99 CInterfaceError 100 FileError 101 WrongArgumentCount 102 SecurityError 103 UserInterrupt 104 VerificationError 105 Halt 106 Notification 107 EndOfStream 108 PackageSkip 109 ProcessBeingTerminated 110 Warning 111 UnhandledException 112 UndefinedObject | one instance: nil
A type signature defines the inputs and outputs for a function, subroutine or method.
A type signature includes at least the function name and the number of its arguments.
In some programming languages, it may also specify the function’s return type, the types of its arguments, or errors it may pass back.
// Ocaml #let rec fib n = if n < 2 then 1 else fib(n-1) + fib(n-2);; val fib : int -> int = <fun>
decrement is -- Decrease counter by one. require item > 0 -- pre-condition do item := item - 1 ensure item = old item - 1 -- post-condition end
Descend the hierarchy, make the types more specific, covers few examples (e.g. mammals -> dogs).
Acsend the hierachy, make the types more general, cover more examples (e.g. OldMan -> Man -> Object).
Note: another way to write pre- post- conditions is to strengthen and weaken signatures
If S is a subtype of T, then:
A.k.a. strong behavioural subtyping
Example of violation:
Hard to prove (to say the least):
A module is a program unit which is an (more or less) independent entity. A module consists of a number of definitions (of types, variables, functions, procedures and so on), with a clearly defined interface stating what it exports to other modules which use it. Modules have a number of advantages for the construction of large programs.
Modules are used to construct libraries, ADTs, classes, interfaces, and implementations. A module is the compilation unit. A module which contains only type abstractions is a specification or interface module.
In program construction the module designer must answer the following questions.
Programming in the large is concerned with programs that are not comprehensible by a single individual and are developed by teams of programmers. At this level programs must consist of modules that can be written, compiled, and tested independently of other modules. A module has a single purpose, and has a narrow interface to other modules. It is likely to be reusable (able to be incorporated into may programs) and modifiable with out forcing changes in other modules.
Modules must provide answers to two questions:
The what is of concern to the user of the module while the how is of concern to the implementer of the module.
Functions and procedures are simple modules. Their signature is a description of what they do while their body describes how it is achieved. More typically a module encapsulates a group of components such as types, constants, variables, procedures, functions and so on.
To present a narrow interface to other modules, a module makes only a few components visible outside. Such components are said to be exported by the module. The other components are said to be hidden inside the module. The hidden components are used to implement the exported components.
Access to the components is often by a qualified name – module name. component name. When strong safety considerations are important, modules using components of another module may be required to explicitly import the required module and the desired components.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on April 19, 2011 using texi2html 5.0.