What is a type?
A computation is a sequence of operations applied to a value to yield a value. Thus values and operations are fundamental to computation. Values are the subject of this chapter and operations are the subject of later chapters.
In mathematical terminology, the sets from which the arguments and results of a function are taken are known as the function's ``domain'' and ``codomain'', respectively. Consequently, the term domain will denote any set of values that can be passed as arguments or returned as results. Associated with every domain are certain ``essential'' operations. For example, the domain of natural numbers is equipped with an the ``constant'' operation which produces the number zero and the operation that constructs the successor of any number. Additional operations (such as addition and multiplication) on the natural numbers may be defined using these basic operations.
Programming languages utilize a rich set of domains. Truth values, characters, integers, reals, records, arrays, sets, files, pointers, procedure and function abstractions, environments, commands, and definitions are but some of the domains that are found in programming languages. There are two approaches to domains. One approach is to assume the existence of a universal domain. It contains all those objects which are of computational interest. The second approach is to begin with a small set of values and some rules for combining the values and then to construct the universe of values. Programming languages follow the second approach by providing several basic sets of values and a set of domain constructors from which additional domains may be constructed.
Domains are categorized as primitive or compound. A primitive domain is a set that is fundamental to the application being studied. Its elements are atomic. A compound domain is a set whose values are constructed from existing domains by one or more domain constructors.
Aside. It is common in mathematics to define a set but fail to give an effective method for determining membership in the set. Computer science on the other hand is concerned with determining membership with in a finite number of steps. In addition, a program is often constrained by requirements to complete its work with in bounds of time and space. % In computer science ... streams ... infinite sequences ... % halting problem ... robust
Terminology. Domain theory is the study of structured sets and their operations. A domain is a set of elements and an accompanying set of operations defined on the domain.The terms domain, type, and data type may be used interchangeably.
The term data refers to either an element of a domain or a collection of elements from one or more domains.
The terms compound, composite and structured when applied to values, data, domains, types are used interchangeably.
Compound domains are constructed by a domain builder. A domain builder consists of a set of operations for assembling and disassembling elements of a compound domain. The domain builders are:
In the binary case, the product domain builder ,×, builds the domain A × B from domains A and B. The domain builder includes the assembly operation, ordered pair builder, and a set of disassembly operations called projection functions. The assembly operation, ordered pair builder, is defined as follows:
if a is an element of A and b is an element of B then (a, b) is an element of A × B. That is,The disassembly operations fst and snd are projection functions which extract elements from tuples. For example, fst extracts the first component and snd extracts the second element.A × B = { (a,b) | a in A, b in B }
The product domain is easily generalized (see Figure N.1) to construct the product of an arbitrary number of domains.
Assembly operation: (a0,...,an) in D0 × ... × Dn where ai in Di and
D0 × ... × Dn = { (a0,...,an) | ai in Di }Disassembly operation: (a0,...,an)|i = ai for 0 <= i <= n
Both relational data bases and logic programming paradigm (Prolog) are based on programming with tuples.
Elements of product domains are usually implemented as a contiguous block of storage in which the components are stored in sequence. Component selection is determined by an offset from the address of the first storage unit of the storage block. An alternate implementation (possibly required in functional or logic programming languages) is to implement the value as a list of values. Component selection utilizes the available list operations.
Terminology. The product domain is also called the ``Cartesian'' or ``cross'' product. In Pascal it is called a record and in C a structure.Implementation.
Product domain elements are usually implemented as contiguous locations
in memory. Using the notation introduced in the Introduction,
Product Descriptor | Values
|
In the binary case, the sum domain builder, +, builds the domain A + B from domains A and B. The domain builder includes a pair of assembly operations and a disassembly operation. The two assembly operations of the sum builder are defined as follows:
if a is an element of A and b is an element of B then (A,a) and (B, b) are elements of A + B. That is,The disassembly operation returns the element iff the tag matches the request.A + B = { (A,a) | a in A } union { (B,b) | b in B } where the A and B are called tags and are used to distinguish between the elements contributed by A and the elements contributed by B.
The sum domain differs from ordinary set union in that the elements of the union are labeled with the parent set. Thus even when two sets contain the same element, the sum domain builder tags them differently.
The sum domain generalizes (see Figure N.2) to sums of an arbitrary number of domains.
Figure N.2: Sum Domain: D0 + ... + Dn Assembly operations: (Di, di) in D0 + ... + Dn and D0 + ... + Dn = Unioni=0n { (Di,d) | d in Di }
Disassembly operations: Di(Di, di) = di
Terminology. The sum domain is also called the disjoint union or co-product domains. In Pascal it is called a variant record and in C a union.
Pascal it is called a record and in C a structure.Implementation.
Sum domain elements are usually implemented as a contiguous piece of
memory large enough to hold a value of any of the domains and a tag which
is used to determine the domain to which the value belongs.
Using the notation introduced in the Introduction,
Sum Descriptor | Tag, Value |
The function domain builder creates the domain A --> B from the domains A and B. The domain A --> B consists of all the functions from A to B. A is called the domain and B is called the co-domain.
The assembly operation is:
(lambda x.e) is an element in A --> B whenever e is an expression containing occurrences of an identifier x, such that whenever a value a in A replaces the occurrences of x in e, the value e[a:x] in B results, then.The disassembly operation is function application. It takes two arguments, an element f of A --> B and an element a of A and produces f(a) an element of B. In the case of arrays, the disassembly operation is called subscripting.
The function domain is summarized in Figure N.3.
Figure N.3: Function Domain: A --> B Assembly operation: (lambda x.E) in A --> B where for all a in A, E[x:a] is a unique value in B.
Disassembly operation: (g a) in B, for g in A --> B and a in A.
Mappings (or functions) from one set to another are an extremely important compositional method. The map m from a element x of S (called the domain) to the corresponding element m(x) of T (called the range) is written as:
where if m(x) = a and m(y) = a then x = y. Mappings are more restricted than the Cartesian product since, for each element of the domain there is a unique range element. Often it is either difficult to specify the domain of a function or an implementation does not support the full domain or range of a function. In such cases the function is said to be a partial function. It is for efficiency purposes that partial functions are permitted and it becomes the programmer's responsibility to inform the users of the program of the nature of the unreliability.
Arrays are mappings from an index set to an array element type. An array is a finite mapping. Apart from arrays, mappings occur as operations and function abstractions. Array values are implemented by allocating a contiguous block of storage where the size of the block is based on the product of the size of an element of the array and the number of elements in the array.
The operations provided for the primitive types are maps. For example, the addition operation is a mapping from the Cartesian product of numbers to numbers.
The functional programming paradigm is based on programming with maps.
Terminology. The function domain is also called the function space.Implementation.
Function domain elements are usually implemented in code.
However, arrays are a special case of function domain and they are usually
implemented in contiguous memory elements. Using the notation introduced
in the Introduction,
Function Descriptor | value1 |
... | |
valuen |
The set of all subsets of a set is the power set and is defined:
Subtypes and subranges are examples of the power set constructor.
Functions are subsets of product domains. For example, the square function can be represented as a subset of the product domain Nat × Nat.
Generalization helps to simplify this infinite list to:
The programming language SetL is based on computing with sets.
Set values may be implemented by using the underlying hardware for bit-strings. This makes set operations efficient but constrains the size of sets to the number of bits(typically) in a word of storage. Alternatively, set values may be implemented using software, in which case, hash-coding or lists may be used.
Some languages provide mechanisms for decomposing a type into subtypes
Figure N.4: Power Domain: PD Assembly operations: Ø in PD, { a } in PD for a in D, and Si union Sj in PD for Si, Sj in PD
The definition is called recursive because the name of the domain ``recurs'' on the right hand side of the definition. Recursively defined domains depend on abstraction since the name of the domain is an essential part the definition of the domain. The context-free grammars used in the definition of programming languages contain recursive definitions so programming languages are examples of recursive types.
More than one set may satisfy a recursive definition. However, it may be shown that a recursive definition always has a least solution. The least solution is a subset of every other solution.
The least solution of a recursively defined domain is obtained through a sequence of approximations (D0, D1,...) to the domain with the domain being the limit of the sequence of approximations (D = limi --> infty Di). The limit is the smallest solution to the recursive domain definition.
We illustrate the limit construction (see Figure N.5) with three examples.
Figure N.5: Limit Construction
D0 = null Di+1 = e[D:Di] for i=0,... D = limi --> infty Di
The defining sequence for the natural numbers is:
N0 = Null |
Ni+1 = 0 | S(Ni) for i = 0,... |
N0 = Null |
N1 = 0 |
N2 = 0 | S(0) |
N3 = 0 | S(0) | S(S(0)) |
N4 = 0 | S(0)| S(S(0)) | S(S(S(0))) |
... |
The factorial function is often recursively defined as:
fac(n) = { | 1 | if n = 0 |
n × fac(n-1) | otherwise |
And the function faci+1 is defined as
faci+1(n) = { | 1 | if n = 0 |
n × faci(n-1) | otherwise |
fac0 = { } |
fac1 = { (0,1) } |
fac2 = { (0,1), (1,1) } |
fac3 = { (0,1), (1,1), (2,2) } |
fac4 = { (0,1), (1,1), (2,2), (3,6) } |
... |
The proof of this last equation is beyond the scope of this text. This construction suggests that recursive definitions can be understood in terms of a family of non-recursive definitions and in format common to each member of the family.
ancestor(A,D), if | parent(A,D) or |
parent(A,I) & ancestor(I,D) |
And the relation ancestori is defined as
ancestori+1(A,D), if | parent(A,D) or |
parent(A,I) & ancestori(I,D) |
parent( John, Mary ) |
parent( Mary, James ) |
parent( James, Alice ) |
ancestor0 = { } |
ancestor1 = { (John, Mary), (Mary, James), (James, Alice)} } |
ancestor2 = ancestor1 union {(John, James), (Mary, Alice)} |
ancestor3 = ancestor2 union { (John,Alice) } |
Loop0 is defined as:Loop : if i < n --> if a[i] != target --> i := i + 1; Loop fi fi
and Loopi+1 is defined as:
with the result of unrolling the recursion into a sequence of if-commands.Loopi+1 : if i < n --> if a[i] != target --> i := i + 1; Loopi fi fi
Implementation
Since recursively defined domains like lists, stacks and trees are unbounded (in general may be infinite objects) they are implemented using a product domain where one domain is a node and one or more are address domains. In Pascal, Ada and C such domains are defined in terms of pointers while Prolog and functional languages like ML and Miranda allow recursive types to be defined directly.
Definition N.M: Type System A type system is a set of rules for defining types and associating a type with expression in the language. A type system rejects an expression if it does not associate a type with the expression.
If the errors are to be detected at compile time then a static type checking system is required. One approach to static type checking is to require the programmer to specify the type of each object in the program. This permits the compiler to perform type checking before the execution of the program and this is the approach taken by languages like Pascal, Ada, C++, and Java. Another approach to static type checking is to add type inference capabilities to the compiler. In such a system, the compiler performs type checking by means of a set of type inference rules and is able to flag type errors prior to runtime. This is the approach taken by Miranda and Haskell.
If the error detection is to be delayed until execution time, then dynamic type checking is required. In dynamic type checking, each data value is tagged with type information so that the run time environment can check type compatibility and possibly perform type conversions if necessary. The programming languages Lisp, Scheme and Small-talk are examples of dynamically typed languages
Example. In C, the decision portion of any control structure can be any expression that produces a value. If the value is 0, it is treated as false and any nonzero values is treated as true. Since the value of an assignment command is the value of its right-hand side, the command if x = 4 ... any else clause will be ignored. In C characters are treated as integers and thus may occur in arithmetic expressions. C's type system is not robust enough to protect novice programmers these and other errors.The advantage of untyped languages is their flexibility. The programmer has complete control over how a data value is used but must assume full responsibility for detecting the application of operations to objects of incompatible type.
Figure N.M: Type Checking Definitions A language is said to be
- untyped if no type abstractions are inforced,
- strongly typed if it enforces type abstractions (operations may be applied only to objects of the appropriate type),
- statically typed if the type of each expression can be determined from the program text,
- dynamically typed if the determination of the type of some expression depends on the runtime behavior of the program.
A strongly typed language enforces type abstractions. Most languages are strongly typed with respect to the primitive types supported by the language. So, for example, the mixing of numeric and character types that is permissible in C is not permitted in Pascal or Ada.
Strong typing helps to insure the security and portability of the code and it often requires the programmer to explicitly define the types of each object in a program. It is also important in compilation for picking appropriate operations and for optimization.
If the types of all variables can be known from an examination of the text (i.e. at compile time), then the a language is said to be statically typed. Pascal, Ada, and Haskell are examples of statically typed languages.
Static typing is widely recognized as a requirement for the production of safe and reliable software. Static type checking implies that the types are checked at compile time. Static typing is chosen when efficiency in execution time is important and compiler support is used to support good software engineering practices.
If the type of a variable can only be known at run-time, then the language is said to be dynamically typed. Lisp and Smalltalk are examples of dynamically typed languages.
Dynamic type checking implies that the types are checked at execution time and that every value is tagged to identify its type in order to make the type checking possible. The penalty for dynamic type checking is additional space and time overheads.
Dynamic typing is often justified on the assumption that its flexibility permits the rapid prototyping of software.
Prolog relies on pattern matching to provide a semblance of type checking. There is active research on adapting type checking systems for Prolog.
Modern functional programming languages such as Miranda and Haskell and object-oriented languages combine the safety of static type checking with the flexibility of dynamic type checking through polymorphic types.
Name equivalence was chosen for Modula-2, Ada, C (for records), and Miranda. The predecessor of Modula-2, Pascal violates name equivalence since file type names are not required to be shared by different programs accessing the same file.
Definition N.1: Two types T, T' are name equivalent iff T and T' are the same name.
Two types T, T' are structurally equivalent iff T and T' have the same set of values.
The following three rules may be used to determine if two types are structurally equivalent.
A type checker must be able to
Axiom given that: f is of type A --> B and x is of type A
infer that: f(x) is type correct and has type B
Examples. In Miranda (a functional language) the types for the arithmetic + operation are declared as follows:+ :: num --> num --> num In Pascal the type of a function for computing the circumference of a circle is declared as follows:
function circumference( radius : real ) : real;
Completely monomorphic systems are rare. Most programming languages contain some operators or procedures which permit arguments of more than one type. For example, Pascal's input and output procedures permit variation both in type and in number of arguments. This is an example of overloading.
Definition N.2:
- Monomorphism: every constant, variable, parameter, operator and function has a unique type.
- Overloading refers to the use of a single syntactic identifier to refer to several different operations discriminated by the type and number of the arguments to the operation.
- Polymorphism: an operator, function or procedure that has a family of related types and operates uniformly on its arguments regardless of type.
- A polymorphic operation is one that can be applied to different but related types of arguments.
The type of the plus operation defined for integer addition is
When the same operation symbol is used for the plus operation for rational numbers and for set union, the symbol as in Pascal it is overloaded. Most programming languages provide for the overloading of the arithmetic operators. A few programming languages (Ada among others) provide for programmer defined overloading of both built-in and programmer defined operators.
When overloaded operators are applied to mixed expressions such as plus to an integer and a rational number there are two possible choices. Either the evaluation of the expression fails or one or more of the subexpressions are coerced into a corresponding object of another type. Integers are often coerced into the corresponding rational number. This type of coercion is called widening. When a language permits the coercion of a real number into an integer (by truncation for example) the coercion is called narrowing. Narrowing is not usually permitted in a programming language since information is usually lost. Coercion is an issue in programming languages because numbers do not have a uniform representation. This type of overloading is called context-dependent overloading.
Many languages provide type transfer functions so that the programmer can control where and when the type coercion is performed. Truncate and round are examples of type transfer functions.
Overloading is sometimes called ad-hoc polymorphism.
Most sorting algorithms can be explained without referring to the kind (type) of data being sorted. Typically, the data is an array of pointers to records each with an associated key. The type of the key does not matter as long as there is a "comparison" procedure which finds the minimum of a pair of keys. The sorting procedures use the compare two keys using the comparison procedure and swap the records by resetting pointers accordingly. However, in a strongly typed language this is not possible since the pointer type depends on the record type. This forces us to write a separate procedure for each type of data.
Stacks, queues, lists and trees also are largely type independent and yet in a strongly typed language, separate code must be written for each element type. Some languages permit type variables and these data structures can be defined with a type variable which then allows the user
A type system is polymorphic if abstractions operate uniformly on arguments of a family of related types.
This type of polymorphism is sometimes called parametric polymorphism.
Generalization can be applied to may aspects of programming languages.
Sometimes there are several domains which share a common operation. For example, the natural numbers, the integers, the rationals, and the reals all share the operation of addition. So, most programming languages use the same addition operator to denote addition in all of these domains. Pascal extends the use of the addition operator to represent set union. The multiple use of a name in different domains is called overloading. Ada permits user defined overloading of built in operators.
Prolog permits the programmer to use the same functor name for predicates of different arity thus permitting the overloading of functor names. This is an example of data generalization or polymorphism.
While the parameterization of an object gives the ability to deal with more than one particular object, polymorphism is the ability of an operation to deal with objects of more than a single type.
Generalization of control has focused on advanced control structures (RAM): iterators, generators, backtracking, exception handling, coroutines, and parallel execution (processes).
Principle of Type Completeness No operation should be arbitrarily restricted in the types of the values involved.
Declarations of variables of the primitive types have the following form in the imperative languages.
Declarations of enumeration types involve listing of the values in the type.I : T; { Modula-2: item I of type T} T I; // C++: item I of type T
Here are the enumerations of the items I1,...,In of type T.
Modula-2, Ada, C++, Prolog, Scheme, Miranda -- list primitive types Haskell provides the built in functions fst and snd to extract the first and second elements from binary tuples.T = {I1,...,In}; { Modula-2}\\ enum T {I1,...,In}; // C++ \\ T ::= I1 | ... | In || Miranda
Imperative languages require that the elements of a tuple be named. Modula-2 is typical; product domains are defined by record types:
The Iis are names for the component of the tuple. The individual components of the record are accessed by the use of qualified names. for example, if MyRec is a element of the above type, then the first component is referenced by MyRec.I1 and the last component is referenced by MyRec.In.record I1 : T1; ... In : Tn; end
C and C++ calls a product domain a structure and uses the following type declaration:
The Iis are names for the entries in the tuple.struct name { T1 I1; ... Tn : In; };
Prolog does not require type declaration and elements of a product domain may be represented in a number of ways, one way is by a term of the form:
The Iis are the entries in the tuple. The entries are accessed by pattern matching. Miranda does not require type declaration and the elements of a product domain are represented by tuples.name(I1,...In)
The Iis are the entries in the tuple.(I1,...In)
Here is an example of a variant record in Pascal.
The initialization of the record should follow the sequence of assigning a value to the tag and then to the appropriate subfields.% From condensed pascal type Shape = (Square, Rectangle, Rhomboid, Trapezoid, Parallelogram); Dimensions = record case WhatShape : Shape of Square : (Side1: real); Rectangle : (Length, Width : real); Rhomboid : (Side2: real; AcuteAngle: 0..360); Trapezoid : (Top1, Bottom, Height: real); Parallelogram : (\=Top2, Side3: real; ObtuseAngle: 0..360) end; var FourSidedObject : Dimensions;
The corresponding definition in Miranda isFourSidedObject.WhatShape := Rectangle; FourSidedObject.Length := 4.3; FourSidedObject.Width := 7.5;
Dimensions :: Square num | Rectangle num num | Rhomboid num num | Trapezoid num num num | Parallelogram num num num
Modula-2, Ada, C++, Prolog, Scheme, Miranda Disjoint union values are implemented by allocating storage based on the largest possible value and additional storage for the tag.area Square S = S*S area Rectangle L W = L * W ...
Modula-2
Prolog and Miranda do not provide for an array type and while Scheme does, it is not a part of the purely functional part of Scheme. Modula-2, Ada, C++, Prolog, Scheme, Miranda -- mapping type In Pascal the notation [i..j] indicates the subset of an ordinal type from element i to element j inclusive. In addition to subranges, Miranda provides infinite lists [i..] and finite and infinite arithmetic series [a,b..c], [a,b..] (the interval is (b-a)). Miranda also provides list comprehensions which are used to construct lists (sets). A list comprehension has the form [exp | qualifier]array[domain_type] of range_type {Modula-2} range_type identifier [natural number] // C++
Modula-2, Ada, C++, Prolog, Scheme, Miranda -- power set Prologsqs = [ n*n | n <-[1..] ] factors n = [ r | r <-[1..n div 2]; n mod r = 0 ] knights_moves [i,j] = [ [i+a,j+b] | a,b <-[-2..2]; a\verb+^+2+\verb+^+2=5 ]
The Miranda syntax for lists is similar to that of Prolog however, elements of lists must be all of the same type.[] [I0,...In] [H | T]
Recursive types in imperative programming languages are usually defined using a pointer type. Pointer types are an additional primitive type. Pointers are addresses.[*] [I_0,...In] [H | T]
Referencing/Dereferencing{Modula-2: the pointer and the list} type NextItem = \verb+^+ListType ListType = record item : Itemtype; next : NextItem end; // C++: the list type struct list { ItemType Item; list* Next; // pointer to list }; || Miranda: list of objects of type T and || a binary tree of type T [T] tree ::= Niltree | Node T tree tree
Recursive values are implemented using pointers. The run-time support system for the functional and logic programming languages, provides for automatic allocation and recovery of storage (garbage collection). The alternative is for the language to provide access to the run-time system so that the programmer can explicitly allocate and recover the linked structures.type ListType = record item : Itemtype; next : ListType end;
Most programming languages provide different types for persistent and transient variables. Typically files are the only type of persistent variable permitted in a programming language. Files are not usually permitted to be transient variables.
Most programming languages provide different types for persistent and transient The type completeness principle suggests that all the types of the programming language should be allowed both for transient variables and persistent variables. A language applying this principle would be simplified by having no special types, commands or procedures for input/output. The programmer would be spared the effort of converting data from a persistent data type to a transient data type on input and vice versa on output.
A persistent variable of array type corresponds to a direct file. If heap variables were persistent then the storage of arbitrary data structures would be possible.
ADT Boolean Operations and(boolean,boolean) --> boolean or(boolean,boolean) --> boolean not(boolean) --> boolean Semantic Equations and(true,true) = true or(true,true) = true not(true) = false not(false) = true Restrictions