Values, Domains and Types

A value is any thing that may be evaluated, stored, incorporated in a data structure, passed as an argument or returned as a result.

What is a type?

Realist: A type is a set of values.

Idealist: No. A type is a conceptual entity whose values are accessible only through the interpretive filter of type.

Beginning Programmer: Isn't a type a name for a set of values?

Intermediate Programmer: A type is a set of values and operations.

Advanced Programmer: A type is a way to classify values by their properties and behavior.

Algebraist: Ah! So a type is an algebra, a set of values and operations defined on the values.

Type checker: Types are more practical than that, they are constraints on expressions to ensure compatibility between operators and their operand(s).

Type Inference System: Yes and more, since a type system is a set of rules for associating with every expression a unique and most general type that reflects the set of all meaningful contexts in which the expression may occur.

Program verifier: Lets keep it simple, types are the behavioral invariants that instances of the type must satisfy.

Software engineer: What is important to me is that types are a tool for managing software development and evolution.

Compiler: All this talk confuses me, types specify the storage requirements for variables of the type.

Keywords and phrases: value, domain, type, type constructor, Cartesian product, disjoint union, map, power set, recursive type, binding, strong and weak typing, static and dynamic type checking, type inference, type equivalence, name and structural equivalence, abstract types, generic types. block, garbage collection, static and dynamic links, display, static and dynamic binding, activation record, environment, Static and Dynamic Scope, aliasing, variables, value, result, value-result, reference, name, unification, eager evaluation, lazy evaluation, strict, non-strict, Church-Rosser, overloading, polymorphism, monomorphism, coercion, transfer functions.

A computation is a sequence of operations applied to a value to yield a value. Thus values and operations are fundamental to computation. Values are the subject of this chapter and operations are the subject of later chapters.

In mathematical terminology, the sets from which the arguments and results of a function are taken are known as the function's ``domain'' and ``codomain'', respectively. Consequently, the term domain will denote any set of values that can be passed as arguments or returned as results. Associated with every domain are certain ``essential'' operations. For example, the domain of natural numbers is equipped with an the ``constant'' operation which produces the number zero and the operation that constructs the successor of any number. Additional operations (such as addition and multiplication) on the natural numbers may be defined using these basic operations.

Programming languages utilize a rich set of domains. Truth values, characters, integers, reals, records, arrays, sets, files, pointers, procedure and function abstractions, environments, commands, and definitions are but some of the domains that are found in programming languages. There are two approaches to domains. One approach is to assume the existence of a universal domain. It contains all those objects which are of computational interest. The second approach is to begin with a small set of values and some rules for combining the values and then to construct the universe of values. Programming languages follow the second approach by providing several basic sets of values and a set of domain constructors from which additional domains may be constructed.

Domains are categorized as primitive or compound. A primitive domain is a set that is fundamental to the application being studied. Its elements are atomic. A compound domain is a set whose values are constructed from existing domains by one or more domain constructors.

Aside. It is common in mathematics to define a set but fail to give an effective method for determining membership in the set. Computer science on the other hand is concerned with determining membership with in a finite number of steps. In addition, a program is often constrained by requirements to complete its work with in bounds of time and space. % In computer science ... streams ... infinite sequences ... % halting problem ... robust

Terminology. Domain theory is the study of structured sets and their operations. A domain is a set of elements and an accompanying set of operations defined on the domain.
The terms domain, type, and data type may be used interchangeably.
The term data refers to either an element of a domain or a collection of elements from one or more domains.
The terms compound, composite and structured when applied to values, data, domains, types are used interchangeably.

Elements of Domain Theory

There are many compound domains that are useful in computer science: arrays, tuples, records, variants, unions, sets, lists, trees, files, relations, definitions, mappings, etc., are all examples of compound domains. Each of these domains may be constructed from simpler domains by one or more applications of domain constructors.

Compound domains are constructed by a domain builder. A domain builder consists of a set of operations for assembling and disassembling elements of a compound domain. The domain builders are:

Product Domains
Sum Domains
Function Domains
Power Domains
Recursive Domains

Product Domain

The domains constructed by the product domain builder are called tuples in ML, records in Cobol, Pascal and Ada, and structures in C and C++. Product domains form the basis for relational databases and logic programming.

In the binary case, the product domain builder ,×, builds the domain A × B from domains A and B. The domain builder includes the assembly operation, ordered pair builder, and a set of disassembly operations called projection functions. The assembly operation, ordered pair builder, is defined as follows:

if a is an element of A and b is an element of B then (a, b) is an element of A × B. That is, A × B = { (a,b) | a in A, b in B }

The disassembly operations fst and snd are projection functions which extract elements from tuples. For example, fst extracts the first component and snd extracts the second element. nsnd(a,b) = b

The product domain is easily generalized (see Figure N.1) to construct the product of an arbitrary number of domains.

Figure N.1: Product Domain: D₀ × ... × D_n

Assembly operation: (a₀,...,a_n) in D₀ × ... × D_n where a_i in D_i and
D₀ × ... × D_n = { (a₀,...,a_n) | a_i in D_i }
Disassembly operation: (a₀,...,a_n)|_i = a_i for 0 <= i <= n

Both relational data bases and logic programming paradigm (Prolog) are based on programming with tuples.

Elements of product domains are usually implemented as a contiguous block of storage in which the components are stored in sequence. Component selection is determined by an offset from the address of the first storage unit of the storage block. An alternate implementation (possibly required in functional or logic programming languages) is to implement the value as a list of values. Component selection utilizes the available list operations.

Terminology. The product domain is also called the ``Cartesian'' or ``cross'' product. In Pascal it is called a record and in C a structure.

Implementation.

Product domain elements are usually implemented as contiguous locations in memory. Using the notation introduced in the Introduction,

Product Descriptor

Values

Descriptor₁	value₁
...	...
Descriptor_n	value_n

Sum Domain

Domains constructed by the sum domain builder are called variant records in Pascal and Ada, unions in Algol-68, constructions in ML and algebraic types in Miranda.

In the binary case, the sum domain builder, +, builds the domain A + B from domains A and B. The domain builder includes a pair of assembly operations and a disassembly operation. The two assembly operations of the sum builder are defined as follows:

if a is an element of A and b is an element of B then (A,a) and (B, b) are elements of A + B. That is, A + B = { (A,a) | a in A } union { (B,b) | b in B }
where the A and B are called tags and are used to distinguish between the elements contributed by A and the elements contributed by B.

The disassembly operation returns the element iff the tag matches the request. A(A,a) = a

The sum domain differs from ordinary set union in that the elements of the union are labeled with the parent set. Thus even when two sets contain the same element, the sum domain builder tags them differently.

The sum domain generalizes (see Figure N.2) to sums of an arbitrary number of domains.

Figure N.2: Sum Domain: D₀ + ... + D_n
Assembly operations: (D_i, d_i) in D₀ + ... + D_n and D₀ + ... + D_n = Union_i=0ⁿ { (D_i,d) | d in D_i }
Disassembly operations: D_i(D_i, d_i) = d_i

Terminology. The sum domain is also called the disjoint union or co-product domains. In Pascal it is called a variant record and in C a union.

Pascal it is called a record and in C a structure.

Implementation.

Sum domain elements are usually implemented as a contiguous piece of memory large enough to hold a value of any of the domains and a tag which is used to determine the domain to which the value belongs. Using the notation introduced in the Introduction,

Sum Descriptor

Tag, Value

Function Domain

The domains constructed by the function domain builder are called functions in Haskell, procedures in Modula-3, and procs in SR. Although their syntax often differs from that of functions, arrays are also examples of domains constructed by the function domain builder.

The function domain builder creates the domain A --> B from the domains A and B. The domain A --> B consists of all the functions from A to B. A is called the domain and B is called the co-domain.

The assembly operation is:

(lambda x.e) is an element in A --> B whenever e is an expression containing occurrences of an identifier x, such that whenever a value a in A replaces the occurrences of x in e, the value e[a:x] in B results, then.

The disassembly operation is function application. It takes two arguments, an element f of A --> B and an element a of A and produces f(a) an element of B. In the case of arrays, the disassembly operation is called subscripting.

The function domain is summarized in Figure N.3.

Figure N.3: Function Domain: A --> B
Assembly operation: (lambda x.E) in A --> B where for all a in A, E[x:a] is a unique value in B.
Disassembly operation: (g a) in B, for g in A --> B and a in A.

Mappings (or functions) from one set to another are an extremely important compositional method. The map m from a element x of S (called the domain) to the corresponding element m(x) of T (called the range) is written as:

m : S --> T

where if m(x) = a and m(y) = a then x = y. Mappings are more restricted than the Cartesian product since, for each element of the domain there is a unique range element. Often it is either difficult to specify the domain of a function or an implementation does not support the full domain or range of a function. In such cases the function is said to be a partial function. It is for efficiency purposes that partial functions are permitted and it becomes the programmer's responsibility to inform the users of the program of the nature of the unreliability.

Arrays are mappings from an index set to an array element type. An array is a finite mapping. Apart from arrays, mappings occur as operations and function abstractions. Array values are implemented by allocating a contiguous block of storage where the size of the block is based on the product of the size of an element of the array and the number of elements in the array.

The operations provided for the primitive types are maps. For example, the addition operation is a mapping from the Cartesian product of numbers to numbers.

+: number × number --> number

The functional programming paradigm is based on programming with maps.

Terminology. The function domain is also called the function space.

Implementation.

Function domain elements are usually implemented in code. However, arrays are a special case of function domain and they are usually implemented in contiguous memory elements. Using the notation introduced in the Introduction,

Function Descriptor	value₁
	...
	value_n

Power Domain

Set theory provides an elegant notation for the description of computation. However, it is difficult to provide efficient implementation of the the set operations. SetL is a programming language based on sets and was used to provide an early compiler for Ada. The Pascal family of languages provide for set union and intersection and set membership. Set variables represent subsets of user defined sets.

The set of all subsets of a set is the power set and is defined:

P^S = { s | s is a subset of S}

Subtypes and subranges are examples of the power set constructor.

Functions are subsets of product domains. For example, the square function can be represented as a subset of the product domain Nat × Nat.

sqr = {(0,0),(1,1),(2,4),(3,9),...}

Generalization helps to simplify this infinite list to:

sqr = {(x,x*x) | x in Nat}

The programming language SetL is based on computing with sets.

Set values may be implemented by using the underlying hardware for bit-strings. This makes set operations efficient but constrains the size of sets to the number of bits(typically) in a word of storage. Alternatively, set values may be implemented using software, in which case, hash-coding or lists may be used.

Some languages provide mechanisms for decomposing a type into subtypes

one is the enumeration of the elements of the subtype.
another is subranges since, enumeration is tedious for large sub-domains and many types have a natural ordering.

The power domain construction builds a domain of sets of elements. For a domain A, the power domain builder P() creates the domain P(A), a collection whose members are subsets of A.

Figure N.4: Power Domain: P^D
Assembly operations: Ø in P^D, { a } in P^D for a in D, and S_i union S_j in P^D for S_i, S_j in P^D

Recursively Defined Domain

Recursively defined domains are domains whose definition is of the form: D : ... D ...

The definition is called recursive because the name of the domain ``recurs'' on the right hand side of the definition. Recursively defined domains depend on abstraction since the name of the domain is an essential part the definition of the domain. The context-free grammars used in the definition of programming languages contain recursive definitions so programming languages are examples of recursive types.

More than one set may satisfy a recursive definition. However, it may be shown that a recursive definition always has a least solution. The least solution is a subset of every other solution.

The least solution of a recursively defined domain is obtained through a sequence of approximations (D₀, D₁,...) to the domain with the domain being the limit of the sequence of approximations (D = lim_{i --> infty} D_i). The limit is the smallest solution to the recursive domain definition.

We illustrate the limit construction (see Figure N.5) with three examples.

Figure N.5: Limit Construction

D₀ = null

D_i+1 = e[D:D_i] for i=0,...

D = lim_{i --> infty} D_i

The Natural Numbers

A representation of the natural numbers given earlier in the text was: N ::= 0 | S(N)

The defining sequence for the natural numbers is:

N₀ = Null

N_i+1 = 0 | S(N_i) for i = 0,...

The definition results in the following:

N₀ = Null

N₁ = 0

N₂ = 0 | S(0)

N₃ = 0 | S(0) | S(S(0))

N₄ = 0 | S(0)| S(S(0)) | S(S(S(0)))

...

The factorial function

For functions, Null can be replaced with _|_ which means undefined.

The factorial function is often recursively defined as:

fac(n) = {	1	if n = 0
	n × fac(n-1)	otherwise

The factorial function is approximated by a sequence of functions where the function fac₀ is defined as fac₀(n) = _|_

And the function fac_i+1 is defined as

fac_i+1(n) = {	1	if n = 0
	n × fac_i(n-1)	otherwise

Writing the functions as sets of ordered pairs helps us to understand the limit construction.

fac₀ = { }

fac₁ = { (0,1) }

fac₂ = { (0,1), (1,1) }

fac₃ = { (0,1), (1,1), (2,2) }

fac₄ = { (0,1), (1,1), (2,2), (3,6) }

...

Note that each function in the sequence includes the previously defined function and the sequence suggests that fac = lim_{i --> infty} fac_i

The proof of this last equation is beyond the scope of this text. This construction suggests that recursive definitions can be understood in terms of a family of non-recursive definitions and in format common to each member of the family.

Ancestors

For logical predicates, Null can be replaced with false. A recursive definition of the ancestor relation is:

ancestor(A,D), if	parent(A,D) or
	parent(A,I) & ancestor(I,D)

The ancestor relation is approximated by a sequence of relations: ancestor₀(A,D) = false

And the relation ancestor_i is defined as

ancestor_i+1(A,D), if	parent(A,D) or
	parent(A,I) & ancestor_i(I,D)

Writing the relations as sets of order pairs helps us to understand the limit construction. An example will help. Suppose we have the following:

parent( John, Mary )

parent( Mary, James )

parent( James, Alice )

then we have:

ancestor₀ = { }

ancestor₁ = { (John, Mary), (Mary, James), (James, Alice)} }

ancestor₂ = ancestor₁ union {(John, James), (Mary, Alice)}

ancestor₃ = ancestor₂ union { (John,Alice) }

Again note that each predicate in the sequence includes the previously defined predicate and the sequence suggests that ancestor = lim_{i --> infty} ancestor_i

Linear Search

The final example of domain construction is a recursive variant of linear search.

Loop : if i < n --> if a[i] != target --> i := i + 1; Loop
                    fi
       fi

Loop₀ is defined as: Loop₀ = _|_

and Loop_i+1 is defined as:

Loop_i+1 : if i < n --> if a[i] != target --> i := i + 1; Loop_i                      fi
         fi

with the result of unrolling the recursion into a sequence of if-commands.

Implementation

Since recursively defined domains like lists, stacks and trees are unbounded (in general may be infinite objects) they are implemented using a product domain where one domain is a node and one or more are address domains. In Pascal, Ada and C such domains are defined in terms of pointers while Prolog and functional languages like ML and Miranda allow recursive types to be defined directly.

Type Systems

A large percentage of errors in programs is due to the application of operations to objects of incompatible types. Type systems have been developed to assist the programmer in the detection of these errors. A type system is a set of rules for defining types and associating a type with expression in the language. A type system rejects an expression if it does not associate a type with the expression. Type checking may performed at compile time or run time or both.

Definition N.M: Type System
A type system is a set of rules for defining types and associating a type with expression in the language. A type system rejects an expression if it does not associate a type with the expression.

If the errors are to be detected at compile time then a static type checking system is required. One approach to static type checking is to require the programmer to specify the type of each object in the program. This permits the compiler to perform type checking before the execution of the program and this is the approach taken by languages like Pascal, Ada, C++, and Java. Another approach to static type checking is to add type inference capabilities to the compiler. In such a system, the compiler performs type checking by means of a set of type inference rules and is able to flag type errors prior to runtime. This is the approach taken by Miranda and Haskell.

If the error detection is to be delayed until execution time, then dynamic type checking is required. In dynamic type checking, each data value is tagged with type information so that the run time environment can check type compatibility and possibly perform type conversions if necessary. The programming languages Lisp, Scheme and Small-talk are examples of dynamically typed languages

Type Checking

Machine operations manipulate bit patterns. Whether a bit pattern represents a character, an integer, a real, an address, or an instruction, any machine operation may be applied to any data item. There is no type checking at the assembly language level. Languages which permit operations to be applied to data of any type are called untyped. Prolog is one of the few high-level languages that is an untyped language. In Prolog, lists can consist of elements of any type and different sorts of values may be compared with the equality relation `=', but such comparison will always yield false.

Example. In C, the decision portion of any control structure can be any expression that produces a value. If the value is 0, it is treated as false and any nonzero values is treated as true. Since the value of an assignment command is the value of its right-hand side, the command if x = 4 ... any else clause will be ignored. In C characters are treated as integers and thus may occur in arithmetic expressions. C's type system is not robust enough to protect novice programmers these and other errors.

The advantage of untyped languages is their flexibility. The programmer has complete control over how a data value is used but must assume full responsibility for detecting the application of operations to objects of incompatible type.

Figure N.M: Type Checking Definitions
A language is said to be

untyped if no type abstractions are inforced,

strongly typed if it enforces type abstractions (operations may be applied only to objects of the appropriate type),

statically typed if the type of each expression can be determined from the program text,

dynamically typed if the determination of the type of some expression depends on the runtime behavior of the program.

A strongly typed language enforces type abstractions. Most languages are strongly typed with respect to the primitive types supported by the language. So, for example, the mixing of numeric and character types that is permissible in C is not permitted in Pascal or Ada.

Strong typing helps to insure the security and portability of the code and it often requires the programmer to explicitly define the types of each object in a program. It is also important in compilation for picking appropriate operations and for optimization.

If the types of all variables can be known from an examination of the text (i.e. at compile time), then the a language is said to be statically typed. Pascal, Ada, and Haskell are examples of statically typed languages.

Static typing is widely recognized as a requirement for the production of safe and reliable software. Static type checking implies that the types are checked at compile time. Static typing is chosen when efficiency in execution time is important and compiler support is used to support good software engineering practices.

If the type of a variable can only be known at run-time, then the language is said to be dynamically typed. Lisp and Smalltalk are examples of dynamically typed languages.

Dynamic type checking implies that the types are checked at execution time and that every value is tagged to identify its type in order to make the type checking possible. The penalty for dynamic type checking is additional space and time overheads.

Dynamic typing is often justified on the assumption that its flexibility permits the rapid prototyping of software.

Prolog relies on pattern matching to provide a semblance of type checking. There is active research on adapting type checking systems for Prolog.

Modern functional programming languages such as Miranda and Haskell and object-oriented languages combine the safety of static type checking with the flexibility of dynamic type checking through polymorphic types.

Type Equivalence

Two unnamed types (sets of objects) are the same if they contain the same elements. The same cannot be said of named types for if they were, then there would be no need for the disjoint union type. When types are named, there are two major approaches to determining whether two types are equal.

Name Equivalence

In name equivalence two types are the same if they have the same name. Types that are given different names are treated as distinct and cannot be accidentally mixed just because their structure happens to be the same. Name equivalence requires type definitions to be global.

Name equivalence was chosen for Modula-2, Ada, C (for records), and Miranda. The predecessor of Modula-2, Pascal violates name equivalence since file type names are not required to be shared by different programs accessing the same file.

Structural Equivalence

In structural equivalence, the names of the types are ignored and the elements of the types are compared for equality. It is possible that two logically different types may turn out to be the same by coincidence and may be mixed. Type definitions are not required to be global. Structural equivalence is important in programming distributed systems, in which separate programs must communicate typed data.

Definition N.1:
Two types T, T' are name equivalent iff T and T' are the same name.
Two types T, T' are structurally equivalent iff T and T' have the same set of values.

The following three rules may be used to determine if two types are structurally equivalent.

A type name is structurally equivalent to its self.
Two types are structurally equivalent if they are formed by applying the same type constructor (recursively) to structurally equivalent types.
After a type declaration, type n = T, the type name n is structurally equivalent to T.

Structural equivalence was chosen by Algol-68 and C (except for records) because it is easy to implement.

Type Inference

Type inference is the general problem of transforming untyped or partially typed syntax into well-typed terms. Pascal constant declarations are an example of type inference, the type of the name is inferred from the type of the constant. In Pascal's for loop the type of the loop index can be inferred from the types of the loop limits and thus the loop index should be a variable local to the loop. The programming languages Miranda and Haskell are statically types and provide powerful type inference systems so that a programmer need not declare any types. The languages also permit programmers to provide explicit type specifications.

A type checker must be able to

determine if a program is well typed and
if the program is well typed, determine the type of any expression in the program.

Type inference axioms

Axiom
given that: f is of type A --> B and x is of type A
infer that: f(x) is type correct and has type B

Type declarations

Even languages that provide a type inference system permit programmers to make explicit declarations of type. Even if the compiler can correctly infer types, human readers may have to scan several pages of code to determine the type of a function. Slight errors by the programmer can cause the compiler to emit obscure error messages or to infer a different type than intended. For these reasons it is good programming practice to explicitly state types on all but the most obvious cases.

Examples. In Miranda (a functional language) the types for the arithmetic + operation are declared as follows: + :: num --> num --> num
In Pascal the type of a function for computing the circumference of a circle is declared as follows:
function circumference( radius : real ) : real;

Polymorphism

A type system is monomorphic if each constant, variable, parameter, and function result has a unique type. Type checking in a monomorphic system is straightforward. But purely monomorphic type systems are unsatisfactory for writing reusable software. Many algorithms such as sorting and list and tree manipulation routines are generic in the sense that they depend very little on the type of the values being manipulated. For example, a general purpose array sorting routine cannot be written in Pascal. Pascal requires that the element type of the array be part of the declaration of the routine. This means that different sorting routines must be written for each element type and array size.

Completely monomorphic systems are rare. Most programming languages contain some operators or procedures which permit arguments of more than one type. For example, Pascal's input and output procedures permit variation both in type and in number of arguments. This is an example of overloading.

Definition N.2:

Monomorphism: every constant, variable, parameter, operator and function has a unique type.

Overloading refers to the use of a single syntactic identifier to refer to several different operations discriminated by the type and number of the arguments to the operation.

Polymorphism: an operator, function or procedure that has a family of related types and operates uniformly on its arguments regardless of type.

A polymorphic operation is one that can be applied to different but related types of arguments.

The type of the plus operation defined for integer addition is

+: int × int --> int

When the same operation symbol is used for the plus operation for rational numbers and for set union, the symbol as in Pascal it is overloaded. Most programming languages provide for the overloading of the arithmetic operators. A few programming languages (Ada among others) provide for programmer defined overloading of both built-in and programmer defined operators.

When overloaded operators are applied to mixed expressions such as plus to an integer and a rational number there are two possible choices. Either the evaluation of the expression fails or one or more of the subexpressions are coerced into a corresponding object of another type. Integers are often coerced into the corresponding rational number. This type of coercion is called widening. When a language permits the coercion of a real number into an integer (by truncation for example) the coercion is called narrowing. Narrowing is not usually permitted in a programming language since information is usually lost. Coercion is an issue in programming languages because numbers do not have a uniform representation. This type of overloading is called context-dependent overloading.

Many languages provide type transfer functions so that the programmer can control where and when the type coercion is performed. Truncate and round are examples of type transfer functions.

Overloading is sometimes called ad-hoc polymorphism.

Most sorting algorithms can be explained without referring to the kind (type) of data being sorted. Typically, the data is an array of pointers to records each with an associated key. The type of the key does not matter as long as there is a "comparison" procedure which finds the minimum of a pair of keys. The sorting procedures use the compare two keys using the comparison procedure and swap the records by resetting pointers accordingly. However, in a strongly typed language this is not possible since the pointer type depends on the record type. This forces us to write a separate procedure for each type of data.

Stacks, queues, lists and trees also are largely type independent and yet in a strongly typed language, separate code must be written for each element type. Some languages permit type variables and these data structures can be defined with a type variable which then allows the user

A type system is polymorphic if abstractions operate uniformly on arguments of a family of related types.

This type of polymorphism is sometimes called parametric polymorphism.

Generalization can be applied to may aspects of programming languages.

Sometimes there are several domains which share a common operation. For example, the natural numbers, the integers, the rationals, and the reals all share the operation of addition. So, most programming languages use the same addition operator to denote addition in all of these domains. Pascal extends the use of the addition operator to represent set union. The multiple use of a name in different domains is called overloading. Ada permits user defined overloading of built in operators.

Prolog permits the programmer to use the same functor name for predicates of different arity thus permitting the overloading of functor names. This is an example of data generalization or polymorphism.

While the parameterization of an object gives the ability to deal with more than one particular object, polymorphism is the ability of an operation to deal with objects of more than a single type.

Generalization of control has focused on advanced control structures (RAM): iterators, generators, backtracking, exception handling, coroutines, and parallel execution (processes).

Type Completeness

Principle of Type Completeness No operation should be arbitrarily restricted in the types of the values involved.

Pragmatics

type declarations: spelling, type checking, type inference vs type declaration

Historical Perspectives and Further Reading

Declarations

Constants

literals

User Defined Types

Miranda's universe of values is numbers, characters and boolean values while Pascal provides boolean, integer, real, and char.

Declarations of variables of the primitive types have the following form in the imperative languages.

I : T;    { Modula-2: item I of type T}
T  I;     // C++: item I of type T

Declarations of enumeration types involve listing of the values in the type.

Here are the enumerations of the items I₁,...,I_n of type T.

T = {I₁,...,I_n};    { Modula-2}\\
enum T {I₁,...,I_n}; // C++ \\
T ::= I₁ | ... | I_n  || Miranda

Modula-2, Ada, C++, Prolog, Scheme, Miranda -- list primitive types Haskell provides the built in functions fst and snd to extract the first and second elements from binary tuples.

Imperative languages require that the elements of a tuple be named. Modula-2 is typical; product domains are defined by record types:

record
  I₁ : T₁;
  ...
  I_n : T_n;
end

The I_is are names for the component of the tuple. The individual components of the record are accessed by the use of qualified names. for example, if MyRec is a element of the above type, then the first component is referenced by MyRec.I₁ and the last component is referenced by MyRec.I_n.

C and C++ calls a product domain a structure and uses the following type declaration:

struct name {
  T₁ I₁;
  ...
  T_n : I_n;
};

The I_is are names for the entries in the tuple.

Prolog does not require type declaration and elements of a product domain may be represented in a number of ways, one way is by a term of the form:

name(I₁,...I_n)

The I_is are the entries in the tuple. The entries are accessed by pattern matching. Miranda does not require type declaration and the elements of a product domain are represented by tuples.

(I₁,...I_n)

The I_is are the entries in the tuple.

Here is an example of a variant record in Pascal.

% From condensed pascal
type Shape = (Square, Rectangle, Rhomboid, Trapezoid, Parallelogram);
   Dimensions = record
      case WhatShape : Shape of
         Square : (Side1: real);
         Rectangle : (Length, Width : real);
         Rhomboid : (Side2: real; AcuteAngle: 0..360);
         Trapezoid : (Top1, Bottom, Height: real);
         Parallelogram : (\=Top2, Side3: real;
             ObtuseAngle: 0..360)
      end;
var FourSidedObject : Dimensions;

The initialization of the record should follow the sequence of assigning a value to the tag and then to the appropriate subfields.

FourSidedObject.WhatShape := Rectangle;
FourSidedObject.Length := 4.3;
FourSidedObject.Width := 7.5;

The corresponding definition in Miranda is

Dimensions :: Square num |
   Rectangle num num | 
   Rhomboid num num |
   Trapezoid num num num | 
   Parallelogram num num num

area Square S = S*S
area Rectangle L W = L * W 
...

Modula-2, Ada, C++, Prolog, Scheme, Miranda Disjoint union values are implemented by allocating storage based on the largest possible value and additional storage for the tag.

Modula-2

array[domain_type] of range_type {Modula-2}
range_type identifier [natural number] // C++

Prolog and Miranda do not provide for an array type and while Scheme does, it is not a part of the purely functional part of Scheme. Modula-2, Ada, C++, Prolog, Scheme, Miranda -- mapping type In Pascal the notation [i..j] indicates the subset of an ordinal type from element i to element j inclusive. In addition to subranges, Miranda provides infinite lists [i..] and finite and infinite arithmetic series [a,b..c], [a,b..] (the interval is (b-a)). Miranda also provides list comprehensions which are used to construct lists (sets). A list comprehension has the form [exp | qualifier]

sqs = [ n*n | n <-[1..] ]

factors n = [ r | r <-[1..n div 2]; n mod r = 0 ]

knights_moves [i,j] = [ [i+a,j+b] | a,b <-[-2..2]; a\verb+^+2+\verb+^+2=5 ]

Modula-2, Ada, C++, Prolog, Scheme, Miranda -- power set Prolog

[]
[I₀,...I_n]
[H | T]

The Miranda syntax for lists is similar to that of Prolog however, elements of lists must be all of the same type.

[*]
[I_0,...I_n]
[H | T]

Recursive types in imperative programming languages are usually defined using a pointer type. Pointer types are an additional primitive type. Pointers are addresses.

{Modula-2: the pointer and the list}
type NextItem = \verb+^+ListType 
   ListType = record 
     item : Itemtype;
     next : NextItem
   end;

// C++: the list type
struct list {
  ItemType Item;
  list* Next; // pointer to list 
};

|| Miranda: list of objects of type T and 
|| a binary tree of type T
[T] 
tree ::= Niltree | Node T tree tree

Referencing/Dereferencing

type ListType = record
      item : Itemtype;
      next : ListType
   end;

Recursive values are implemented using pointers. The run-time support system for the functional and logic programming languages, provides for automatic allocation and recovery of storage (garbage collection). The alternative is for the language to provide access to the run-time system so that the programmer can explicitly allocate and recover the linked structures.

Variables

\marginpar{state:store} It is frequently necessary to refer to an arbitrary element of a type. Such a reference is provided through the use of variables. A variable is a name for an arbitrary element of a type and it is a generalization of a value since it can be the name of any element.

Functions and Procedures

Persistent Types

A persistent variable is one whose lifetime transcends an activation of any particular program. In contrast, a transient variable is one whose lifetime is bounded by the activation of the program that created it. Local and heap variables are transient variables.

Most programming languages provide different types for persistent and transient variables. Typically files are the only type of persistent variable permitted in a programming language. Files are not usually permitted to be transient variables.

Most programming languages provide different types for persistent and transient The type completeness principle suggests that all the types of the programming language should be allowed both for transient variables and persistent variables. A language applying this principle would be simplified by having no special types, commands or procedures for input/output. The programmer would be spared the effort of converting data from a persistent data type to a transient data type on input and vice versa on output.

A persistent variable of array type corresponds to a direct file. If heap variables were persistent then the storage of arbitrary data structures would be possible.

Exercises

Extend the compiler to handle constant, type, variable, function and procedure definitions and references to the same.
Static and dynamic scope
Define algebraic semantics for the following data types.

Boolean

ADT Boolean
Operations
  and(boolean,boolean) --> boolean
  or(boolean,boolean) --> boolean
  not(boolean) --> boolean
Semantic Equations
  and(true,true) = true
  or(true,true) = true
  not(true) = false
  not(false) = true
Restrictions

Integer
Real
Character
String

Name or Structure equivalence (type checking)
Algebraic Semantics: stack, tree, queue, grade book etc.
Abstraction
Generalization
Name or Structure equivalence (type checking)
Extend the compiler to handle additional types. This requires modifications to the syntax of the language with extensions of the scanner, parser, symbol table and code generators.

Author: A. Aaby