Concurrent Programming
The root of all successful human organization is co-operation not competition.
Concurrent programming is characterized by programming with more than
one process.
Keywords and phrases Pipelines, parallel processes, message passing,
monitors, concurrent programming, safety, liveness, deadlock, live-lock,
fairness, communication, synchronization producer-consumer, dining philosophers.
There are several reasons for a programmer to be interested in concurrency:
-
To better understand computer architecture (it has a great deal of
concurrency with pipelining (multiple steps) and super-scalar (multiple
instructions)) and
-
compiler design,
-
some problems are most naturally solved by using a set of co-operating
processes,
-
A sequential solution constitutes over specification, and
-
to reduce the execution time.
At the machine level, operations are sequential, if they occur one
after the other, ordered in time. Operations are concurrent, if
they overlap in time. In Figure 1, sequential operations are
connected by a single thread of control while concurrent operations have
multiple threads of control.
Figure 1: Sequential and Concurrent Operations
Sequential operations:
|
--O-O-O-O-->
|
O: operation |
Concurrent operations: |
-O-O-
--| |-->
-O-O-
|
-: thread |
Operations in the source text of a program are concurrent if they
could be, but need not be, executed in parallel. Thus concurrency occurs
in a programming language when two or more operations could be but need
not be executed in parallel. In Figure 2a the second assignment
depends on the outcome of the first assignment while in Figure 2b
neither assignment depends on the other and may be executed concurrently.
Figure 2: Sequential and Concurrent Code
a. not concurrent |
b. concurrent |
X := 5;
Y := 3*X + 4 |
X := A*B + C;
Y := 3*A + 7; |
Concurrent programming involves the notations for expressing potential
parallelism so that operations may be executed in parallel and the techniques
for solving the resulting synchronization and communication problems. Notations
for explicit concurrency are a program structuring technique while parallelism
is mode of execution provided by the underlying hardware. Thus we can have
parallel execution without explicit concurrency in the language. We can
have concurrency in a language without parallel execution. This is the
case when a program (with or without explicit concurrent sections) is executed
on a single processor. In this case, the program is executed by interleaving
executions of the concurrent operations in the source text.
Aside. The terms, concurrent, distributed and
parallel have a been used at various times to describe various types
of concurrent programming. Multiple processors and disjoint or shared store
are implementation concepts and are not important from the programming
language point of view. What matters is the notation used to indicate concurrent
execution, communication and synchronization.
Functional and logic programming languages do not necessarily need explicit
specification of concurrency and, with a parallelizing compiler, may be
executed on parallel hardware. It is important to note that the notion
of processes is orthogonal to that of inference, functions and assignments.
The two fundamental concepts in concurrent programming are processes and
resources. A process corresponds to a sequential computation with
its own thread of control. Concurrent programs are distinguished
from sequential programs in that, unlike sequential programs, concurrent
programs permit multiple processes. Processes may share resources. Shared
resources include program resources -- data structures and hardware
resources -- CPU, memory, & I/O devices.
Aside. Processes which share an address space are called
threads or light-weight processes. For some programming languages
(C, C++) there are threads packages to permit concurrent programming. In
other cases, the operating system (Microsoft Windows NT, Sun Solaris) provides
system calls for threads. Processes which do not share an address space
are called heavy-weight processes. The Unix family of operating
systems provide a system call to allow programmers to create heavy-weight
processes.
The Concurrent Nature of Systems
Co-operation
The Bakery. busy waiting, fairness, liveness
The Secretary. scheduling, priority, co-operative multitasking,
interrupts, competitive multitasking, pre-emptive multitasking
The Secretarial Pool parallel tasking
Geoffrey Fox's wall. the construction of a brick wall by a number
of workers.
Questions:
-
How do we break down the task to extract maximum parallelism?
-
Wow do we get the task done in the shortest possible time with a given
number of workers.
-
What is the minimum amount of supervision needed?
-
Can all workers be kept equally busy?
-
Does the task demand specialized workers?
-
Can we maintain efficiency as either the size of the problem or the number
of workers grows?
The Nature of Concurrent Systems
Abstraction
Performance
Communication
In the previous solution, it was assumed that the processes shared the
address space and that synchronization was achieved by the use of monitor
and condition queues. If the address spaces are disjoint, then both communication
and synchronization must be achieved through message passing. There are
two choices, message passing can be synchronous or asynchronous. When message
passing is asynchronous, synchronization can be obtained by requiring a
reply to a synchronizing message. In the examples that follow, synchronized
message passing is assumed.
Behavior
Synchronization and Communication
Two processes are said to communicate if an action of one process must
entirely precede an action of a second process. Synchronization
is related to communication.
Live-lock may result if there are more than one waiting process and
when the signal is received access is not granted fairly.
Starvation: (live-lock) multiple processes waiting for access but access
is not provided in a fair manner
Coroutines.
Real-time Programming language issues
When message passing is asynchronous, synchronization can be obtained
by requiring a reply to a synchronizing message. In the examples that follow,
synchronized message passing is assumed.
Communication commands in the guards. Most communication based programming
languages permit input commands in the guards but not output commands.
The asymmetry is due to the resulting complexity required to implement
output commands in the guards.
process Q;
const qsize = 10;
var head, tail : integer;
queue : array[0..qsize-1] of integer;
begin
head,tail := 0,0;
*[ head != tail, C?X --> C!queue[head]; head := (head + 1) mod qsize
[] head != (tail+1) mod qsize, P?X --> queue[tail],tail := X, (tail + 1) mod qsize]
end;
process P;
begin
*[ true --> produce(X); Q!X]
end;
process C;
begin
*[ true --> Q!X, Q?X; consume(X)]
end;
begin
[ P || C || Q ]
end.
Nondeterminism
A program is deterministic if its evaluations on the same input
it always produce the same output. The evaluation strategy might not always
be unique.
A program is nondeterministic if it has more than one allowable
evaluation strategy and different evaluation strategies lead to different
results.
A concept related to nondeterminism is parallel evaluation Parallel
evaluation that does not involve interaction on the part of its subparts
is called noninterfering parallelism. Processes which have disjoint
address spaces cannot interfere with each other and thus can operate without
fear of corrupting each other. For example, the two processes in
[|| i:=1, j:=2]
do not share an address space therefore, the assignments may take place
in parallel.
Another example of non-interfering processes is found in matrix multiplication.
When two matrices are multiplied, each entry in the product matrix is the
result of multiplying a row times a column and summing the products. This
is called an inner product. Each inner produce can be computed independently
of the others. Figure~\ref{cp:mm}
Figure M.N: Matrix Multiplication
# multiply n by n matrices a and b in parallel
# place result in matrix c
# all matrices are global to multiply
process multiply( i := 1 to n, := 1 to n)
var inner_prod := 0
fa k := 1 to n ->
inner_prod := inner_prod + a[i,k]*b[k,j]
af
c[i,j] := inner_prod
end
|
is an example of a matrix multiplication routine written in the SR
programming language. This particular example also illustrated dynamic
process creation in that {\tt n$^2$} processes are created to perform the
multiplication.
In interfering parallelism, there is interaction and the relative speeds
of the subparts can affect the final result.
Processes that access a common address space may interfere with each
other. In this program,
[i:=1 || i:=2]
the resulting value of $i$ could be either 1 or 2 depending on which process
executed last and in this program,
[i:=0;i:=i+1 || i:=2]
the resulting value of $i$ could be either 1, 2 or 3.
A language is concurrent if it uses interfering parallelism.
Sequential programs are nearly always deterministic. A deterministic
program follows a sequence of step that can be predicted in advance. Its
behavior is reproducible and thus, deterministic programs are testable.
Concurrent programs are likely to be nondeterministic because the order
and speed of execution of the processes is unpredictable. This makes testing
of concurrent programs a difficult task.
The requirement for disjoint address space may be too severe a requirement.
What is required is that shared resources may need to be protected so that
only one process is permitted access to the resourse at a time. This permits
processes to cooperate, sharing the resource but maintaining the integrity
of the resource.
Mutual Exclusion
Often a process must have exclusive access to a resource. For example,
when a process is updating a data structure, no other process should have
access to the same data structure otherwise the accuracy of the data may
be in doubt. The necessity to restrict access is termed mutual exclusion
and involves the following:
-
At most one process has access
-
If there are multiple requests for a resource, it must be granted to one
of the processes in finite time.
-
When a process has exclusive access to a shared resource it release it
in finite time.
-
When a process requests a resource it must obtain the resource in finite
time.
-
A process should not consume processing time while waiting for a resource.
There are several solutions to the mutual exclusion problem. Among the
solutions are semaphores, critical regions and monitors.
Deadlock
Deadlock is a liveness problem; it is a situation in which a set
of processes are prevented from making any further progress by their mutually
incompatible demands for additional resources. For example, in the dining
philosophers problem, deadlock occurs if each philosopher picks up his/her
left fork. No philosopher can make further progress.
Deadlock can occur in a system of processes and resources if, and only
if, the following conditions all hold together.
-
Mutual exclusion: processes have exclusive access to the resources.
-
Wait and hold: processes continue to hold a resource while waiting
for a new resource request to be granted.
-
No preemption: resources cannot be removed from a process.
-
Circular wait: there is a cycle of processes, each is awaiting a
resource held by the next process in the cycle.
There are several approaches to the problem of deadlock. A common approach
is to ignore deadlock and hope that it will not happen. If deadlock
occurs, (much as when a program enters an infinite loop) the system's operators
abort the program. This is not an adequate solution in highly concurrent
systems where reliability is required.
A second approach is to allow deadlocks to occur but detect and
recover automatically. Once deadlock is detected, processes are
selectively aborted or one or more processes are rolled back to
an earlier state and temporarily suspended until the danger point is passed.
This might not an acceptable solution in real-time systems.
A third approach is to prevent deadlock by weakening one or more
of the conditions. The wait-and-hold condition may be modified to require
a process to request all needed resources at one time. The circular-wait
condition may be modified by imposing a total ordering on resources and
insisting that they be requested in that order.
Another example of a liveness problem is live-lock (or lockout
or starvation). Live-lock occurs when a process is prevented from making
progress (other processes are running). This is an issue of fairness.
Scheduling
When there are active requests for a resource there must be a mechanism
for granting the requests. Often a solution is to grant access on a first-come,
first-served basis. This may not always be desirable since there may be
processes whose progress is more important. Such processes may be given
a higher priority and their requests are processed first. When processes
are prioritized, some processes may be prevented from making progress (such
a process is live-locked). A fair scheduler insures that
all processes eventually make progress thus preventing live-lock.
Semantics
Parallel processes must be... \begin{enumerate}
Synchronization-coordination of tasks which are not completely independent.
Communication-exchange of information
Scheduling-priority,
Nondeterminism-arbitrary selection of execution path \end{enumerate} Explicit
Parallelism (message passing, semaphores, monitors) Languages which have
been designed for concurrent execution include Concurrent Pascal, Ada and
Occam. Application areas are typically operating systems and distributed
processing. Ensemble activity
Concurrency in Programming Languages
Threads/Communication/Metaphor
From the programmer's point of view, concurrent programming notations
allow programs to be structured as a set of possibly interactive processes.
Such an organization is particularly useful for operating systems, real-time
control systems, simulation studies, and combinatorial search applications.
To permit the effective use of multiple processes, concurrent programming
languages must provide notations for:
-
Concurrent execution: A notation that denotes operations that could
be, but need not be, executed in parallel.
PCN Occam
[|| P1, P2, ..., Pn] PAR
P1
...
Pn
-
Communication: A notation that permits processes to exchange information
either through shared variables (visible to each process) or a message
passing mechanism.
-
Shared Memory
-
Assignment: X := E
-
Message Passing
-
Synchronous Pi!E, Pj?X
-
Asynchronous Pi!E, Pj?X
-
Remote procedure call
-
Synchronization: A notation to require a process to wait
for a signal from another process. In general processes are not
independent. Often a process depends on data produced by another process.
If the data is not available the process must wait until the data is available.
wait(Pi), signal(Pj)
A process can change its state to Blocked (waiting for some condition
to change) and can signal Blocked processes so that they can continue.
In this case, the OS must provide the system calls BLOCK and WAKEUP.
cking version of a semaphore
type semaphore = record
value : integer;
L : list of processes; // or queue blocked waiting for
end; // the signal
down(S): S.value := S.value - 1; // wait
if S.value < 0 then
add this process to S.L;
block;
end;
up(S): S.value := S.value + 1; // signal
if S.value <= 0 then
remove a process P from S.L;
wakeup(P);
end;
Implementation
-
Single processor: The normal way is to implement the semaphore operations
(up and down) as system calls with the OS disabling the interrupts while
executing the code.
-
Multiprocessor: Each semaphore should be protected by a lock variable,
with the TSL instruction used to be sure that only one CPU at a time examines
the semaphore. Using the TSL instruction to prevent several CPUs from accessing
the semaphore at the same time is different from busy waiting.
In many applications it is necessary to order the actions of a set of processes
as well as interleave their access to shared resources. common address
space, critical section protected by a monitor, synchronization provided
through wait and signal.
Some alternative synchronization primitives are
-
Semaphores
-
Critical Regions
-
Monitors
-
Synchronized Message Passing
-
Mutual exclusion: A notation to synchronize access to shared resources.
-
semaphores
-
Monitors: One approach is to protect the critical section by a monitor.
The monitor approach requires that only one process at a time may execute
in the monitor.
monitor Queue_ADT
const qsize = 10;
var head, tail : integer;
queue : array[0..qsize-1] of integer;
notempty, notfull : condition;
procedure enqueue (x : integer);
begin
[ head=(tail+1) mod qsize --> wait(notfull)
[] head!=(tail+1) mod qsize --> skip];
queue[tail],tail := x, (tail + 1) mod qsize
signal(notempty)
end;
procedure dequeue (var x : integer);
begin
[ head=tail --> wait(notempty)
[] head!=tail --> skip];
x,head := queue[head],(head + 1) mod qsize;
signal(notfull)
end;
begin
head,tail := 0,0;
end;
begin
[ produce(x); enqueue(x) || dequeue(y); consume(y) || dequeue(y); consume(y)]
end.
-
Correctness (safety and liveness)
-
Performance
-
Architecture
-
Implementation
-
Aside.
-
concurrency: Fork (P) & Join (P)
combined notation for communication and synchronization C, Scheme, Ada,
PVM, PCN, SR, Java and Occam are just some of the programming languages
that provide for processes.
Producer-Consumer
In the following program there is a producer and a consumer process. The
producer process adds items to the queue and the consumer process removes
items from the queue. The safety condition that must be satisfied is that
the head and tail of the queue must not over run each other. The liveness
condition that must be satisfied is that when the queue contains an item,
the consumer process must be able to access the queue and when the queue
contains space for another item, the producer process must be able to access
the queue.
const qsize = 10;
var count:integer;
queue : array[0..qsize-1] of integer;
procedure enqueue (x : integer);
begin
*[ head=(tail+1) mod qsize --> skip];
queue[tail],tail := x, (tail + 1) mod qsize
end;
procedure dequeue (var x : integer);
begin
*[ head=tail --> skip];
x,head := queue[head],(head + 1) mod qsize
end;
begin
head,tail := 0,0;
[ *[produce(x); enqueue(x)] || *[dequeue(y); consume(y)]]
end.
Since the processes access different portions of the queue and test for
the presence or absence of items in the queue before accessing the queue,
the desired safety properties are satisfied. Note however, that busy waiting
is involved.
Shared Memory Model
Process Creation
Process Identification
Synchronization
In many applications it is necessary to order the actions of a set of processes
as well as interleave their access to shared resources. common address
space, critical section protected by a monitor, synchronization provided
through wait and signal.
Some alternative synchronization primitives are
-
Semaphores
-
Critical Regions
-
Monitors
-
Synchronized Message Passing
If in the previous example another process where to be added, either a
producer or a consumer process, an unsafe condition could result. Two processes
could compete for access to the same item in the queue. The solution is
to permit only one process at a time to access the enqueue or dequeue routines.
One approach is to protect the critical section by a monitor. The monitor
approach requires that only one process at a time may execute in the monitor.
The following monitor solution is incorrect.
monitor Queue_ADT
const qsize = 10;
var count:integer;
queue : array[0..qsize-1] of integer;
procedure enqueue (x : integer);
begin
*[ head=(tail+1) mod qsize -> skip];
queue[tail],tail := x, (tail + 1) mod qsize
end;
procedure dequeue (var x : integer);
beg\=in
*[ head=tail -> skip];
x,head := queue[head],(head + 1) mod qsize
end;
begin
head,tail := 0,0;
end;
begin
[ produce(x); enqueue(x) $\parallel$ dequeue(y); consume(y) $\parallel$ dequeue(y); consume(y)]
end.
Note that busy waiting is still involved and further once a process is
in the monitor and is waiting, no other process can get in and the program
is {\it deadlocked}.
Message Passing Model
Process Creation
Process Identification
Message Passing
Data Flow
-
Unidirectional
-
Bidirectional
MPI
Hardware
Processes
-
Process: single flow of control through a set of instructions
-
Processor: hardware device for executing
-
Parallel computer: two or more processors connected through an interconnection
network.
Flynn's Taxonomy
System classification by number of instruction and data streams.
-
SISD: classical sequential von Neumann machine. Inherently sequential.
Parallelism may be simulated by interleaving instructions & multiprogramming.
-
Pipelining and vector architectures
-
SIMD: synchronous since there is a single instruction stream, each processor
has its own data stream. Matrix operations are a good example.
Thinking Machines - CM, Maspar Computer Corp -- MP (single sequencing units)
-
MISD: does not seem to be useful
-
MIMD/SPMD: asynchronous processes but with occasional pauses to synchronize;
Intel iPSC, nCUBE, Sequent Symmetry, SGI Onyx, SUN MP system
-
shared-memory (sometimes called multiprocessors) locking and protection
mechanism
-
distributed-memory (sometimes called multicomputers) message passing
Shared-Memory MIMD
-
Bus-based architectures
-
Cache coherence -- for bus based systems use the snoopy protocol
-
Switch-based architectures, crossbar switch
-
NUMA - nonuniform memory access
Distributed-Memory MIMD
-
Dynamic interconnection networks
-
Static interconnection networks
-
linear array, 2D mesh, 3D mesh
-
ring, torus
-
hypercube
-
bus
Communication and routing
-
routing
-
store-and-forward
cut-through
The Engineering of Concurrent Programs
A parallel programming environment must support the following three phases
of system behavior specification.
-
Programming Behavior of processes and their interconnection
-
Network description Processors and their interconnection
-
Configuration Mapping of software onto hardware
Programming
The way to design parallel software is to begin with the most
parallel algorithm possible and then gradually render it more sequential
... until it suits the machine on which it is to run.
East (1995)
Chandy and Taylor (1992) define an elegant parallel programming
language PCN (Program Composition Notation) based on:
-
Shared definition variables (single assignment) -- X=Exp,
-
Parallel composition -- [||P0,...,Pn],
-
Choice composition -- [? G0 -> P0,..., Gn
-> Pn],
-
Sequential composition -- [; S0,...,Sn], and
-
Recursion -- name(parameters) composition expression
The definition variable eliminates the indeterminacy problems. Communication
is through shared variables which may be streams. Synchronization is achieved
by requiring a process that references an undefined variable to wait until
it is defined by some other process before continuing. Recursion with parallel
composition permits dynamic process creation.
If a program that uses only parallel and choice composition
and definition variables does not have adequate efficiency, ...
We use the following steps in the introduction of mutables and sequencing
into a parallel block.
-
We order the statements in a parallel block so that all variables that
appear on the right-hand sides of definition statements reduce to ground
values or tuples, and all guards reduce to the ground values true
or false, give only the definitions established by statements earlier
in the ordering. In other words, we order statements in the direction of
data flow; statements that write a variable appear earlier than statements
that read that variable. Then we convert the parallel block into a sequential
block by replacing "||" by ";" retaining the data-flow order of statements.
-
Next, we introduce mutables, add assignment statements to our program,
and show that the mutable m has the same value as the definition
variable x it is to replace, at every point in the program in which
x is read - i.e., where x appears on the right-hand side
of a definition statement or assignment or guard.
-
Finally, we remove the definition variables that are replaced by mutables,
secure in the knowledge that the mutables have the same value as the definition
variables in the statements in which they are read. We must, of course,
be sure that mutables shared by constituent blocks of a parallel block
are not modified within the parallel block.
Chandy and Taylor (1992)
Decomposition
Function decomposition
Break down the task so that each worker performs a distinct function.
Advantages
Disadvantages
-
Fewer tasks than workers
-
Some tasks are easier than others
Domain decomposition
Divide the domain by the number of workers available.
-
Horizontal domain decomposition: group is responsible for the entire project.
-
Vertical domain decomposition: Assembly line, pipelining
Communication and Synchronization
Co-operation requires communication. Communication requires a protocol.
Alternation and Competition
Allocate time to multiple tasks.
-
Scheduling
-
Co-operative multitasking: multi-person game, using the copy machine
-
Priority: telephone vs email
-
Competitive multitasking: time slice
-
Client-server: bakery
-
Busy waiting
-
Fairness
Correctness
Partial correctness, Total correctness, satisfaction of specifications...
Chandy & Taylor (1992) require
-
Shared mutable variables remain constant during parallel composition.
-
Mutable variables to copied when used in definitions.
-
When defined, definition variables act as constants in assignment.
Lewis (1993) develops a theory of program correctness called flow-correctness.
Lewis requires for each shared variable:
-
it must be defined before it is referenced,
-
it must be referenced before it is updated, and
-
only one process at a time may (re)define it.
These rules apply only to the dependencies among variables and do not include
either total correctness (termination) or logical correctness (satisfaction
of specifications).
Correctness issues in the design of concurrent programs fall in one
of two categories: safety and liveness.
-
Safety: nothing bad will happen. For example, access to a shared
resource like a printer requires that the user process have exclusive access
to the resource. So there must be a mechanism to provide mutual exclusion.
-
Liveness: something good will happen. On the other hand, no process
should prevent other processes from eventual access to the printer. Thus
any process which wants the printer must eventually have access to the
printer.
Safety is related to the concept of a loop invariant. A program should
produce the ``right'' answer. Liveness is related to the concept of a loop
variant. A program is expected to make progress. Termination is an example
of a liveness property when a program is expected to terminate.
Network description
Configuration
Implementation
Sequential Program |
Concurrent Program |
single process/thread |
multiple threads |
multiple processes |
Thread |
Name space
|
PC -> |
Data -> |
Heap -> |
Stack -> |
|
Code |
Global data |
Heap |
Stack |
|
|
Thread 1 |
PC -> |
Data -> |
Heap -> |
Stack1 -> |
.
.
.
Thread n |
PC -> |
Data -> |
Heap -> |
Stackn -> |
|
Shared Space |
Code |
Global data |
Heap |
Individual Stacks |
Stack1 |
.
.
.
.
.
.
.
. |
Stackn |
|
|
Processi |
Name spacei
|
PC -> |
Data -> |
Heap -> |
Stack -> |
|
Code |
Global data |
Heap |
Stack |
|
|
Historical Perspectives and Further Reading
Related issues: Lazy evaluation vs Parallel execution; Backtracking vs
Parallel execution
Concurrency occurs in hardware when two or more operations overlap in
time. Concurrency in hardware dates back to the 1950s when special-purpose
processors were developed for controlling input/output devices. This permitted
the overlapping of CPU instructions with I/O actions. For example, the
execution of an I/O instruction no longer delayed the execution of the
next instruction. The programmer was insulated from this concurrency by
the operating system. The problems presented to the operating systems by
this concurrency and the resulting solutions form the basis for constructs
supporting concurrency in programming languages. Hardware signals called
interrupts provided the synchronization between the CPU and the
I/O devices.
Other advances in hardware have lead to the the development of alternative
architectures. Pipeline processors which fetch the next instruction
while the first instruction is being decoded. Super scalar processors
combine multiple pipelines to provide an even greater level of concurrency.
Array processors provide a large number of identical processors
that operate simultaneously on different parts of the same data structure.
Data flow computers aim at extracting maximum concurrency from a
computation by performing as much of the computation in parallel as possible.
Connectionism based hardware models provide concurrency by modeling
computation after the neural networks found in the brain.
Interrupts together with a hardware clock made it possible to implement
multiprogramming systems which are designed to maximize the utilization
of the the computer systems resources (CPU, store, I/O devices) by running
two or more jobs concurrently. When one job was performing I/O another
job could be executing using the CPU.
Interrupts and the hardware clock also made possible the development
of interactive systems where multiple users have simultaneous access
to the system resources. Such a system must provide for a large number
of jobs whose combined demands on the system may exceed the system resources.
Various techniques of swapping and paging meet this need by moving jobs
in and out of the store to the larger capacity of backing store devices.
With the increase of jobs comes the need to increase the capacity of the
CPU. The solution was to develop multiprocessor systems in which
several CPUs are available and simultaneously operate on separate jobs
in the shared store.
An alternate solution is to develop distributed systems consisting
of several complete computers (each containing both CPU and an associated
store) that can operate independently but also communicate efficiently.
Such systems of local area networks permit the efficient use of shared
resources (such as printers and large backing store via file servers) and
increase the computational throughput of the system.
-
Andrews, Gregory R. and Olsson, Ronald A. (1993)
-
The SR Programming Language, Benjamin/Cummings, Redwood City,
CA.
-
Ben-Ari, M. (1990)
-
Principles of Concurrent and Distributed Programming, Prentice
Hall International, Hemel Hempstead, Hertfordshire.
-
Chandy, K. Mani and Taylor, Stephen (1992)
-
An Introduction to Parallel Programming Jones and Bartlett, Boston.
-
East, Ian. (1995)
-
Parallel Processing with Communicating Process Architecture, UCL
Press, London, England.
-
Foster, I. (1996)
-
Compositional Parallel Programming Languages TOPLAS Vol 18 No. 4
(July 1996): pp. 454-476.
-
Hehner, Eric C. R. (1993)
-
A Practical Theory of Programming Springer-Verlag, New York.
-
Lewis, Ted G. (1993)
-
Foundations of Parallel Programming: A Machine Independent Approach
IEEE Computer Society Press, Los Alamitos, CA.
-
Pacheco, Peter S. (1997)
-
Parallel Programming with MPI Morgan Kaufmann Publishers Inc., San
Francisco, CA.
-
Watt, David A. (1990)
-
Programming Language Concepts and Paradigms, Prentice-Hall International,
Hemel Hempstead, Hertfordshire.
Exercises
For each of the following problems identify the potential for concurrent
execution and the synchronization and communication requirements. Define
appropriate safety and liveness invariants. Construct solutions using ...
-
Producer-Consumer/Bounded Buffer (Models race conditions) Producers
create data elements which are placed in a buffer. The consumers remove
data elements from the buffer and perform some internal computation. The
problem is to keep the producer from overwriting full buffers and the consumer
from rereading empty buffers.
-
Readers and Writers (Models access to a database) A data object
is shared among several concurrent processes. Some of which only want to
read the content of the shared object, whereas others want to update (read
and write) the shared object. The problem is insure that only one writer
at a time has access to the object. Readers are processes which are not
required to exclude one another. Writers are required to exclude every
other process, readers and writers alike.
-
The Dining Philosophers. (Models exclusive access to limited resources)
N philosophers spend their lives seated around a circular table thinking
and eating. Each philosopher has a plate of spaghetti and, on each side,
shares a fork his/her neighbor. To eat, a philosopher must first acquire
the forks to its immediate left and right. After eating, a philosopher
places the forks back on the table. The problem is to write a program that
lets each philosopher eat and think.
The philosophers correspond to processes and the forks correspond to
resources.
A safety property for this problem is that a fork is held by one and
only one philosopher at a time. A desireable liveness property is that
whenever a philosopher wants to eat, eventually the philosopher will get
to eat.
-
Solve the dining philosophers problem using a central fork manager (centralized).
-
Solve the dining philosophers problem where there is a manager for each
fork(distributed).
-
Solve the dining philosophers problem where the philosophers handle their
own forks (decentralized).
-
Solve the dining philosophers problem if the philosophers must acquire
all the forks in order to eat (distributed mutual exclusion).
-
Sleeping Barber The barber shop has one barber, a barber chair,
and n chairs for waiting customers. The problem is to construct
an appropriate simulation.
-
Searching
-
Find the largest element in an unordered list
-
Sorting
-
Merge sort: Your program should break the list into two halves and sort
each half concurrently. While sorting, the two halves should be concurrently
merged.
-
Parallel merge of sorted lists -- if X[i] should just precede Y[j], then
X[i] should appear at Z[i+j-1].
-
Rank sort: X[i] has rank k if X has exactly k items less than X[i] i.e.,
X[i] should be placed in position k.
-
Insertion sort: value is placed into its place in the sorted list.
-
Exchange/Bubble sort: small values flow left and large values flow right.
-
Quicksort
-
Bitonic sort
-
The N-body problem. The N-body problem is used in astrophysics to
calculate the dynamics of the solar system and galaxies. Each mass in this
problem experiences a gravitational attraction by every other mass, in
proportion to the inverse square of the distance between the objects.
-
The sieve of Eratosthenes. The sieve of Eratosthenes is a method
of generating prime numbers by deleting composite numbers. This is done
by the following beginning with two as the first prime:
-
Delete all multiples of the prime number other than the prime number.
-
Iterate with the next remaining number which is prime.
-
Polynomial Multiplication -- initialize, form cross-product, sort
by power, combine like powers
-
The quadrature problem. The quadrature problem is to approximate
the area under a curve, i.e., to approximate the integral of a function.
Given a continuous, non-negative function f(x) and two endpoints l and
r, the problem is to compute the area of the region bounded by f(x) the
x axis, and the vertical lines through l and r. The typical way to solve
the problem is to subdivide the regions into a number of smaller ones,
using something like a trapezoid to approximate the area of each smaller
region, and them sum the areas of the smaller regions.
-
Matrix Operations.
-
Multiplication: AB = C where A is a p \times q matrix, B a q \times r matrix,
C a p \times r matrix and C[i,j] = \sum_{k=1}^m A[i,k]B[k,j]
-
Triangularization: Triangularization is a method for reducing a real matrix
to upper-triangular form. It involves iterating across the columns and
zeroing out the elements in the column below the diagonal element. This
is done by performing the following step for each column.
-
For each row r below the diagonal row d, subtract a multiple
of row d from row r. The multiple is m[r,d]/m[d,d]; subtracting
this multiple of row d has the effect of setting m[r,d] to zero.
-
Backsubstitution:
-
Gaussian elimination: Gaussian elimination with partial pivoting is a method
for reducing a real matrix to upper-triangular form. It involves iterating
across the columns and zeroing out the elements in the column below the
diagonal element. This is done by performing the following three steps
for each column.
-
Select a pivot element, which is the element in column d having
the largest absolute value.
-
Swap row d and the row containing the pivot element.
-
For each row r below the new diagonal row, subtract a multiple of
row d from row r. The multiple is m[r,d]/m[d,d]; subtracting
this multiple of row d has the effect of setting m[r,d] to zero.
Assume the matrix is non-singular (the divisor is non-zero).
-
Shortest Path between two vertices of a graph (edges are weighted).
-
Traveling salesman problem. Find the shortest tour that visits each
city exactly once.
-
Dutch national flag. A collection of colored balls is distributed
among N processes. There are at most N different colors of balls. The goal
is for the processes to exchange balls so that eventually, for all i, process
i holds all balls of color i. The number of balls in the collection is
unknown to the processes.
-
Distributed Synchronization
-
Write a program that polls N processes for yes or no votes and terminates
when at least N/2 responses have been received. Assume N is even.
-
Repeat the previous exercise, but terminate when a majority of identical
responses have been received. Assume N is even.
-
Random election of a leader amongst n processes. Create n processes. %Let
each process flip a coin to decide whether the process wants to %contest
the "elections". %Broadcast this to all other processes. Now, each process
generates a random number to decide its "vote", and sends the "vote" to
the process it is voting for. Each process counts its votes, and broadcasts
the results to all other processes. Now everyone knows the leader. (May
have to think of starting the process over again in case of a tie, or simply
deciding that the process with the larger Id is the leader, or some such
thing.) This is a rather silly problem, but it will help your to learn
about broadcasts and synchronizing processes, both of which are extremely
important for any kind of parallel programming.
-
The eight-queens problem. The eight-queens problem is concerned
with placing eight queens on a chess board in such a way that none can
attack another. One queen can attack another if they are in the same row
or column or are on the same diagonal.
-
Miscellaneous
-
Sum a set of numbers
-
(Conway) Read 80-character records, write 125 character records. Add an
extra blank after each input record. Replace every pair of asterisks (**)
with an exclamation point (!).
-
(Manna and Pnueli) Compute (n k) = n(n-1)...(n-k+1)/k!
-
(Roussel) Compare the structure of two binary trees
-
(Dijkstra) Let S and T be two disjoint sets of numbers with s and t the
number of elements respectively. Modify S and T so that S contains the
s smallest members of S union T and t the t largest members of S union
T.
-
(Conway) The game of life
-
(Hoare) Write a disk server that minimizes amount of seek time
-
Show that Lewis' flow-correctness rules are safety or liveness rules.
-
PCN is a single assignment language (in a single assignment language,
the assignment of a value to a variable may occur just once within a program).
In addition, when a program must reference an undefined variable, it waits
until the variable becomes defined. Show that PCN programs satisfy Lewis'
flow-correctness rules.
Author: A. Aaby