Concurrent Programming

The root of all successful human organization is co-operation not competition. Concurrent programming is characterized by programming with more than one process.

Keywords and phrases Pipelines, parallel processes, message passing, monitors, concurrent programming, safety, liveness, deadlock, live-lock, fairness, communication, synchronization producer-consumer, dining philosophers.

There are several reasons for a programmer to be interested in concurrency:

To better understand computer architecture (it has a great deal of concurrency with pipelining (multiple steps) and super-scalar (multiple instructions)) and
compiler design,
some problems are most naturally solved by using a set of co-operating processes,
A sequential solution constitutes over specification, and
to reduce the execution time.

At the machine level, operations are sequential, if they occur one after the other, ordered in time. Operations are concurrent, if they overlap in time. In Figure 1, sequential operations are connected by a single thread of control while concurrent operations have multiple threads of control.

Figure 1: **Sequential and Concurrent Operations**
Sequential operations:	--O-O-O-O-->	O: operation
Concurrent operations:	-O-O- --\| \|--> -O-O-	-: thread

Operations in the source text of a program are concurrent if they could be, but need not be, executed in parallel. Thus concurrency occurs in a programming language when two or more operations could be but need not be executed in parallel. In Figure 2a the second assignment depends on the outcome of the first assignment while in Figure 2b neither assignment depends on the other and may be executed concurrently.

Figure 2: **Sequential and Concurrent Code**
a. not concurrent	b. concurrent
X := 5; Y := 3*X + 4	X := AB + C; Y := 3A + 7;

Concurrent programming involves the notations for expressing potential parallelism so that operations may be executed in parallel and the techniques for solving the resulting synchronization and communication problems. Notations for explicit concurrency are a program structuring technique while parallelism is mode of execution provided by the underlying hardware. Thus we can have parallel execution without explicit concurrency in the language. We can have concurrency in a language without parallel execution. This is the case when a program (with or without explicit concurrent sections) is executed on a single processor. In this case, the program is executed by interleaving executions of the concurrent operations in the source text.

Aside. The terms, concurrent, distributed and parallel have a been used at various times to describe various types of concurrent programming. Multiple processors and disjoint or shared store are implementation concepts and are not important from the programming language point of view. What matters is the notation used to indicate concurrent execution, communication and synchronization.
Functional and logic programming languages do not necessarily need explicit specification of concurrency and, with a parallelizing compiler, may be executed on parallel hardware. It is important to note that the notion of processes is orthogonal to that of inference, functions and assignments.

The two fundamental concepts in concurrent programming are processes and resources. A process corresponds to a sequential computation with its own thread of control. Concurrent programs are distinguished from sequential programs in that, unlike sequential programs, concurrent programs permit multiple processes. Processes may share resources. Shared resources include program resources -- data structures and hardware resources -- CPU, memory, & I/O devices.

Aside. Processes which share an address space are called threads or light-weight processes. For some programming languages (C, C++) there are threads packages to permit concurrent programming. In other cases, the operating system (Microsoft Windows NT, Sun Solaris) provides system calls for threads. Processes which do not share an address space are called heavy-weight processes. The Unix family of operating systems provide a system call to allow programmers to create heavy-weight processes.

The Concurrent Nature of Systems

Co-operation

The Bakery. busy waiting, fairness, liveness

The Secretary. scheduling, priority, co-operative multitasking, interrupts, competitive multitasking, pre-emptive multitasking

The Secretarial Pool parallel tasking

Geoffrey Fox's wall. the construction of a brick wall by a number of workers.

Questions:

How do we break down the task to extract maximum parallelism?
Wow do we get the task done in the shortest possible time with a given number of workers.
What is the minimum amount of supervision needed?
Can all workers be kept equally busy?
Does the task demand specialized workers?
Can we maintain efficiency as either the size of the problem or the number of workers grows?

The Nature of Concurrent Systems

Abstraction

Performance

Communication

In the previous solution, it was assumed that the processes shared the address space and that synchronization was achieved by the use of monitor and condition queues. If the address spaces are disjoint, then both communication and synchronization must be achieved through message passing. There are two choices, message passing can be synchronous or asynchronous. When message passing is asynchronous, synchronization can be obtained by requiring a reply to a synchronizing message. In the examples that follow, synchronized message passing is assumed.

Behavior

Synchronization and Communication

Two processes are said to communicate if an action of one process must entirely precede an action of a second process. Synchronization is related to communication.

Live-lock may result if there are more than one waiting process and when the signal is received access is not granted fairly.

Starvation: (live-lock) multiple processes waiting for access but access is not provided in a fair manner

Coroutines.

Real-time Programming language issues

When message passing is asynchronous, synchronization can be obtained by requiring a reply to a synchronizing message. In the examples that follow, synchronized message passing is assumed.

Communication commands in the guards. Most communication based programming languages permit input commands in the guards but not output commands. The asymmetry is due to the resulting complexity required to implement output commands in the guards.

process Q;
const qsize = 10;
var head, tail : integer;
   queue : array[0..qsize-1] of integer;

begin                        
    head,tail := 0,0;
    *[ head != tail, C?X --> C!queue[head]; head := (head + 1) mod qsize 
      [] head != (tail+1) mod qsize, P?X --> queue[tail],tail := X, (tail + 1) mod qsize]
end;

process P;
begin
   *[ true --> produce(X); Q!X]
end;

process C;
begin
   *[ true --> Q!X, Q?X; consume(X)]
end;


begin
[ P || C || Q ]
end.

Nondeterminism

A program is deterministic if its evaluations on the same input it always produce the same output. The evaluation strategy might not always be unique.

A program is nondeterministic if it has more than one allowable evaluation strategy and different evaluation strategies lead to different results.

A concept related to nondeterminism is parallel evaluation Parallel evaluation that does not involve interaction on the part of its subparts is called noninterfering parallelism. Processes which have disjoint address spaces cannot interfere with each other and thus can operate without fear of corrupting each other. For example, the two processes in

[|| i:=1, j:=2] do not share an address space therefore, the assignments may take place in parallel.

Another example of non-interfering processes is found in matrix multiplication. When two matrices are multiplied, each entry in the product matrix is the result of multiplying a row times a column and summing the products. This is called an inner product. Each inner produce can be computed independently of the others. Figure~\ref{cp:mm}

Figure M.N: **Matrix Multiplication**
# multiply n by n matrices a and b in parallel # place result in matrix c # all matrices are global to multiply process multiply( i := 1 to n, := 1 to n) var inner_prod := 0 fa k := 1 to n -> inner_prod := inner_prod + a[i,k]*b[k,j] af c[i,j] := inner_prod end

Figure M.N: Matrix Multiplication

# multiply n by n matrices a and b in parallel 
# place result in matrix c 
# all matrices are global to multiply 
process multiply( i := 1 to n, := 1 to n) 
   var inner_prod := 0 
   fa k := 1 to n -> 
      inner_prod := inner_prod + a[i,k]*b[k,j] 
   af
   c[i,j] := inner_prod 
end

is an example of a matrix multiplication routine written in the SR programming language. This particular example also illustrated dynamic process creation in that {\tt n$^2$} processes are created to perform the multiplication.

In interfering parallelism, there is interaction and the relative speeds of the subparts can affect the final result.

Processes that access a common address space may interfere with each other. In this program,

[i:=1 || i:=2] the resulting value of $i$ could be either 1 or 2 depending on which process executed last and in this program, [i:=0;i:=i+1 || i:=2] the resulting value of $i$ could be either 1, 2 or 3.

A language is concurrent if it uses interfering parallelism.

Sequential programs are nearly always deterministic. A deterministic program follows a sequence of step that can be predicted in advance. Its behavior is reproducible and thus, deterministic programs are testable. Concurrent programs are likely to be nondeterministic because the order and speed of execution of the processes is unpredictable. This makes testing of concurrent programs a difficult task.

The requirement for disjoint address space may be too severe a requirement. What is required is that shared resources may need to be protected so that only one process is permitted access to the resourse at a time. This permits processes to cooperate, sharing the resource but maintaining the integrity of the resource.

Mutual Exclusion

Often a process must have exclusive access to a resource. For example, when a process is updating a data structure, no other process should have access to the same data structure otherwise the accuracy of the data may be in doubt. The necessity to restrict access is termed mutual exclusion and involves the following:

At most one process has access
If there are multiple requests for a resource, it must be granted to one of the processes in finite time.
When a process has exclusive access to a shared resource it release it in finite time.
When a process requests a resource it must obtain the resource in finite time.
A process should not consume processing time while waiting for a resource.

There are several solutions to the mutual exclusion problem. Among the solutions are semaphores, critical regions and monitors.

Deadlock

Deadlock is a liveness problem; it is a situation in which a set of processes are prevented from making any further progress by their mutually incompatible demands for additional resources. For example, in the dining philosophers problem, deadlock occurs if each philosopher picks up his/her left fork. No philosopher can make further progress.

Deadlock can occur in a system of processes and resources if, and only if, the following conditions all hold together.

Mutual exclusion: processes have exclusive access to the resources.
Wait and hold: processes continue to hold a resource while waiting for a new resource request to be granted.
No preemption: resources cannot be removed from a process.
Circular wait: there is a cycle of processes, each is awaiting a resource held by the next process in the cycle.

There are several approaches to the problem of deadlock. A common approach is to ignore deadlock and hope that it will not happen. If deadlock occurs, (much as when a program enters an infinite loop) the system's operators abort the program. This is not an adequate solution in highly concurrent systems where reliability is required.

A second approach is to allow deadlocks to occur but detect and recover automatically. Once deadlock is detected, processes are selectively aborted or one or more processes are rolled back to an earlier state and temporarily suspended until the danger point is passed. This might not an acceptable solution in real-time systems.

A third approach is to prevent deadlock by weakening one or more of the conditions. The wait-and-hold condition may be modified to require a process to request all needed resources at one time. The circular-wait condition may be modified by imposing a total ordering on resources and insisting that they be requested in that order.

Another example of a liveness problem is live-lock (or lockout or starvation). Live-lock occurs when a process is prevented from making progress (other processes are running). This is an issue of fairness.

Scheduling

When there are active requests for a resource there must be a mechanism for granting the requests. Often a solution is to grant access on a first-come, first-served basis. This may not always be desirable since there may be processes whose progress is more important. Such processes may be given a higher priority and their requests are processed first. When processes are prioritized, some processes may be prevented from making progress (such a process is live-locked). A fair scheduler insures that all processes eventually make progress thus preventing live-lock.

Semantics

Parallel processes must be... \begin{enumerate}

Synchronization-coordination of tasks which are not completely independent.

Communication-exchange of information

Scheduling-priority,

Nondeterminism-arbitrary selection of execution path \end{enumerate} Explicit Parallelism (message passing, semaphores, monitors) Languages which have been designed for concurrent execution include Concurrent Pascal, Ada and Occam. Application areas are typically operating systems and distributed processing. Ensemble activity

Concurrency in Programming Languages

Threads/Communication/Metaphor

From the programmer's point of view, concurrent programming notations allow programs to be structured as a set of possibly interactive processes. Such an organization is particularly useful for operating systems, real-time control systems, simulation studies, and combinatorial search applications.

To permit the effective use of multiple processes, concurrent programming languages must provide notations for:

Concurrent execution: A notation that denotes operations that could be, but need not be, executed in parallel.

PCN                   Occam
[|| P₁, P₂, ..., P_n]    PAR
                          P₁                          ...
                          P_n

Communication: A notation that permits processes to exchange information either through shared variables (visible to each process) or a message passing mechanism.

Shared Memory

Assignment: X := E

Message Passing

Synchronous P_i!E, P_j?X

Asynchronous P_i!E, P_j?X

Remote procedure call

Synchronization: A notation to require a process to wait for a signal from another process. In general processes are not independent. Often a process depends on data produced by another process. If the data is not available the process must wait until the data is available.

wait(P_i), signal(P_j)

A process can change its state to Blocked (waiting for some condition to change) and can signal Blocked processes so that they can continue.

In this case, the OS must provide the system calls BLOCK and WAKEUP. cking version of a semaphore

type semaphore = record
                     value : integer;
                     L : list of processes; // or queue blocked waiting for 
                 end;                       // the signal

down(S): S.value := S.value - 1;  // wait
         if S.value < 0 then
            add this process to S.L;
            block;
         end;

up(S): S.value := S.value + 1;    // signal
       if S.value <= 0 then
          remove a process P from S.L;
          wakeup(P);
       end;

Single processor: The normal way is to implement the semaphore operations (up and down) as system calls with the OS disabling the interrupts while executing the code.
Multiprocessor: Each semaphore should be protected by a lock variable, with the TSL instruction used to be sure that only one CPU at a time examines the semaphore. Using the TSL instruction to prevent several CPUs from accessing the semaphore at the same time is different from busy waiting.

Some alternative synchronization primitives are

Semaphores
Critical Regions
Monitors
Synchronized Message Passing

Mutual exclusion: A notation to synchronize access to shared resources.

semaphores
Monitors: One approach is to protect the critical section by a monitor. The monitor approach requires that only one process at a time may execute in the monitor.

Correctness (safety and liveness)
Performance
Architecture
Implementation

Aside.: concurrency: Fork (P) & Join (P)

combined notation for communication and synchronization C, Scheme, Ada, PVM, PCN, SR, Java and Occam are just some of the programming languages that provide for processes.

Producer-Consumer

In the following program there is a producer and a consumer process. The producer process adds items to the queue and the consumer process removes items from the queue. The safety condition that must be satisfied is that the head and tail of the queue must not over run each other. The liveness condition that must be satisfied is that when the queue contains an item, the consumer process must be able to access the queue and when the queue contains space for another item, the producer process must be able to access the queue.

const qsize = 10;
var count:integer;
   queue : array[0..qsize-1] of integer;
procedure enqueue (x : integer);
   begin
       *[ head=(tail+1) mod qsize --> skip]; 
      queue[tail],tail := x, (tail + 1) mod qsize
   end;
procedure dequeue (var x : integer);
   begin
       *[ head=tail --> skip]; 
      x,head := queue[head],(head + 1) mod qsize
   end;
begin                        
 head,tail := 0,0;
 [ *[produce(x); enqueue(x)] || *[dequeue(y); consume(y)]] 
end.

Since the processes access different portions of the queue and test for the presence or absence of items in the queue before accessing the queue, the desired safety properties are satisfied. Note however, that busy waiting is involved.

Shared Memory Model

Process Creation

Static
Dynamic

Process Identification

Named
Anonymous

Synchronization

Semaphore
Monitor

In many applications it is necessary to order the actions of a set of processes as well as interleave their access to shared resources. common address space, critical section protected by a monitor, synchronization provided through wait and signal.

Some alternative synchronization primitives are

Semaphores
Critical Regions
Monitors
Synchronized Message Passing

If in the previous example another process where to be added, either a producer or a consumer process, an unsafe condition could result. Two processes could compete for access to the same item in the queue. The solution is to permit only one process at a time to access the enqueue or dequeue routines. One approach is to protect the critical section by a monitor. The monitor approach requires that only one process at a time may execute in the monitor. The following monitor solution is incorrect.

monitor Queue_ADT
   const qsize = 10;
   var count:integer;
      queue : array[0..qsize-1] of integer;
   procedure enqueue (x : integer);
      begin
          *[ head=(tail+1) mod qsize -> skip]; 
         queue[tail],tail := x, (tail + 1) mod qsize
      end;
   procedure dequeue (var x : integer);
      beg\=in
          *[ head=tail -> skip]; 
         x,head := queue[head],(head + 1) mod qsize
      end;
   begin                        
       head,tail := 0,0;
   end;
begin                        
    [ produce(x); enqueue(x) $\parallel$ dequeue(y); consume(y) $\parallel$ dequeue(y); consume(y)] 
end.

Note that busy waiting is still involved and further once a process is in the monitor and is waiting, no other process can get in and the program is {\it deadlocked}.

Message Passing Model

Process Creation

Static
Dynamic

Process Identification

Named
Anonymous

Message Passing

Synchronous
Asynchronous

Data Flow

Unidirectional
Bidirectional

MPI

Hardware

Processes

Process: single flow of control through a set of instructions
Processor: hardware device for executing
Parallel computer: two or more processors connected through an interconnection network.

Flynn's Taxonomy

System classification by number of instruction and data streams.

SISD: classical sequential von Neumann machine. Inherently sequential. Parallelism may be simulated by interleaving instructions & multiprogramming.
Pipelining and vector architectures
SIMD: synchronous since there is a single instruction stream, each processor has its own data stream. Matrix operations are a good example. Thinking Machines - CM, Maspar Computer Corp -- MP (single sequencing units)
MISD: does not seem to be useful
MIMD/SPMD: asynchronous processes but with occasional pauses to synchronize; Intel iPSC, nCUBE, Sequent Symmetry, SGI Onyx, SUN MP system

shared-memory (sometimes called multiprocessors) locking and protection mechanism
distributed-memory (sometimes called multicomputers) message passing

Shared-Memory MIMD

Bus-based architectures
Cache coherence -- for bus based systems use the snoopy protocol
Switch-based architectures, crossbar switch
NUMA - nonuniform memory access

Distributed-Memory MIMD

Dynamic interconnection networks
Static interconnection networks

linear array, 2D mesh, 3D mesh
ring, torus
hypercube
bus

Communication and routing

routing
store-and-forward

The Engineering of Concurrent Programs

A parallel programming environment must support the following three phases of system behavior specification.

Programming Behavior of processes and their interconnection
Network description Processors and their interconnection
Configuration Mapping of software onto hardware

Programming

The way to design parallel software is to begin with the most parallel algorithm possible and then gradually render it more sequential ... until it suits the machine on which it is to run.
East (1995)

Chandy and Taylor (1992) define an elegant parallel programming language PCN (Program Composition Notation) based on:

Shared definition variables (single assignment) -- X=Exp,
Parallel composition -- [||P₀,...,P_n],
Choice composition -- [? G₀ -> P₀,..., G_n -> P_n],
Sequential composition -- [; S₀,...,S_n], and
Recursion -- name(parameters) composition expression

The definition variable eliminates the indeterminacy problems. Communication is through shared variables which may be streams. Synchronization is achieved by requiring a process that references an undefined variable to wait until it is defined by some other process before continuing. Recursion with parallel composition permits dynamic process creation.

If a program that uses only parallel and choice composition and definition variables does not have adequate efficiency, ...
We use the following steps in the introduction of mutables and sequencing into a parallel block.

We order the statements in a parallel block so that all variables that appear on the right-hand sides of definition statements reduce to ground values or tuples, and all guards reduce to the ground values true or false, give only the definitions established by statements earlier in the ordering. In other words, we order statements in the direction of data flow; statements that write a variable appear earlier than statements that read that variable. Then we convert the parallel block into a sequential block by replacing "||" by ";" retaining the data-flow order of statements.

Next, we introduce mutables, add assignment statements to our program, and show that the mutable m has the same value as the definition variable x it is to replace, at every point in the program in which x is read - i.e., where x appears on the right-hand side of a definition statement or assignment or guard.

Finally, we remove the definition variables that are replaced by mutables, secure in the knowledge that the mutables have the same value as the definition variables in the statements in which they are read. We must, of course, be sure that mutables shared by constituent blocks of a parallel block are not modified within the parallel block.

Chandy and Taylor (1992)

Decomposition

Function decomposition

Break down the task so that each worker performs a distinct function.

Advantages
Disadvantages

Fewer tasks than workers
Some tasks are easier than others

Domain decomposition

Divide the domain by the number of workers available.

Horizontal domain decomposition: group is responsible for the entire project.
Vertical domain decomposition: Assembly line, pipelining

Communication and Synchronization

Co-operation requires communication. Communication requires a protocol.

Alternation and Competition

Allocate time to multiple tasks.

Scheduling
Co-operative multitasking: multi-person game, using the copy machine
Priority: telephone vs email
Competitive multitasking: time slice
Client-server: bakery
Busy waiting
Fairness

Correctness

Partial correctness, Total correctness, satisfaction of specifications...

Chandy & Taylor (1992) require

Shared mutable variables remain constant during parallel composition.
Mutable variables to copied when used in definitions.
When defined, definition variables act as constants in assignment.

Lewis (1993) develops a theory of program correctness called flow-correctness. Lewis requires for each shared variable:

it must be defined before it is referenced,
it must be referenced before it is updated, and
only one process at a time may (re)define it.

These rules apply only to the dependencies among variables and do not include either total correctness (termination) or logical correctness (satisfaction of specifications).

Correctness issues in the design of concurrent programs fall in one of two categories: safety and liveness.

Safety: nothing bad will happen. For example, access to a shared resource like a printer requires that the user process have exclusive access to the resource. So there must be a mechanism to provide mutual exclusion.
Liveness: something good will happen. On the other hand, no process should prevent other processes from eventual access to the printer. Thus any process which wants the printer must eventually have access to the printer.

Safety is related to the concept of a loop invariant. A program should produce the ``right'' answer. Liveness is related to the concept of a loop variant. A program is expected to make progress. Termination is an example of a liveness property when a program is expected to terminate.

Network description

Configuration

Implementation

Sequential Program

Concurrent Program

single process/thread

multiple threads

multiple processes

Thread

Name space

PC ->

Data ->

Heap ->

Stack ->

Code

Global data

Heap

Stack

Thread 1

PC ->

Data ->

Heap ->

Stack₁ ->

.
.
.

Thread n

PC ->

Data ->

Heap ->

Stack_n ->

Shared Space

Code

Global data

Heap

Individual Stacks

Stack₁

.
.
.
.
.
.
.
.

Stack_n

Process_i

Name space_i

PC ->

Data ->

Heap ->

Stack ->

Code

Global data

Heap

Stack

Historical Perspectives and Further Reading

Related issues: Lazy evaluation vs Parallel execution; Backtracking vs Parallel execution

Concurrency occurs in hardware when two or more operations overlap in time. Concurrency in hardware dates back to the 1950s when special-purpose processors were developed for controlling input/output devices. This permitted the overlapping of CPU instructions with I/O actions. For example, the execution of an I/O instruction no longer delayed the execution of the next instruction. The programmer was insulated from this concurrency by the operating system. The problems presented to the operating systems by this concurrency and the resulting solutions form the basis for constructs supporting concurrency in programming languages. Hardware signals called interrupts provided the synchronization between the CPU and the I/O devices.

Other advances in hardware have lead to the the development of alternative architectures. Pipeline processors which fetch the next instruction while the first instruction is being decoded. Super scalar processors combine multiple pipelines to provide an even greater level of concurrency. Array processors provide a large number of identical processors that operate simultaneously on different parts of the same data structure. Data flow computers aim at extracting maximum concurrency from a computation by performing as much of the computation in parallel as possible. Connectionism based hardware models provide concurrency by modeling computation after the neural networks found in the brain.

Interrupts together with a hardware clock made it possible to implement multiprogramming systems which are designed to maximize the utilization of the the computer systems resources (CPU, store, I/O devices) by running two or more jobs concurrently. When one job was performing I/O another job could be executing using the CPU.

Interrupts and the hardware clock also made possible the development of interactive systems where multiple users have simultaneous access to the system resources. Such a system must provide for a large number of jobs whose combined demands on the system may exceed the system resources. Various techniques of swapping and paging meet this need by moving jobs in and out of the store to the larger capacity of backing store devices. With the increase of jobs comes the need to increase the capacity of the CPU. The solution was to develop multiprocessor systems in which several CPUs are available and simultaneously operate on separate jobs in the shared store.

An alternate solution is to develop distributed systems consisting of several complete computers (each containing both CPU and an associated store) that can operate independently but also communicate efficiently. Such systems of local area networks permit the efficient use of shared resources (such as printers and large backing store via file servers) and increase the computational throughput of the system.

Andrews, Gregory R. and Olsson, Ronald A. (1993): The SR Programming Language, Benjamin/Cummings, Redwood City, CA.
Ben-Ari, M. (1990): Principles of Concurrent and Distributed Programming, Prentice Hall International, Hemel Hempstead, Hertfordshire.
Chandy, K. Mani and Taylor, Stephen (1992): An Introduction to Parallel Programming Jones and Bartlett, Boston.
East, Ian. (1995): Parallel Processing with Communicating Process Architecture, UCL Press, London, England.
Foster, I. (1996): Compositional Parallel Programming Languages TOPLAS Vol 18 No. 4 (July 1996): pp. 454-476.
Hehner, Eric C. R. (1993): A Practical Theory of Programming Springer-Verlag, New York.
Lewis, Ted G. (1993): Foundations of Parallel Programming: A Machine Independent Approach IEEE Computer Society Press, Los Alamitos, CA.
Pacheco, Peter S. (1997): Parallel Programming with MPI Morgan Kaufmann Publishers Inc., San Francisco, CA.
Watt, David A. (1990): Programming Language Concepts and Paradigms, Prentice-Hall International, Hemel Hempstead, Hertfordshire.

Exercises

For each of the following problems identify the potential for concurrent execution and the synchronization and communication requirements. Define appropriate safety and liveness invariants. Construct solutions using ...

Producer-Consumer/Bounded Buffer (Models race conditions) Producers create data elements which are placed in a buffer. The consumers remove data elements from the buffer and perform some internal computation. The problem is to keep the producer from overwriting full buffers and the consumer from rereading empty buffers.
Readers and Writers (Models access to a database) A data object is shared among several concurrent processes. Some of which only want to read the content of the shared object, whereas others want to update (read and write) the shared object. The problem is insure that only one writer at a time has access to the object. Readers are processes which are not required to exclude one another. Writers are required to exclude every other process, readers and writers alike.
The Dining Philosophers. (Models exclusive access to limited resources) N philosophers spend their lives seated around a circular table thinking and eating. Each philosopher has a plate of spaghetti and, on each side, shares a fork his/her neighbor. To eat, a philosopher must first acquire the forks to its immediate left and right. After eating, a philosopher places the forks back on the table. The problem is to write a program that lets each philosopher eat and think.

The philosophers correspond to processes and the forks correspond to resources.

A safety property for this problem is that a fork is held by one and only one philosopher at a time. A desireable liveness property is that whenever a philosopher wants to eat, eventually the philosopher will get to eat.

Solve the dining philosophers problem using a central fork manager (centralized).
Solve the dining philosophers problem where there is a manager for each fork(distributed).
Solve the dining philosophers problem where the philosophers handle their own forks (decentralized).
Solve the dining philosophers problem if the philosophers must acquire all the forks in order to eat (distributed mutual exclusion).

Sleeping Barber The barber shop has one barber, a barber chair, and n chairs for waiting customers. The problem is to construct an appropriate simulation.
Searching

Find the largest element in an unordered list

Sorting

Merge sort: Your program should break the list into two halves and sort each half concurrently. While sorting, the two halves should be concurrently merged.
Parallel merge of sorted lists -- if X[i] should just precede Y[j], then X[i] should appear at Z[i+j-1].
Rank sort: X[i] has rank k if X has exactly k items less than X[i] i.e., X[i] should be placed in position k.
Insertion sort: value is placed into its place in the sorted list.
Exchange/Bubble sort: small values flow left and large values flow right.
Quicksort
Bitonic sort

The N-body problem. The N-body problem is used in astrophysics to calculate the dynamics of the solar system and galaxies. Each mass in this problem experiences a gravitational attraction by every other mass, in proportion to the inverse square of the distance between the objects.
The sieve of Eratosthenes. The sieve of Eratosthenes is a method of generating prime numbers by deleting composite numbers. This is done by the following beginning with two as the first prime:

Delete all multiples of the prime number other than the prime number.
Iterate with the next remaining number which is prime.

Polynomial Multiplication -- initialize, form cross-product, sort by power, combine like powers
The quadrature problem. The quadrature problem is to approximate the area under a curve, i.e., to approximate the integral of a function. Given a continuous, non-negative function f(x) and two endpoints l and r, the problem is to compute the area of the region bounded by f(x) the x axis, and the vertical lines through l and r. The typical way to solve the problem is to subdivide the regions into a number of smaller ones, using something like a trapezoid to approximate the area of each smaller region, and them sum the areas of the smaller regions.
Matrix Operations.

Multiplication: AB = C where A is a p \times q matrix, B a q \times r matrix, C a p \times r matrix and C[i,j] = \sum_{k=1}^m A[i,k]B[k,j]
Triangularization: Triangularization is a method for reducing a real matrix to upper-triangular form. It involves iterating across the columns and zeroing out the elements in the column below the diagonal element. This is done by performing the following step for each column.

For each row r below the diagonal row d, subtract a multiple of row d from row r. The multiple is m[r,d]/m[d,d]; subtracting this multiple of row d has the effect of setting m[r,d] to zero.

Backsubstitution:
Gaussian elimination: Gaussian elimination with partial pivoting is a method for reducing a real matrix to upper-triangular form. It involves iterating across the columns and zeroing out the elements in the column below the diagonal element. This is done by performing the following three steps for each column.

Select a pivot element, which is the element in column d having the largest absolute value.
Swap row d and the row containing the pivot element.
For each row r below the new diagonal row, subtract a multiple of row d from row r. The multiple is m[r,d]/m[d,d]; subtracting this multiple of row d has the effect of setting m[r,d] to zero.

Shortest Path between two vertices of a graph (edges are weighted).
Traveling salesman problem. Find the shortest tour that visits each city exactly once.
Dutch national flag. A collection of colored balls is distributed among N processes. There are at most N different colors of balls. The goal is for the processes to exchange balls so that eventually, for all i, process i holds all balls of color i. The number of balls in the collection is unknown to the processes.
Distributed Synchronization

Write a program that polls N processes for yes or no votes and terminates when at least N/2 responses have been received. Assume N is even.
Repeat the previous exercise, but terminate when a majority of identical responses have been received. Assume N is even.
Random election of a leader amongst n processes. Create n processes. %Let each process flip a coin to decide whether the process wants to %contest the "elections". %Broadcast this to all other processes. Now, each process generates a random number to decide its "vote", and sends the "vote" to the process it is voting for. Each process counts its votes, and broadcasts the results to all other processes. Now everyone knows the leader. (May have to think of starting the process over again in case of a tie, or simply deciding that the process with the larger Id is the leader, or some such thing.) This is a rather silly problem, but it will help your to learn about broadcasts and synchronizing processes, both of which are extremely important for any kind of parallel programming.

The eight-queens problem. The eight-queens problem is concerned with placing eight queens on a chess board in such a way that none can attack another. One queen can attack another if they are in the same row or column or are on the same diagonal.
Miscellaneous

Sum a set of numbers
(Conway) Read 80-character records, write 125 character records. Add an extra blank after each input record. Replace every pair of asterisks (**) with an exclamation point (!).
(Manna and Pnueli) Compute (n k) = n(n-1)...(n-k+1)/k!
(Roussel) Compare the structure of two binary trees
(Dijkstra) Let S and T be two disjoint sets of numbers with s and t the number of elements respectively. Modify S and T so that S contains the s smallest members of S union T and t the t largest members of S union T.
(Conway) The game of life
(Hoare) Write a disk server that minimizes amount of seek time
Show that Lewis' flow-correctness rules are safety or liveness rules.
PCN is a single assignment language (in a single assignment language, the assignment of a value to a variable may occur just once within a program). In addition, when a program must reference an undefined variable, it waits until the variable becomes defined. Show that PCN programs satisfy Lewis' flow-correctness rules.

Author: A. Aaby