Concurrent Programming

The root of all successful human organization is co-operation not competition. Concurrent programming is characterized by programming with more than one process.

Keywords and phrases Pipelines, parallel processes, message passing, monitors, concurrent programming, safety, liveness, deadlock, live-lock, fairness, communication, synchronization producer-consumer, dining philosophers.


There are several reasons for a programmer to be interested in concurrency:

  1. To better understand computer architecture  (it has a great deal of concurrency with pipelining (multiple steps) and super-scalar (multiple instructions)) and
  2. compiler design,
  3. some problems are most naturally solved by using a set of co-operating processes,
  4. A sequential solution constitutes over specification, and
  5. to reduce the execution time.
At the machine level, operations are sequential, if they occur one after the other, ordered in time. Operations are concurrent, if they overlap in time.   In Figure 1, sequential operations are connected by a single thread of control while concurrent operations have multiple threads of control.


 
Figure 1: Sequential and Concurrent Operations
 
Sequential operations:
   --O-O-O-O--> 
O: operation
Concurrent operations:
      -O-O-
   --|     |-->  
      -O-O-
-: thread

Operations in the source text of a program are concurrent if they could be, but need not be, executed in parallel. Thus concurrency occurs in a programming language when two or more operations could be but need not be executed in parallel.   In Figure 2a the second assignment depends on the outcome of the first assignment while in  Figure 2b neither assignment depends on the other and may be executed concurrently.


 
Figure 2: Sequential and Concurrent Code
a. not concurrent b. concurrent
X := 5; 
Y := 3*X + 4
X := A*B + C; 
Y := 3*A + 7;

Concurrent programming involves the notations for expressing potential parallelism so that operations may be executed in parallel and the techniques for solving the resulting synchronization and communication problems. Notations for explicit concurrency are a program structuring technique while parallelism is mode of execution provided by the underlying hardware. Thus we can have parallel execution without explicit concurrency in the language. We can have concurrency in a language without parallel execution. This is the case when a program (with or without explicit concurrent sections) is executed on a single processor. In this case, the program is executed by interleaving executions of the concurrent operations in the source text.
Aside. The terms, concurrent, distributed and parallel have a been used at various times to describe various types of concurrent programming. Multiple processors and disjoint or shared store are implementation concepts and are not important from the programming language point of view. What matters is the notation used to indicate concurrent execution, communication and synchronization.

Functional and logic programming languages do not necessarily need explicit specification of concurrency and, with a parallelizing compiler, may be executed on parallel hardware. It is important to note that the notion of processes is orthogonal to that of inference, functions and assignments.

The two fundamental concepts in concurrent programming are processes and resources. A process corresponds to a sequential computation with its own thread of control. Concurrent programs are distinguished from sequential programs in that, unlike sequential programs, concurrent programs permit multiple processes. Processes may share resources. Shared resources include program resources -- data structures and hardware resources -- CPU, memory, & I/O devices.
Aside. Processes which share an address space are called threads or light-weight processes. For some programming languages (C, C++) there are threads packages to permit concurrent programming. In other cases, the operating system (Microsoft Windows NT, Sun Solaris) provides system calls for threads. Processes which do not share an address space are called heavy-weight processes. The Unix family of operating systems provide a system call to allow programmers to create heavy-weight processes.

The Concurrent Nature of Systems

Co-operation

The Bakery. busy waiting, fairness, liveness

The Secretary. scheduling, priority, co-operative multitasking, interrupts, competitive multitasking, pre-emptive multitasking

The Secretarial Pool parallel tasking

Geoffrey Fox's wall. the construction of a brick wall by a number of workers.

Questions:

The Nature of Concurrent Systems

Abstraction

Performance

Communication

In the previous solution, it was assumed that the processes shared the address space and that synchronization was achieved by the use of monitor and condition queues. If the address spaces are disjoint, then both communication and synchronization must be achieved through message passing. There are two choices, message passing can be synchronous or asynchronous. When message passing is asynchronous, synchronization can be obtained by requiring a reply to a synchronizing message. In the examples that follow, synchronized message passing is assumed.

Behavior

Synchronization and Communication

Two processes are said to communicate if an action of one process must entirely precede an action of a second process. Synchronization is related to communication.

Live-lock may result if there are more than one waiting process and when the signal is received access is not granted fairly.

Starvation: (live-lock) multiple processes waiting for access but access is not provided in a fair manner

Coroutines.

Real-time Programming language issues

When message passing is asynchronous, synchronization can be obtained by requiring a reply to a synchronizing message. In the examples that follow, synchronized message passing is assumed.

Communication commands in the guards. Most communication based programming languages permit input commands in the guards but not output commands. The asymmetry is due to the resulting complexity required to implement output commands in the guards.

process Q;
const qsize = 10;
var head, tail : integer;
   queue : array[0..qsize-1] of integer;

begin                        
    head,tail := 0,0;
    *[ head != tail, C?X --> C!queue[head]; head := (head + 1) mod qsize 
      [] head != (tail+1) mod qsize, P?X --> queue[tail],tail := X, (tail + 1) mod qsize]
end;

process P;
begin
   *[ true --> produce(X); Q!X]
end;

process C;
begin
   *[ true --> Q!X, Q?X; consume(X)]
end;


begin
[ P || C || Q ]
end.

Nondeterminism

A program is deterministic if its evaluations on the same input it always produce the same output. The evaluation strategy might not always be unique.

A program is nondeterministic if it has more than one allowable evaluation strategy and different evaluation strategies lead to different results.

A concept related to nondeterminism is parallel evaluation Parallel evaluation that does not involve interaction on the part of its subparts is called noninterfering parallelism. Processes which have disjoint address spaces cannot interfere with each other and thus can operate without fear of corrupting each other. For example, the two processes in

[|| i:=1, j:=2]
do not share an address space therefore, the assignments may take place in parallel.

Another example of non-interfering processes is found in matrix multiplication. When two matrices are multiplied, each entry in the product matrix is the result of multiplying a row times a column and summing the products. This is called an inner product. Each inner produce can be computed independently of the others. Figure~\ref{cp:mm}



 
Figure M.N: Matrix Multiplication
# multiply n by n matrices a and b in parallel 
# place result in matrix c 
# all matrices are global to multiply 
process multiply( i := 1 to n, := 1 to n) 
   var inner_prod := 0 
   fa k := 1 to n -> 
      inner_prod := inner_prod + a[i,k]*b[k,j] 
   af
   c[i,j] := inner_prod 
end


is an example of a matrix multiplication routine written in the SR programming language. This particular example also illustrated dynamic process creation in that {\tt n$^2$} processes are created to perform the multiplication.

In interfering parallelism, there is interaction and the relative speeds of the subparts can affect the final result.

Processes that access a common address space may interfere with each other. In this program,

[i:=1 || i:=2]
the resulting value of $i$ could be either 1 or 2 depending on which process executed last and in this program,
[i:=0;i:=i+1 || i:=2]
the resulting value of $i$ could be either 1, 2 or 3.

A language is concurrent if it uses interfering parallelism.

Sequential programs are nearly always deterministic. A deterministic program follows a sequence of step that can be predicted in advance. Its behavior is reproducible and thus, deterministic programs are testable. Concurrent programs are likely to be nondeterministic because the order and speed of execution of the processes is unpredictable. This makes testing of concurrent programs a difficult task.

The requirement for disjoint address space may be too severe a requirement. What is required is that shared resources may need to be protected so that only one process is permitted access to the resourse at a time. This permits processes to cooperate, sharing the resource but maintaining the integrity of the resource.

Mutual Exclusion

Often a process must have exclusive access to a resource. For example, when a process is updating a data structure, no other process should have access to the same data structure otherwise the accuracy of the data may be in doubt. The necessity to restrict access is termed mutual exclusion and involves the following: There are several solutions to the mutual exclusion problem. Among the solutions are semaphores, critical regions and monitors.

Deadlock

Deadlock is a liveness problem; it is a situation in which a set of processes are prevented from making any further progress by their mutually incompatible demands for additional resources. For example, in the dining philosophers problem, deadlock occurs if each philosopher picks up his/her left fork. No philosopher can make further progress.

Deadlock can occur in a system of processes and resources if, and only if, the following conditions all hold together.

There are several approaches to the problem of deadlock. A common approach is to ignore deadlock and hope that it will not happen. If deadlock occurs, (much as when a program enters an infinite loop) the system's operators abort the program. This is not an adequate solution in highly concurrent systems where reliability is required.

A second approach is to allow deadlocks to occur but detect and recover automatically. Once deadlock is detected, processes are selectively aborted or one or more processes are rolled back to an earlier state and temporarily suspended until the danger point is passed. This might not an acceptable solution in real-time systems.

A third approach is to prevent deadlock by weakening one or more of the conditions. The wait-and-hold condition may be modified to require a process to request all needed resources at one time. The circular-wait condition may be modified by imposing a total ordering on resources and insisting that they be requested in that order.

Another example of a liveness problem is live-lock (or lockout or starvation). Live-lock occurs when a process is prevented from making progress (other processes are running). This is an issue of fairness.

Scheduling

When there are active requests for a resource there must be a mechanism for granting the requests. Often a solution is to grant access on a first-come, first-served basis. This may not always be desirable since there may be processes whose progress is more important. Such processes may be given a higher priority and their requests are processed first. When processes are prioritized, some processes may be prevented from making progress (such a process is live-locked). A fair scheduler insures that all processes eventually make progress thus preventing live-lock.

Semantics

Parallel processes must be... \begin{enumerate}
  • Synchronization-coordination of tasks which are not completely independent.
  • Communication-exchange of information
  • Scheduling-priority,
  • Nondeterminism-arbitrary selection of execution path \end{enumerate} Explicit Parallelism (message passing, semaphores, monitors) Languages which have been designed for concurrent execution include Concurrent Pascal, Ada and Occam. Application areas are typically operating systems and distributed processing. Ensemble activity
  • Concurrency in Programming Languages

    Threads/Communication/Metaphor

     From the programmer's point of view, concurrent programming notations allow programs to be structured as a set of possibly interactive processes. Such an organization is particularly useful for operating systems, real-time control systems, simulation studies, and combinatorial search applications.

    To permit the effective use of multiple processes, concurrent programming languages must provide notations for:

    1. Concurrent execution: A notation that denotes operations that could be, but need not be, executed in parallel.
    2. PCN                   Occam
      [|| P1, P2, ..., Pn]    PAR
                                P1
                                ...
                                Pn
    3. Communication: A notation that permits processes to exchange information either through shared variables (visible to each process) or a message passing mechanism.
    4. Shared Memory
      Assignment: X := E
      Message Passing
      Synchronous Pi!E, Pj?X
      Asynchronous Pi!E, Pj?X
      Remote procedure call
    5. Synchronization: A notation to require a process to wait for a signal from another process. In general processes are not independent. Often a process depends on data produced by another process. If the data is not available the process must wait until the data is available.
    6. wait(Pi), signal(Pj)

      A process can change its state to Blocked (waiting for some condition to change) and can signal Blocked processes so that they can continue.

      In this case, the OS must provide the system calls BLOCK and WAKEUP. cking version of a semaphore

      type semaphore = record
                           value : integer;
                           L : list of processes; // or queue blocked waiting for 
                       end;                       // the signal
      
      down(S): S.value := S.value - 1;  // wait
               if S.value < 0 then
                  add this process to S.L;
                  block;
               end;
      
      up(S): S.value := S.value + 1;    // signal
             if S.value <= 0 then
                remove a process P from S.L;
                wakeup(P);
             end;
      Implementation In many applications it is necessary to order the actions of a set of processes as well as interleave their access to shared resources. common address space, critical section protected by a monitor, synchronization provided through wait and signal.

      Some alternative synchronization primitives are

    7. Mutual exclusion: A notation to synchronize access to shared resources.
    8. semaphores
      Monitors: One approach is to protect the critical section by a monitor. The monitor approach requires that only one process at a time may execute in the monitor.
      monitor Queue_ADT
      const qsize = 10;
      var head, tail : integer;
         queue : array[0..qsize-1] of integer;
         notempty, notfull : condition;
      procedure enqueue (x : integer);
         begin
            [ head=(tail+1) mod qsize --> wait(notfull)
            [] head!=(tail+1) mod qsize --> skip];
            queue[tail],tail := x, (tail + 1) mod qsize
            signal(notempty)
         end;
      procedure dequeue (var x : integer);
         begin
            [ head=tail --> wait(notempty)
            [] head!=tail --> skip];
            x,head := queue[head],(head + 1) mod qsize;
            signal(notfull)
         end;
      begin                        
          head,tail := 0,0;
      end;
      begin                        
       [ produce(x); enqueue(x) || dequeue(y); consume(y) || dequeue(y); consume(y)] 
      end.
    Aside.
    concurrency: Fork (P) & Join (P)
    combined notation for communication and synchronization C, Scheme, Ada, PVM, PCN, SR, Java and Occam are just some of the programming languages that provide for processes.

    Producer-Consumer

    In the following program there is a producer and a consumer process. The producer process adds items to the queue and the consumer process removes items from the queue. The safety condition that must be satisfied is that the head and tail of the queue must not over run each other. The liveness condition that must be satisfied is that when the queue contains an item, the consumer process must be able to access the queue and when the queue contains space for another item, the producer process must be able to access the queue.
    const qsize = 10;
    var count:integer;
       queue : array[0..qsize-1] of integer;
    procedure enqueue (x : integer);
       begin
           *[ head=(tail+1) mod qsize --> skip]; 
          queue[tail],tail := x, (tail + 1) mod qsize
       end;
    procedure dequeue (var x : integer);
       begin
           *[ head=tail --> skip]; 
          x,head := queue[head],(head + 1) mod qsize
       end;
    begin                        
     head,tail := 0,0;
     [ *[produce(x); enqueue(x)] || *[dequeue(y); consume(y)]] 
    end.
    Since the processes access different portions of the queue and test for the presence or absence of items in the queue before accessing the queue, the desired safety properties are satisfied. Note however, that busy waiting is involved.

    Shared Memory Model

    Process Creation Process Identification Synchronization In many applications it is necessary to order the actions of a set of processes as well as interleave their access to shared resources. common address space, critical section protected by a monitor, synchronization provided through wait and signal.

    Some alternative synchronization primitives are

    If in the previous example another process where to be added, either a producer or a consumer process, an unsafe condition could result. Two processes could compete for access to the same item in the queue. The solution is to permit only one process at a time to access the enqueue or dequeue routines. One approach is to protect the critical section by a monitor. The monitor approach requires that only one process at a time may execute in the monitor. The following monitor solution is incorrect.
    monitor Queue_ADT
       const qsize = 10;
       var count:integer;
          queue : array[0..qsize-1] of integer;
       procedure enqueue (x : integer);
          begin
              *[ head=(tail+1) mod qsize -> skip]; 
             queue[tail],tail := x, (tail + 1) mod qsize
          end;
       procedure dequeue (var x : integer);
          beg\=in
              *[ head=tail -> skip]; 
             x,head := queue[head],(head + 1) mod qsize
          end;
       begin                        
           head,tail := 0,0;
       end;
    begin                        
        [ produce(x); enqueue(x) $\parallel$ dequeue(y); consume(y) $\parallel$ dequeue(y); consume(y)] 
    end.
    Note that busy waiting is still involved and further once a process is in the monitor and is waiting, no other process can get in and the program is {\it deadlocked}.

    Message Passing Model

    Process Creation Process Identification Message Passing Data Flow MPI

    Hardware

    Processes Flynn's Taxonomy

    System classification by number of instruction and data streams.

    Shared-Memory MIMD Distributed-Memory MIMD Communication and routing

    The Engineering of Concurrent Programs

    A parallel programming environment must support the following three phases of system behavior specification.

    Programming

    The way to design parallel software is to begin with the most parallel algorithm possible and then gradually render it more sequential ... until it suits the machine on which it is to run.
    East (1995)
    Chandy and Taylor (1992) define an elegant parallel programming language PCN (Program Composition Notation) based on: The definition variable eliminates the indeterminacy problems. Communication is through shared variables which may be streams. Synchronization is achieved by requiring a process that references an undefined variable to wait until it is defined by some other process before continuing. Recursion with parallel composition permits dynamic process creation.
    If a program that uses only parallel and choice composition and definition variables does not have adequate efficiency, ...
    We use the following steps in the introduction of mutables and sequencing into a parallel block.
    1. We order the statements in a parallel block so that all variables that appear on the right-hand sides of definition statements reduce to ground values or tuples, and all guards reduce to the ground values true or false, give only the definitions established by statements earlier in the ordering. In other words, we order statements in the direction of data flow; statements that write a variable appear earlier than statements that read that variable. Then we convert the parallel block into a sequential block by replacing "||" by ";" retaining the data-flow order of statements.
    2. Next, we introduce mutables, add assignment statements to our program, and show that the mutable m has the same value as the definition variable x it is to replace, at every point in the program in which x is read - i.e., where x appears on the right-hand side of a definition statement or assignment or guard.
    3. Finally, we remove the definition variables that are replaced by mutables, secure in the knowledge that the mutables have the same value as the definition variables in the statements in which they are read. We must, of course, be sure that mutables shared by constituent blocks of a parallel block are not modified within the parallel block.
    Chandy and Taylor (1992)

    Decomposition

    Function decomposition

    Break down the task so that each worker performs a distinct function.

    Advantages
    Disadvantages

    Domain decomposition

    Divide the domain by the number of workers available.

    Communication and Synchronization

    Co-operation requires communication. Communication requires a protocol.

    Alternation and Competition

    Allocate time to multiple tasks.

    Correctness

    Partial correctness, Total correctness, satisfaction of specifications...

    Chandy & Taylor (1992) require

    1. Shared mutable variables remain constant during parallel composition.
    2. Mutable variables to copied when used in definitions.
    3. When defined, definition variables act as constants in assignment.
    Lewis (1993) develops a theory of program correctness called flow-correctness. Lewis requires for each shared variable:
    1. it must be defined before it is referenced,
    2. it must be referenced before it is updated, and
    3. only one process at a time may (re)define it.
    These rules apply only to the dependencies among variables and do not include either total correctness (termination) or logical correctness (satisfaction of specifications).

    Correctness issues in the design of concurrent programs fall in one of two categories: safety and liveness.

    Safety is related to the concept of a loop invariant. A program should produce the ``right'' answer. Liveness is related to the concept of a loop variant. A program is expected to make progress. Termination is an example of a liveness property when a program is expected to terminate.

    Network description

    Configuration

    Implementation

    Sequential Program Concurrent Program
    single process/thread multiple threads multiple processes
    Thread Name space 
     
    PC     ->
    Data  ->
    Heap ->
    Stack ->
     
    Code
    Global data
    Heap
    Stack
     
     
    Thread 1
    PC     ->
    Data   ->
    Heap  ->
    Stack1 ->
      . 
      . 
      . 
     
    Thread n
    PC     ->
    Data   ->
    Heap  ->
    Stackn ->
     
    Shared Space
    Code
    Global data
    Heap
    Individual Stacks
    Stack1







    .
    Stackn
     
     
    Processi Name spacei 
     
    PC     ->
    Data  ->
    Heap ->
    Stack ->
     
    Code
    Global data
    Heap
    Stack
     
     
     

    Historical Perspectives and Further Reading

    Related issues: Lazy evaluation vs Parallel execution; Backtracking vs Parallel execution

    Concurrency occurs in hardware when two or more operations overlap in time. Concurrency in hardware dates back to the 1950s when special-purpose processors were developed for controlling input/output devices. This permitted the overlapping of CPU instructions with I/O actions. For example, the execution of an I/O instruction no longer delayed the execution of the next instruction. The programmer was insulated from this concurrency by the operating system. The problems presented to the operating systems by this concurrency and the resulting solutions form the basis for constructs supporting concurrency in programming languages. Hardware signals called interrupts provided the synchronization between the CPU and the I/O devices.

    Other advances in hardware have lead to the the development of alternative architectures. Pipeline processors which fetch the next instruction while the first instruction is being decoded. Super scalar processors combine multiple pipelines to provide an even greater level of concurrency. Array processors provide a large number of identical processors that operate simultaneously on different parts of the same data structure. Data flow computers aim at extracting maximum concurrency from a computation by performing as much of the computation in parallel as possible. Connectionism based hardware models provide concurrency by modeling computation after the neural networks found in the brain.

    Interrupts together with a hardware clock made it possible to implement multiprogramming systems which are designed to maximize the utilization of the the computer systems resources (CPU, store, I/O devices) by running two or more jobs concurrently. When one job was performing I/O another job could be executing using the CPU.

    Interrupts and the hardware clock also made possible the development of interactive systems where multiple users have simultaneous access to the system resources. Such a system must provide for a large number of jobs whose combined demands on the system may exceed the system resources. Various techniques of swapping and paging meet this need by moving jobs in and out of the store to the larger capacity of backing store devices. With the increase of jobs comes the need to increase the capacity of the CPU. The solution was to develop multiprocessor systems in which several CPUs are available and simultaneously operate on separate jobs in the shared store.

    An alternate solution is to develop distributed systems consisting of several complete computers (each containing both CPU and an associated store) that can operate independently but also communicate efficiently. Such systems of local area networks permit the efficient use of shared resources (such as printers and large backing store via file servers) and increase the computational throughput of the system.

    Andrews, Gregory R. and Olsson, Ronald A. (1993)
     The SR Programming Language, Benjamin/Cummings, Redwood City, CA.
    Ben-Ari, M. (1990)
     Principles of Concurrent and Distributed Programming, Prentice Hall International, Hemel Hempstead, Hertfordshire.
    Chandy, K. Mani and Taylor, Stephen (1992)
    An Introduction to Parallel Programming Jones and Bartlett, Boston.
    East, Ian. (1995)
    Parallel Processing with Communicating Process Architecture, UCL Press, London, England.
    Foster, I. (1996)
    Compositional Parallel Programming Languages TOPLAS Vol 18 No. 4 (July 1996): pp. 454-476.
    Hehner, Eric C. R. (1993)
    A Practical Theory of Programming Springer-Verlag, New York.
    Lewis, Ted G. (1993)
    Foundations of Parallel Programming: A Machine Independent Approach IEEE Computer Society Press, Los Alamitos, CA.
    Pacheco, Peter S. (1997)
    Parallel Programming with MPI Morgan Kaufmann Publishers Inc., San Francisco, CA.
    Watt, David A. (1990)
    Programming Language Concepts and Paradigms, Prentice-Hall International, Hemel Hempstead, Hertfordshire.

    Exercises

    For each of the following problems identify the potential for concurrent execution and the synchronization and communication requirements. Define appropriate safety and liveness invariants. Construct solutions using ...
    Author: A. Aaby