POM2 Specification

This section defines the assumptions of POM2. There are three sources of these assumptions: (1) the literature reviewed by Port08; (2) the Boehm and Turner paper, (3) the COCOMO literature (used to determine, say, the effects on cost of increased reliability); (4) [Penharkar07] (used to determine team size); and (4) a set of values extrapolated from Port08 that handle larger projects. For example, a close reading of Port08 shows that that paper tacitly assumed small projects (about 10 developers per project with a maximum of 25 tasks). Here, we work with up to 300 developers so we expand the maximum number of tasks by 300/10*25=750.

We acknowledge that our conclusions are sensitive to the assumptions of POM2. However, in defense of our current assumption set, we note unlike many other publications in this area, at least we fully state our assumptions. We belief that other studies reporting (say) that agile development is always the best might be glossing over some of the assumptions of those studies.

Tasks, Trees, Heaps, and Projects

Port08 implemented a simulator of teams implementing requirements. For the sake of brevity, POM2 calls requirements ``tasks''.

A project is divided into teams and teams perform tasks. Each task has a value and a cost. As we shall see, these values may change over the lifetime of a task (using the mechanics discussed below).

The semantics of the tree is that all child-tasks must be completed before a parent class can start. We say that a task become "ready" when there does not exist a child class that isn't "ready". Initially, all leaf classes are "ready" since, because they have no sub-tasks, they have no sub-tasks that can block their start.

Initially, all teams are assigned different trees. After the trees are generated, links are added between trees from different teams such that one team must wait for another team to finish some tasks before they can finish there. Inter-team dependencies are generated as follows:

For all tasks at the same level in a tree, one dependency is added to and from another tree, at the same level (if it were otherwise, and dependencies could come from different levels, then dependency addition could result in unmanageable loops.
Low down in the tree near the leaves, there are many tasks so the addition of one to/from dependency has very little impact on the completion rate of the task. Higher in the tree, there are fewer tasks at one level so intra-team dependencies can significantly slow down task completion.

This method is used since it mimics a common structure seen in industry: teams often share large sub-systems (tasks that use many sub-tasks) rather than very small task assemblies.

The tree is generated as follows:

Nodes are assigned 2 to 10 kids, at random, biased by a Tparameter as follows:
```
kids = 2 + 8*rand()^-T
```
At large T values, the number of kids approaches 10 and a low T values, the number of kids approaches 2.
Aaron's ordering algorithm ????

In the usual case, the set of tasks in one tree are handled by one team. However, sometimes, the children of a task are completed by multiple teams. For example, in the above, there is a "dotted line between task l in tree2 and task d in tree1. This shows that team1's task "d" requires two sub-tasks from the same tree ("e" and "f") plus one task ("l") from another team implementing tree2 ("l")

As a project progresses, teams may discover that they have new tasks. We model this as follows. Suppose that the project size dictates we have, say, 500 tasks. We then build task trees with 500 tasks. At first, all tasks are marked as "hidden". Then, starting at the leaves and working up the project trees, we less than 500 tasks as "visible" (ensuring that all descendants of visible nodes are also visible). This means that above the visible tasks, hidden tasks may be lurking that the teams cannot yet see. As the project continues, we may make more parent classes "visible", at which point teams will "see" that they have new tasks to perform.

The above process handles inter-team dependencies. As we move further up the project tree, it becomes more likely that we will make "visible" a parent node that has children assigned to different teams. That is, it is very unlikely that tasks near the leaves will be completed by more than one team. However, as more tasks are completed, large sub-systems are built which, sooner or later, must be combined with sub-systems built by other teams.

Each node in the tree has N kids at probability 1/(d^(N+1)). By increasing "d", we increase the number of dependencies. For our sims, we'll use d = 2..10.

Iterations

Teams maintain a plan- a set of tasks taken from their heap.

Teams process their plans in interactions. The total number of interactions is 6. A probability 0.9, an project can end after iteration_i. This means that projects assigned 6 iterations only has a (0.9^6 = 53.2%) of lasting that long. This early termination rule models the industry reality that many projects are canceled before their originally planned end date.

The number of tasks that can be completed in each iteration is controlled by the budget. Following Port08, we say that the budget for an iteration is

budget = (total initial cost of all tasks) / num_iter

where "total initial cost" is computed from the costs of the plans, as they are known at the start of the project, after criticality is applied (see below). Note that this base budget will be adjusted to reflect the effects of using less skilled programmers (see below).

Planning Policies

At each iteration, teams copy some unfinished tasks from the heap to their plan. The teams may also reflect over the relative cost/value of the tasks in their plan and (possibly) reorder their tasks using a planning "policy". POM2 experiments with four policies- three of which are variants of agile programming while the fourth emulates traditional non-agile software development:

Plan-based : Plan-based policies are different to the other three methods. This policies model non-agile development methods where the order of the tasks are sorted once at the start of the development using the value/cost numbers assumed at the start of the project. The other policies reassess the sort order, at every iteration. The plan-based method, however, only adjusts task values at every iteration, "but does not resort them".
Ag= Agile: sort only on value. According to Cao08, this is standard method advocated by the agile community/
Ag2= Agile: sort on value/cost. According to Cockburn (personnel communication), this is another methods used in the agile community.
HY= Hybrid : This method works in two passes. In pass one, all unfinished tasks in the heap are sorted by their value/cost. The median value within this sort becomes the threshold value. The hybrid policy only completes tasks whose value/cost is greater than this threshold (and all other tasks are deferred to the next iteration). This hybrid policy was discovered by Port08 who showed, via simulation, that it performs better than other policies in most cases.

Performance Scores

At the end of an iteration, all uncompleted tasks are left in the plan are passed to the the next iteration. Also, all the tasks completed in that interation are (a) marked "not waiting" and (b) moved to a "done" set.

As each new task is added to "done", statistics are kept on the team's performance:

Sum of costs, so far, of completed tasks;
Sum of values, so far, of completed tasks;

By tracking (sum of costs,sum of values) on a (X,Y) plane, for X = 1 to N, we can visualize the time-varying performance of the team:

            ^                       KEY
            |              **       * from policy
total value |          ****
            |   ******
            | *
            |*
            |-----------------> total cost

When the project is finished, this plot can be compared to an optimal frontier obtained by sorting all the "done" tasks using the final value/cost for those tasks. Note that the sort used to generate the optimal frontier relies on information not available halfway through the simulation (specifically, the _final_ cost/value of each task). This is the sort order offered by an god-like being who knows the future cost/value of tasks. No method can do better than the optimal frontier.

If the project finishes after completing (say) 100 tasks, then we score each run as follows:

Take the final value/cost generated by the project at task 100 ;
Divide that number by the value/cost figure seen in optimal frontier at 100 tasks.

Assumptions from Boehm & Tuner

POM2 explores a set of options for five factors identified by Boehm and Turner that distinguished agile projects from traditional plan-based projects. This section summarizes Boehm and Turners descriptions of those factors, as well as other literature that explicates further details.

Criticality

Agile methods are untested on safety-critical products: they present potential difficulties with simple design and lack of documentation. Plan-based methods, on the other hand, evolved to handle highly critical products; hard to tailor down efficiently to low-criticality products.

Criticality is measured in terms of losses die to impact or defects and ranges from "none" (best for agile development) to "impact on discretionary funds" to "impact on essential funds" to "loss of single life" to "loss of many lives". Plan-based methods are best suited to projects that must be carefully planned, lest defects cause loss of many lives.

According to the COCOMO research, criticality effects cost as follows:

1 2 3 4 5

very low low nominal high very high

none impact on discretionary funds impact on essential funds single life many lives

cost'= 0.82 0.92 1.00 1.10 1.26

POM2 assumes that all tasks performed by one team are of equal criticality (this reflects the industrial reality that specialist teams work on particular portions of the code base). Once that criticality is known is known, then we adjust interation budget as follows:

budget' = budget / criticality

where budget comes from above and criticality is one of 0.82, 0.92, 1, 1.1, 1.26. That is, increasing reality means we can code less each iteration (since we need to take greater care with each line of code).

Dynamism

While agile methods work best for simple design and continuous refactoring in highly dynamic environments, they present a source of potentially expensive rework for highly stable environments.

Plan-based methods, on the other hand, are better suited for detailed plans and "big design up front". This approach is excellent for highly stable environments.

According to Boehm and Turner, dynamism is measured in terms of the percent of requirements changed each month and has the range 50% (best for agile) to 30 to 10 to 10 to 5 to 1 (best for planned-based).

According to Port08, dynamism effects tasks as follows:

Initially, we only mark 30 ≤ 40*N(0,1) ≤ to 70% of the tasks in the project tree as "visible". At each interation, for each team, we make visible new=Posss(lambda) more hidden tasks in the tree.
Another parameter controlling dynamism is sigma. At each iteration, we go to every uncompleted task (in the plan or in the project tree) and alter its value by
```
value'= value + (value*N(0,sigma))
```
Note that sigma and lambda are linked such that high sigma values implies high lambda values (and vice versa). Specifcally, after setting sigma, we use lambda = sigma/10.

Note that, as dynamism increases, we discover more new tasks and the value of the tasks becomes more variable.

Note also a special case of the above process. If the dynamism parameter tells a team of find "new" number of tasks, but there are not that many "visible" or "ready" tasks, then a team's plan may cost less to complete than the budget for that interation. In this case, a team can finish an iteration with leftover budget, and unfinished possible tasks to complete. If this happens, the budget is kept until next round. This is accomplished by having 2 numbers, attached to each team: totalAccumulatedBudget and totalSpentBudget. Budget is burned by setting the totalSpentBudget to the totalAccumulatedBudget. At the start of each iteration, totalAccumulatedBudget is incremented by the budget for the iteration.

Before continuing, we digress to discuss another method (which we do not use) for handling leftover budgets. We considered allowing a team to work on tasks from other teams. However, Brooke's Law (adding programmers to a late task makes it later) convinced us of the folly of that approach. Teams are specialists in the quirks and capabilities of their own code base. An outsider coming in to work temporarily on a small part of another teams's code base can be quite unproductive (since they do not know the quirks of that code). Further, they can slow up the remaining team (while they teach the newcomer the tricks of that code)

Culture

For a fully agile project, changing task value means resorting the tasks in the plan to ensure that the most cost-effective valuable task is completed first. However, as discussed in this section, the corporate culture may inhibit that resorting process.

Agile processes thrive in a culture where people feel comfortable and empowered by having many degrees of freedom and thrive on chaos. Plan-based methods, on the other hand, thrive in a culture where people feel comfortable and empowered by having their roles defined by clear policies and procedures. Personnel in plan-based projects thrive on order.

Culture is measured in terms of the percent of staff thriving on chaos and has the range 90% (best for agile) to 70 to 50 to 30 to 10 (best for plan-based). At culture=90%, the changes to task value described above are used when resorting tasks for the next iteration. However, at culture=10%, developers are loathed to change the initial project plan since this introduces a degree of disorder into their work life.

We therefore distinguish between the true value and the accepted of task, calculated as follows:

accepted = value + (value * N(0,sigma) * culture)

So, as the percent thriving on chaos decreases, "culture" drops to 0 and the accepted value remains as the old value. Note that the accepted value is used to resort (exception: not for the plan-based policy, which never resorts) the tasks but, when performance statistics are gathered, we use the true value.

Personnel

Agile and plan-based development require different kinds of personnel. Before discusses those differences, we introduce Cockburns's personal levels.

According to Cockburn, level 3 developers are able to revise a method, breaking its rules to fit an unprecedented new situation. Level 2 developers, on the other hand, prefer to tailor a method to fit a precedented new situation.

Moving down the scale, level 1A developers can, with training, perform discretionary method steps such as sizing stories to fit increments, composing patterns, compound refactoring, or complex COTS integration. Next on the scale are the level 1B developers. With training, these developers are able to perform procedural method steps such as coding a simple method, simple refactoring, following coding standards and CM procedures, or running tests.

According to Boehm and Turner, agile software development require continuous presence of a critical mass of scarce Cockburn Level 2 or 3 experts. In agile projects, it is risky to use non-agile Level 1B people.

Plan-based methods, on the other hand, need a critical mass of scarce Cockburn Level 2 and 3 experts during project definition, but can work with fewer later in the project.

Personnel is scored according the following table (from Boehm and Turner). In this table, agile methods are best suited to left-hand side projects while plan-based methods are best suited to right-hand side projects:

A % alpha-level programmers (level 2 and 3) 45 50 55 60 65

B % beta-level programmers (level 1a) 40 30 20 10 0

O % gamma-level programmers (level 1b) 15 20 25 30 35

SUM: 1*alpha + 1.2* beta + 1.6* gamma 118 119 120 121

Personnel effects the cost of tasks. After Port08, we say that the base of a task is a random variable from 0 to 100. Personnel skill changes this code. Personnel have different productivity scale given by the COCOMO [Chulani99] dimensions.

          vl    l    n       h      vh
acap,    , 1.42, 1.19, 1.00, 0.85, 0.71 ,
pcap,    , 1.34, 1.15, 1.00, 0.88, 0.76,

Assuming that vh,h,n,l,vl represents alpha, alpha, beta, gamma,gamma developers respectively, we can compute the cost implications of using different kinds of personnel.

Average the maximum of (vh,h) and (l,vl) results,
Expressing that result with the vl values,

This leads to values describing how using beta and gamma programmers slows down task completion: omega = 1.6; beta = 1.22; alpha = 1.

These values are used to adjust the budget available to an iteration as follows:

budget'' = budget'/(1.6*O + 1.22*B + A).

where budget' was calculate above and (O,B,A) comes from one column of the above table. That is, less can be completed in each iteration if a team comprises higher numbers of gamma and beta-programmers.

Note that the net effect of all the above is minimal. As shown in the "SUM:" row of the above table, the above calculations lead to adjustment factors ranging from 118 to 121. Hence, for the current study, we do not adjust the personnel.

Size

Agile methods are well matched to small products and teams. Their reliance on tacit knowledge limits scalability. Plan-driven projects, on the other hand, use methods evolved to handle large products and teams that are hard to tailor down to small projects.

Size is measured in terms of number of personnel and has the range 3 (best for agile), 10,30,100, 300 (best for plan-based). The size of a POM2 project is picked at random from this range and the number of tasks is then set to size * 25/10.

Once size is known, we build teams as follows. Originally, we planned to apply [Pehharkar07] results who report that the size of software development teams has (min,mean,sd)=(1,8,20). However this leads to a large number of single-person teams. We modified that result, slightly. In consultation with some of our NASA colleagues. POM2 selects team size randomly from the following distribition, until the total team size exceeds size. 1       ***
3       *********
6       *****************
9       ****************************
12     **********************
15     ******************
18     **************
21     **********
24     ******
27     ***
30     **
33     *
36     *
39     *
42     *

We using the following algorithm:
Size=300*rand()
t=0
while Size > 0
    t++
    team[t] = max(1, N(8,20))
    Size    = Size - team[t]

Each team of size team[t] gets allocated team[t]/Size * Tasks of the tasks.

	1	2	3	4	5
	very low	low	nominal	high	very high
	none	impact on discretionary funds	impact on essential funds	single life	many lives
cost'=	0.82	0.92	1.00	1.10	1.26

A	% alpha-level programmers (level 2 and 3)	45	50	55	60	65
B	% beta-level programmers (level 1a)	40	30	20	10	0
O	% gamma-level programmers (level 1b)	15	20	25	30	35
SUM:	1alpha + 1.2 beta + 1.6* gamma	118	119	120	121