Hi Tim, Thanks very much for all the data. In-between other duties (Akbar's defence, a PhD student that I was examining, and so on) I have been mulling over it and drawing conclusions. I'm still not sure how to interpret it. I'll give you my analysis, and you might from that draw some conclusions about whether there is other stuff you might discover from doing other different analyses. ====== My analysis Every "base gene name" (e.g. chanceOfNull, valuePoolActivityBitset) corresponds to some code in the Nighthawk system. Basically, I want to figure out which of those bits of code I can safely comment out. There are really only 10 base gene names: candidateBitSet chanceOfNull chanceOfTrue lowerBound methodWeight numberOfCalls numberOfValuePools numberOfValues upperBound valuePoolActivityBitSet For every base gene name, I searched all the .out files for genes with that base name. I ordered the lines thus found by merit and then again by rank. I got the 20 lines with the best (lowest) rank and the 20 lines with the best (highest) merit. The output is in the attached PostScript file. Here are my conclusions, gene by gene. - candidateBitSet: Rank is often 1 to 4, several times merit is over 0.3. It appears several times in your "interestings90" set. So, it is often very important. This indicates that changing this gene often affects the score. Therefore it should be kept. There is one of these genes for each parameter position of each method of each class. If a bit in the bitset is set, this means that the parameter can be drawn from a value pool of the corresponding class. In other words, these genes control what actual types the values passed for those parameters are. There is also one of these genes for the result of each method of each class. If a bit in the bitset is set, this means that the result can be put in a value pool of the corresponding class. It's a bit surprising that this gene is so important, since many parameter positions have only one class that can go in them. - chanceOfNull: Rank is never lower than 24, and doesn't appear in your "interestings" sets until "interesting50". So, it is not very important, in fact probably the least important. I'll try deleting this one. There is also one of these genes for every parameter position that can take a non-primitive type, of every method of every class. What it controls is how often a "null" is chosen to be passed as the parameter. The fact that the gene is not important doesn't indicate that null should never be passed; it just indicates that the default value (3% chance) usually doesn't need to be changed. - chanceOfTrue: Rank is several times between 1 and 10, though usually not. However, the times at which the rank is low, the merit is not very high at all, indicating that it has a "good" rank only because nothing else is higher. I have a feeling the good rankings are spurious and only the result of chance. But I'm not confident enough to delete the gene. This gene controls the chance of choosing "true" vs. "false" for a parameter of type boolean. - lowerBound, upperBound: These are often important, especially the upper bounds (rank often between 1 and 4 for the various Hash classes). This may indicate that it's important to be able to set these, though perhaps that just means the defaults are not good. There is one of these for every value pool of every numeric primitive type (fixed-point or floating-point). It chooses the maximum and minimum value that can be seeded in the pool. Most of the important ones seem to be for the "float" type, and for the Hash classes. The only parameter that this affects is the "load factor" for the hash table, which in turn determines how often the hash table is rehashed. That's probably why it's important (whenever it rehashes, it executes some code that wouldn't be executed otherwise). - methodWeight: Surprisingly, this is never very high in either rank or merit. There is one of these genes for each method, and it controls how often the method gets chosen. I'm not comfortable about deleting it at first, though, since the exploratory study showed that it could be important. - numberOfCalls: This also surprised me by how infrequently it was high in rank or merit. This might just indicate that the default value for the number of calls was not bad. This controls the number of calls overall made in the randomly-constructed test case. The default value is 5 times the number of methods. - numberOfValuePools: This was never higher than rank 14, never higher than 0.186 in merit, and doesn't appear in your "interestings" list until "interesting60". So, probably not very important. I'll try deleting this one too. There is one of these genes for each type (both primitive type and class), and it controls the number of value pools that exists for the type. The lack of "interestingness" indicates that I chose the default values correctly (2 value pools for each type). - numberOfValues: This has low rank a few times, but never lower than 5. It never has higher merit than 0.16. However, I'm still not confident enough to delete it. It seems to fall into the same grey area as chanceOfTrue, methodWeight and numberOfCalls, because of the rank being low a few times. There is one of these genes for each value pool of each type. It determines how many values are in the value pool. The initial number is 30 values for integer-like and float-like types, 2 for String (or the number of "seed strings" that exist), and 2 for every other class (non-primitive type). - valuePoolActivityBitSet: Often very important. Often has rank between 1 and 10, and merit over 0.2. Appears several times in "interesting90" and many times in "interesting70". Competes with candidateBitSet for the most important kind of gene. There is one of these genes for each candidate type for each parameter position for each method for each class. It determines which value pools values for that parameter are chosen from. Thus, between it and candidateBitSet, it basically determines how methods are "plugged together" -- which results from which method calls are re-used as parameters and receivers of other method calls. This is some confirmation for our intuition that the "value reuse policy" is important. ===== Conclusions I will try modifying the system in this way: - Fix the chance of null for parameters of non-primitive types at 3%, and comment out all references to chanceOfNull genes. - Fix the number of value pools for all types at 2, and comment out all references to numberofValuePools genes. I will then re-run everything, keeping track of timing and also the final scores. cheers --Jamie.