DieHarder: A Random Number Test Suite

Version 2.6.24

Robert G. Brown (rgb)

At the suggestion of Linas Vepstas on the Gnu Scientific Library (GSL) list this GPL'd suite of random number tests will be named "DieHarder". Using a movie sequel pun for the name is a double tribute to George Marsaglia, whose "Diehard battery of tests" of random number generators has enjoyed years of enduring usefulness as a test suite.

The DieHarder suite is more than just the diehard tests cleaned up and given a pretty GPL'd source face in native C. Tests from the Statistical Test Suite (STS) developed by the National Institute for Standards and Technology (NIST) are being incorporated, as are new tests developed by rgb. Where possible or appropriate, all tests that can be parameterized ("cranked up") to where failure, at least, is unambiguous are so parameterized and controllable from the command line.

A further design goal is to provide some indication of why a generator fails a test, where such information can be extracted during the test process and placed in usable form. For example, the bit-distribution tests should (eventually) be able to display the actual histogram for the different bit ntuplets.

DieHarder is by design extensible. It is intended to be the "Swiss Army Knife of random number test suites", or if you prefer, "the last suite you'll ever ware" for testing random numbers.


DieHarder Download Area

The version numbers have the following meaning:

The single-tree dieharder sources (.tgz and .src.rpm) files can be downloaded from this directory. In addition, i386 binary rpm's built on top of Fedora Core 6 are present. Be warned: the GSL is a build dependency. The current packaging builds both the library and the dieharder UI from a single source rpm, or from running "make" in the toplevel directory of the source tarball. With a bit of effort (making a private rpm building tree), "make rpm" should work for you as well in this toplevel directory.

This project is under very active development. Considerable effort is being expended so that the suite will "run out of the box" to produce a reasonably understandable report for any given random number generator it supports via the "-a" flag, in addition to the ability to considerably vary most specific tests as applied to the generator. A brief synopsis of command options to get you started is presented below. In general, though, documentation (including this page, the man page, and built-in documentation) may lag the bleeding edge snapshot by a few days or more.

An rpm installation note from Court Shrock:

I was reading about your work on dieharder.  First, some info
about getting dieharder working in Gentoo:

cd ~
emerge rpm gsl
wget
http://www.phy.duke.edu/~rgb/General/dieharder/dieharder-0.6.11-1.i386.rpm
rpm -i --nodeps dieharder-0.6.11-1.i386.rpm

Rebuilding from tarball source should always work as well, and if you are planning to play a lot with the tool may be a desireable way to proceed as there are some documentation goodies in the ./doc subdirectory and the ./manual subdirectory of the source tarball (such as the original diehard test descriptions and the STS white paper).

Also, George Marsaglia retired from FSU in 1996, and diehard appears to have finally disappeared from FSU webspace. However, the diehard CDROM is still mirrored here in what appears to be an authorized copy. The source code of diehard itself is (of course) Copyright George Marsaglia but Marsaglia did not incorporate an explicit license into his code which muddles the issue of how and when it can be distributed, freely or otherwise. Existing diehard sources are not directly incorporated into dieharder in source form for that reason, to keep authorship and GPL licensing issues clear.

Note that the same is not true about data. Several of the diehard tests require that one use precomputed numbers as e.g. target mean, sigma for some test statistic. Obviously in these cases we use the same numbers as diehard so we get the same, or comparable, results. These numbers were all developed with support from Federal grants and have all been published in the literature, though, and should therefore be in the public domain as far as reuse in a program is concerned.

Note also that most of the diehard tests are modified in dieharder, usually in a way that should improve them. There are three improvements that were basically always made if possible.

In a few cases other variations are possible for specific tests. This should be noted in the built-in test documentation for that test where appropriate.

Aside from these major differences, note that the algorithms were independently written more or less from the test descriptions alone (sometimes illuminated by a look at the code implementations, but only to clear up just what was meant by the description). They may well do things in a different (but equally valid) order or using different (but ultimately equivalent) algorithms altogether and hence produce slightly different (but equally valid) results even when run on the same data with the same basic parameters. Then, there may be bugs in the code, which might have the same general effect. Finally, it is always possible that diehard implementations have bugs and can be in error. Your Mileage May Vary. Be Warned.


About DieHarder

The primary point of DieHarder (like Diehard before it) is to make it easy to time and test (pseudo)random number generators, both software and hardware, for a variety of purposes in research and cryptography. The tool is built entirely on top of the GSL's random number generator interface and uses a variety of other GSL tools (e.g. sort, erfc, incomplete gamma, distribution generators) in its operation. Five examples are provided of wrapping a random number generator (including both file I/O and the entropy-based /dev/random and /dev/urandom available on many linux systems) and inserting it so that it is can be called via the GSL interface. It is strongly suggested that any software random number generator to be tested by provided with such an GSL-compatible interface.

A file interface (still consistent with the GSL to the extent possible) has been added that allows random numbers in either binary unsigned integer or a variety of ascii encoded formats to be read in from a file. This permits dieharder to be used with "any" generator (directly wrappable or not) that can generate a table of random numbers, but this will place severe limits on some of the tests, which can require very large numbers of random numbers. For this reason software generators should be implemented directly if at all possible and not via the file interface.

In this respect, DieHarder differs significantly from Diehard, which used file based sources of random numbers exclusively and would "work" with only a few million random numbers in such a file. Modern random number generators in a typical simulation application can easily need to generate 10^18 or more random numbers, generated from hundreds, thousands, millions of different seeds, over months to years of accumlated run time and are therefore sensitive to weaknesses that might not be revealed by such short sequences even with excellent and sensitive tests.

This was, in part, the motivation for the development for the Statistical Test Suite by NIST, which focusses more on cryptographical strength (although the general testing methodology is much the same).

The development of DieHarder was motivated by the following, in rough order of importance:

Note well that the primary objections I have towards diehard and STS are not that they are or are not adequate, accurate and complete; it is that the code itself is not properly packaged for reuse, testing, and extension. Diehard is remarkably poorly documented (with one small paragraph of text describing each test, even very complex ones, and with no accompanying description of how certain important data used in the program were actually computed). STS, in contrast, is really nothing but its description in documentation with no readily available open source code for implementation. DieHarder will hopefully rectify both situations and be both well documented and available in clearly publically licensed code to make it extremely easy for anybody to test random numbers on any GSL-supported platform.

Although this tool is being developed on Linux/GCC-based platforms, it should port with no particular difficulty to other Unices, especially ones that support RPMs. No particular effort is being expended at this time to make it run on Windows based compute platforms (due to a lack of availability of such platforms and compilers to rgb) but there is no reason to think that such a port would be terribly difficult PROVIDED that the Gnu Scientific Library is installable under Windows.

Essential Usage Synopsis

If you compile the test or install the provided binary rpm's and run it as:

dieharder -a

it should run -a(ll) tests on the default GSL generator.

Choose alternative tests with -g number where

dieharder -g -1

will list all possible numbers known to the current snapshot of the DieHarder (mostly from the GSL).

dieharder -l

should list all the tests implemented in the current snapshop of DieHarder. Finally, the venerable and time tested:

dieharder -h

provides a Usage synopsis (which can quite long) and

man dieharder

is the (installed) man page, which may or many not be completely up to date as the suite is under active development. For developers, additional documentation is available in the toplevel directory or doc subdirectory of the source tree. Eventually, a complete DieHard manual in printable PDF form will be available both on this website and in /usr/share/doc/dieharder-*/.

List of Random Number Generators and Tests Available

List of GSL and user-defined random number generators that can be tested by DieHarder:

rgb@lilith|B:1344>dieharder
              Listing available built-in gsl-linked generators:           |
 Id Test Name           | Id Test Name           | Id Test Name           |
==========================================================================|
  0 borosh13            |  1 cmrg                |  2 coveyou             |
  3 fishman18           |  4 fishman20           |  5 fishman2x           |
  6 gfsr4               |  7 knuthran            |  8 knuthran2           |
  9 lecuyer21           | 10 minstd              | 11 mrg                 |
 12 mt19937             | 13 mt19937_1999        | 14 mt19937_1998        |
 15 r250                | 16 ran0                | 17 ran1                |
 18 ran2                | 19 ran3                | 20 rand                |
 21 rand48              | 22 random128-bsd       | 23 random128-glibc2    |
 24 random128-libc5     | 25 random256-bsd       | 26 random256-glibc2    |
 27 random256-libc5     | 28 random32-bsd        | 29 random32-glibc2     |
 30 random32-libc5      | 31 random64-bsd        | 32 random64-glibc2     |
 33 random64-libc5      | 34 random8-bsd         | 35 random8-glibc2      |
 36 random8-libc5       | 37 random-bsd          | 38 random-glibc2       |
 39 random-libc5        | 40 randu               | 41 ranf                |
 42 ranlux              | 43 ranlux389           | 44 ranlxd1             |
 45 ranlxd2             | 46 ranlxs0             | 47 ranlxs1             |
 48 ranlxs2             | 49 ranmar              | 50 slatec              |
 51 taus                | 52 taus2               | 53 taus113             |
 54 transputer          | 55 tt800               | 56 uni                 |
 57 uni32               | 58 vax                 | 59 waterman14          |
 60 zuf                 |
                   Listing available non-gsl generators:                  |
 Id Test Name           | Id Test Name           | Id Test Name           |
==========================================================================|
 61 /dev/random         | 62 /dev/urandom        | 63 empty               |
 64 file_input          | 65 file_input_raw      |

Note that the last five tests are examples of random number generators that have been wrapped up in GSL compatible clothes and linked to the GSL so that the standard GSL interface works for them. Any random number generator that one wishes to test can thus easily be added for testing using these as prototypes, and can likely be submitted to the GSL for inclusion if they pass the tests as well or better than the tests that are already there. That makes this a very convenient tool for testing new RNGs.

Note also that the last two non-gsl generators are "universal" generators in the sense that they permit you to input your OWN random number stream from a file (but NOT from /dev/random or /dev/urandom, be warned). The file_input generator requires a file of "cooked" (ascii readable) random numbers, one per line, with a header that describes the format to dieharder. This interface is still somewhat experimental -- not all ascii formats have been tested. However, it has been tested and should work for 32 bit unsigned integers represented directly in ascii or as 32 bits of binary. An example of the required header for these formats is given here:

#==================================================================
# generator mt19937_1999  seed = 1274511046
#==================================================================
type: u
count: 100000
numbit: 32
3129711816
  85411969
2545911541
 903839182
2564046000
1157728411
 202655667
 969286899
1519043834
... (for 100,000 rands total).
#==================================================================
# handmade.  Comments are ignored, obviously.
#==================================================================
type: b
count: 10
numbit: 32
00000000000000000000000000000001
00000000000000000000000000000010
00000000000000000000000000000011
00000000000000000000000000000100
00000000000000000000000000000101
00000000000000000000000000000110
00000000000000000000000000000111
00000000000000000000000000001000
11111111111111111111111111111111
11111111111111110000000000000000

(where the latter is clearly not very random).

The last type, file_input_raw, accepts a file of raw bits as input, such as might be generated by

 dd if=/dev/urandom of=testrands.raw bs=4 count=1000000
(to generate 1,000,000 four-byte ints directly from the software-augmented kernel entropy generator). That is, running the tests from such a file should be approximately the same as testing /dev/urandom directly.

The main (important!) difference is that some of the test require a lot of random numbers -- far more than were needed by diehard. Indeed, dieharder runs many of the diehard tests 100 independent times, generating a p-value for each, and plots a histogram of the p-values and generates a p-value for the (presumed uniform) distribution of p-values! This approach mimics the histogram presented in the STS suite but augments it with a hard Kolmogorov-Smirnov p-value that describes the distribution of p itself in many independent test runs!

This protects one somewhat from the "p happens" problem described by Marsaglia -- every now and then you will have a run with a very low p from a good generator, but overall a good generator will generate a uniform distribution of p-values. Dieharder lets you visually decide if the distribution is or isn't credibly uniform, while giving you an index that in most cases is a fairly clear "good" or "bad" indicator for a given random sequence or generator. Direct control over the number of samples used in the computation of the KS p-value (as well as other important test parameters) permit one to "crank up" the generator to clarify what appears to be a marginal level of success or failure based on a few separate runs.

File input rands are delivered to the tests on demand, but if the test needs more than are available it simply rewinds the file and cycles through it again, and again, and again as needed. Obviously this significantly reduces the sample space and can lead to completely incorrect results for the p-value histograms unless there are enough rands to run EACH test without repetition (it is harmless to reuse the sequence for different tests). Let the user beware!

List of the CURRENT fully implemented tests (as of the 07/12/06 snapshot):

rgb@lilith|B:1346>dieharder -l

                     DieHarder Test Suite
========================================================================
The following tests are available and will be run when diehard -a is
invoked.  Special options or suggested parameters are indicated if
they are needed to get a satisfactory result (such as completion in a
reasonable amount of time).

            Diehard Tests
   -d 1  Diehard Birthdays test
   -d 2  Diehard Overlapping Permutations test
   -d 3  Diehard 32x32 Binary Rank test
   -d 4  Diehard 6x8 Binary Rank test
   -d 5  Diehard Bitstream test
   -d 6  Diehard OPSO test
   -d 7  Diehard OQSO test
   -d 8  Diehard DNA test
   -d 9  Diehard Count the 1s (stream) test
   -d 10 Diehard Count the 1s (byte) test
   -d 11 Diehard Parking Lot test
   -d 12 Diehard Minimum Distance (2D Spheres) test
   -d 13 Diehard 3D Spheres (minimum distance) test
   -d 14 Diehard Squeeze test
   -d 15 Diehard Sums test
   -d 16 Diehard Runs test
   -d 17 Diehard Craps test
   -d 18 Marsaglia and Tsang GCD test

             RGB Tests
   -r 1 Bit Persist test
   -r 2 Bit Ntuple Distribution test suite (-n ntuple for 1-8)
   -r 3 Timing test (times rng)

      Statistical Test Suite (STS)
   -s 1 STS Monobit test
   -s 2 STS Runs test

            User Tests
   -u 1 User Template (Lagged Sum Test)

Note that the design goal of completely encapsulating diehard is COMPLETED with all tests apparently functional as of 7/12/06. dieharder is now in a "beta" debugging/testing phase until the new code shakes out, but it produces what are for the most part very reasonable and consistent values for all the tests on known "good" or "bad" random number generators encapsulated in the GSL.

Full descriptions of the tests are available (as you can see) from within the tool and source documentation. All tests are completely and independently rewritten from their description alone, and may be functionally modified or extended relative to the original source code published in the originating suite. The author (rgb) bears complete responsibility for these changes, subject to the standard GPL code disclaimer (in essence, yes it's my fault if they don't work but using the tool is at your own risk and you can fix it if it bothers you and/or I don't fix it first).

Development Notes

All tests are encapsulated to be as standard as possible in the way they compute p-values from single statistics or from vectors of statistics, and in the way they implement the underlying KS and chisq tests. Diehard is now complete in dieharder, and attention will turn towards implementing more selected tests from the STS. I also have my eye on the as-yet unimplemented tests from Knuth's The Art of Programming, lagged correlation, and more bitwise tests that have occurred to me as I ported diehard (which does some things somewhat backwards or indirectly, IMO).

Thoughts for the Future/Wish List/To Do

Conclusions

I hope that even during its development, you find dieharder useful. Remember, it is fully open source, so you can freely modify and redistribute the code according to the rules laid out in the Gnu Public License (version 2b), which might cost you as much as a beer one day. In particular, you can easily add random number generators using the provided examples as templates, or you can add tests of your own by copying the general layout of the existing tests (working toward a p-value per run, cumulating (say) 100 runs, and turning the resulting KS test into an overall p-value). Best of all, you can look inside the code and see how the tests work, which may inspire you to create a new test -- or a new generator that can pass a test.

To conclude, if you have any interest in participating in the development of dieharder, be sure to let me know, especially if you have decent C coding skills (including familiarity with Subversion and the GSL) and a basic knowledge of statistics. I even have documents to help with the latter, if you have the programming skills and want to LEARN statistics. Bug reports or suggestions are also welcome.

Submit bug reports, etc. to

rgb at phy dot duke dot edu