Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[platform-releng-dev] ACTION required: failing performance tests

We are having performance test failures since several builds.
I therefore did a quick analysis of some of the failing performance tests,
and came to the conclusion that some tests are failing because they are
too fragile.

Quite a few of the tests fail because the measured time (CPU_TIME, ELAPSED_TIME) is too close to the granularity of the timing mechanism of the underlying platform.

Here is an example:

The Javadoc comment of System.currentTimeMillis() states:
    /**
     * Returns the current time in milliseconds.  Note that
     * while the unit of time of the return value is a millisecond,
     * the granularity of the value depends on the underlying
     * operating system and may be larger.  For example, many
     * operating systems measure time in units of tens of
     * milliseconds.
     */

So if you are measuring an operation that always takes exactly 100ms,
you might get values in the range 90-110ms due to the granularity of
the underlying timing mechanism.

You can see a good example for this behavior here:

http://fullmoon.rtp.raleigh.ibm.com/downloads/performance/win/graphs/org.eclipse.swt.tests.junit.performance.Test_org_eclipse_swt_graphics_GC.test_drawStringLjava_lang_StringIIZ()_%20transparent_CPU%20Time.html

The assertPerformance(...) method of the performance plugin asserts that measured
values don't become slower than 10% of the reference value.
In our example the test would fail if the value becomes larger than 110ms (for a reference value of 100ms). However, 110ms is still in the range of possible results from a 100ms operation.


To fix this issue I suggest to modify all performance tests that produce timing results in the 100ms range
so that they "run longer".

A good lower bound is 1 second.

The important thing here is to "do more work" inside the start/stop bracket of the performance meter
and not to put only the start/stop bracket into a loop.


So this example doesn't help:

	for (int i= 0; i < 10000; i++) {
		pm.start();
		my_100_microsecond_operation();
		pm.stop();
	}

because it is possible that the time difference between the calls to
start() and stop() is 0 or flips between two values. And taking the average
of 10000 zeroes doesn't result in significantly better values.


So do this instead:

	pm.start();
	for (int i= 0; i < 10000; i++) {
		my_100_microsecond_operation();
	}
	pm.stop();

And if you want to make the performance framework automatically calculate the average and standard deviation
over 10 runs, you can use two nested loops:

	for (int n= 0; n < 10; n++) {
		pm.start();
		for (int i= 0; i < 10000; i++) {
			my_100_microsecond_operation();
		}
		pm.stop();
	}

Therefore please make a pass over your (failing) performance tests
and make them more robust where necessary.

Yes, most likely the modified tests will fail even more (because you made them slower). But this shouldn't be a problem because we'll regenerate the reference data weekly, and it will use the modified tests as well. So those test failures should disappear
after every Friday.


Thanks for your attention.
--andre


Back to the top