[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[platform-releng-dev] ACTION required: failing performance tests
|
We are having performance test failures since several builds.
I therefore did a quick analysis of some of the failing performance tests,
and came to the conclusion that some tests are failing because they are
too fragile.
Quite a few of the tests fail because the measured time (CPU_TIME,
ELAPSED_TIME)
is too close to the granularity of the timing mechanism of the
underlying platform.
Here is an example:
The Javadoc comment of System.currentTimeMillis() states:
/**
* Returns the current time in milliseconds. Note that
* while the unit of time of the return value is a millisecond,
* the granularity of the value depends on the underlying
* operating system and may be larger. For example, many
* operating systems measure time in units of tens of
* milliseconds.
*/
So if you are measuring an operation that always takes exactly 100ms,
you might get values in the range 90-110ms due to the granularity of
the underlying timing mechanism.
You can see a good example for this behavior here:
http://fullmoon.rtp.raleigh.ibm.com/downloads/performance/win/graphs/org.eclipse.swt.tests.junit.performance.Test_org_eclipse_swt_graphics_GC.test_drawStringLjava_lang_StringIIZ()_%20transparent_CPU%20Time.html
The assertPerformance(...) method of the performance plugin asserts
that measured
values don't become slower than 10% of the reference value.
In our example the test would fail if the value becomes larger than
110ms (for a reference value of 100ms).
However, 110ms is still in the range of possible results from a 100ms
operation.
To fix this issue I suggest to modify all performance tests that
produce timing results in the 100ms range
so that they "run longer".
A good lower bound is 1 second.
The important thing here is to "do more work" inside the start/stop
bracket of the performance meter
and not to put only the start/stop bracket into a loop.
So this example doesn't help:
for (int i= 0; i < 10000; i++) {
pm.start();
my_100_microsecond_operation();
pm.stop();
}
because it is possible that the time difference between the calls to
start() and stop() is 0 or flips between two values. And taking the average
of 10000 zeroes doesn't result in significantly better values.
So do this instead:
pm.start();
for (int i= 0; i < 10000; i++) {
my_100_microsecond_operation();
}
pm.stop();
And if you want to make the performance framework automatically
calculate the average and standard deviation
over 10 runs, you can use two nested loops:
for (int n= 0; n < 10; n++) {
pm.start();
for (int i= 0; i < 10000; i++) {
my_100_microsecond_operation();
}
pm.stop();
}
Therefore please make a pass over your (failing) performance tests
and make them more robust where necessary.
Yes, most likely the modified tests will fail even more (because you
made them slower).
But this shouldn't be a problem because we'll regenerate the
reference data weekly,
and it will use the modified tests as well. So those test failures
should disappear
after every Friday.
Thanks for your attention.
--andre