[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [equinox-dev] interning strings in RegistryCacheReader
|
There are two notions of interning at
play here. There is String.intern() and then there is the uniquification(tm)
of objects read from the registry. You will note that in readCachedString()
there is a case for the value coming in from the stream being an index.
This allows for uniquification of Strings within the registry itself.
The net result is that on second run, all strings written and read
using the cached version of the method are written/read once. There
is then a separate choice as to whether or not that string should also
be interned using String.intern().
When you look through the objects using
YourKit, see if you can compare identity to see if the strings that appear
as duplicates, really are duplicates. In many cases I would expect
up to about 3 copies of the string between ones read from files, ones in
constants etc. If you start seeing more than that, it gets interesting.
More generally, we are very much interested
in ideas for how to improve this model. IMHO it was flawed from the
outset but we did not notice until it was too late. 3.0 maintains
soft references to configuration elements so at least some of this stuff
goes away but in many cases (in the UI in particular), people keep pointers
to the registry structure. This greatly inhibits the runtime's ability
to manage the space. Our challenge now is to a) come up with a better
model and b) introduce it in a compatible way.
Please (please!) contribute ideas if
you have them.
Jeff
"Ed Burnette"
<Ed.Burnette@xxxxxxx>
Sent by: equinox-dev-admin@xxxxxxxxxxx
07/21/2004 12:42 PM
Please respond to
equinox-dev |
|
To
| <equinox-dev@xxxxxxxxxxx>
|
cc
|
|
Subject
| [equinox-dev] interning strings
in RegistryCacheReader |
|
I've been looking at Eclipse startup in YourKit
3.0 beta and about half of the memory used is taken up with Strings. I
looked at the strings and the same strings are repeated over and over again,
for example "org.eclipse.ui.defaultAcceleratorConfiguration".
I traced this back to the org.eclipse.core.internal.registry.RegistryCacheReader
class. It has two methods, readString() and readCachedString() which take
an 'intern' boolean parameter that would cause String.intern() to be called
on the strings, thus eliminating the dups. Only a few callers pass true
to the functions though.
http://www.eclipse.org/eclipse/development/performance/bloopers.html
talks about this a little and it says "On some JVM implementations
the performance of intern() degrades dramatically. Interning the registry
strings eagerly and early seeds the intern() table increasing the collision
rate". This makes it sound like at some point in the past, somebody
tried using intern() all the time and didn't like the results. Can
anybody shed some light on the design decision not to use intern() and
whether or not this caveat is still true?