[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
RE: [equinox-dev] interning strings in RegistryCacheReader
|
Title: Message
There
must be a lot of uniquification going on because when I changed every call to
set the intern flag to true it only saved a small amount. YourKit 3.0 beta says
there were 4182 fewer String objects for a total savings of 388K (which
left 189,331 String objects for a total of 18M). So... interning in
RegistryCacheReader may not be worth it.
With
the tool I can only see the first 500 strings values, but without the intern
change I can eyeball dozens of duplicates of these strings near the
top:
org.eclipse.ui.defaultAcceleratorConfiguration
org.eclipse.ui.contexts.dialogAndWindow
org.eclipse.ui.textEditorScope
org.eclipse.jdt.ui.javaEditorScope
Looking at this small sample, maybe uniquification is not working for
KeySequenceBindingDefinition's.
(I wish this tool
would do a report of duplicate Strings; maybe they will accept an enhancement
request.)
Looking at the memory dump more closely, the
org.eclipse.core package was only responsible for 39% of the memory usage on
startup. jdt was 32%, pde 13%, ui 5%, osgi 3%, jface 2%, and swt
1%.
Most of core's memory is in the indexing package
which is at 29% (11M just in core.internal.indexing.Buffer), resources at 7%
(ResourceInfo, ProjectInfo), dtree at
5% (DeltaDataTree).
Most of jdt is in the corext TypeInfo array at 19%, and
related things like JavaModelCache 6%. The jdt ui only takes 4%, mostly in the
javaeditor ASTProvider.
Most of pde is in the Plugin extensions vector at 10%.
244 Plugin classes took 5.3M. Most of that (8% of total) was in the
PluginElement attributes Hashtable, 4% (1.8M) in the value strings for
PluginAttribute.
This is just the live objects once startup is done. I
didn't look at garbage collected objects.
I don't know if this is info you already have or not
but I hope it's helpful. I just started looking at this to try and figure out
why a) my workbench takes so long to come up, and b) why it is so slow doing
garbage collection when I haven't used it in a while no matter what memory
options I use.
There are two notions of interning at play here.
There is String.intern() and then there is the uniquification(tm) of
objects read from the registry. You will note that in readCachedString()
there is a case for the value coming in from the stream being an index.
This allows for uniquification of Strings within the registry itself.
The net result is that on second run, all strings written and read using
the cached version of the method are written/read once. There is then a
separate choice as to whether or not that string should also be interned using
String.intern().
When you
look through the objects using YourKit, see if you can compare identity to see
if the strings that appear as duplicates, really are duplicates. In many
cases I would expect up to about 3 copies of the string between ones read from
files, ones in constants etc. If you start seeing more than that, it
gets interesting.
More generally,
we are very much interested in ideas for how to improve this model. IMHO
it was flawed from the outset but we did not notice until it was too late.
3.0 maintains soft references to configuration elements so at least some
of this stuff goes away but in many cases (in the UI in particular), people
keep pointers to the registry structure. This greatly inhibits the
runtime's ability to manage the space. Our challenge now is to a) come
up with a better model and b) introduce it in a compatible way.
Please (please!) contribute ideas if you
have them.
Jeff