I have done further investigation and I'm really worried
about what we've created...
Here is the experiment I have performed:
-
load helios in its current format and force it to be rewritten using
the new serialization format (I have changed the code to force the
writing in the new format).
Here is a list of observations:
-
The new format is significantly bigger. For example, 42M vs 36M in XML
and 3.9M instead of 3.6M compressed.
-
Massive increase in memory consumption:
- No
sharing of the _expression_ object resulting from the parsing of the
match expressions in each requirement.
-
Unnecessary string pooling of the complete match _expression_ being loaded
-
No pooling of the parameters (the parameters contain the id for IUs,
packages, the versions, etc.).
- In
short, it does not scale.
That said, for the Helios release, given that none of the IUs
being generated by the publisher uses the new match expressions, and
given that the metadata writer persists things in the 3.4 format as
much as possible we will not be encountering these memory problems.
So my real question is: do we want to try to fix this new
serialization format for 3.6.0, or do we want to go out in the field
with this and define yet another format next year knowing that we will
still have to support the 3.6 format.
PaScaL
On 2010-05-08, at 2:25 AM, Thomas Hallgren wrote:
Hi,
I agree that the XML encoded string that represents an _expression_ is
ugly.
On 05/07/2010 06:32 PM, Pascal Rapicault wrote:
Hi,
While working on a solution to prevent RAP
and the IDE to be installed together (306709), I met the serialized
format of queries and I find that extremely unreadable (see example
below). On top of that I'm also questioning the ability for this format
to compress has good as before.
So my few questions are:
1) can we make this format more
readable ?
We can write it out as a CDATA element, i.e.
<requirement min='0' max='0' greedy='false'>
<match>
<![CDATA[providedCapabilities.exists(x |
x.name == $0&& x.namespace ==
$1&& x.version>= $2&& x.version< $3)]]>
</match>
<matchParameters>
<![CDATA[['org.eclipse.rap.rwt', 'org.eclipse.equinox.p2.iu',
version('1.0.0'), version('2.0.0')]]]>
</matchParameters>
</requirement>
What does the old parser to when it encounters elements that it doesn't
recognize? I know that attributes are ignored. Does that also apply to
elements?
2) does this compress as good as
before?
I can't see why not. It's all keywords, operators, and well known
entities.
3) is parsing as fast as before?
The QL parser is extremely fast so I don't think it's parsing will be
measurable. The XML parser is exposed to an attribute with a lot of
entities in it, but my guess is that it's very optimized to deal with
that. The only way to find out is to write performance tests. An easy
test would be to force the serializer to write everything in this
format.
- thomas
_______________________________________________
p2-dev mailing list
p2-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/p2-dev
_______________________________________________
p2-dev mailing list
p2-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/p2-dev