[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[eclipselink-dev] Some ideas for EclipseLink 3.0.0
|
Hello,
I'm not sure if this is the correct place for this kind of mail - if
not, please let me know where to post it instead.
We're working with EclipseLink for a long time and I'd like to share
with you some thoughts of our work with it and
our proposals for the upcoming EclipseLink release 3.0.0.
I'm sorry that it got a little bit long but hopefully it will be of some
help for you.
Most of them are performance or correctness related and I'm quite
confident that a significant number of users might benefit from the
following improvements:
1. Implicitly use (OUTER) JOIN FETCH for EAGERly loaded ManyToOne
references
If an entity contains a ManyToOne reference with FetchType.EAGER
(default) or when the corresponding weaving technique is disabled,
EclipseLink issues a separate (sub-)query for each of those references
when retrieving it.
In case of complex hierarchies of those ManyToOne references, this may
result in a dramatic performance drop as the fetch of
one simple query might cause many additional queries and round trips to
the database.
IMHO, EclipseLink should implicitly use an (OUTER) JOIN FETCH for
fetching EAGER references
(maybe as a configurable option of the Session/EntityManager).
This should affect all queries irrespective of the source of such a
query (e.g. find(), transparent indirection containers, Query).
To avoid too big / deep JOIN FETCH queries, there should be a limit
(ideally configurable) on how many references (or levels)
should be considered for implicit JOIN FETCH queries.
We had scenarios where the introduction of a JOIN FETCH increased
single-threaded throughput by factors of 5-10.
I would love to see that kind of "tuning" to be automatically done in
EclipseLink.
(BTW, LAZY loaded references can only help here in cases where the
ManyToOne reference is actually NOT accessed, otherwise the
performance impact matches the one for eagerly fetched ManyToOne
references).
2. Enable a way to provide query hints to transparent indirection
containers
Currently, I'm not aware of a way to provide query hints for transparent
indirection containers like IndirectSet.
This prevents us to provide performance related query hints like
join/batch-fetches or FetchSize
(ideally with the option of specifying them globally, e.g. use fetchSize
of xyz for all indirect container loads)
3. Improve sequence caching
Using sequence caching is a good idea in many scenarios.
However, there's still room of improvement:
- Introduce a configuration option to share the sequence cache across
transactions. Currently, it looks like each transaction has its own
sequence cache.
This lowers the cache efficiency (and saves unnecessary database round
trips) for transactions with only a few persists.
In most cases, especially where sequence values are retrieved
from a globally synchronized database sequence generator, there's no
reason why the sequence cache should not be shared across
transactions/client sessions.
- Expose the sequence cache to an official API so that non-JPA code can
also benefit from EclipseLink sequence caching,
hencing improving its efficiency even more.
4. Make EntityManager.getReference() database roundtrip free
Currently, there is a little difference in behavior between
EntityManager.find() and EntityManager.getReference().
Both issue database calls which is not required for the latter according
the JPA spec. This lowers the efficiency of getReference().
Imagine that you just want to link an entity (with a known PK) to a
newly persisted one - getReference() would be perfect for that job
without the need of fetching anything from the database (e.g. by
returning a proxy).
Other JPA implementations do better here (Hibernate for example).
5. Add support for retrieving detached entities within transactions
Many entities retrieved by a JPA provider are not changed at all.
Often, this is known at development time. If EclipseLink would support
some kind of hint (e.g. READ_DETACHED)
to return a detached entity (even within a transaction), significant
amount of work and memory for building and managing the backup clone
could be saved.
Initially, I thought the QueryHints.READ_ONLY would be exactly what I
need, however, according to the documentation
it can only be used when using the shared-cache and for
non-transactional queries only. Both doesn't apply in our use case.
6. Avoid StackOverflowError for certain entity models
Imagine a single entity which represents some kind of linked list
element. Each record/entity references to a previous record to define
the chain
(by using an EAGERly fetched ManyToOne reference).
The very first element has a null reference to the previous record.
Even though this kind of data structure is perfectly valid, EclipseLink
has some issues with it:
When the "chain" grows, EclipseLink will sooner or later throw an
StackOverflowError.
The reason for this is that eagerly fetched references are processed
using a recursion-based approach.
For each element, the stack increases, and we had cases where a
30-element chain already caused a StackOverflowError with default stack
size
settings in an enterprise-grade linux environment. I'm not aware of any
workaround
(beside using a lazy previous reference or increasing stack size) to
avoid that issue.
7. Exploit parallelism of CPU bound tasks
Some tasks within EclipseLink are quite CPU intense, e.g. change
tracking calculations or creating backup clones in large transactions.
Throughput can be significantly increased in certain scenarios if
Eclipselink would exploit parallelism of such tasks by using multiple
threads
(should be configurable).
8. Weaving: Eliminate need for a backup clone in certain scenarios
This is something more experimental:
For entities with a significant amount of mappings, the backup clone
adds significantly CPU and memory overhead.
In certain scenarios where no or only little fields changed and where
weaving is used, alternative methods might perform much better here,
e.g. by storing the original values in the original clone before
changing them
(that the original clone contains both database and changed values).
9. Address bug reports that affect correctness
Correctness should be the most crucial feature for any ORM.
Please have a look at corresponding bug reports that affect correctness,
e.g.:
349477 (42 votes)
391279 (35 votes)
371743 (16 votes)
247662 (15 votes)
416837 (12 votes)
467470 (12 votes)
416837
10. Care about startup time
EclipseLink takes (relatively) long to startup when having large
persistence units and/or classpaths.
Most of the time is spent within I/O operations which can be avoided in
many scenarios (maybe configurable).
In certain short-living EntityManagerFactory scenarios (e.g.
unit/automated testing) it does matter significantly
whether EclipseLink needs 1 or 3 seconds to startup (please also take a
look at bug 352845).
11. Address open bug reports
Currently, there are ~155 open, unresolved and unassigned
critical/blocker bug reports for EclipseLink in the bugzilla.
Same is true for feature requests with a significant number of votes.
I'm sure the community would appreciate if they finally get some
response for some of them.
Finally, I'd like to thank all involved developers/companies for their
great work related to EclipseLink!
Regards,
Patric