hi Daniel,
i like that u think
about making things faster there…
I also was wondering about
switching the DI checking to a batch oriented process as u did with the router
and listeners. ATM each record is checked and added singly to the DB and I could
imagine that doing sets of N would be faster.
but I don’t know if the
code and hence can't tell if it is feasible
Kind regards
Thomas Menzel @ brox IT-Solutions GmbH
From: smila-dev-bounces@xxxxxxxxxxx
[mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Daniel.Stucky@xxxxxxxxxxx
Sent: Montag, 17. August 2009 11:27
To: smila-dev@xxxxxxxxxxx
Subject: [smila-dev] CrawlerController - ConnectivityManager interaction
Hi all,
I am a little unsatisfied with the way
CrawlerController interacts with ConnectivityManager and the internal Router.
As the API is designed, the CrawlerController gets feedback for each invocation
of method add(), and internally the ConnectivityManager gets feedback for
each invocation of route().
Adding records to the Storages via
Blackboard may be a time consuming operation and we have to wait for it to be
completed, before we can insert a message to the Queue. This is currently done
within a simple loop and all callers are blocked until all records were added
(or tried to be added but failed) to the Queue and the return value by each
method is generated.
Do we really need the return values in
method add() and route() ? I think we should strive for a more
asynchronous processing of incoming records in ConnectivityManager to increase
throughput. I don’t think that we need this kind of feedback for clients of
ConnectivityManager. Errors on single records are still logged in
ConnectivityManager and could also be made available (to some extend) via JMX.
Another option could be to use
multithreading in the CrawlerController (currently there is only one thread),
but that could make crawler implementations more difficult.
Any thoughts or comments ?
Bye,
Daniel