|
Re: General memory consumption of SMILA [message #644836 is a reply to message #644821] |
Tue, 14 December 2010 12:06 |
Eclipse User |
|
|
|
Originally posted by: juergen.schumacher.attensity.com
Am 14.12.2010, 11:45 Uhr, schrieb Martin <martin.roebert@gmx.de>:
> Hi, I got a question about how much memory SMILA needs do work proper
> and how the memory behaviour should be in general.
>
> We ran SMILA in several times with different options and the amount of
> memory which is used is somewhat... huge... From time to time SMILA even
> quits with a OutOfMemory-Exception - but I cannot reproduce his properly.
>
> ATM I have one crawl job with 200 listener threads running. After
> 21hours of crawling i have a heap size of 2.2GB, the used heap lies
> between 1.9 and 2.1GB. Is this a normal behaviour?
I'm not aware of any memory problems in any projects, but then I'm not
using the crawlers
currently. Also I don't have any other numbers for comparison. Of course
it depends very
much on how large the records are which you are pushing through the
system. Depending on
this 200 listener threads may be quite a lot on a single machine.
If you want to do further analysis, I think you will find the Eclipse
Memory Analyzer
[http://eclipse.org/mat/] helpful. It will show you quite easily where in
the systems
large amounts of memory are being referenced.
Regards,
Juergen.
|
|
|
|
Re: General memory consumption of SMILA [message #644897 is a reply to message #644840] |
Tue, 14 December 2010 16:09 |
Eclipse User |
|
|
|
Originally posted by: juergen.schumacher.attensity.com
Am 14.12.2010, 13:29 Uhr, schrieb Martin <martin.roebert@gmx.de>:
> Hi J=C3=BCrgen,
>
> thanks for the fast answer.
>
> I crawl a site (average size 500KB per page according to FireBug) on a=
=
> server with 4 cores (2,5GHz) and currently 4GB RAM. How many crawl job=
s =
> with how many threads are the top limit for this set up?
I cannot say for sure, depends on how computational intensive it is to
process the pages, but I would reckon that using about 10, maybe 20 work=
er
listener threads should be enough. And then you can use as much crawl jo=
bs
to always have some messages in the work queue - you can use the JMX =
monitoring
of ActiveMQ to see this. But that's just guessing from my side.
I just asked a colleague who once did some tests and he used 16 listener=
=
threads.
He tried up to 64 threads, but it didn't improve anything.
> Have any of your colleagues tested SMILA regarding the consumption of =
=
> memory?
We did tests to ensure that there are no memory leaks once (and didn't f=
ind
any, of course ;-), but I don't think that we yet measure memory =
consumption
parameters.
Cheers,
Juergen
|
|
|
|
Re: General memory consumption of SMILA [message #645462 is a reply to message #645420] |
Fri, 17 December 2010 10:11 |
Eclipse User |
|
|
|
Originally posted by: juergen.schumacher.attensity.com
Am 17.12.2010, 09:33 Uhr, schrieb Martin <martin.roebert@gmx.de>:
> Hi Juergen,
>
>
> as you recommended I used MAT to look over a heap dump of SMILA.
>
> I got the following message:
>
> One instance of "
> org.eclipse.smila.connectivity.framework.crawler.web.WebSite Iterator "
> loaded by "...crawler.web" occupies 1,759,810,632 (87.58%) bytes. The
> memory is accumulated in one instance of "java.util.HashMap$Entry[]"
>
> I uploaded a picture of a part of the report at ImageShack, so you can
> have a look: http://img232.imageshack.us/img232/8839/matscreen.png
>
> Is my assumption correct, that this Map contains the links of my
> crawled/parsed web page that should be crawled in the future?
> Some facts to my web.xml: It contains 15 seeds and does a deep search:
> <CrawlingModel Value="5" Type="MaxDepth"></CrawlingModel>
> <CrawlScope Type="Host"></CrawlScope>
>
> Any hints on that?
No, not really, I do not know the crawlers code very much. Maybe someone
else can comment better.
But it sounds like a memory leak to me. Could you please create a Bugzilla
issue so we could track
this (though I have to admit that our time is rather limited currently...).
Thanks,
Juergen.
|
|
|
|
Powered by
FUDForum. Page generated in 0.04245 seconds