[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [smila-dev] Problems with BinStorage
|
Hi,
There was a discussion about BinStorage redesign some time ago, where
this problem was discussed too.
Discussion started here:
http://dev.eclipse.org/mhonarc/lists/smila-dev/msg00084.html
So I think BinStorage should be in process of redesign now..
Thanks,
Dmitry
Daniel.Stucky@xxxxxxxxxxx wrote:
Hi all,
we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.
I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.
Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.
BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.
Bye,
Daniel
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev