[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [geomesa-users] GDELT - Results incomplete?
|
Thanks for detailed explanation of the table design. But I assume this
knowledge can´t solve my problem.
Some more details what I want to do:
I imported my 10GB file in PostGIS and a count return around 35M records.
For comparing the systems I loaded the same file to Geomesa twice. Now
the strange thing happens:
On my first try there were around 30M entries in accumulo, but on my
second try there were only 6M! For the same file!
There is some data loss in here. Maybe a problem with failing mappers?
Marcel.
Am 03.08.2015 19:44, schrieb Jim Hughes:
Hi Marcel,
Yes, Filter.INCLUDE should return all the entries. There are a few
quick reasons you should not expect to see a direct match-up between
the number of entries and the number of entries in Accumulo.
First, in order to better support different query patterns, GeoMesa
uses a number of different table structures. The original
spatio-temporal table has two entries per SimpleFeature. If you
ingested 5.4M records, you should see 10.8M records in a table ending
in st_idx.
We introduced a new table design using a composite space filling
curve. That table should end in z3. If the SimpleFeatureType's has a
date field and the default geometry is a point, then there should be
one entry in the Z3 table per record.
To support attribute-base queries, there is an attribute table (ending
in attr_idx). For each attribute indexed, there should be 0 or 1
records per record ingested. (If I recall, features which have a null
value for an indexed attribute are not indexed.)
Finally, there is a records table which contains a copy of each
feature. This table is used in conjunction with the attribute table.
I hope that helps explain what's going on. I'd suggest checking out
the various table sizes (available on the 'Master Server' click from
the left of the Accumulo monitor page).
Cheers,
Jim
On 08/03/2015 05:46 AM, Marcel wrote:
Hello,
it´s me again. I created a 10GB GDELT file and ingested it to
accumulo. Looking at the web console, it estimates that there are
around 30M entries.
Writing a Query with a Filter.INCLUDE statement should return all of
my events, right? When calling the size() method the output is only
5.4M. What could be the cause for this?
Is there a problem with empty values?
Thanks.
Marcel Jacob.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users