Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] GDELT - Results incomplete?

Thanks for detailed explanation of the table design. But I assume this knowledge can´t solve my problem.
Some more details what I want to do:
I imported my 10GB file in PostGIS and a count return around 35M records.
For comparing the systems I loaded the same file to Geomesa twice. Now the strange thing happens: On my first try there were around 30M entries in accumulo, but on my second try there were only 6M! For the same file!
There is some data loss in here. Maybe a problem with failing mappers?

Marcel.

Am 03.08.2015 19:44, schrieb Jim Hughes:
Hi Marcel,

Yes, Filter.INCLUDE should return all the entries. There are a few quick reasons you should not expect to see a direct match-up between the number of entries and the number of entries in Accumulo.

First, in order to better support different query patterns, GeoMesa uses a number of different table structures. The original spatio-temporal table has two entries per SimpleFeature. If you ingested 5.4M records, you should see 10.8M records in a table ending in st_idx.

We introduced a new table design using a composite space filling curve. That table should end in z3. If the SimpleFeatureType's has a date field and the default geometry is a point, then there should be one entry in the Z3 table per record.

To support attribute-base queries, there is an attribute table (ending in attr_idx). For each attribute indexed, there should be 0 or 1 records per record ingested. (If I recall, features which have a null value for an indexed attribute are not indexed.)

Finally, there is a records table which contains a copy of each feature. This table is used in conjunction with the attribute table.

I hope that helps explain what's going on. I'd suggest checking out the various table sizes (available on the 'Master Server' click from the left of the Accumulo monitor page).

Cheers,

Jim

On 08/03/2015 05:46 AM, Marcel wrote:
Hello,
it´s me again. I created a 10GB GDELT file and ingested it to accumulo. Looking at the web console, it estimates that there are around 30M entries. Writing a Query with a Filter.INCLUDE statement should return all of my events, right? When calling the size() method the output is only 5.4M. What could be the cause for this?
Is there a problem with empty values?

Thanks.
Marcel Jacob.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users



Back to the top