[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [geowave-dev] Accumulo Key Structure - Storing Point Data
|
Rich,
thank you for your explanations. I made a new and simpler drawing
which belongs to the single tier strategy because I only have point
data. I hope this sketch is easier to understand and more precise
than the first one.
Assuming a precision of 2 for each dimension. According to the
TieredSFCIndexFactory class only one space filling curve is created
for point data. So for each tier there is only one SFC?, isn´t it?
This means that my SFC have to be comprehensive over all of my bins.
The hilbert-values 1-8 belongs to binId = 2000, value 9-16 (or do
they start again from 1?) to binId 2001 and so on.
Okay, the AccumuloRowId helped a lot, but the most interesting part
for me is in the insertionId. How would you decompose them further
(tier, bin als hilbert-value) if you want to analyze the indexId?
Thanks in advance,
Marcel Jacob.
Am 12.10.2015 21:50, schrieb Rich
Fecher:
Marcel,
I'm trying to understand the attachment and have really just
concluded that there seems to be some confusion about binning
that I can't quite pinpoint. I think there is some conflation
with binning in this diagram and hilbert values that just does
not exist. The bin is actually a basic concept that is
completely decoupled from the hilbert curve. Actually there is
a lot of discussion on space filling curves that could serve as
good background here: https://github.com/geotrellis/curve/issues/3#issuecomment-76588640
If you look at Rob's comment regarding "Unbounded
Dimensions" its a fairly accurate characterization of binning
as primarily concerned with bounding the unbounded dimension
using periodicity. Our default uses a year as the
periodicity, but we have the enum to easily allow day or month
to be used through index configuration ( https://github.com/ngageoint/geowave/blob/master/core/geotime/src/main/java/mil/nga/giat/geowave/core/geotime/index/dimension/TemporalBinningStrategy.java).
With gdelt data, you likely will want to continue to use
year. In the case of space and time bin ID's are really
straightforward because there is only one unbounded dimension
(time). When there are multiple unbounded dimensions then it
can become less clear. If you use the default index, your bin
ID will be the year. The hilbert value would be a 3D SFC
value with the time dimension bounded by the beginning and end
of that particular year. So 2010 would be the prefix in your
example and the SFC value would be based on January 1st (year
agnostic - ie. it would be much like a longitude value of
-180).
Does that make more sense?
It would be nice to provide an implementation of org.apache.accumulo.core.util.format.Formatter for
this so that the scans could be performed directly in the
accumulo shell and the keys could be nicely formatted and
human readable. We have that for the values as much as
possible by using this, but have nothing equivalent for
understanding the keys:
I just created the issue for that ( https://github.com/ngageoint/geowave/issues/528)
as its a fairly straightforward task for interested new
contributors, although requires some digging into
understanding the key structure, which may be a valuable way
to digest the concept.
Rich
_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev
|
Attachment:
geowave-index-singletier.pdf
Description: Adobe PDF document