Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Key/Index construction question.

One other note on accumulo keys:

An accumulo key is composed of a row, a column family and, a column qualifier (and a timestamp and visibility, which I won't discuss here). A row is a logical structure, in reality accumulo is a key->value store. In GeoMesa, we split up the key between the different parts. So in the below code, where we have 'only' 6 rows, there may in actuality be many more keys and corresponding entries (broken up by column family and column qualifier).

Thanks,

Emilio

On 09/18/2015 02:47 PM, Chris Eichelberger wrote:
Moises,

Good question!  The good news is that there is nothing special about how
the keys are being constructed; the interesting part is in how GeoMesa
decides which keys should be constructed...

(Apologies in advance if, in the course of lecturing, I tell you things
you already know.)

The first point to remember is that each Geohash index-entry represents
a cell.  For 35-bit Geohashes, each cell is no more than ~150 meters
square.  A 0-bit (degenerate) Geohash is the entire surface of the
(flat) Earth.  Each bit of precision you add to a Geohash halves exactly
one of its dimensions (when zero-based, even bits halve longitude; odd
bits halve latitude).

Whenever you are indexing data that contain only single-point
geometries, there will be one index-key per record, because every point
will fall inside exactly one Geohash cell.  (Each Geohash cell in
GeoMesa includes its minimum X and minimum Y values, but excludes its
maximum X and maximum Y extents.)

Whenever you are indexing non-point geometries -- line strings;
polygons; etc. -- you have a problem:  How do you create a single
index-entry for a geometry that can cross multiple cell boundaries?  If
you only index the vertices, you lose information about the fact that
the geometry covers the space between them.  There are typically two
approaches to solving this problem:

1.  You can encode a single entry that represents the minimum-bounding
cell description that contains your geometry; or

2.  you can decompose your geometry into covering cells, at potentially
heterogeneous resolutions (different sizes), and index each of those
separately (and then de-duplicate results at query time so that each
feature appears no more than once in any given results set).

GeoMesa takes approach #2 (for now; we're experimenting with other ways
to do this).  This is how the polygon you quote, with a large number of
points, can be decomposed into just a few covering cells; each of those
covering cells receives its own index key.  I've attached an image to
this email that shows how a polygon and a line-string can be decomposed.
In practice, we do not allow non-point geometries to be decomposed into
so many covering Geohashes.  Here is the reference to the code in
GeoMesa where this decomposition is called:

https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/STIndexEntry.scala#L49

Please note that, with the advent of the new Z3 index, we will be
revisiting this scheme.  The Z3 index is much faster than the old
Geohash-based index, but does not yet support non-point geometries, so
it's a great opportunity for us to improve that feature.

I hope this addressed some of your questions; if not, or if you think of
new ones, please just let us know.

Thanks!

Sincerely,
  -- Chris


On Fri, 2015-09-18 at 14:14 -0400, Moises Baly wrote:
Hi there:


I've come across some tests in the project in my quest to understand
how indexes work and how is the index partitioned in Accumulo's Key
(what goes where, and how is constructed.


val dummyType =
SimpleFeatureTypes.createType("DummyType",s"foo:String,bar:Geometry,baz:Date,$DEFAULT_GEOMETRY_PROPERTY_NAME:Geometry,$DEFAULT_DTG_PROPERTY_NAME:Date,$DEFAULT_DTG_END_PROPERTY_NAME:Date")
  val customType =
SimpleFeatureTypes.createType("DummyType",s"foo:String,bar:Geometry,baz:Date,*the_geom:Geometry,dt_start:Date,$DEFAULT_DTG_END_PROPERTY_NAME:Date")
  customType.setDtgField("dt_start")
  val dummyEncoder = SimpleFeatureSerializers(dummyType,
SerializationType.AVRO)
  val customEncoder = SimpleFeatureSerializers(customType,
SerializationType.AVRO)
  val dummyIndexValueEncoder = IndexValueEncoder(dummyType)`
  val geometryFactory = new GeometryFactory(new PrecisionModel, 4326)
  val now = new DateTime().toDate

  val Apr_23_2001 = new DateTime(2001, 4, 23, 12, 5, 0,
DateTimeZone.forID("UTC")).toDate

  val schemaEncoding = "%~#s%feature#cstr%99#r::%~#s%0,4#gh::%~#s%
4,3#gh%#id"

  val index = IndexSchema.buildKeyEncoder(dummyType, schemaEncoding)
 val line : Geometry = WKTUtils.read("LINESTRING(-78.5000092574703
38.0272986617359,-78.5000196719491 38.0272519798381,-78.5000300864205
38.0272190279085,-78.5000370293904 38.0271853867342,-78.5000439723542
38.027151748305,-78.5000509153117 38.027118112621,-78.5000578582629
38.0270844741902,-78.5000648011924 38.0270329867966,-78.5000648011781
38.0270165108316,-78.5000682379314 38.026999348366,-78.5000752155953
38.026982185898,-78.5000786870602 38.0269657099304,-78.5000856300045
38.0269492339602,-78.5000891014656 38.0269327579921,-78.5000960444045
38.0269162820211,-78.5001064588197 38.0269004925451,-78.5001134017528
38.0268847030715,-78.50012381616 38.0268689135938,-78.5001307590877
38.0268538106175,-78.5001411734882 38.0268387076367,-78.5001550593595
38.0268236046505,-78.5001654737524 38.0268091881659,-78.5001758881429
38.0267954581791,-78.5001897740009 38.0267810416871,-78.50059593303
38.0263663951609,-78.5007972751677 38.0261625038609)")
      val item = AvroSimpleFeatureFactory.buildAvroFeature(dummyType,
List("TEST_LINE", line, now, line, now, now), "TEST_LINE")
      val toWrite = new FeatureToWrite(item, "", dummyEncoder,
dummyIndexValueEncoder)
      val indexEntries = index.encode(toWrite).toList
      indexEntries.size must equalTo(1)
      indexEntries.head.size() mustEqual(6)
      val cf = new
Text(indexEntries.head.getUpdates.get(0).getColumnFamily)
      val cq = new
Text(indexEntries.head.getUpdates.get(0).getColumnQualifier)
      val keyStr = cf + "::" + cq val line : Geometry =
WKTUtils.read("LINESTRING(-78.5000092574703
38.0272986617359,-78.5000196719491 38.0272519798381,-78.5000300864205
38.0272190279085,-78.5000370293904 38.0271853867342,-78.5000439723542
38.027151748305,-78.5000509153117 38.027118112621,-78.5000578582629
38.0270844741902,-78.5000648011924 38.0270329867966,-78.5000648011781
38.0270165108316,-78.5000682379314 38.026999348366,-78.5000752155953
38.026982185898,-78.5000786870602 38.0269657099304,-78.5000856300045
38.0269492339602,-78.5000891014656 38.0269327579921,-78.5000960444045
38.0269162820211,-78.5001064588197 38.0269004925451,-78.5001134017528
38.0268847030715,-78.50012381616 38.0268689135938,-78.5001307590877
38.0268538106175,-78.5001411734882 38.0268387076367,-78.5001550593595
38.0268236046505,-78.5001654737524 38.0268091881659,-78.5001758881429
38.0267954581791,-78.5001897740009 38.0267810416871,-78.50059593303
38.0263663951609,-78.5007972751677 38.0261625038609)")
      val item = AvroSimpleFeatureFactory.buildAvroFeature(dummyType,
List("TEST_LINE", line, now, line, now, now), "TEST_LINE")
      val toWrite = new FeatureToWrite(item, "", dummyEncoder,
dummyIndexValueEncoder)
      val indexEntries = index.encode(toWrite).toList
      indexEntries.size must equalTo(1)
      indexEntries.head.size() mustEqual(6)
      val cf = new
Text(indexEntries.head.getUpdates.get(0).getColumnFamily)
      val cq = new
Text(indexEntries.head.getUpdates.get(0).getColumnQualifier)
      val keyStr = cf + "::" + cq


How all those points in the Linestring translate to encoding only 6
rows in Accumulo? As far as I understand, the Key definition
(string :: gh :: gh + ID) should encode a single point correct? What
am I missing in the process here?


If somebody could walk me through this example with special attention
to how the key is being constructed it would be very much appreciated.


Thank you for your time


Moises


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

      

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top