Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geowave-dev] Split query results in chunks

Marcel,

I would tackle this problem in one of two ways:
(1) An Accumulo Iterator/Combiner.  GeoWave uses this concept with Statistics such as Count.
(2) A RDD.  I have been remiss on completing the complete Spark offering.  It is more of examples, than concrete classes.  I will try to wrap that up this week. You can use it by inspecting the kNN branch.  It adds a new analytics/spark sub project.  It uses the HadoopRDD and the GeoWaveInputFormat.   The INput Format still uses a Hadoop Job Context, so there is some extra functions to configure the Query and Data Store parameters (e.g. Zookeeper, user, password, instance and namespace).


On Tue, Nov 10, 2015 at 6:22 AM, Marcel Jacob <m.jacob@xxxxxxxxxxx> wrote:
Hello,
I wrote a query which needs a group by statement. Since this keyword is
not supported in GeoWave I use Spark.
This is fine for small datasize like 1 til 3 GB. However if I change to
10 GB there is not enough heap space to answer the query and I can´t
give more heap space to my mini cluster.

Iterator<SimpleFeature> intermediateResults;

This is the iterator for my intermediate results. Unfortunately the
.remove() method is not supported. So I thought chunking up the results
should save me space. A SimpleFeature is not serializable so I have to
encapsulate it in a custom object for use with Spark. Like so:

while (intermediateresults.hasNext()) {
sf = intermediateresults.next();
countryCode1 = String.valueOf(sf.getAttribute("Actor1CountryCode"));
countryCode2 = String.valueOf(sf.getAttribute("Actor2CountryCode"));
actorCountryList.add(new CountryNames(
countryCode1,
countryCode2));
}

CountryNames are serializable. This loop is my bottleneck, which causes
the error, because it is one the client node. I added a counter and each
1 million results I process spark results and clear my list. Afterwards
I merge my results to the final one. But this causes the same error, so
memory could not released. So I think the ITERATOR is the main-problem
here. Is there another way for chunking? Or do you have an idea what
else I could try?

Best regards,
Marcel Jacob.
_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev


Back to the top