the problem is remain
I put the geomesa-accumulo-distributed-runtime jar inside in main accumulo lib directory
[g.rinchin@netris-cassandra-stage60-04 lib]$ pwd
/opt/accumulo/lib
[g.rinchin@netris-cassandra-stage60-04 lib]$ ls | grep geomesa
geomesa-accumulo-distributed-runtime_2.12-3.2.2.jar
after this i can correctly load class org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
root@accumulo> setiter -t examples.runners -p 10 -scan -minc -majc -n decStats -class org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
Combiners apply reduce functions to multiple versions of values with otherwise equal keys
----------> set StatsCombiner parameter all, set to true to apply Combiner to every column, otherwise leave blank. if true, columns option will be ignored.:
I recreate namespace like
root@accumulo> deletenamespace -f myNamespace
root@accumulo> createnamespace myNamespace
root@accumulo> grant NameSpace.CREATE_TABLE -ns myNamespace -u root
root@accumulo> config -ns myNamespace -s table.classpath.context=myNamespace
then run an application and it create geomesa tables put
my myNamespace.geomesa_stats table config
root@accumulo> config -t myNamespace.geomesa_stats
-----------+-------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SCOPE | NAME | VALUE
-----------+-------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
default | table.balancer ............................................ | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
default | table.bloom.enabled ....................................... | false
default | table.bloom.error.rate .................................... | 0.5%
default | table.bloom.hash.type ..................................... | murmur
default | table.bloom.key.functor ................................... | org.apache.accumulo.core.file.keyfunctor.RowFunctor
default | table.bloom.load.threshold ................................ | 1
default | table.bloom.size .......................................... | 1048576
default | table.cache.block.enable .................................. | false
default | table.cache.index.enable .................................. | true
default | table.classpath.context ................................... |
namespace | @override .............................................. | myNamespace
default | table.compaction.major.everything.idle .................... | 1h
default | table.compaction.major.ratio .............................. | 3
default | table.compaction.minor.idle ............................... | 5m
default | table.compaction.minor.logs.threshold ..................... | 3
default | table.compaction.minor.merge.file.size.max ................ | 0
table | table.constraint.1 ........................................ | org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
default | table.durability .......................................... | sync
default | table.failures.ignore ..................................... | false
default | table.file.blocksize ...................................... | 0B
default | table.file.compress.blocksize ............................. | 100K
default | table.file.compress.blocksize.index ....................... | 128K
default | table.file.compress.type .................................. | gz
default | table.file.max ............................................ | 15
default | table.file.replication .................................... | 0
default | table.file.summary.maxSize ................................ | 256K
default | table.file.type ........................................... | rf
default | table.formatter ........................................... | org.apache.accumulo.core.util.format.DefaultFormatter
default | table.groups.enabled ...................................... |
default | table.interepreter ........................................ | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
table | table.iterator.majc.stats-combiner ........................ | 10,org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
table | table.iterator.majc.stats-combiner.opt.all ................ | true
table | table.iterator.majc.stats-combiner.opt.sep ................ | ~
table | table.iterator.majc.stats-combiner.opt.sft-SignalBuilder .. | *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
table | table.iterator.majc.vers .................................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table | table.iterator.majc.vers.opt.maxVersions .................. | 1
table | table.iterator.minc.stats-combiner ........................ | 10,org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
table | table.iterator.minc.stats-combiner.opt.all ................ | true
table | table.iterator.minc.stats-combiner.opt.sep ................ | ~
table | table.iterator.minc.stats-combiner.opt.sft-SignalBuilder .. | *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
table | table.iterator.minc.vers .................................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table | table.iterator.minc.vers.opt.maxVersions .................. | 1
table | table.iterator.scan.stats-combiner ........................ | 10,org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
table | table.iterator.scan.stats-combiner.opt.all ................ | true
table | table.iterator.scan.stats-combiner.opt.sep ................ | ~
table | table.iterator.scan.stats-combiner.opt.sft-SignalBuilder .. | *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
table | table.iterator.scan.vers .................................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table | table.iterator.scan.vers.opt.maxVersions .................. | 1
default | table.majc.compaction.strategy ............................ | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
default | table.replication ......................................... | false
default | table.sampler ............................................. |
default | table.scan.dispatcher ..................................... | org.apache.accumulo.core.spi.scan.SimpleScanDispatcher
default | table.scan.max.memory ..................................... | 512K
default | table.security.scan.visibility.default .................... |
default | table.split.endrow.size.max ............................... | 10K
default | table.split.threshold ..................................... | 1G
default | table.suspend.duration .................................... | 0s
default | table.walog.enabled ....................................... | true
-----------+-------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
root@accumulo>
but the statistics is still not correctly gathered for the first iteration
I put 1000 geocoordinates and stats count by cam it returns
✘ ~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
Estimated count: 1000
~/bin/geomesa-accumulo_2.12-3.2.2/bin
~/bin/geomesa-accumulo_2.12-3.2.2/bin
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 866
866 - is last batch of events saved from code
code log
16.02.2022 12:16:21.199 INFO [pool-3-thread-4] r.netris.gps.sampler.GeoEventSampler - Saved 866 of 866 events
the next added events is correctly added to count stats
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 1866
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 2866
~/bin/geomesa-accumulo_2.12-3.2.2/bin
~/bin/geomesa-accumulo_2.12-3.2.2/bin
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 2866
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 3866
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 4866
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 5866
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 6866
To solve this in code I do next first I write the first event then all others events something like
private Integer writeDataInternalTest(List<GeoEvent> events) throws IOException {
if (events == null || events.isEmpty()) {
return 0;
}
int count = 0;
GeoEvent firstEvent = events.remove(0);
try (FeatureWriter<SimpleFeatureType, SimpleFeature> writer = dataStore.getFeatureWriterAppend(
SimpleFeatureUtils.TYPE.getTypeName(), Transaction.AUTO_COMMIT)) {
SimpleFeature feature = SimpleFeatureUtils.toSimpleFeature(firstEvent);
String event_id = feature.getID();
if (!event_id.contains(firstEvent.getCam())) {
log.info("event not contain camId");
}
SimpleFeature toWrite = writer.next();
toWrite.setAttributes(feature.getAttributes());
toWrite.getUserData().put(Hints.PROVIDED_FID, event_id);
toWrite.getUserData().putAll(feature.getUserData());
writer.write();
count++;
log.info("Event id = {}, for event = {}", event_id, firstEvent);
} catch (Exception e) {
log.error("Geomesa write error", e);
}
try (FeatureWriter<SimpleFeatureType, SimpleFeature> writer = dataStore.getFeatureWriterAppend(
SimpleFeatureUtils.TYPE.getTypeName(), Transaction.AUTO_COMMIT)) {
for (GeoEvent event : events) {
SimpleFeature feature = SimpleFeatureUtils.toSimpleFeature(event);
String event_id = feature.getID();
if (!event_id.contains(event.getCam())) {
log.info("event not contain camId");
}
SimpleFeature toWrite = writer.next();
toWrite.setAttributes(feature.getAttributes());
toWrite.getUserData().put(Hints.PROVIDED_FID, event_id);
toWrite.getUserData().putAll(feature.getUserData());
writer.write();
count++;
log.info("Event id = {}, for event = {}", event_id, event);
}
} catch (Exception e) {
log.error("Geomesa write error", e);
}
return count;
}
Then the statistics for putting the first 1000 geoevents is 999
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 999
But still if I run stats-analyze it reset the count by cam to 0
✘ ~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-analyze -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
INFO Running stat analysis for feature type SignalBuilder...
INFO Stats analyzed:
Total features: 1000
Bounds for geo: [ 37.598174, 55.736823, 37.681424, 55.820073 ] cardinality: 981
Bounds for time: [ 2022-02-27T08:26:42.000Z to 2022-02-27T09:00:00.000Z ] cardinality: 973
Bounds for cam: [ 0000c1fe-a727-4a86-9eee-5b99d21038ea to 0000c1fe-a727-4a86-9eee-5b99d21038ea ] cardinality: 1
INFO Use 'stats-histogram', 'stats-top-k' or 'stats-count' commands for more details
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 0
Thanks.