Re: [jakartabatch-dev] "Reactive Batch" ideas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jakartabatch-dev] "Reactive Batch" ideas

From: Romain Manni-Bucau <rmannibucau@xxxxxxxxx>
Date: Wed, 24 Mar 2021 18:57:01 +0100
Delivered-to: jakartabatch-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/jakartabatch-dev/>
List-help: <mailto:jakartabatch-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/jakartabatch-dev>, <mailto:jakartabatch-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/jakartabatch-dev>, <mailto:jakartabatch-dev-request@eclipse.org?subject=unsubscribe>

Hi,

Without moving to the heaviness of the impl I will mention we already have standards in this field:

- rx

- microprofile-reactive (even if this one is neither adopted nor integrable)

- spring mono

- at some point apache camel (even if this one is more to be a long running instance but in terms of design it matches and is regularly used for batches)

What is important to see is that all these API - including java se CompletionStage - enable to define at least:

1. a flow thanks to a fluent API

2. a reactive model thanks to push/event like API

To answer your questions:

> 1. static, XML job definition

We can imagine it indeed since at the end it is about having a flow DSL, reactive or not, but I strongly think it is not needed and a bit against jakarta spirit since 1999 where all XML descriptors are slowly dropped from new specs because Java dev abandonned them in practise.

The other big advantage to not use that is to be type safe by construction and not with a maven plugin checking the job.xml and still failling at runtime because the data in the step/job context are not the expected ones.

If desired the user can make the flow configurable using jsonp/jsonb/jaxb/whatever fits its app config and not use a custom solution which must rely on system properties and not integrate properly with its env (thinking to k8s where the batch will just be a synchronous main with the state persistence but needs configmap/secrets support which is not built in in the spec).

> 2. programmatic, synchronous job definition

Being reactive does not mean being asynchronous, typically:

final var jobPromise = completedFuture(...);

jobPromise.get();

is actually synchronous because the impl of the code pushing the result is synchronous. What makes it reactive is the capacity to combine it with other steps which "react" to the completion of the previous step. Typical example is a NIO call ("when ready call next step") but if you previous call is immediately ready - or is not done async/in another thread - the "when ready" means "now".

> 3. programmatic, async / reactive definition

The nice thing about reactive support is that it unifies sync and async programming models.

The best for us would be to be able to rely on java 11 Flow interfaces since it would solve with a standard and java-se interoperable API the chunking too and batchlet would just be functions returning a CompletionStage.

To rephrase this part: once you have a reactive API you don't need a synchronous API.

A job definition is a supplier or function (depends if we inject configuration/let the user read the config from its own env and how we want to be able to wrap the CompletionStage to instrument them) of CompletionStage<BatchResult> which represents the end of the job.

In CDI land it can look like an observer - using the instrumentation from root instance which is the simplest probably in this sample but it is just to illustrate one API:

void defineMyBatch(@Observes BatchDefinitionCollector collector) {

collector.register("my-batch", root -> root

.thenApply(myCustomBean::readDataSinceLastTime) // will use @Inject EntityManager em; of the app

.thenApply(myOtherbean::process) // we don't care much as jbatch what it does but it is functionally a processor

.thenApply(this::toBatchResult));

}

If we want to be more explicit - extending default se API: it can look like:

// collector is the aggregator enabling to initialize the job repository, reverse pattern is to not initialize it and lookup the instance at usage but it means we can't list the available batches which is a blocker for admins/ops

void defineMyBatch(@Observes BatchDefinitionCollector collector) {

collector.register("my-batch", root -> root

.thenApply("step1", myCustomBean::readDataSinceLastTime) // will use @Inject EntityManager em; of the app and uses the app state (last execution time)

.thenCompose("step2", myOtherbean::process) // we don't care much as jbatch what it does but it is functionally a processor and signature can look like CompletionStage<X> process(List<Data> list);

.thenApply("step3", this::toBatchResult));

}

Indeed which bean do what can be configured, very concretely I can register a camel route being "process" method and even split it by component to map components/processors on steps and reflect in jbatch state tracking the full camel execution.

Gain is obvious:

1. works with modern/current technology stacks

2. integrates smoothly and optionally iteratively (in terms of adoptions) with any framework by not being a container anymore but a set of extension points (almost a library)

Indeed the drawback is that it makes us rethink the whole design but from what I saw adoption was very low when 1.0 was hit and it got almost abandonned at the same time of the microservices adoption due to the new programming style java adopted in the mean time so think it is worth evaluating this to propose to fulfill this scope - even if it is a new spec and jbatch 1.x API style moves to maintenance mode as CMP had been when JPA popped out.

Hope it is a bit clearer.

Le mer. 24 mars 2021 à 17:56, Reza Rahman <reza_rahman@xxxxxxxxx> a écrit :

I think the best approach here is an actual working implementation that has some adoption, ideally by a major vendor in the batch processing space. Until then, it feels far too premature to be talking about putting any of this into a specification. With the limited bandwidth we all have, it’s best we focus on some relatively modest enhancements to Jakarta Batch for now.

Reza Rahman
Jakarta EE Ambassador, Author, Blogger, Speaker

Please note views expressed here are my own as an individual community member and do not reflect the views of my employer.

On Mar 24, 2021, at 12:01 PM, Scott Kurz <skurz@xxxxxxxxxx> wrote:

Hi Romain,

Your idea to make batch more "reactive" seemed worth breaking out into a separate thread.

Thanks for offering to explain more. It's hard for me to comment since this isn't very concrete to me.

If I had to paraphrase, it sounds like you're talking about "rebasing" the API to take advantage of patterns made possible by Java SE developments like Java 8 Streams (java.util.stream.*) and Concurrency APIs CompletionStage/CompletableFuture), and lambdas when appropriate, that would also facilitate
async NIO and more "reactive" usages.

Would it be possible for you to point to an example of a "job" (batch-like flow) constructed as you were describing, but without the batch container-provided persistence and "controller" (providing stop) ?

I don't think I have a great mental model here but I wonder.. could we imagine a series of options like:

1. static, XML job definition
2. programmatic, synchronous job definition
3. programmatic, async / reactive definition

The first we have today; the second provides a similar flow between container and batch artifacts, and the third more aligns with your description.

If so, perhaps these all could fit together in one spec. But maybe premature to speculate.

But I think we need some sample code snippets (if not prototype) to give us something to focus on and discuss further?

Thanks,
------------------------------------------------------
Scott Kurz
WebSphere / Open Liberty Batch and Developer Experience
skurz@xxxxxxxxxx
--------------------------------------------------------

"jakartabatch-dev" <jakartabatch-dev-bounces@xxxxxxxxxxx> wrote on 03/23/2021 11:07:54 AM: > From: Romain Manni-Bucau <rmannibucau@xxxxxxxxx>
> To: jakartabatch developer discussions <jakartabatch-dev@xxxxxxxxxxx>
> Date: 03/23/2021 11:08 AM
> Subject: [EXTERNAL] Re: [jakartabatch-dev] Kick off conversation on > next Jakarta Batch release - Jakarta EE 10 ? Batch + CDI integration?
> Sent by: "jakartabatch-dev" <jakartabatch-dev-bounces@xxxxxxxxxxx>
> > Hi Scott,
> > Maybe coming from nowhere but personally i'd like jbatch 2 to not > inherit from jbatch 1 and be reactive driven.
> Overall idea is to be able to use more easily custom flows (by > combining CompletionStages).
> This has a ton of advantages:
> > 1. Makes the batch writer responsible of the flow (whereas a mix > between the runtime and writer),
> 2. Keep the composability of existing batch "components",
> 3. Enables to work with any IoC without having to do any integration > (batch writer injects what it needs),
> 4. No need of any batch/job repository (in code veritas est ;)),
> 5. Enables to be more efficient in more and more common "remoting > first" batches we meet with microservices (ie leverage NIO more than > in current synchronous API),
> 6. Makes the testing way easier: > myBatch.userStart().toCompletionFuture().get();
> > The main missing piece is the persistence of the steps+overall batch > state(s) and some controller (to stop a running batch for example).
> > Indeed JBatch 2 can be a BatchController { saveStepState(...); > saveBatchState(...) } etc but thinking more in terms of events > sounds more natural (we can start by CDI events since it is about > jakarta but implementations can wire events on what they desire).
> > To rephrase it a bit more clearly: JBatch wouldn't be a runtime with > very low value as of today but the actual infrastructure API > (persistence mainly and state "isBatch(key, STOPPED)").
> > In terms of API the only small challenges are to enable to inject > into a promise flow these persistent steps properly and to probably > enable to orchestrate more easily chunking and so (even if Stream > API kind of enables it but it misses some Iteration object to make > it easier) but overally it does not sound crazy:
> > final var batch = jbatch.startingBatch("my-batch");
> final var batchPromise = batch.wrapStep("step 1 name", fetchData())
> .thenApply(read -> batch.wrapStep("step 2 name", processData(read)));
> > and so on.

_______________________________________________
jakartabatch-dev mailing list
jakartabatch-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/jakartabatch-dev
_______________________________________________
jakartabatch-dev mailing list
jakartabatch-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/jakartabatch-dev

Follow-Ups:
- Re: [jakartabatch-dev] "Reactive Batch" ideas
  - From: Michael Minella

References:
- [jakartabatch-dev] "Reactive Batch" ideas
  - From: Scott Kurz
- Re: [jakartabatch-dev] "Reactive Batch" ideas
  - From: Reza Rahman

Prev by Date: Re: [jakartabatch-dev] "Reactive Batch" ideas
Next by Date: Re: [jakartabatch-dev] "Reactive Batch" ideas
Previous by thread: Re: [jakartabatch-dev] "Reactive Batch" ideas
Next by thread: Re: [jakartabatch-dev] "Reactive Batch" ideas
Index(es):
- Date
- Thread

Breadcrumbs