[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [recommenders-dev] [Jayes] Jayes and Apache Spark Integration
|
Hi Ekin,
There is no integration with Spark that I know of. The used algorithm is basically message passing, so could be distributed, and also parallelized to a degree. But you would have to rewrite a lot if not most of the code.
But there are a few caveats:
- nodes have different sizes, central nodes with many parents will take up most of the space, so distributing the nodes may not actually achieve anything depending on your model (code recommender models, at the time I wrote Jayes, had that). Neither for space nor for time. Of course in other cases this may work.
- double precision values are used in the computations, and lots of multiplication happens, meaning they can underflow which results in a NumericalInstabilityException. Larger models are more at risk because more small values get multiplied. I would expect a Spark implementation should somehow deal better with that (use a different number type or something) to support the large models it is meant for. Not all big models will necessarily have that problem, but it's something to be aware of.
The design decisions for Jayes:
Trade space for performance (things are cached etc.)
Exact inference
One inference at a time (JunctionTreeAlgorithm is not thread-safe)
Also regarding memory leak, I would be surprised if there was one, there are not too many places where something could leak, and memory consumption was tested. On the other hand, feel free to prove me wrong ;-)
The error looks like the model is just too big.
The first place you should look is your model - can it be simplified? Are there indepence relations that you can somehow exploit?
Hope this helps a little. If you decide to try to port Jayes to Spark, I would be interested in hearing about the results and your experience.
Regards, Michael