Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]

Good question.

https://wiki.eclipse.org/Triquetrum/FAQ#What_is_the_relationship_between_Triquetrum_and_Kepler.3F says

What is the relationship between Triquetrum and Kepler?

Ptolemy II is the execution engine used by both Triquetrum and the Kepler Scientific Workflow System. We see Triquetrum as a possible way forward for Kepler to use a more common build infrastructure and to otherwise leverage the Eclipse ecosystem. One of the goals of Triquetrum is for Triquetrum to be able to open at least some Kepler models.

In theory, there is nothing stopping Triquetrum from being able to open all the Kepler actors.

In practice, there are some limitations.

The first issue is that Kepler uses a compressed format for storing models and other files.  However, Eclipse can export models as .moml files, which can be read in by Triquetrum.  Some Kepler-specific annotations such as provenance would not be imported into Triquetrum until someone did the necessary conversion.

There could be UI issues in the dialogs.

Kepler uses SVG for icons.  The Triquetrum icons support a subset of SVG.

I'm not sure if we want to support all Kepler actors because not all Kepler actors are robust and useful. 

Next week,  I want to try getting some parts of Kepler working in Triquetrum.  My guess is that it will be fairly straightforward to get something to work.

Adding individual actors (Matlab, which is a Ptolemy II actor, or R, which is a Kepler actor) from Kepler to Triquetrum will probably not be that hard.  It is primarily a question of adding the Kepler classes to the classpath and updating the Triquetrum actor palette.  However, there will likely be complications.

If an actor uses the Ptolemy classes then it should work in Ptolemy, Kepler and Triquetrum.  Kepler actors that use other Kepler classes will probably work as well.  So, the Ptolemy engine does make actors fully reusable in Ptolemy, Kepler and Triquetrum.

_Christopher



On 5/24/16 9:08 AM, Michele Gabusi wrote:

Thank you,

great, I'll take a look soon.

Just a question: when you say that you aim to support Kepler, you mean that in future one may import a Kepler project and execute it just importing all the actors defined by Kepler as well (such as MapReduce, R, Matlab, etc.), or you aim to support Kepler's DDP engine?

Honestly it's not completely clear to me if the Ptolemy engine make the same actors fully reusable (out of the box) in Ptolemy, Triquetrum and Kepler.

Thanks again,

Cheers,

Michele.
Il 24/05/2016 17:07, Christopher Brooks ha scritto:
Hi Michele,
Right, I remember meeting you.  I've taken the liberty of cc'ing the Triquetrum mailing list.

At this time, Triquetrum does not include direct support for Hadoop.

Triquetrum uses Ptolemy II as its execution engine.  The Kepler Scientific Workflow System (https://kepler-project.org/) also uses Ptolemy II as its execution engine.  (BTW - when I refere to Kepler, I'm referring to the Kepler Scientific Workflow System which predates Kepler the Eclipse release, and probably predates the Kepler Lua package).

Kepler (Scientific Workflow System) does have support for Hadoop, see

http://users.sdsc.edu/~jianwu/JianwuWang_files/ICCS-bioKepler.pdf

https://kepler-project.org/developers/interest-groups/distributed/configuring-hadoop-for-biokepler-or-ddp-suite

My understanding of how Kepler's Distributed Data-Parallel (DDP) works is that it presents a facade for the different big data systems.   If I remember correctly, there are directors such as the Stratosphere Director that support a limited set of data types.  These directors handle the glue for the different big data systems.

Our goal with Triquetrum is to be able to support Kepler, but I have not yet tried it

Triquetrum does supporting adding actors, see https://wiki.eclipse.org/Triquetrum/Extending_Triquetrum

One thing that might be missing is that Kepler has a tabbed parameter editor that is not yet supported.


Two things are happening soon about Triquetrum:

On Tuesday, June 7, I'll be giving a poster about Triquetrum and Kepler at ICCS in San Diego.

On Wednesday, June 8, Erwin will be presenting about Triquetrum at EclipseCon France: https://www.eclipsecon.org/france2016/session/triquetrum-integrating-workflows-scientific-software

I'll see about trying to get the Kepler DDP work to run in Triquetrum next week, but I make no promises.

_Christopher






On 5/24/16 7:46 AM, Michele Gabusi wrote:

Dear Christopher,

I hope you remember me and our quick meeting at EclipseCon NA 2016.

Jay Jay kindly introduced me and my company (Engineering Group, an Italian IT company) as new guest members of Eclipse Science WG. The core business of our job today focuses on Cloud infrastructures & Analytics development (Python and R analytics on Hadoop). At the end of our quick conversation at EclipseCon, I saw that Triquetrum project did not support any actors or connectors for Hadoop environment yet. If I well remember code generation for (Big) data processing workflows on Hadoop Ecosystem were not implemented at that time. Did I understand correctly? I was wondering if you have collected any contribution/feedback on that side during recent months.

In addition, recently some interesting platform like Apache NiFi, or Talend Big Data have grown up and seem getting quite promising. Do you see in Big Data/Hadoop integration an appealing perspective for Triquetrum development?

Thank you,

All the best,

Michele.


....
--
Logo Engineering

Michele Gabusi
Data Mining and Business Analytics
Big Data Competency Center
michele.gabusi@xxxxxx

Engineering Group
Corso Stati Uniti, 23/C - 35127 Padova - Italy
Tel. +39-049.8283549
Fax +39-049.8283569
www.eng.it


-- 
Christopher Brooks, PMP                       University of California
Academic Program Manager & Software Engineer  US Mail: 337 Cory Hall
CHESS/iCyPhy/Ptolemy/TerraSwarm               Berkeley, CA 94720-1774
cxh@xxxxxxxxxxxxxxxxx, 707.332.0670           (Office: 545Q Cory)

--
Logo Engineering

Michele Gabusi
Data Mining and Business Analytics
Big Data Competency Center
michele.gabusi@xxxxxx

Engineering Ingegneria Informatica
Corso Stati Uniti, 23/C - 35127 Padova - Italy
Tel. +39-049.8283549
Fax +39-049.8283569
www.eng.it


-- 
Christopher Brooks, PMP                       University of California
Academic Program Manager & Software Engineer  US Mail: 337 Cory Hall
CHESS/iCyPhy/Ptolemy/TerraSwarm               Berkeley, CA 94720-1774
cxh@xxxxxxxxxxxxxxxxx, 707.332.0670           (Office: 545Q Cory)

Back to the top