Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]

From: Christopher Brooks <cxh@xxxxxxxxxxxxxxxxx>
Date: Wed, 1 Jun 2016 17:54:34 -0700
Delivered-to: triquetrum-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/triquetrum-dev>
List-help: <mailto:triquetrum-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/triquetrum-dev>, <mailto:triquetrum-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/triquetrum-dev>, <mailto:triquetrum-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.0

Ok, I spent some time at this. See https://github.com/eclipse/triquetrum/issues/84 and https://wiki.eclipse.org/Triquetrum/Kepler

There are two issues:

1) The ExecutionChoice actor does not properly split the action code from the UI code, so it has dependencies on classes like KeplerGraphFrame, which extends ActorGraphFrame. ActorGraphFrame is not a part of Triquetrum.

2) There are a number of classes seem like they are not exported from the ptolemy modules

#1 is a pretty big deal and would require refactoring the Kepler Execution Choice Actor. I won't have the time to do that right now.

_Christopher

On 5/24/16 9:22 AM, Christopher Brooks wrote:

Good question.

https://wiki.eclipse.org/Triquetrum/FAQ#What_is_the_relationship_between_Triquetrum_and_Kepler.3F says

What is the relationship between Triquetrum and Kepler?

Ptolemy II is the execution engine used by both Triquetrum and the Kepler Scientific Workflow System. We see Triquetrum as a possible way forward for Kepler to use a more common build infrastructure and to otherwise leverage the Eclipse ecosystem. One of the goals of Triquetrum is for Triquetrum to be able to open at least some Kepler models.

In theory, there is nothing stopping Triquetrum from being able to open all the Kepler actors.

In practice, there are some limitations.

The first issue is that Kepler uses a compressed format for storing models and other files. However, Eclipse can export models as .moml files, which can be read in by Triquetrum. Some Kepler-specific annotations such as provenance would not be imported into Triquetrum until someone did the necessary conversion.

There could be UI issues in the dialogs.

Kepler uses SVG for icons. The Triquetrum icons support a subset of SVG.

I'm not sure if we want to support all Kepler actors because not all Kepler actors are robust and useful.

Next week, I want to try getting some parts of Kepler working in Triquetrum. My guess is that it will be fairly straightforward to get something to work.

Adding individual actors (Matlab, which is a Ptolemy II actor, or R, which is a Kepler actor) from Kepler to Triquetrum will probably not be that hard. It is primarily a question of adding the Kepler classes to the classpath and updating the Triquetrum actor palette. However, there will likely be complications.

If an actor uses the Ptolemy classes then it should work in Ptolemy, Kepler and Triquetrum. Kepler actors that use other Kepler classes will probably work as well. So, the Ptolemy engine does make actors fully reusable in Ptolemy, Kepler and Triquetrum.

_Christopher

On 5/24/16 9:08 AM, Michele Gabusi wrote:
Thank you,

great, I'll take a look soon.

Just a question: when you say that you aim to support Kepler, you mean that in future one may import a Kepler project and execute it just importing all the actors defined by Kepler as well (such as MapReduce, R, Matlab, etc.), or you aim to support Kepler's DDP engine?

Honestly it's not completely clear to me if the Ptolemy engine make the same actors fully reusable (out of the box) in Ptolemy, Triquetrum and Kepler.

Thanks again,

Cheers,
Michele.

Il 24/05/2016 17:07, Christopher Brooks ha scritto:
Hi Michele,
Right, I remember meeting you. I've taken the liberty of cc'ing the Triquetrum mailing list.

At this time, Triquetrum does not include direct support for Hadoop.

Triquetrum uses Ptolemy II as its execution engine. The Kepler Scientific Workflow System (https://kepler-project.org/) also uses Ptolemy II as its execution engine. (BTW - when I refere to Kepler, I'm referring to the Kepler Scientific Workflow System which predates Kepler the Eclipse release, and probably predates the Kepler Lua package).

Kepler (Scientific Workflow System) does have support for Hadoop, see

http://users.sdsc.edu/~jianwu/JianwuWang_files/ICCS-bioKepler.pdf

https://kepler-project.org/developers/interest-groups/distributed/configuring-hadoop-for-biokepler-or-ddp-suite

My understanding of how Kepler's Distributed Data-Parallel (DDP) works is that it presents a facade for the different big data systems. If I remember correctly, there are directors such as the Stratosphere Director that support a limited set of data types. These directors handle the glue for the different big data systems.

Our goal with Triquetrum is to be able to support Kepler, but I have not yet tried it

Triquetrum does supporting adding actors, see https://wiki.eclipse.org/Triquetrum/Extending_Triquetrum

One thing that might be missing is that Kepler has a tabbed parameter editor that is not yet supported.

Two things are happening soon about Triquetrum:

On Tuesday, June 7, I'll be giving a poster about Triquetrum and Kepler at ICCS in San Diego.

On Wednesday, June 8, Erwin will be presenting about Triquetrum at EclipseCon France: https://www.eclipsecon.org/france2016/session/triquetrum-integrating-workflows-scientific-software

I'll see about trying to get the Kepler DDP work to run in Triquetrum next week, but I make no promises.

_Christopher

On 5/24/16 7:46 AM, Michele Gabusi wrote:

Dear Christopher,

I hope you remember me and our quick meeting at EclipseCon NA 2016.

Jay Jay kindly introduced me and my company (Engineering Group, an Italian IT company) as new guest members of Eclipse Science WG. The core business of our job today focuses on Cloud infrastructures & Analytics development (Python and R analytics on Hadoop). At the end of our quick conversation at EclipseCon, I saw that Triquetrum project did not support any actors or connectors for Hadoop environment yet. If I well remember code generation for (Big) data processing workflows on Hadoop Ecosystem were not implemented at that time. Did I understand correctly? I was wondering if you have collected any contribution/feedback on that side during recent months.

In addition, recently some interesting platform like Apache NiFi, or Talend Big Data have grown up and seem getting quite promising. Do you see in Big Data/Hadoop integration an appealing perspective for Triquetrum development?

Thank you,

All the best,

Michele.

....

--

Michele Gabusi
Data Mining and Business Analytics
Big Data Competency Center
michele.gabusi@xxxxxx

Engineering Group
Corso Stati Uniti, 23/C - 35127 Padova - Italy
Tel. +39-049.8283549
Fax +39-049.8283569
www.eng.it
-- 
Christopher Brooks, PMP                       University of California
Academic Program Manager & Software Engineer  US Mail: 337 Cory Hall
CHESS/iCyPhy/Ptolemy/TerraSwarm               Berkeley, CA 94720-1774
cxh@xxxxxxxxxxxxxxxxx, 707.332.0670           (Office: 545Q Cory)
--

Michele Gabusi
Data Mining and Business Analytics
Big Data Competency Center
michele.gabusi@xxxxxx

Engineering Ingegneria Informatica
Corso Stati Uniti, 23/C - 35127 Padova - Italy
Tel. +39-049.8283549
Fax +39-049.8283569
www.eng.it
-- 
Christopher Brooks, PMP                       University of California
Academic Program Manager & Software Engineer  US Mail: 337 Cory Hall
CHESS/iCyPhy/Ptolemy/TerraSwarm               Berkeley, CA 94720-1774
cxh@xxxxxxxxxxxxxxxxx, 707.332.0670           (Office: 545Q Cory)

-- 
Christopher Brooks, PMP                       University of California
Academic Program Manager & Software Engineer  US Mail: 337 Cory Hall
CHESS/iCyPhy/Ptolemy/TerraSwarm               Berkeley, CA 94720-1774
cxh@xxxxxxxxxxxxxxxxx, 707.332.0670           (Office: 545Q Cory)

References:
- [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]
  - From: Christopher Brooks
- Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]
  - From: Michele Gabusi
- Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]
  - From: Christopher Brooks

Prev by Date: Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]
Next by Date: [triquetrum-dev] Fwd: [technology-pmc] Vote for Committer status for Jonah Graham was started by Erwin De Ley
Previous by thread: Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]
Next by thread: [triquetrum-dev] Fwd: [technology-pmc] Vote for Committer status for Jonah Graham was started by Erwin De Ley
Index(es):
- Date
- Thread

Breadcrumbs

What is the relationship between Triquetrum and Kepler?