[triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]

From: Christopher Brooks <cxh@xxxxxxxxxxxxxxxxx>
Date: Tue, 24 May 2016 08:07:51 -0700
Delivered-to: triquetrum-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/triquetrum-dev>
List-help: <mailto:triquetrum-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/triquetrum-dev>, <mailto:triquetrum-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/triquetrum-dev>, <mailto:triquetrum-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.7.2

Hi Michele,
Right, I remember meeting you. I've taken the liberty of cc'ing the Triquetrum mailing list.

At this time, Triquetrum does not include direct support for Hadoop.

Triquetrum uses Ptolemy II as its execution engine. The Kepler Scientific Workflow System (https://kepler-project.org/) also uses Ptolemy II as its execution engine. (BTW - when I refere to Kepler, I'm referring to the Kepler Scientific Workflow System which predates Kepler the Eclipse release, and probably predates the Kepler Lua package).

Kepler (Scientific Workflow System) does have support for Hadoop, see

http://users.sdsc.edu/~jianwu/JianwuWang_files/ICCS-bioKepler.pdf

https://kepler-project.org/developers/interest-groups/distributed/configuring-hadoop-for-biokepler-or-ddp-suite

My understanding of how Kepler's Distributed Data-Parallel (DDP) works is that it presents a facade for the different big data systems. If I remember correctly, there are directors such as the Stratosphere Director that support a limited set of data types. These directors handle the glue for the different big data systems.

Our goal with Triquetrum is to be able to support Kepler, but I have not yet tried it

Triquetrum does supporting adding actors, see https://wiki.eclipse.org/Triquetrum/Extending_Triquetrum

One thing that might be missing is that Kepler has a tabbed parameter editor that is not yet supported.

Two things are happening soon about Triquetrum:

On Tuesday, June 7, I'll be giving a poster about Triquetrum and Kepler at ICCS in San Diego.

On Wednesday, June 8, Erwin will be presenting about Triquetrum at EclipseCon France: https://www.eclipsecon.org/france2016/session/triquetrum-integrating-workflows-scientific-software

I'll see about trying to get the Kepler DDP work to run in Triquetrum next week, but I make no promises.

_Christopher

On 5/24/16 7:46 AM, Michele Gabusi wrote:

Dear Christopher,

I hope you remember me and our quick meeting at EclipseCon NA 2016.

Jay Jay kindly introduced me and my company (Engineering Group, an Italian IT company) as new guest members of Eclipse Science WG. The core business of our job today focuses on Cloud infrastructures & Analytics development (Python and R analytics on Hadoop). At the end of our quick conversation at EclipseCon, I saw that Triquetrum project did not support any actors or connectors for Hadoop environment yet. If I well remember code generation for (Big) data processing workflows on Hadoop Ecosystem were not implemented at that time. Did I understand correctly? I was wondering if you have collected any contribution/feedback on that side during recent months.

In addition, recently some interesting platform like Apache NiFi, or Talend Big Data have grown up and seem getting quite promising. Do you see in Big Data/Hadoop integration an appealing perspective for Triquetrum development?

Thank you,

All the best,

Michele.

....

--

Michele Gabusi
Data Mining and Business Analytics
Big Data Competency Center
michele.gabusi@xxxxxx

Engineering Group
Corso Stati Uniti, 23/C - 35127 Padova - Italy
Tel. +39-049.8283549
Fax +39-049.8283569
www.eng.it

-- 
Christopher Brooks, PMP                       University of California
Academic Program Manager & Software Engineer  US Mail: 337 Cory Hall
CHESS/iCyPhy/Ptolemy/TerraSwarm               Berkeley, CA 94720-1774
cxh@xxxxxxxxxxxxxxxxx, 707.332.0670           (Office: 545Q Cory)

Follow-Ups:
- Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]
  - From: Michele Gabusi

Prev by Date: Re: [triquetrum-dev] Blog about the editor palette
Next by Date: Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]
Previous by thread: [triquetrum-dev] Blog about the editor palette
Next by thread: Re: [triquetrum-dev] Hadoop support in Triquetrum [Was Re: Introduction]
Index(es):
- Date
- Thread

Breadcrumbs