Hi Michele,
Right, I remember meeting you. I've taken the liberty of cc'ing the
Triquetrum mailing list.
At this time, Triquetrum does not include direct support for Hadoop.
Triquetrum uses Ptolemy II as its execution engine. The Kepler
Scientific Workflow System (https://kepler-project.org/) also uses
Ptolemy II as its execution engine. (BTW - when I refere to Kepler,
I'm referring to the Kepler Scientific Workflow System which
predates Kepler the Eclipse release, and probably predates the
Kepler Lua package).
Kepler (Scientific Workflow System) does have support for Hadoop,
see
http://users.sdsc.edu/~jianwu/JianwuWang_files/ICCS-bioKepler.pdf
https://kepler-project.org/developers/interest-groups/distributed/configuring-hadoop-for-biokepler-or-ddp-suite
My understanding of how Kepler's
Distributed Data-Parallel (DDP) works is that it presents a facade
for the different big data systems. If I remember correctly, there
are directors such as the Stratosphere Director that support a
limited set of data types. These directors handle the glue for the
different big data systems.
Our goal with Triquetrum is to be able to support Kepler, but I have
not yet tried it
Triquetrum does supporting adding actors, see
https://wiki.eclipse.org/Triquetrum/Extending_Triquetrum
One thing that might be missing is that Kepler has a tabbed
parameter editor that is not yet supported.
Two things are happening soon about Triquetrum:
On Tuesday, June 7, I'll be giving a poster about Triquetrum and
Kepler at ICCS in San Diego.
On Wednesday, June 8, Erwin will be presenting about Triquetrum at
EclipseCon France:
https://www.eclipsecon.org/france2016/session/triquetrum-integrating-workflows-scientific-software
I'll see about trying to get the Kepler DDP work to run in
Triquetrum next week, but I make no promises.
_Christopher
On 5/24/16 7:46 AM, Michele Gabusi
wrote:
Dear Christopher,
I hope you remember me and our quick meeting at EclipseCon NA
2016.
Jay Jay kindly introduced me and my company (Engineering Group,
an Italian IT company) as new guest members of Eclipse Science
WG. The core business of our job today focuses on Cloud
infrastructures & Analytics development (Python and R
analytics on Hadoop). At the end of our quick conversation at
EclipseCon, I saw that Triquetrum project did not support any
actors or connectors for Hadoop environment yet. If I well
remember code generation for (Big) data processing workflows on
Hadoop Ecosystem were not implemented at that time. Did I
understand correctly? I was wondering if you have collected any
contribution/feedback on that side during recent months.
In addition, recently some interesting platform like Apache
NiFi, or Talend Big Data have grown up and seem getting quite
promising. Do you see in Big Data/Hadoop integration an
appealing perspective for Triquetrum development?
Thank you,
All the best,
Michele.
....
--
Michele Gabusi
Data Mining and Business Analytics
Big Data Competency Center
michele.gabusi@xxxxxx
Engineering Group
Corso Stati Uniti, 23/C - 35127 Padova - Italy
Tel. +39-049.8283549
Fax +39-049.8283569
www.eng.it
--
Christopher Brooks, PMP University of California
Academic Program Manager & Software Engineer US Mail: 337 Cory Hall
CHESS/iCyPhy/Ptolemy/TerraSwarm Berkeley, CA 94720-1774
cxh@xxxxxxxxxxxxxxxxx, 707.332.0670 (Office: 545Q Cory)
|