The Future of Scientific Workflows and What That Means for Eclipse ICE


Workflows and workflow management systems are widely used to solve scientific problems because of their capability to automate and streamline complex tasks. These scientific workflows can be used to solve big data problems that execute many tasks simultaneously, to bring together human and computer problem-solving abilities in sophisticated analysis workflows, and to conduct large parametric studies for the purposes of quantifying uncertainty or determining sensitivity to changes. Scientific workflows are poised to change the way we solve problems and the ways in which we develop next-generation software, including Eclipse projects such as Eclipse ICE and Triquetrum.


The future of scientific workflows is far from clear, and numerous grand challenges exist in the field that require computer scientists to rethink the way workflows are executed. This includes executing workflows across heterogeneous hardware; distributing parts of the workflow across geographic locations; processing multiple types of science; and seamlessly switching between fast parallel execution, large optimization loops, and gathering human (or AI) feedback at runtime. These challenges must be met in the context of computing systems that have different security policies and authentication mechanisms.


Arguably, no software development effort could write a single workflow engine that addresses all of these issues to the degree needed for future problems. The problem space is vast with diverse requirements and many existing tools that address the requirements in part. In light of these facts, research scientists and members of the Eclipse ICE development team at Oak Ridge National Laboratory have started developing a new version of ICE that supports the same type of workflows as ICE 2.0 but leverages the existing workflow management systems for missing functionality.


Eclipse ICE 3.0 ("ICE III") is a full redesign and nearly full re-implementation of the entire ICE platform that uses workflow aggregation and microservices to combine the core ICE framework with other workflow management systems. This is building in principle on previous work between the ICE and Triquetrum teams which showed that because of ICE's service-based workflow model, it could aggregate Triquetrum workflows into its own workflow catalog. ICE III will use microservices in lieu of OSGi services, a design decision based on the diversity found in languages, libraries, and technologies used in science projects. All public interfaces and the data model in ICE III will be described in the Resource Description Framework (RDF). RDF is a framework that simplifies the process of describing resources of any type and is extensible enough to handle full ontological descriptions of all the workflow execution models that ICE III will process using its own and other workflow engines. This is also an opportunity to address long-standing concerns with ICE 2.0, including a troublesome build system, cross-language compatibility, portability, and scalability.


ICE III represents a radical departure from the previous versions of ICE, but its design is dictated by a bright future for scientific workflows that promises to solve big problems of unprecedented complexity, importance, and value to the community.


About the Author