Introducing EMF-IncQuery: super-fast incremental query evaluation over EMF models

emf incquery logo

EMF-IncQuery is a state-of-the-art incremental graph query framework based on:

  1. an expressive declarative language that developers can use to capture queries
  2. a scalable query engine to execute live queries over models
  3. an extensive automation to seamlessly integrate incremental queries into EMF applications

Why EMF-IncQuery?

The leading industrial modeling ecosystem, the Eclipse Modeling Framework (EMF), provides different ways for querying the contents of models. These approaches range from hand-coded model traversal to high-level declarative constraint languages such as Eclipse OCL. However, industrial experience shows strong evidence of scalability problems in complex query evaluation over large EMF models, and manual query optimization is time consuming to implement on a case-by-case basis.

In order to overcome this limitation, the EMF-IncQuery framework proposes to capture queries over EMF models declaratively and execute them efficiently without manual coding using incremental graph pattern matching techniques. The benefits of EMF-IncQuery with respect to the state-of-the-art of querying EMF models include:

  • high-level declarative query language based on graph patterns supported by advanced Xtext based editor,
  • a highly efficient query engine capable of evaluating queries over models with millions of elements within a few milliseconds,
  • an advanced integrated development environment to ease the construction and validation of model queries.

In addition, EMF-IncQuery efficiently addresses several shortcomings of the EMF API (such as backward navigation or instance enumeration).

Writing my first EMF-IncQuery graph patterns

For the query language, we reuse the concepts of graph patterns (a concepts similar to datalog/prolog clauses) as a concise and easy way to specify complex structural model queries. These graph-based queries can capture interrelated constellations of EMF objects, with the following benefits:

  • the language is expressive and provides powerful features such as negation or counting,
  • graph patterns are composable and reusable,
  • queries can be evaluated with great freedom, i.e. input and output parameters can be selected at run-time,
  • some frequently encountered shortcomings of EMF’s interfaces are addressed:
    • easy and efficient enumeration of all instances of a class regardless of location,
    • simple backwards navigation along all kinds of references (even without eOpposite)
    • finding objects based on attribute value.

Example

To illustrate the pattern language of EMF-IncQuery we present a set of patterns in Figure 1(a) for identifying empty classes in UML (from our UML case study): classes that do not have operations or properties (not even in their superclasses), suggesting an incomplete model. However, if the name of the class is postfixed with the string “Empty”, we consider the class empty by design, so it is not returned.

source code
example instance model

Figure 1: (a) example graph pattern code (b) UML class model

  • The SupEmpty class is not considered empty because of its name, while the classes B, C and D either define or inherit the property called refers.
  • The pattern superClass in Line 1 consists only of structural constraints: it describes the direct superclass relation by a generalization node (local variable gen) that is connected both to the classes referenced as sub and sup.
  • The pattern hasOperation in Line 7 consists of the disjunction of two bodies: one represents the fact that the selected class cl holds an Operation. The second body uses the transitive closure of the relation defined by the superClass pattern in Line 10 to select the indirect superclasses of a selected class, and then declares that the superclass owner holds an Operation.
  • Finally, the pattern emptyClass in Line 14 selects classes without operations and properties by evaluating two corresponding neg calls, which means that the parent query evaluates to true if those called patterns cannot be matched to the underlying model (for the sake of simplicity the hasProperty pattern is omitted as it works exactly the same as the presented hasOperation). The second parameters of the pattern calls are single-use variables (starting with the ‘_’ symbol), so these NACs are simple non-existence checks. The check expression in the Line 18 reuses the String.endsWith Java method on a local variable.

What is under the hood - the development environment

The development workflow of the EMF-IncQuery framework focuses on the specification and evaluation of queries and the automatic generation of integration code for plugging into existing EMF-based applications. As depicted in Figure 2, the development environment offers three major components: (1) the Graph Pattern Editor, (2) the Query Result Explorer and (3) the Pattern Code Generator.

architecture incquery

Figure 2: EMF-IncQuery architecture overview


Graph pattern editor - The EMF-IncQuery development environment provides an Xtext-based editor for the pattern language with syntax highlighting, code completion and well-formedness validation. The editor is tightly integrated with the other components: the code generator is integrated into the Eclipse builder framework, and is executed after changes in pattern definitions are saved (unless Eclipse automatic builders are turned off), while the Query Explorer updates the displayed query results.

Query result explorer - In order to evaluate complex model queries the EMF-IncQuery provides the Query result explorer. This component visualizes live query results of both interpretative and generated pattern matchers in a generic view, and provides a quick feedback cycle during transformation development.

Pattern code generator - The environment also helps the integration of queries into a Java application by maintaining a project with pattern-specific generated matcher code. The generated matcher is semantically equivalent of the interpretative one, but provides an easy-to-integrate type-safe Java API, and some performance optimizations are also executed. Furthermore, the generator may also produce code for various integration components, such as the data binding support, validation framework or query-based features.

To see it in action Figure 3 shows the EMF-IncQuery development environment while developing an EMF-UML2 based validation plugin (as in our pattern example). On the left side the used model and plug-in projects are shown. As EMF-IncQuery projects are plug-in projects, their management relies on already existing Eclipse features. On the right, the Query Editor is open next to the Papyrus UML editor in the middle that contains a sample model for evaluating the queries currently developed. Finally, in the bottom of the screen the Query Explorer has already loaded the model and the queries from the editors, and reacts on changes in any of the editors.

incquery ide

Figure 3: EMF-IncQuery development environment

Cool features based on incremental query evaluation

What is important to see that incremental query evaluation is an enabler technology to many advanced modeling features that can benefit from the blazing fast query reevaluation. The following list provides an overview on some of those:

  • On-the-fly model validation: EMF-IncQuery provides facilities to create validation rules based on the pattern language of the framework. These rules can be evaluated on various EMF instance models and upon violations of constraints, markers are automatically created in the Eclipse Problems View. Used in an Autosar context for validating large models loaded into the editor.
  • EMF query-based features: EMF-IncQuery supports the definition of efficient, incrementally maintained, well-behaving (meaning that all the derived features act as any ordinary EStructuralFeature and whenever changes throws proper Notifications) derived features in EMF. Based on incremental evaluation for calculating the value of derived features and providing automated code generation for integrating into existing applications. This approach is extensively used in our Massif project to provide easier navigation on Matlab Simulink models.
  • Model visualization made easy: The goal of the IncQuery Viewers component is to help developing model-driven user interfaces by filling and updating model viewer results with the results of model queries. The Viewers component can bind the results of queries to various visualization frameworks like JFace, GEF4 Zest and it also incorporates a specific Sirius integration to support interpreted expressions.
  • Incremental model synchronization: The common recurring task in any tool, framework or application based on model-driven concepts is to capture and process not only the underlying models, but also their changes as a stream of events (operations that affect models). We generalized this approach to provide a common conceptual framework for defining reactive model transformations/processors based on an event-driven virtual machine (EVM) architecture. This approach provides the basis for our Viatra reactive model transformation engine (a short introduction).

Summary

EMF-IncQuery is an incremental graph query engine based on a declarative pattern language to capture and execute live queries over models. It provides up-to-date results and result change notifications as the models evolve. EMF-IncQuery is powered by a highly scalable engine capable of executing complex queries over large models (10M+ elements) in a few milliseconds.