Eclipse Community Forums: EMF » Automatic generation of random ecore model instances

Home » Modeling » EMF » Automatic generation of random ecore model instances

Automatic generation of random ecore model instances [message #1840333]

Mon, 12 April 2021 14:36

Eclipse User

Hello everyone,
I've been working on a model2model transformation using ecore models and ATL as a language.
To test my transformation, I generated few XMI instances of my input model using the sample reflective ecore model editor.

Now, however, I need to have a big number of instances to be able to run more tests in order to validate the correctness of my transformation. What I need is a method, a tool, or other solution to be able to automatically generate random different instances of my input model, to create distinct test cases.

I learned about the ecoreFactory method, which is the method to generate instances programmatically. However, after investigating it, it did not seem to be able to generate instances randomly or automatically as the values of different attributes must be written manualy.

Please, enlighten me if I am getting the ecoreFactory wrong, and help me figure out how to make it work for my context.
If not possible, suggest to me other options to be able to automatically generate random instances to run tests.

Thank you in advance.

Re: Automatic generation of random ecore model instances [message #1840353 is a reply to message #1840333]

Tue, 13 April 2021 02:08

Eclipse User

I'm not sure specifically what you are using but if it's the generate factory of the model of course that factory will not generate random instances but rather uniformly the same "blank" instance. Some of these articles might be useful:

https://www.google.com/search?q=emf+generate+random+instances

Re: Automatic generation of random ecore model instances [message #1840364 is a reply to message #1840333]

Tue, 13 April 2021 05:04

Eclipse User

Hi

This is a standard research topic on which you can find many articles in the literature; try Googling "test model generator". However they tend to be metamodel blind and so take an unpalatably long time.

Using random models seems like a good idea, but is absolutely not what you want.

Mindlessly random test models will have a very low chance of hitting interesting test cases so that your testing will take forever and perhaps "find" nothing.

"find". Given a random test model how can you tell whether the result is good or bad? Ok; a smoke test is useful, but unless you have an Oracle to predict the correct output for comparison, or a sub-Oracle to check e.g. that size(Min+Nin) == size(Qout), your tests reveal little. If you are lucky, you have a golden brick reference implementation that you can compare against. More likely you have a flaky legacy implementation. There will always be tolerable differences so you will need an intelligent comparator. For my transformations, I try to ensure that they can be bidirectional so that I can do a round-trip test and so use the reverse transformation as an oracle for the forward and vice-versa. To accommodate acceptable differences I run the result and reference model through a normalizer that may zero optional fields, alphabeticise unordered collections etc to make comparison viable.

Intelligently random test models are the goal of much of the better research relying on the observation that the commonest errors are off-by-one or more generally recognizing that each test model is a point in the hyperspace of all possible models. The hyperspace is partitioned by hyperplanes each of which separates good from bad. Consequently an intelligently random test model will pick test models that are marginally good and marginally bad. While the hyperplane for e.g. "age >= 0" is easy to identify, the full set of hyperplanes is very challenging. IMHO this is where the tools go wrong attempting to reformulate the metamodels and OCL into the language of a SAT solver. My experience implementing a QVTr compiler convinces me that metamodels are very very restrictive consequently much of the arbitrary generality of SAT constraints is inappropriate for metamodels;the migration to SAT solvers requires the metamodel characteristics to be reformulated as additional problem dimensions rather than restrictive solutions. Anyway I have yet to see a test model generator that is not really really slow for big models. Of course you don't want big models for functionality testing. The minimum model that locates the marginal point in the hyperplane will do.

For my own purposes such as Fig 5 of [1] I want a family of structurally similar models ranging from 100 to 10,000,000 model elements so that I can see whether performance scales linearly or not. I absolutely do not want random models. A custom generator for structurally similar models is pretty easy to write for the small metamodels that my tests have instrumented so far. The models must be generated on the fly since the time and space to load a 10,000,000 element model may dwarf all useful testing. For larger metamodels you have the challenge of how the N elements are to be realized over perhaps 10 different metamodel multiplicities. If you allocate too much N to depth, your transformation may well hit a StackOverflow. If you allocate too much N to width you may find that you are just demonstrating that EMF performance for large Sets implemented by Lists is really disappointing.

Of course model elements are only part of the story. Nearly all models have Strings and Integers for which some values may be significant. e.g. if checking on URIs you need to have some string with sensitive %, \, $ combinations and ensure that you go through a transformation path that can detect the erroneous double encoding of a URI. The chances of any random model generator happening to generate a suitable String in a suitable context are infinitessimal.

I recommend a suite of judicious manual test models with a bidirectional Oracle. For Eclipse QVTo there is a test coverage tool that may help you assess your test models.

Regards

Ed Willink

[1] http://www.eclipse.org/mmt/qvt/docs/BigMDE2016/QVTcFirstResults.pdf

Re: Automatic generation of random ecore model instances [message #1840579 is a reply to message #1840364]

Mon, 19 April 2021 13:37

Eclipse User

Hello Ed,
I would like to start by thanking you for the detailed answer and the insightful information, which took me several days to check online and come up with some conclusions.

Ed Willink wrote on Tue, 13 April 2021 09:04

Hi

This is a standard research topic on which you can find many articles in the literature; try Googling "test model generator". However they tend to be metamodel blind and so take an unpalatably long time.

Using random models seems like a good idea, but is absolutely not what you want.

Mindlessly random test models will have a very low chance of hitting interesting test cases so that your testing will take forever and perhaps "find" nothing.

"find". Given a random test model how can you tell whether the result is good or bad? Ok; a smoke test is useful, but unless you have an Oracle to predict the correct output for comparison, or a sub-Oracle to check e.g. that size(Min+Nin) == size(Qout), your tests reveal little. If you are lucky, you have a golden brick reference implementation that you can compare against. More likely you have a flaky legacy implementation. There will always be tolerable differences so you will need an intelligent comparator. For my transformations, I try to ensure that they can be bidirectional so that I can do a round-trip test and so use the reverse transformation as an oracle for the forward and vice-versa. To accommodate acceptable differences I run the result and reference model through a normalizer that may zero optional fields, alphabeticise unordered collections etc to make comparison viable.

I am now stuck on the test Oracle generation topic, which I couldn't figure out if it is manually generated by me as I did generate the first XMI instance using ecore reflective model or it has to be automatically generated using some sort of code (i couldn't find a proper answer to this question)?

The round trip test is out of my scope as I used ATL for my transformations, which is unidirectional, and implementing a bidirectional transformation in ATL requires the implementation of a new transformation from the target model to the source model, which is not an option for me right now.
Ed Willink wrote on Tue, 13 April 2021 09:04

Intelligently random test models are the goal of much of the better research relying on the observation that the commonest errors are off-by-one or more generally recognizing that each test model is a point in the hyperspace of all possible models. The hyperspace is partitioned by hyperplanes each of which separates good from bad. Consequently an intelligently random test model will pick test models that are marginally good and marginally bad. While the hyperplane for e.g. "age >= 0" is easy to identify, the full set of hyperplanes is very challenging. IMHO this is where the tools go wrong attempting to reformulate the metamodels and OCL into the language of a SAT solver. My experience implementing a QVTr compiler convinces me that metamodels are very very restrictive consequently much of the arbitrary generality of SAT constraints is inappropriate for metamodels;the migration to SAT solvers requires the metamodel characteristics to be reformulated as additional problem dimensions rather than restrictive solutions. Anyway I have yet to see a test model generator that is not really really slow for big models. Of course you don't want big models for functionality testing. The minimum model that locates the marginal point in the hyperplane will do.

For my own purposes such as Fig 5 of [1] I want a family of structurally similar models ranging from 100 to 10,000,000 model elements so that I can see whether performance scales linearly or not. I absolutely do not want random models. A custom generator for structurally similar models is pretty easy to write for the small metamodels that my tests have instrumented so far. The models must be generated on the fly since the time and space to load a 10,000,000 element model may dwarf all useful testing. For larger metamodels you have the challenge of how the N elements are to be realized over perhaps 10 different metamodel multiplicities. If you allocate too much N to depth, your transformation may well hit a StackOverflow. If you allocate too much N to width you may find that you are just demonstrating that EMF performance for large Sets implemented by Lists is really disappointing.

Of course model elements are only part of the story. Nearly all models have Strings and Integers for which some values may be significant. e.g. if checking on URIs you need to have some string with sensitive %, \, $ combinations and ensure that you go through a transformation path that can detect the erroneous double encoding of a URI. The chances of any random model generator happening to generate a suitable String in a suitable context are infinitessimal.

I recommend a suite of judicious manual test models with a bidirectional Oracle. For Eclipse QVTo there is a test coverage tool that may help you assess your test models.
Regards

Ed Willink

[1] http://www.eclipse.org/mmt/qvt/docs/BigMDE2016/QVTcFirstResults.pdf

I agree with your thinking about the quality of test models. In fact, I don't need 100s of test models to test my transformations. Furthermore, my models aren't big and have a very limited number of elements. However, I don't seem to be able to implement my own test model generator for my unidirectional ATL transformations and the tools (similar to the one you mentioned for QVTo) I happened to find are mostly research works that were discarded and not updated. Therefore, they don't produce any results.

Regards.

Re: Automatic generation of random ecore model instances [message #1840599 is a reply to message #1840579]

Tue, 20 April 2021 08:33

Eclipse User

Hello

Automated testing of model transformations was a fashionable subject in academic MDE venues some years ago.

I do not know if these works ever produced a usable tool, but may be you can take some ideas. For ATL transformations you may take a look at

ATLTest: A White-Box Test Generation Approach for ATL Transformations
https://www.researchgate.net/publication/262370768_ATLTest_A_White-Box_Test_Generation_Approach_for_ATL_Transformations

HTH
G. Vega

Re: Automatic generation of random ecore model instances [message #1840609 is a reply to message #1840599]

Tue, 20 April 2021 12:26

Eclipse User

Hi

The Oracle is something that has to be provided by someone. If you are lucky it's already available - a reference/legacy implementation. Else you have to code it, potentially doubling your development effort.

When I wrote bidirectional to support round-trip testing, I did not mandate use of a bidirectional M2M. Most of mine are independently coded forward/reverse Java M2Ms. Using QVTr is still an aspiration for me.

An inferior alternative to an Oracle is a downstream consumer. For M2Ms for which I cannot sensibly provide an inverse such as a Java code generator, I can at least put the generated Java through a Java compiler (which is very fussy about bad syntax) and then execute it to demonstrate intended functionality.

I omitted one further problem with randomly autogenerated test models. If they fail, why? It needs manual analysis to determine whether the failure is in the test model or the model under test and whether the diagnostic is accurate. A systemically autogenerated failing test model should have a known deliberate fault against which a diagnostic can be compared.

https://git.eclipse.org/r/plugins/gitiles/mmt/org.eclipse.qvtd/+/refs/heads/master/tests/org.eclipse.qvtd.doc.exe2016.tests/src/org/eclipse/qvtd/doc/exe2016/tests/atl/EXE2016_ATL_Tests.java is the root of performance tests on an ATL M2M with automgenerated test models.

Regards

Ed Willink

Previous Topic:	Nesting ecore models
Next Topic:	[CDO]Identify the author of a CDOCommit

Goto Forum:

-=] Back to Top [=-

Current Time: Sun Oct 19 09:44:06 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter