Hi Radmin,
thx for the info. I have a couple of remarks, sprinkled below.
/Thomas
On Tue, 2019-08-20 at 15:48 +0200, Radim Hopp wrote: Hi, On today's che SoS meeting, there was a question about what exactly is PR check doing and who is responsible for analyzing failures.
What is PR check actually doing? * The declarative pipeline is in my opinion quite readable, but I'll try to summarize it shortly: - Builds che from PR branch & pushes it to dockerhub
Che is not just built from the eclipse/che repository, but from many moving parts (che-theia, plugin-repository, devfile-repository, chectl, etc.) If we have dependent changes in more than one, we would have to rebuild those parts first in order to build a working system. We used to have rules detecting the same branch name in multiple repositories: our build would then rebuild those repos before "che". I think we'll need a similar mechanisme in order to make things reliable.
What to do, when tests fail? First of all, try to run them again - unfortunatelly we haven't achieved 100% stability (even though we are pretty close, when there is no major problem like infrastructure outage or something like this) If it fails again, please contact QE team on eclipse mattermost - room Eclipse Che QE (with @here or @all handle).
The trouble with failure reports from happy path test failures is that it is pretty much impossible to guess at the cause of a failure from the information we find in bug reports (example: https://github.com/eclipse/che/issues/14283). In the end, most failures in the Happy Path test look like Java language support is not working, even if the problem is completely unrelated. I don't think that the langauges team analysing 90% of the failures is a desirable outcome. We should work toward being more precise in pinpointing failure locations. I can see two ways to achieve this: the first would be to make it easy to reproduce the test failure in a way that a person can poke the system and debug it. To me, it's currently not clear how I would set up a system that accurately reproduces the environment where the failure occurs. The second way would be to increase logging to a degree where we can make a reasonable guess at the cause of a failure looking at logs post-mortem.
/Thomas
|