Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [epp-dev] 2020-12 - Retrospective

Hi Jonah,

Thanks a lot for your answers.

On Wed, Dec 16, 2020 at 3:20 AM Jonah Graham <jonah@xxxxxxxxxxxxxxxx> wrote:
TL;DR - the most time is spent dealing with projects that themselves are under resourced and no longer participating with the same enthusiasm they once were! This applies to the whole simrel + epp.

For the 2020-12 release cycle, the biggest single use of my time (by far!) was trying to fix the JAXB issue that caused problems when different sets of plug-ins were installed. The conclusion of this seems to be that Mylyn (not wikitext) is just going to be dropped from simrel, and therefore all the packages. It took me days (5+) to resolve and test this issue and continued right through RC2 as the final (known) problem was discovered in RC1.

For that, I would blame SimRel, its "kindness" and false promises. So far, EPP used to assume that SimRel bits could be taken together and work; this time, it has been clear that it's not the case: SimRel is not really a reliable source of consistently working bits. The fact that SimRel doesn't actively prune projects  that lacks enthusiasm leads to SimRel becoming a kind of pot-pourri of projects with many of them in a non-industrial grade intrinsic or support quality.
The fact that EPP got such bug, and it was detected only in EPP months later and not by SimRel itself before it released is a bad smell.
I don't think any technology can fix that in general. It's an organizational issue to bring to Planning Council. As you are probably aware, I personally advocate for the progressive drops of SimRel as it doesn't appear profitable these days, and as I believe dropping SimRel would allow to focus more energy in EPP, where I believe there is more quality checks and better ROI. This story makes my case against SimRel thicker ;)

However, in parallel of the general analysis, I'm curious: is the dependency issue a case that's likely to happen again? Is there a technical root cause to the issue (eg in p2) that could reduce the chances of such issue to happen again? Are there some automated tests we could put in EPP that could have detected this issue since 2020-06? Would it be worth automatic tests for such potential issues in the future?

The thing about the above problem is that builds have always been green, despite there being problems since 2020-06 release. The first step on my release process is "Ensure that the CI build is green. [...]

That echoes my question above: could we have checks to detect the issue and make builds red?
 
This basically means that someone (me for now) has to own making sure that the packages build. 

Should we have a policy in EPP that mandates the package maintainers to look at those issue when their packages are affected?
SimRel usually disables contributions that are causing trouble. In EPP, we could do something similar: when there is a known issue with a package, we comment out its lines in the pom.xml so its excluded from the future builds and warn on the mailing-list about package being disabled, then the package maintainers have to deal with the resolution, it's not in your hands any more ;)

 
- Updating the N&N links in the epp.website.xml files. This activity ends up serving two functions, first is I collect all the relevant N&N links into epp.website.xml, but secondly I end up reviewing the state of simrel. There are currently 18 unique N&N links, but I had to send emails out to 4 of the projects because their N&N links were somehow broken (e.g. simply 404, or fully missing PMI entry). This is improved in one of two ways - 1. all projects have a generic N&N entry, like wwd[1] or 2. use the Eclipse web API to automate this step[2], but that would require projects to still have up to date N&N links.

Do you think we should keep those N&N info for individual packages? There is a N&N aggregated for all SimRel projects, isn't it enough? Do you need this "intermediary" grain between SimRel and projects? Would it hurt of we get rid of package specific N&N links?

- The build itself takes a long time. In particular the promotion job. While I can theoretically press run and then come back to it, it is annoying that the promotion job spends 20 minutes downloading the artifacts from the build being promoted, does some massaging on them, and then spends another 20 minutes copying the results to download.eclipse.org. (I also imagine that every EPP Thursday the webmasters are cursing the network spike - but maybe it doesn't register?). 

Do I get it right that
1. we could do the "massaging" on every EPP build so built artifacts area always ready for promotion
2. we could try to publish the "snapshots" to download.eclipse.org on every build? (that would probablt make the snapshots build 20 minutes longer)
3. promoting would then be sending a `ssh cp ...` command to download.eclipse.org and would then become very fast
? Would that save time overall? Is it OK to have snapshots taking 20+ more minutes if it saves 20 minutes on promotion?

Back to the top