Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [che-dev] Single Host on OpenShift

Thank you Lukas for the detailed analysis. 

We are somehow implementing an alternative ingress on top of the existing one. The default ingress controller for Kubernetes is the nginx controller, on OpenShift it's haproxy. Those have been the choices that those teams have done 5 years ago or so. It may be interesting to ask them what would be their choice if they had to start from scratch today. Their use cases will be more generic than ours but we may get some useful answers anyway.

On Mon, Jul 13, 2020 at 6:05 PM Lukas Krejci <lkrejci@xxxxxxxxxx> wrote:
Good point,

https://github.com/eclipse/che/issues/12914#issuecomment-657646329

On Monday, July 13, 2020 5:15:37 PM CEST Brad Micklea wrote:
> Thanks Lukas, good summary. Is this summary of findings and decision
> captured in a GH issue so that it's visible for those outside the mailing
> list, and can be revisited if needed?
>
> On Mon, Jul 13, 2020 at 11:05 AM Lukas Krejci <lkrejci@xxxxxxxxxx> wrote:
> > Hi all,
> >
> > So we finally concluded all of our tests. Since the last time, where I
> > informed you about our ruling out of Envoy because of somewhat
> > surprisingly
> > slower performance under high load compared to our other candidate reverse
> > proxies and also somewhat more difficult debuggability of problems (due to
> > the
> > distributed nature of configuration), we've also ruled our HAProxy from
> > our
> > list of candidates mainly because of ease of use concerns.
> >
> > That left us with having to make a choice between Traefik and Nginx. For
> > that
> > we felt it was necessary to confirm our original findings with another run
> > of
> > performance tests, because we saw some oddly high response times with
> > Nginx
> > leading us to think there might have been some environmental influence.
> > Also,
> > having more data would give us more confidence in our findings.
> >
> > We found this:
> >
> > 1) With the increasing static load, Nginx has less and less performance
> > advantage over Traefik and under a very high load, Nginx starts to show
> > rare
> > but severe erratic behavior (a couple of requests lasting over 16 minutes,
> > high ratio of error responses in short time bursts (500 or even corrupt
> > responses)).
> >
> > 2) When dynamically reconfiguring the reverse proxies (to simulate adding
> > new
> >
> > workspaces), Traefik seems to have a slight edge over Nginx:
> >   a) Nginx again showing some odd outliers making its p99 response time a
> >
> > 3rd
> > slower than Traefik (while p95 is roughly the same).
> >
> >   b) Traefik is faster in establishing a new route but the difference is
> >
> > getting smaller with the increased static load on the servers.
> >
> > 3) Nginx seems to be slightly faster at handling websocket traffic.
> >
> > 4) Traefik cannot correctly handle path rewrites in Set-Cookie headers,
> > while
> > Nginx can. After discussing this, we concluded that this is not a blocker
> > for
> > us because applications generally need to be aware whether they are being
> > deployed behind a reverse proxy and need to handle this in a way that
> > Traefik
> > supports (X-Forwarded-For headers and the like).
> >
> > Given the overall comparable results with both of the solutions, we
> > decided
> > for Traefik because of its more predictable and stable performance and
> > ease of
> > use.
> >
> > At the same time, given the similarities in how the two solutions are
> > configured, we feel confident that if we needed to change our minds later
> > when
> > we properly integrated the solution into Che as a whole, it would not be
> > difficult to swap them around.
> >
> > Lukas
> >
> > On Thursday, July 2, 2020 12:55:43 PM CEST Lukas Krejci wrote:
> > > To follow up on this,
> > >
> > > we have finally finished our performance tests with Envoy and while it
> > > offers very nice option for dynamic reconfiguration we have found it
> > > performing significantly slower under highly dynamic load (e.g. when we
> > > simulated adding new workspaces) than the others (traefik, nginx,
> >
> > haproxy).
> >
> > > We have not yet made a team decision but IMHO for the reasons above,
> >
> > we're
> >
> > > going to be looking at the other alternatives.
> > >
> > > On Saturday, June 6, 2020 9:47:09 PM CEST Lukas Krejci wrote:
> > > > We have not! I will definitely look into it.
> > > >
> > > > On Saturday, June 6, 2020 9:22:10 AM CEST Gorkem Ercan wrote:
> > > > > Have you considered Envoy[1] as an alternative?
> > > > > Knative Kourier has a similar usage which uses envoy underneath.
> > > > >
> > > > > [1] https://www.envoyproxy.io/
> > > > > [2] https://github.com/knative/net-kourier
> > > > >
> > > > > On Tue, Jun 2, 2020 at 8:22 AM Lukas Krejci <lkrejci@xxxxxxxxxx>
> >
> > wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I am following up on the topic of enabling single-host on
> >
> > OpenShift.
> >
> > > > > > We have concluded the performance tests and I would like to
> >
> > present to
> >
> > > > > > you
> > > > > > the
> > > > > > results that we have found.
> > > > > >
> > > > > > tl;dr There is no clear winning solution.
> > > > > >
> > > > > > In our testing we concentrated on 3 areas. The performance of
> >
> > routing
> >
> > > > > > of
> > > > > > the
> > > > > > HTTP traffic, performance of Websocket communication and correct
> > > > > > handling
> > > > > > of
> > > > > > cookies under path rewriting.
> > > > > >
> > > > > > We were trying to choose between 3 candidates for the HTTP gateway
> > > > > > that
> > > > > > we
> > > > > > identified in the prior POCs:
> > > > > >
> > > > > > * HAProxy
> > > > > > * Nginx
> > > > > > * Traefik
> > > > > >
> > > > > > Unfortunately, none of them came out of our testing with flying
> > > > > > colors.
> > > > > >
> > > > > > Generally, HTTP and websockets have somewhat unsurprisingly very
> > > > > > similar
> > > > > > performance profile in each of the solutions so I won't be
> >
> > discussing
> >
> > > > > > them
> > > > > > separately.
> > > > > >
> > > > > > == HAProxy
> > > > > > Pros:
> > > > > > * fast and hardware-efficient even under high load
> > > > > > Cons:
> > > > > > * Some issues with live reconfiguration
> > > > > > ** The slowest to establish a new route within the gateway
> > > > > > ** Rare routing errors
> > > > > >
> > > > > > == Nginx
> > > > > > Pros:
> > > > > > * fast and hardware-efficient under moderate load
> > > > > > * Stable under live reconfiguration
> > > > > > Cons:
> > > > > > * "Flappy" performance under high load - high variance in response
> > > > > > times
> > > > > > * Rare routing errors under high load
> > > > > >
> > > > > > == Traefik
> > > > > > Pros:
> > > > > > * Performant
> > > > > > * Best support for live reconfiguration
> > > > > > * Support for OAuth and other "modern" features that we could take
> > > > > > advantage
> > > > > > of in the future
> > > > > > Cons:
> > > > > > * BLOCKER - incorrect handling of cookies defined on a specific
> >
> > path.
> >
> > > > > > Such
> > > > > > cookie paths are not rewritten along with requests. This is
> > > > > > essentially
> > > > > > a
> > > > > > security issue because it would enable auth cookie
> > > > > > overwriting/stealing.
> > > > > > * Higher hardware requirements (especially CPU under higher load)
> > > > > >
> > > > > > Our current favorite is Traefik despite the blocking issue of
> > > > > > incorrect
> > > > > > cookie
> > > > > > handling. We think it might be worth trying to fix that and get a
> > > > > > solution
> > > > > > that seems to be the most stable of the 3. If fixing Traefik
> > > > > > proves
> > > > > > too
> > > > > > difficult, our second choice would probably be nginx but that
> > > > > > would
> > > > > > require
> > > > > > further testing.
> > > > > >
> > > > > > We will present our findings with all the fancy graphs and
> >
> > discussion
> >
> > > > > > on
> > > > > > the
> > > > > > next community call.
> > > > > >
> > > > > > We have now concluded our performance testing though and are
> > > > > > moving
> > > > > > forward
> > > > > > with the actual implementation (and will soon pick the gateway
> > > > > > solution).
> > > > > >
> > > > > > We'd appreciate your feedback and advice on any of the above
> >
> > detailed
> >
> > > > > > pros
> > > > > > or
> > > > > > cons.
> > > > > >
> > > > > > You can check our progress on this epic at
> > > > > > https://github.com/eclipse/che/issues/12914.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Lukas
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > che-dev mailing list
> > > > > > che-dev@xxxxxxxxxxx
> > > > > > To unsubscribe from this list, visit
> > > > > > https://www.eclipse.org/mailman/listinfo/che-dev
> >
> > _______________________________________________
> > che-dev mailing list
> > che-dev@xxxxxxxxxxx
> > To unsubscribe from this list, visit
> > https://www.eclipse.org/mailman/listinfo/che-dev




_______________________________________________
che-dev mailing list
che-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/che-dev

Back to the top