Re: [che-dev] How to collect and persist all workspace logs?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [che-dev] How to collect and persist all workspace logs?

From: Michal Vala <mvala@xxxxxxxxxx>
Date: Mon, 27 Jan 2020 10:23:44 +0100
Delivered-to: che-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/che-dev>
List-help: <mailto:che-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/che-dev>, <mailto:che-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/che-dev>, <mailto:che-dev-request@eclipse.org?subject=unsubscribe>

Hello,

on one side, we're well aware of architecture direction and we're
trying to honour that. On the other hand it's still only a plan, which
might change, so it is hard to stick to current architecture proposals
100%. And we need to support current setup for possibly significant
amount of time. Also, logs should be accessible primary through the
dashboard, which will not be available in "lightweight" mode, so it
might simply be the feature for full deployment only.

Anyway, I think our current proposal does not close the door for
either option. We will collect logs to workspace PV. The tricky part
is how to "archive" them. In our proposal, we will be watching
workspace pod for exit event and then start new collector component
pod that will grab the logs from workspace PV and provides them to the
"backend". I can imagine backend to be PV dedicated to logs on
workspace namespace (for lightweight deployment) or che-server API
(for full deployment) or any other service actually. This will depend
on the configuration. The question of the actual storage is still open
and we definitely don't want to hard-wire only one way how to do it.

Using workspace POD directly might be problematic. Typically we need
logs the most when some nasty crash happen and sending logs from ws
POD directly might lead to losing some of them, possibly in most
critical time. Also hw resources, performance and network bandwidth
costs might not be negligible and it would be consumed for every
workspace instance, even when everything works fine and no-one would
request the logs.
That's why we chose to write the logs to the files on ws PV, which is
minimal to no extra cost, and do the hard work of archiving them after
ws stops.

Does it make sense?

Can you please elaborate more what do you mean by "compatible with
typical K8S logging infrastructures"?

Thanks,
Michal

On Thu, Jan 23, 2020 at 1:09 PM David Festal <dfestal@xxxxxxxxxx> wrote:
>
> Hi all,
>
> Just some thoughts about unified workspace logging, the ongoing work on the Workspace CRD, and cloud shell.
>
> According to this EPIC https://github.com/eclipse/che/issues/15425,
> there will be, at some point, the ability to start Che 7 workspaces in a lightweight,
> standalone, and embeddable way, without requiring the presence of the Che master.
>
> One important point mentioned in this EPIC, is the big scalability gain that would be brought,
> in this envisioned K8S-native architecture, by removing the Postgres database in favor of K8S-native Custom Resources
> that will benefit from the highly-scalable etcd storage underpinning K8S clusters.
>
> It seems to me that these 2 points:
> - Speak *against*:
>   - Using the wsmaster server or some sort of required centralized component, either to store the logs, or to collect them.
>     In this regard using the Postgres database seems the worst choice according to the future architectural directions
> - Speak *in favor of*:
>   - Collecting the logs locally in the workspace and sending them from the workspace POD to a logging mechanism,
>     hopefully in a way that would be compatible with typical K8S logging infrastructures.
>
> David.
>
> Le jeu. 23 janv. 2020 à 05:09, Michal Vala <mvala@xxxxxxxxxx> a écrit :
>>
>> Hello team,
>>
>> we're currently working on improving diagnosis capabilities[1] of workspaces, to
>> be more concrete, how to get all logs from the workspace[2]. We're in phase of
>> investigating options and prototyping and we've came up with several variants
>> how to achieve the goal. We would like to know your opinion and new ideas.
>>
>> Requirements:
>>   - collect all logs of all containers from the workspace
>>   - stdout/err as well as file logs inside the container
>>   - keep history of last 5 runs of the workspace
>>   - collect logs of crashed workspace
>>   - make logs easily accessible to the user (rest API + dashboard view)
>>
>>
>> I've splitted the effort into two sections:
>>
>>   ### How to collect:
>>
>>     # log everything to files to mounted PV
>>       - just mount PV and log everything there
>>       - pros
>>         - not much extra overhead, only write stdout/err to the file
>> and mount PV
>>         - don't need extra hw resources (memory/cpu)
>>       - cons
>>         - we might need to override `command` of all containers. They will
>>           have to run with extra parameters to write stdout/err to the file.
>>           Something like `<command> 2>&1 | tee ws.log`
>>
>>     # workspace collector sidecar (kubernetes/client-go app?)
>>       - pros
>>         - per workspace
>>         - dynamic and powerful
>>       - cons
>>         - very custom solution and might be hard to manage/maintain
>>         - unknown performance and hw resources requirements
>>         - hard when ws crash
>>         - need more memory per workspace, even if user does not use it and
>>           everything works as expected
>>
>>     # watch and collect from master
>>       - pros
>>         - easy to grab logs and events
>>         - easy to access archived logs
>>       - cons
>>         - only container's stderr/out
>>         - keep the connection to ws
>>         - more network traffic
>>         - increase memory footprint of mastaer
>>
>>     # kubernetes native
>>       - change the logging backend of kubernetes [3]
>>       - pros
>>         - standard k8s way, "googleable"
>>       - cons
>>         - depends on kubernetes deployment
>>         - needs extra cluster component/configuration
>>         - only stdout/err of containers
>>
>>     # push logs directly from containers to logging backend
>>       - cons
>>         - customize all components to log to the backend
>>         - performance and hw resources overhead
>>
>>     # collect on workspace exit
>>       - mount PV and log there. When workspace exits, start collector pod that
>>           grabs the logs and "archive" them.
>>       - pros
>>         - not much extra overhead
>>       - cons
>>         - don't have logs of running workspace
>>         - custom collector pod
>>
>>
>>   ### Where to store and how to access:
>>
>>     # Workspace PV
>>       - pros
>>         - easy to set quota per user
>>       - cons
>>         - harder to access (need to start some pod at workspace's namespace)
>>         - lost when delete namespace
>>
>>     # Che PV
>>       - pros
>>         - easier to access
>>       - cons
>>         - harder to set quota per user
>>         - harder to scale and manage
>>         - possible performance bottleneck
>>
>>     # PostgreSQL
>>       - pros
>>         - the easiest to access
>>       - cons
>>         - harder to set quota per user
>>         - harder to scale and manage
>>         - possible performance bottleneck
>>
>>
>> There is one remaining and very important question we have not investigated
>> much. We need to somehow configure all plugins/editors and other components, to
>> tell where they have all log files that should be collected. Otherwise, we
>> would not be able to find the logs on containers. We would need to
>> handle that in
>> plugin's `meta.yaml` as well as in the devfile.
>>
>> What's next?
>>   We would like to investigate and prototype following solution:
>>     - collect all ws logs into files and store in PV in the workspace
>>     - watch ws events from master and on exit, start the collector pod that will
>>       collect all the logs and pass them to the backend. Logs backend
>> is something
>>       to be done. It might be only PV dedicated to archiving log, or some new
>>       service, or Che master.
>>     - prototype new Che master API to access the logs. If we store
>> them in workspace's PV,
>>       start the collector pod on demand to access the logs.
>>
>>
>> We would very much welcome any opinions or ideas.
>>
>>
>> [1] - https://github.com/eclipse/che/issues/15047
>> [2] - https://github.com/eclipse/che/issues/15134
>> [3] - https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
>>
>> _______________________________________________
>> che-dev mailing list
>> che-dev@xxxxxxxxxxx
>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>> https://www.eclipse.org/mailman/listinfo/che-dev
>>
>
>
> --
>
> David Festal
>
> Principal Software Engineer, DevTools
>
> Red Hat France
>
> dfestal@xxxxxxxxxx
>
>
> _______________________________________________
> che-dev mailing list
> che-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://www.eclipse.org/mailman/listinfo/che-dev



-- 
Michal Vala
Software Engineer, Eclipse Che
Red Hat Czech

References:
- [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: David Festal

Prev by Date: [che-dev] Status of E2E tests check of Eclipse Che repo PR
Next by Date: [che-dev] Committer Election for Yana Hontyk on Eclipse Che has started
Previous by thread: Re: [che-dev] How to collect and persist all workspace logs?
Next by thread: Re: [che-dev] How to collect and persist all workspace logs?
Index(es):
- Date
- Thread

Breadcrumbs