[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [che-dev] How to collect and persist all workspace logs?
|
fix: global collector is without the rice, of course... facepalm,
clipboard went crazy or what...
On Tue, Feb 4, 2020 at 11:10 AM Michal Vala <mvala@xxxxxxxxxx> wrote:
>
> Hello team,
> we've got into troubles with the implementation of writing the logs of
> container's stdout to the files. It is quite unfortunate as we've
> spent some time analyzing the feature, but that's life.
>
> Original idea was to modify the command of the image so it would
> redirect the output into the file, something like `<command> | tee
> c1.log`. However, it is very hard to impossible to achieve that. My
> idea was to pass the `args` to the container command. This does not
> work and I think that it's caused by arguments being passed in quotes
> under the hood so it became something like `<command> '| tee c1.log'`,
> which does not do what we want. To actually update the command, we
> would need to have full image pulled and then somehow inspect it to
> get the original command and update it. This would mean very high
> intervention in current workspace startup logic with uncertain result
> and high risk.
>
> So where to go next? We have few ideas:
>
> # Namespace log collector component (I have a working prototype of this)
> - will run in extra pod in the namespace of the workspace
> - will be watching for workspace pods and when there's some and
> running, it will start follow the logs of all containers and write
> them to files
> - one instance per namespace
> - lifecycle managed by che-server (can scale down when no workspace
> is running and scale up before first workspace start)
> pros:
> - should be quite gentle with hw resources (TODO: measure),
> especially with many workspaces in the same namespace
> - outlive the workspace lifetime, so we should be able to get all the logs
> - logs could be provided to the backend within the same component
> - should be possible to manage file logs from inside the containers
> with this component
> cons:
> - needs extra PVC for logs XOR use workspace's PVC with limitation
> that all workspaces will need to run on one node and the logic will
> have to be more complex to reflect different Che PVC strategies
> - for "namespace per workspace" or "only one workspace per user"
> scenarios, same hw requirements as a sidecar collector
>
>
> # Workspace log collector sidecar
> - will run as a workspace pod as a sidecar
> - will follow all the container logs of the workspace and write them to PVC
> pros:
> - no issues with PVC access from multiple pods
> - same lifecycle as the workspace, so it's easier to deploy with
> current server logic ("just" add another sidecar)
> - easiest to get file logs from inside the containers as we're in the same pod
> cons:
> - same lifecycle as the workspace, so we're not sure we get all the
> logs before collector is killed
> - extra hw resources consumed per each workspace
> - we will need another component to send the logs to the backend as
> we can't rely on workspace pod will manage it in time on workspace
> crash
>
>
> # Global che-server log collectora ze sojového masa
> 200g rýže bas
> - che-server will watch and follow the logs of all workspaces and
> write them to PVC/Database/?
> pros:
> - no extra hw resources per workspace/namespace
> - logs are collected directly to the place where they can be
> requested so not much extra coding needed to make them accessible on
> server API
> cons:
> - higher network traffic workspace ⇔ che-server
> - keep the connection to all workspaces open all the time
> - higher hw requirements on che-server
> - hard to impossible to get file logs from inside the containers,
> probably will need another component that will run on-exit inside
> workspace's namespace
>
>
>
> Important question here is how hard requirement is to get the file
> logs from the inside of the containers (e.g. language servers)? This
> can be an important thing to decide which way to go.
>
>
> Thanks!
> Michal
>
>
> On Thu, Jan 23, 2020 at 5:04 AM Michal Vala <mvala@xxxxxxxxxx> wrote:
> >
> > Hello team,
> >
> > we're currently working on improving diagnosis capabilities[1] of workspaces, to
> > be more concrete, how to get all logs from the workspace[2]. We're in phase of
> > investigating options and prototyping and we've came up with several variants
> > how to achieve the goal. We would like to know your opinion and new ideas.
> >
> > Requirements:
> > - collect all logs of all containers from the workspace
> > - stdout/err as well as file logs inside the container
> > - keep history of last 5 runs of the workspace
> > - collect logs of crashed workspace
> > - make logs easily accessible to the user (rest API + dashboard view)
> >
> >
> > I've splitted the effort into two sections:
> >
> > ### How to collect:
> >
> > # log everything to files to mounted PV
> > - just mount PV and log everything there
> > - pros
> > - not much extra overhead, only write stdout/err to the file
> > and mount PV
> > - don't need extra hw resources (memory/cpu)
> > - cons
> > - we might need to override `command` of all containers. They will
> > have to run with extra parameters to write stdout/err to the file.
> > Something like `<command> 2>&1 | tee ws.log`
> >
> > # workspace collector sidecar (kubernetes/client-go app?)
> > - pros
> > - per workspace
> > - dynamic and powerful
> > - cons
> > - very custom solution and might be hard to manage/maintain
> > - unknown performance and hw resources requirements
> > - hard when ws crash
> > - need more memory per workspace, even if user does not use it and
> > everything works as expected
> >
> > # watch and collect from master
> > - pros
> > - easy to grab logs and events
> > - easy to access archived logs
> > - cons
> > - only container's stderr/out
> > - keep the connection to ws
> > - more network traffic
> > - increase memory footprint of mastaer
> >
> > # kubernetes native
> > - change the logging backend of kubernetes [3]
> > - pros
> > - standard k8s way, "googleable"
> > - cons
> > - depends on kubernetes deployment
> > - needs extra cluster component/configuration
> > - only stdout/err of containers
> >
> > # push logs directly from containers to logging backend
> > - cons
> > - customize all components to log to the backend
> > - performance and hw resources overhead
> >
> > # collect on workspace exit
> > - mount PV and log there. When workspace exits, start collector pod that
> > grabs the logs and "archive" them.
> > - pros
> > - not much extra overhead
> > - cons
> > - don't have logs of running workspace
> > - custom collector pod
> >
> >
> > ### Where to store and how to access:
> >
> > # Workspace PV
> > - pros
> > - easy to set quota per user
> > - cons
> > - harder to access (need to start some pod at workspace's namespace)
> > - lost when delete namespace
> >
> > # Che PV
> > - pros
> > - easier to access
> > - cons
> > - harder to set quota per user
> > - harder to scale and manage
> > - possible performance bottleneck
> >
> > # PostgreSQL
> > - pros
> > - the easiest to access
> > - cons
> > - harder to set quota per user
> > - harder to scale and manage
> > - possible performance bottleneck
> >
> >
> > There is one remaining and very important question we have not investigated
> > much. We need to somehow configure all plugins/editors and other components, to
> > tell where they have all log files that should be collected. Otherwise, we
> > would not be able to find the logs on containers. We would need to
> > handle that in
> > plugin's `meta.yaml` as well as in the devfile.
> >
> > What's next?
> > We would like to investigate and prototype following solution:
> > - collect all ws logs into files and store in PV in the workspace
> > - watch ws events from master and on exit, start the collector pod that will
> > collect all the logs and pass them to the backend. Logs backend
> > is something
> > to be done. It might be only PV dedicated to archiving log, or some new
> > service, or Che master.
> > - prototype new Che master API to access the logs. If we store
> > them in workspace's PV,
> > start the collector pod on demand to access the logs.
> >
> >
> > We would very much welcome any opinions or ideas.
> >
> >
> > [1] - https://github.com/eclipse/che/issues/15047
> > [2] - https://github.com/eclipse/che/issues/15134
> > [3] - https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
>
>
>
> --
> Michal Vala
> Software Engineer, Eclipse Che
> Red Hat Czech
--
Michal Vala
Software Engineer, Eclipse Che
Red Hat Czech