Re: [che-dev] How to collect and persist all workspace logs?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [che-dev] How to collect and persist all workspace logs?

From: Michal Vala <mvala@xxxxxxxxxx>
Date: Tue, 4 Feb 2020 11:53:51 +0100
Delivered-to: che-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/che-dev>
List-help: <mailto:che-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/che-dev>, <mailto:che-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/che-dev>, <mailto:che-dev-request@eclipse.org?subject=unsubscribe>

fix: global collector is without the rice, of course... facepalm,
clipboard went crazy or what...

On Tue, Feb 4, 2020 at 11:10 AM Michal Vala <mvala@xxxxxxxxxx> wrote:
>
> Hello team,
> we've got into troubles with the implementation of writing the logs of
> container's stdout to the files. It is quite unfortunate as we've
> spent some time analyzing the feature, but that's life.
>
> Original idea was to modify the command of the image so it would
> redirect the output into the file, something like `<command> | tee
> c1.log`. However, it is very hard to impossible to achieve that. My
> idea was to pass the `args` to the container command. This does not
> work and I think that it's caused by arguments being passed in quotes
> under the hood so it became something like `<command> '| tee c1.log'`,
> which does not do what we want. To actually update the command, we
> would need to have full image pulled and then somehow inspect it to
> get the original command and update it. This would mean very high
> intervention in current workspace startup logic with uncertain result
> and high risk.
>
> So where to go next? We have few ideas:
>
> # Namespace log collector component (I have a working prototype of this)
>   - will run in extra pod in the namespace of the workspace
>   - will be watching for workspace pods and when there's some and
> running, it will start follow the logs of all containers and write
> them to files
>   - one instance per namespace
>   - lifecycle managed by che-server (can scale down when no workspace
> is running and scale up before first workspace start)
> pros:
>   - should be quite gentle with hw resources (TODO: measure),
> especially with many workspaces in the same namespace
>   - outlive the workspace lifetime, so we should be able to get all the logs
>   - logs could be provided to the backend within the same component
>   - should be possible to manage file logs from inside the containers
> with this component
> cons:
>   - needs extra PVC for logs XOR use workspace's PVC with limitation
> that all workspaces will need to run on one node and the logic will
> have to be more complex to reflect different Che PVC strategies
>   - for "namespace per workspace" or "only one workspace per user"
> scenarios, same hw requirements as a sidecar collector
>
>
> # Workspace log collector sidecar
>   - will run as a workspace pod as a sidecar
>   - will follow all the container logs of the workspace and write them to PVC
> pros:
>   - no issues with PVC access from multiple pods
>   - same lifecycle as the workspace, so it's easier to deploy with
> current server logic ("just" add another sidecar)
>   - easiest to get file logs from inside the containers as we're in the same pod
> cons:
>   - same lifecycle as the workspace, so we're not sure we get all the
> logs before collector is killed
>   - extra hw resources consumed per each workspace
>   - we will need another component to send the logs to the backend as
> we can't rely on workspace pod will manage it in time on workspace
> crash
>
>
> # Global che-server log collectora ze sojového masa
> 200g rýže bas
>   - che-server will watch and follow the logs of all workspaces and
> write them to PVC/Database/?
> pros:
>   - no extra hw resources per workspace/namespace
>   - logs are collected directly to the place where they can be
> requested so not much extra coding needed to make them accessible on
> server API
> cons:
>   - higher network traffic workspace ⇔ che-server
>   - keep the connection to all workspaces open all the time
>   - higher hw requirements on che-server
>   - hard to impossible to get file logs from inside the containers,
> probably will need another component that will run on-exit inside
> workspace's namespace
>
>
>
> Important question here is how hard requirement is to get the file
> logs from the inside of the containers (e.g. language servers)? This
> can be an important thing to decide which way to go.
>
>
> Thanks!
> Michal
>
>
> On Thu, Jan 23, 2020 at 5:04 AM Michal Vala <mvala@xxxxxxxxxx> wrote:
> >
> > Hello team,
> >
> > we're currently working on improving diagnosis capabilities[1] of workspaces, to
> > be more concrete, how to get all logs from the workspace[2]. We're in phase of
> > investigating options and prototyping and we've came up with several variants
> > how to achieve the goal. We would like to know your opinion and new ideas.
> >
> > Requirements:
> >   - collect all logs of all containers from the workspace
> >   - stdout/err as well as file logs inside the container
> >   - keep history of last 5 runs of the workspace
> >   - collect logs of crashed workspace
> >   - make logs easily accessible to the user (rest API + dashboard view)
> >
> >
> > I've splitted the effort into two sections:
> >
> >   ### How to collect:
> >
> >     # log everything to files to mounted PV
> >       - just mount PV and log everything there
> >       - pros
> >         - not much extra overhead, only write stdout/err to the file
> > and mount PV
> >         - don't need extra hw resources (memory/cpu)
> >       - cons
> >         - we might need to override `command` of all containers. They will
> >           have to run with extra parameters to write stdout/err to the file.
> >           Something like `<command> 2>&1 | tee ws.log`
> >
> >     # workspace collector sidecar (kubernetes/client-go app?)
> >       - pros
> >         - per workspace
> >         - dynamic and powerful
> >       - cons
> >         - very custom solution and might be hard to manage/maintain
> >         - unknown performance and hw resources requirements
> >         - hard when ws crash
> >         - need more memory per workspace, even if user does not use it and
> >           everything works as expected
> >
> >     # watch and collect from master
> >       - pros
> >         - easy to grab logs and events
> >         - easy to access archived logs
> >       - cons
> >         - only container's stderr/out
> >         - keep the connection to ws
> >         - more network traffic
> >         - increase memory footprint of mastaer
> >
> >     # kubernetes native
> >       - change the logging backend of kubernetes [3]
> >       - pros
> >         - standard k8s way, "googleable"
> >       - cons
> >         - depends on kubernetes deployment
> >         - needs extra cluster component/configuration
> >         - only stdout/err of containers
> >
> >     # push logs directly from containers to logging backend
> >       - cons
> >         - customize all components to log to the backend
> >         - performance and hw resources overhead
> >
> >     # collect on workspace exit
> >       - mount PV and log there. When workspace exits, start collector pod that
> >           grabs the logs and "archive" them.
> >       - pros
> >         - not much extra overhead
> >       - cons
> >         - don't have logs of running workspace
> >         - custom collector pod
> >
> >
> >   ### Where to store and how to access:
> >
> >     # Workspace PV
> >       - pros
> >         - easy to set quota per user
> >       - cons
> >         - harder to access (need to start some pod at workspace's namespace)
> >         - lost when delete namespace
> >
> >     # Che PV
> >       - pros
> >         - easier to access
> >       - cons
> >         - harder to set quota per user
> >         - harder to scale and manage
> >         - possible performance bottleneck
> >
> >     # PostgreSQL
> >       - pros
> >         - the easiest to access
> >       - cons
> >         - harder to set quota per user
> >         - harder to scale and manage
> >         - possible performance bottleneck
> >
> >
> > There is one remaining and very important question we have not investigated
> > much. We need to somehow configure all plugins/editors and other components, to
> > tell where they have all log files that should be collected. Otherwise, we
> > would not be able to find the logs on containers. We would need to
> > handle that in
> > plugin's `meta.yaml` as well as in the devfile.
> >
> > What's next?
> >   We would like to investigate and prototype following solution:
> >     - collect all ws logs into files and store in PV in the workspace
> >     - watch ws events from master and on exit, start the collector pod that will
> >       collect all the logs and pass them to the backend. Logs backend
> > is something
> >       to be done. It might be only PV dedicated to archiving log, or some new
> >       service, or Che master.
> >     - prototype new Che master API to access the logs. If we store
> > them in workspace's PV,
> >       start the collector pod on demand to access the logs.
> >
> >
> > We would very much welcome any opinions or ideas.
> >
> >
> > [1] - https://github.com/eclipse/che/issues/15047
> > [2] - https://github.com/eclipse/che/issues/15134
> > [3] - https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
>
>
>
> --
> Michal Vala
> Software Engineer, Eclipse Che
> Red Hat Czech



-- 
Michal Vala
Software Engineer, Eclipse Che
Red Hat Czech

Follow-Ups:
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Mario Loriedo
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Anatolii Bazko

References:
- [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala
- Re: [che-dev] How to collect and persist all workspace logs?
  - From: Michal Vala

Prev by Date: Re: [che-dev] How to collect and persist all workspace logs?
Next by Date: Re: [che-dev] Changes to the Community Call
Previous by thread: Re: [che-dev] How to collect and persist all workspace logs?
Next by thread: Re: [che-dev] How to collect and persist all workspace logs?
Index(es):
- Date
- Thread

Breadcrumbs