Hi Michal,
Thanks for these analysis and sorry for the late reply.
I don't think that building a log collecting system from scratch is the right approach (it's painful). What about going through the kube native direction you mentioned in your first mail? Grafana loki or fluentd are projects that may solve our problem.
Another aspect is that querying the workspaces logs is more an admin user story than a user one. Tools like elastic search and grafana provide a good UX for that, I would NOT build a Che UI component and a new wsmaster API for that. As for monitoring the logging collection should be optional and an admin could choose to activate it if he wants.
But anyway, although the admin scenario is important, I believe the original problem we were trying to solve was more a user problem. We want to make it easy for a user to troubleshoot a workspaces that:
- fails to start
- is not behaving correctly (i.e. a LS doesn't work as expected)
We have already made some good progress on troubleshooting (better messages) but there are still some cases where it's hard to figure out what's going. For those cases, providing the logs to the user would help. But I am not sure that persisting the logs is necessary:
- when an error happens at workspaces start we should provide: wsmaster logs, kubernetes events, containers status and logs from the workspace pod and from the plugin broker.
- at anytime when a workspace is running a user should be able to see/tail or download all the logs (theia, LS and other plugins) via a specific command within theia