Yes, I do use FileLocker.java. The difficult thing is to test and debug in our cloud environment. Remote debugging of multiple processes is not supported.
Currently it looks like file locking of NFS has some problems. One process sets the lock but the locking is not synchronized in the network file system immediately
so the other process manages to lock as well (at least this is our theory).
From: orion-dev-bounces@xxxxxxxxxxx [mailto:orion-dev-bounces@xxxxxxxxxxx]
On Behalf Of John Arthorne
Sent: יום ג 27 אוקטובר 2015 16:17
To: orion-dev@xxxxxxxxxxx
Subject: Re: [orion-dev] Orion high availability support
Ok, that sounds reasonable as long as the lock is only at a single user level, and not wider. You have probably already seen the existing FileLocker helper class that we have
to perform this kind of file system locking.
----- Original message -----
From: "Rozenszajn, Sergio" <sergio.rozenszajn@xxxxxxx>
Sent by: orion-dev-bounces@xxxxxxxxxxx
To: Orion developer discussions <orion-dev@xxxxxxxxxxx>
Cc: "Tsentsiper, Oleg" <oleg.tsentsiper@xxxxxxx>
Subject: Re: [orion-dev] Orion high availability support
Date: Tue, Oct 27, 2015 10:05 AM
Hi John,
I've started to work on this topic. Currently the direction is to create a special "process.lock" file in user's workspace that is used to lock the entire create/delete project
processes in SimpleMetaStore.java.
We test it by creating projects from different VM's for the same user in parallel.
Sergio
From:
orion-dev-bounces@xxxxxxxxxxx [mailto:orion-dev-bounces@xxxxxxxxxxx]
On Behalf Of John Arthorne
Sent: יום ג 27 אוקטובר 2015 15:01
To:
orion-dev@xxxxxxxxxxx
Subject: Re: [orion-dev] Orion high availability support
Currently we do not have maintenance branches for previous releases. However if it is important for you, and you provide the pull requests for the backport, we could release into the stable build
branch that R8.0 was built from:
We don't have any mechanism in place to run builds on old branches, but it would at least give you a stable branch point to build from.
----- Original message -----
From: "Rozenszajn, Sergio" <sergio.rozenszajn@xxxxxxx>
Sent by: orion-dev-bounces@xxxxxxxxxxx
To: Orion developer discussions <orion-dev@xxxxxxxxxxx>
Cc: "Epstein, Tomer" <tomer.epstein@xxxxxxx>, "Tsentsiper, Oleg" <oleg.tsentsiper@xxxxxxx>, "Krigsman, Shoham" <shoham.krigsman@xxxxxxx>
Subject: Re: [orion-dev] Orion high availability support
Date: Sun, Oct 25, 2015 2:39 AM
Hi John,
In our project we work on Orion 8 version. Will this bug correction be applied to previous versions (8/9) or it will be applied to master (Orion 10) only?
Sergio
From:
orion-dev-bounces@xxxxxxxxxxx
[mailto:orion-dev-bounces@xxxxxxxxxxx]
On Behalf Of John Arthorne
Sent: יום ה 22 אוקטובר 2015 22:14
To:
orion-dev@xxxxxxxxxxx
Cc: Epstein, Tomer <tomer.epstein@xxxxxxx>;
Tsentsiper, Oleg <oleg.tsentsiper@xxxxxxx>;
orion-dev@xxxxxxxxxxx
Subject: Re: [orion-dev] Orion high availability support
I have dug into this a bit, and you are right. There was a bug fixed in Orion 9 that prevented any kind of corruption due to concurrent writes:
However in some cases (including project create/delete) the locking is performed at too low level, so although there is no data corruption, there is a race condition where a metadata change can
be lost. I have entered a new bug report for this:
Are you interested in helping to work on this?
----- Original message -----
From: "Rozenszajn, Sergio" <sergio.rozenszajn@xxxxxxx>
Sent by: orion-dev-bounces@xxxxxxxxxxx
To: Orion developer discussions <orion-dev@xxxxxxxxxxx>
Cc: "Epstein, Tomer" <tomer.epstein@xxxxxxx>, "Tsentsiper, Oleg" <oleg.tsentsiper@xxxxxxx>
Subject: Re: [orion-dev] Orion high availability support
Date: Thu, Oct 22, 2015 9:03 AM
Hi John,
I've followed these steps and I still find issues when trying to update metastore.json from several processes.
In my opinion the problem is in SimpleMetastore.java.
For example:
Process 1 wants to delete proj1
Process 2 wants to delete proj2
1.
Process 1 calls SimpleMetastore.deleteProject(proj1)
a.
User lock is set
b.
metastore.json is read (fileLocker is set and released)
c.
proj1 is deleted from file system
d.
proj1 is deleted from local copy of metastore.json
e.
metastore.json is updated in file system ((fileLocker is set and released))
Process 1 overrides changes from Process 2
f.
User lock is released
2.
Process 2 calls SimpleMetastore.deleteProject(proj2)
a.
User lock is set (no problem because is another process)
b.
metastore.json is read (fileLocker is set and released)
c.
proj2 is deleted from file system
d.
proj2 is deleted from local copy of metastore.json
e.
metastore.json is updated in file system ((fileLocker is set and released))
before Process 1 updates metastore
f.
User lock is released
I don't think the above mentioned scenario is supported by Orion locking mechanism. In order to support this scenario User lock should be cross process (e.g.: FileLocker on User.json).
Best Regards
Sergio
From:
orion-dev-bounces@xxxxxxxxxxx
[mailto:orion-dev-bounces@xxxxxxxxxxx]
On Behalf Of John Arthorne
Sent: יום ד 21 אוקטובר 2015 20:28
To:
orion-dev@xxxxxxxxxxx
Subject: Re: [orion-dev] Orion high availability support
This is certainly supported, but how to do it is not very well documented. I found this blog post that gives some of the explanation:
I have just copied some of the core information into the orion server admin guide:
If you follow these steps and still see problems, please follow up here with more questions!
----- Original message -----
From: "Rozenszajn, Sergio" <sergio.rozenszajn@xxxxxxx>
Sent by: orion-dev-bounces@xxxxxxxxxxx
To: "orion-dev@xxxxxxxxxxx" <orion-dev@xxxxxxxxxxx>
Cc:
Subject: [orion-dev] Orion high availability support
Date: Wed, Oct 21, 2015 2:40 AM
Hi,
We are trying to implement Orion 8 with high availability. We will have multiple VM's that run Orion with a shared file system where the workspace files are maintained. We have turned on FileLocker.locking.
In one of our automatic tests we log on with same user id from different sessions/different VM's. In this test each session tries to create multiple projects in parallel.
As a result we get this error:
2015 10 21 05:40:09#+00#ERROR#org.eclipse.orion.server.config##anonymous#http-bio-8041-exec-6#na#hcproxy#orion#web#hcproxy#Meta File Error, cannot read JSON file /mnt/perm_storage/persistent/orion_web/xe/xeee1cc67/P9/P900902/xeee1cc67$P900902-OrionContent.json
from disk, reason: A JSONObject text must begin with '{' at character 0
When doing the same from a single VM, it works fine.
I assume that we have here a locking issue when updating metastore.json file from different processes.
I was planning to use a FileLocker lock in SimpleMetaStore.java, instead of the ReadWriteLock to have a lock on file level that protects from cross process access.
The question is if you have already noticed this issue and if there is a proposed solution.
Best Regards
Sergio Rozenszajn
|