Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] stable-4.1 fixes


On Apr 10, 2018, at 4:50 PM, Matthias Sohn <matthias.sohn@xxxxxxxxx> wrote:

On Wed, Apr 11, 2018 at 12:45 AM, Nasser Grainawi <nasser@xxxxxxxxxxxxxx> wrote:

On Apr 10, 2018, at 4:25 PM, Matthias Sohn <matthias.sohn@xxxxxxxxx> wrote:

On Tue, Apr 10, 2018 at 10:36 PM, Nasser Grainawi <nasser@codeaurora.org> wrote:

On Mar 29, 2018, at 10:21 AM, Nasser Grainawi <nasser@xxxxxxxxxxxxxx> wrote:


On Mar 24, 2018, at 1:45 PM, Matthias Sohn <matthias.sohn@xxxxxxxxx> wrote:

On Sat, Mar 24, 2018 at 1:32 AM, David Pursehouse <david.pursehouse@gmail.com> wrote:


On Fri, 23 Mar 2018, 23:01 Matthias Sohn, <matthias.sohn@xxxxxxxxx> wrote:
On Fri, Mar 23, 2018 at 9:52 PM, Nasser Grainawi <nasser@xxxxxxxxxxxxxg> wrote:
Hey folks,

I'm doing some work to stabilize the Gerrit 2.12 release series that we're running in several places still internally. As part of that, I found a few JGit commits that I'd like to backport to the stable-4.1 branch.

You can see the fixes as the latest 12 commits on https://source.codeaurora.org/quic/gerrit4msm/jgit/log/?h=stable-4.1-backports

I cherry-picked the patches you want and pushed them for stable-4.1
 
Please review this series and let me know if you really want 4.1.3 or if you can use 4.5.4 like we do

I'll test out 4.5.4 today and let you know mid-next week. Thanks for telling me about that version and that you have used it in prod with 2.12, that really helps.


Sorry for the delay getting back to you. We tried 4.5.4 and it works pretty well. I found one NFS-related bug and uploaded a fix [1]. We're also seeing MissingObjectExceptions that we think are NFS related, but we're not certain. The setup we have for this instance is a master replicating over git:// to git-daemon through a load-balancer to two slaves sharing an NFS filesystem. The slaves are running a custom git.git (has a few NFS error-avoidance patches) for repacking.


did I get it right that the master also accesses git repositories via NFS ?

No, master has its own disk.

 So jgit running on master is victim to NFS issues on a slave when trying to replicate changes to the slave ?

No, sorry. We're only seeing these errors on the slaves. AFAIK, replication is working fine.


Why do you use slaves sharing the same NFS ?

Mostly because of less replication traffic and the slaves share the effort of repacking, but there's added benefits like knowing the slaves have a consistent view (one can't get out of date). With just 2 slaves it's not as big of a deal, but we have other instances with 8-10 slaves in a site.

 
This is what the exceptions look like (jgit code is the same as [1]):
[2018-04-06 10:42:38,524] [SSH git-upload-pack 'XXXXXXX' (YYYYYYY)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user YYYYYY account ZZZZZZ) during git-upload-pack 'XXXXXXX'
org.eclipse.jgit.transport.UploadPackInternalServerErrorException
        at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:768)
        at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:667)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:80)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:101)
        at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:32)
        at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:70)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:437)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:376)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.eclipse.jgit.transport.WantNotValidException: want 56c76057710dd3ca29e66ef6cdc4b727be781384 not valid
        at org.eclipse.jgit.transport.UploadPack.parseWants(UploadPack.java:1197)
        at org.eclipse.jgit.transport.UploadPack.processHaveLines(UploadPack.java:1072)
        at org.eclipse.jgit.transport.UploadPack.negotiate(UploadPack.java:1052)
        at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:745)
        ... 14 more
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing unknown 56c76057710dd3ca29e66ef6cdc4b727be781384
        at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:158)
        at org.eclipse.jgit.lib.ObjectReader$1.open(ObjectReader.java:300)
        at org.eclipse.jgit.revwalk.RevWalk$2.next(RevWalk.java:971)
        at org.eclipse.jgit.transport.UploadPack.parseWants(UploadPack.java:1184)
        ... 17 more

We're assuming this error happens due to NFS caching directory entries. We're looking at testing that out by changing the mount options (lookupcache, noac, ac*time), but these are prod hosts, so we don't want to hurt the performance too much. Do you have any ideas on how we'd approach solving this in JGit or a different hypothesis on what could be causing this error?


AFAIR I never saw a WantNotValidException

@Chris you did a lot of testing on NFS when we implemented multi-master on shared NFS, maybe you have an idea ?

-Matthias

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, 
a Linux Foundation Collaborative Project


Back to the top