Re: [jgit-dev] RFC: Optimized "single-commit" push

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jgit-dev] RFC: Optimized "single-commit" push

From: Shawn Pearce <spearce@xxxxxxxxxxx>
Date: Sun, 19 Jun 2016 15:18:29 -0700
Delivered-to: jgit-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>

On Fri, Jun 17, 2016 at 8:59 AM, Justin Santa Barbara
<justin@xxxxxxxxxxxx> wrote:
> I am using git as a versioned store in a project I'm working on, with
> a fairly big repo, and using JGit as both the git client (which is
> actually a RESTful server) and the git server (using
> jgit.http.server).  Performance is generally good with frequent server
> GCs and sufficient memory, but a push will sometimes take a long time
> in the "counting objects" phase (30 seconds or more).
>
> Because of my use-case though, the problem is constrained: I am
> pushing a single commit to the remote server, and there are only a
> handful of changed files (typically one).  I created an experimental
> patch that detects this case and optimizes it by directly comparing
> the new commit's tree to the base commit's tree:
>
> https://github.com/justinsb/jgit/commit/9db165e88d162c7f052f6c58784c16d4cd830b3e

Huh. Interesting approach.

Thing is, the PackWriter should already be doing this if you passed it
wants=[commit], have=[commit.getParent(0)]. I suspect its getting long
counting times because there are other things in the have collection
from the server and this is costing more time to enumerate.

> There are limitations however, which is why I gated it behind a
> boolean option.  The biggest is that if the new files are already
> available on the server on a different branch, we won't reuse them
> (e.g. cherry-picks).
>
> A few questions I would love some feedback on:
>
> 1. Is this something that might be considered for inclusion into jgit?
> 2. Should I instead figure out a way to expose the
> PackWriter.preparePack(Iterator<RevObject>) method, perhaps by passing
> a list containing the known set of objects when doing the push? I
> imagine that would be more general and thus more welcome in jgit
> (though obviously harder to use!)

If we do any of these things, I'd prefer the
preparePack(Iterator<RevObject>) option as it offers more flexibility
to callers to construct a pack the way they want.

But see my comment above, I really think something is wrong here, as
the algorithm you implemented is what PackWriter should be doing
itself for the single have/want case.


> 3. Am I doing something obviously wrong to cause a slow 'counting
> objects' phase (I expect it is just the repo size - it is currently
> about 250k objects)
>
> Many thanks,
> Justin
> _______________________________________________
> jgit-dev mailing list
> jgit-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/jgit-dev

Follow-Ups:
- Re: [jgit-dev] RFC: Optimized "single-commit" push
  - From: Dmitry Neverov

References:
- [jgit-dev] RFC: Optimized "single-commit" push
  - From: Justin Santa Barbara

Prev by Date: [jgit-dev] RFC: Optimized "single-commit" push
Next by Date: Re: [jgit-dev] RFC: Optimized "single-commit" push
Previous by thread: [jgit-dev] RFC: Optimized "single-commit" push
Next by thread: Re: [jgit-dev] RFC: Optimized "single-commit" push
Index(es):
- Date
- Thread

Breadcrumbs