[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [egit-dev] [jgit-dev] speeding up re-indexing
|
Hi Matthias,
your change adds the following line, but it is commented?
// e.setLastModified(lastModified);
in line 826 in method "private void keep(DirCacheEntry e, long lastModified)"
2012/6/4 Markus Duft
<markus.duft@xxxxxxxxxx>
On 05/31/2012 10:44 PM, Matthias Sohn wrote:
>
> How about using a profiler to find out what's going on ?
Hey all!
(please note that all the times i'm noting in texts are approximate measurements of hitting the refresh button in the staging view, which also refreshes all projects. so for cold caches this takes a good amount of the time).
good idea :) i jprofile'd egits re-indexing, and found out that most of the time (with hot caches) is spent finding a container for resources:
23.5% - 516 s - 1 inv. org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry$4.run
23.5% - 516 s - 1 inv. org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.access$4
23.5% - 516 s - 1 inv. org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.calcIndexDiff
23.4% - 516 s - 1 inv. org.eclipse.jgit.lib.IndexDiff.diff
23.4% - 515 s - 3 inv. org.eclipse.jgit.treewalk.TreeWalk.next
21.7% - 477 s - 12,005 inv. org.eclipse.jgit.treewalk.TreeWalk.enterSubtree
21.6% - 476 s - 23,995 inv. org.eclipse.jgit.treewalk.AbstractTreeIterator.createSubtreeIterator
20.9% - 460 s - 1,873 inv. org.eclipse.egit.core.AdaptableFileTreeIterator.createSubtreeIterator
20.8% - 458 s - 1,873 inv. org.eclipse.egit.core.IteratorService.findContainer
20.8% - 458 s - 1,873 inv. org.eclipse.core.internal.resources.WorkspaceRoot.findContainersForLocationURI
as a really stupid optimization, i tried to change AdaptableFileTreeIterator.createSubtreeIterator() in a way that it only tries to find a container if a .project file exists (i know i', breaking project-less containers in the workspace). This sped up the whole thing _a lot_ (127 vs 516 seconds):
17.2% - 127 s - 1 inv. org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry$4.run
17.2% - 127 s - 1 inv. org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.access$4
17.2% - 127 s - 1 inv. org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.calcIndexDiff
17.2% - 127 s - 1 inv. org.eclipse.jgit.lib.IndexDiff.diff
17.1% - 126 s - 3 inv. org.eclipse.jgit.treewalk.TreeWalk.next
13.0% - 96,737 ms - 12,005 inv. org.eclipse.jgit.treewalk.TreeWalk.enterSubtree
12.9% - 95,484 ms - 23,995 inv. org.eclipse.jgit.treewalk.AbstractTreeIterator.createSubtreeIterator
11.3% - 83,680 ms - 1,873 inv. org.eclipse.egit.core.AdaptableFileTreeIterator.createSubtreeIterator
11.1% - 82,487 ms - 696 inv. org.eclipse.egit.core.IteratorService.findContainer
11.1% - 82,374 ms - 696 inv. org.eclipse.core.internal.resources.WorkspaceRoot.findContainersForLocationURI
also note that of course profiling slowed this down a lot. without profiling the time for re-indexing with hot caches changed from ~20 seconds to ~14 seconds.
now for the bad(er) case: cold caches. it turned out that the IndexDiffFilter seems to do a contentCheck() on (nearly) every file - 33826 times. after debugging this, it seems that jgit thinks all timestamps differ.
interesting is, that org.eclipse.jgit.treewalk.WorkingTreeIterator.compareMetadata(DirCacheEntry).cacheLastModified is nearly always zero, /except/ for files that have been modified in the last commit (tip of current branch)(!!!!).
times look more like this then:
26.1% - 396 s - 1 inv. org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry$4.run
26.1% - 396 s - 1 inv. org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.access$4
26.1% - 396 s - 1 inv. org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.calcIndexDiff
26.1% - 395 s - 1 inv. org.eclipse.jgit.lib.IndexDiff.diff
26.0% - 394 s - 3 inv. org.eclipse.jgit.treewalk.TreeWalk.next
19.0% - 288 s - 46,620 inv. org.eclipse.jgit.treewalk.filter.AndTreeFilter$List.include
19.0% - 288 s - 46,620 inv. org.eclipse.jgit.treewalk.filter.IndexDiffFilter.include
18.5% - 281 s - 33,828 inv. org.eclipse.jgit.treewalk.WorkingTreeIterator.isModified
18.0% - 273 s - 33,826 inv. org.eclipse.jgit.treewalk.WorkingTreeIterator.contentCheck
18.0% - 272 s - 33,826 inv. org.eclipse.jgit.treewalk.AbstractTreeIterator.getEntryObjectId
17.9% - 271 s - 33,826 inv. org.eclipse.jgit.treewalk.WorkingTreeIterator.idBuffer
17.9% - 271 s - 33,826 inv. org.eclipse.jgit.treewalk.WorkingTreeIterator.idBufferBlob
17.6% - 267 s - 33,826 inv. org.eclipse.jgit.treewalk.WorkingTreeIterator.computeHash
17.0% - 258 s - 543,873 inv. java.io.InputStream.read
i think the bug is in how the dircache is created, maybe? could it be that only mod-times of entries in the current HEAD commit are applied to the (initial) DCEs? seems so. as soon as i stage a file, this also gets correct last modifier timestamps.
i added this:
if (e.getLastModified() != 0) {
System.out.println("mod: " + e.getLastModified() + ": "
+ e.getPathString());
}
to org.eclipse.jgit.dircache.DirCache.writeTo(OutputStream):666, where the DCEs are written to the real index file, and only files modified in the last commit, and newly staged entries are shown.
even more investigation showed that the only one that seems to set the timestamp on index entries during non-merge "normal" checkout (org.eclipse.jgit.dircache.DirCacheCheckout.checkoutEntry(Repository, File, DirCacheEntry, ObjectReader)) is only called for files that changed - this matches my observations. my stupid approach to fix this is to add a setLastModified() when keep() is called (index entry unmodified). this gives me valid timestamps here and everywhere, pushing down the re-indexing time to ~5 seconds with hot caches, and ~108 seconds (vs. ~500 seconds!) with cold caches.
i pushed the fix i have done [1]; it is probably completely wrong, so correct me please :)
[1] https://git.eclipse.org/r/6232
Regards,
Markus
_______________________________________________
egit-dev mailing list
egit-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/egit-dev