core
the foundation of the platform

[home] [documents] [downloads] [resources] [planning] [testing]

History Store Modifications
Summary

The history store current implementation is said to be problem-prone (data corruption) and hard to maintain. There is an ongoing effort to replace it with something simpler. One of the options would be the adoption of a third-party provided storage engine, such as a embedded database or a b-tree engine. Another is to provide a very simple home-made solution with as little (conceptual/memory/execution) overhead as possible.

  • 2004/10/29 - tested a file system based solution (take 1). Every project has its own history area. History for a file will be stored in a relative directory similar to where the file is. The difference is that, in order to prevent problems with path length limitations on Windows, every segment is translated to a short fixed length string (using a hash function). This opens opportunity for colisions, and it is an one-way translation, so we need to remember the original path for every state. We did this by keeping an additional .info file for each file state to maintain that information. Result: adding new state is really fast, but any tree traversals are very costly (we have to open all .info files in every directory we look).

  • 2004/11/02 - tested an improved solution (take 2) based on the previous approach. Instead of keeping an individual .info file showing the original location for every state, we keep a index table per directory. This drastically reduces the number of file openings (not mentioning it takes much less space in the file system), having a better performance for tree based operations. The poor performance for copying history is understandable since we are actually copying the states as well, whereas the original implementation links them (maybe we should do the same). The bad performance when clearing the history for a given resource is due to having to delete multiple files/directories in a more complex directory structure than the original implementation (a two-level tree, directories are never deleted). We could improve that by postponing deleting directories to shutdown (making shutdown slower instead).

  • 2004/11/17 - using some ideas from the previous tries and the existing code, we got a solution (take 3) that seems to have all the characteristics we have been looking for: it is much simpler than the current story, and generally performs (sometimes much) better than the current implementation. We are using the existing BlobStore (as the existing implementation) and per-bucket indexes, as in the previous solutions. These indexes are just dictionaries having the file path as key and a pair <UUID,timestamps> for every state belonging to the file as value.

  • 2004/12/13 - the solution above (known as take 3) has been part of recent integration builds and will be in for M4. We are currently investigating its use as a replacement for the previous property store implementation as well. There are concerns w.r.t. scalability in using a single file to store all properties for all files/folders under a given directory (since properties can be as big as 2K). Some sort of segmentation strategy seems necessary to avoid such problems.

  • 2005/02/02 - Eclipse 3.1 M5 - both history store and property store implementations were moved to a new infrastructure. The existing indexed store has been moved to a separate fragment, only used for supporting backward compatibility with old workspaces. Performance tests are part of the org.eclipse.core.tests.resources project.
Performance Comparisons

Following is a comparison of performance (execution time) for the most common use cases:

Use cases
Original (ms)
Take 3 (ms)
add state
498
481
get history for file
1280
16
find deleted members (100 files, 4 states per file) 914 46
find deleted members (20 files, 20 states per file) 906 15
find deleted members (4 files, 100 states per file)
882
15
clear history of a directory tree (100 files, 4 states per file) 367 42
clear history of a directory tree (20 files, 20 states per file) 332 187
clear history of a directory tree (4 files, 100 states per file)
379
179
copy history for a directory tree (100 files, 4 states per file) 3350 3250
copy history for a directory tree (20 files, 20 states per file) 2210 868
copy history for a directory tree (4 files, 100 states per file) 4720 453
apply cleaning policy
tbd
tbd