Java Output Folder

Last revised 17:30 Tuesday October 30, 2001 (recent changes in blue; latest in red)

Work item: "Support for dealing with class files generated by external Java compilers like javac and jikes from an Ant script."

Here's the crux of one problem (from WSAD, via John W.):

In some environments the client has limited flexibility in how they structure their Java projects. Sources must go here; resource files here; mixed resource and class files here; etc.

When clients attempts either, they discover that (a) the Java model and builder ignore any class files in the output folder, and (b) from time to time these files in the output folder get deleted without warning.

The Java builder was designed under the assumption that it "owns" the output folder. The work item, therefore, is to change the Java builder to give clients and users more flexiblility as to where they place their source, resource, library class, and generated class files.

Current Functionality

Eclipse 1.0 Java builder has the following characteristics (and inconsistencies):

WSAD usecase - for the record

The (proposed) WSAD scenario is that they have a src/ folder for source code, a classes/ folder for pre-existing class files (extracted from a WAR file), and a bin/ folder for generated classes.

Proposal

The Java builder compiles source files found in the source folders specified on the build classpath and generates class files into the output folder. The Java builder also copies "resource" files from source folders to the output folder (provided that source and output do not coincide). Once in the output folder, the resource files are available at runtime because the output folder is always present on the runtime class path. The proposal is to extend this mechanism.

The following proposal involves:

Output folder ownership

When the output folder does not coincide with a source folder, the Java builder owns the output folder and everything in it. The output folder is taken to contain only files that are "expendable" - either generated class files or copies of files that live in a source or library folder.

Users or clients that add, remove, or replace files in the output folder can expect unpredicatable results. If the user or client does tamper with files in the output folder, the Java builder does not attempt to repair the damage. It is the responsibility of the user or client to clean up their mess (by manually requesting a full build).

When the output folder coincides with a source folder, the Java builder only owns the class files in the output folder. Only the class files in the output folder are considered expendable. Users or clients that add, remove, or replace class files in the output folder can expect unpredictable results.

(N.B. This is a restatement of the current behavior. [Verify that damage to output folder is not triggering builds.])

Output folder resource file consolidation

The Java builder provides resource file consolidation, for resource files stored in source folders.

When the output folder does not coincide with a source folder, the Java builder can also be used to consolidate resources files needed at runtime in the output folder. In some cases, this consolidation may be preferred over the alternative of including additional runtime classpath entries for source folders containing resources files.

By flagging a source folder as copied, all non-source, non-class files become eligible to be copied to the output folder. When there are multiple entries on the build classpath specifying copying, eligible files for earlier classpath entries take precedence over ones for later entries.

When the output folder coincides with a source folder, the Java builder cannot perform any resource file consolidation (resource files in the output folder belong to the user, not to the Java builder). It is considered an error to specify copying from other source folders.

(N.B. This is different from current behavior in a couple of regards:

)

Output folder class file consolidation

The Java builder also provides class file consolidation, for class files stored in library folders.

The Java builder can also be used to consolidate class in the output folder, regardless of whether the output folder coincides with a source folder. In some cases, this consolidation may be preferred over the alternative of including additional runtime classpath entries for library folders. Note, however, that this works only when the library folder contains no important resource files needed at runtime (resource files are not copied from library folders, because resource files in the output folder belong to the user rather than to the Java builder).

By flagging a library folder as copied, all class files become eligible to be copied to the output folder. Class files generated in the output folder always take precedence over class files copied from library folders.

(N.B. This new behavior. Files are not copied from library folders by the current Java builder.)

Semantics

Summary: Resource files (non-source, non-class) may be copied from either source folders. Class files may be copied from library folders, but never override generated files. Source files never get copied.

Output folder invariant:

A full builds must achieve the output folder invariant from arbitrary initial conditions. When output and source folders do not coincide, a full build should scrub all existing files from the output folder, regardless of how they got there. When output and source folders do coincide, a full build should scrub all existing class files from the output folder, but leave all other files alone.

Assuming that a user or client is only adding, removing, or changing files in source or library folders, but not tampering with any of the files in the output folder that the Java builder owns, then an incremental build should re-achieve the output folder invariant.

Algorithm:

Full build:
    Scrub all class files from the output folder.
    if performing resource consolidation (requires output folder != source folder)
        Scrub all resource files from the output folder.
    Compile all source files into class files in the output folder.
    Infill/copy eligible class files from library folders into the output folder (no overwriting).
    if performing resource consolidation (requires output folder != source folder)
        Infill/copy eligible resource files from source folders into the output folder.

Incremental build:
    (phase 1) process changes to library folders:
        for add or remove or change file p/x.class in one of the library folders
            if p/x.class in the output folder was not generated by compiler then
                scrub p/x.class from the output folder
                remember to compile source files that depend on p/x
                remember to infill p/x.class
    (phase 2) process changes to source folders:
        for add p/y.java in one of the source folders
            remember to compile source file at path p/y.java
        for remove or change p/y.java in one of the source folders
            scrub any class file p/x.class from the output folder that some p/y.java compiled into last time
            remember to infill p/x.class
            remember to compile source file at path p/y.java
        for add or remove or change resource p/x.other in one of the source folders
            if performing resource consolidation (requires output folder != source folder)
                scrub p/x.other from the output folder
                remember to infill p/x.other
    (phase 3) recompile:
        compile all remembered source files into the output folder (and any dependent source files)
    (phase 4) infill:
        for each hole p/x.class to infill
            copy first-found file p/x.class in a library folder to p/x.class in the output folder (no overwriting)
        if performing resource consolidation (requires output folder != source folder)
            for each hole p/x.other to infill
                copy first-found file p/x.other in a source folder to p/x.other in the output folder

How These Changes Solve WSAD's Problem

WSAD would include their classes/ folder on the build classpath as a library folder with class file copying turned on. Doing so means that the pre-compiled class files in the library are available to build against, and will be used whenever there is no corresponding source code in a source folder. By turning class file copying on for that library folder (programatically - there is no UI), the class files in the library folder are automatically consolidated with the generated class files.

Resource files can always be kept in the same folder as the source files. When the source and output folders do not coincide, the source folder on the classpath could have copying turned on to ensure that resource files were copied to the output folder. When the source and output folders do coincide, further resource file consolidation is not required (or possible) and the source folder on the classpath would have copying turned off. The resource files that normally live in the source folder would automatically be included in the output folder (without copying).

Minimizing Class Files

(This problem is not really an output folder issue.)

WSAD has a special problem. They have class files in a classes/ folder which they obtain from unzipping a WAR file. They have a folder of source code; some of the source code may be brand new; some of the source code may correspond to class files in the classes/ folder. They need to prune from the classes/ directory those class files for which corresponding source is available. This allows them to save only those class files which they actually need.

The heart of this operation is identifying the class files which could have come from a given source file. A source file can be lightly parsed to obtain fully qualified names for all top-level types declared within; e.g., a source file com/example/acme/app/Foo.java might contain types named com.example.acme.app.Foo and com.example.acme.app.FooHelper. Such type names map directly to corresponding class file name patterns; e.g., com.example.acme.app.FooHelper would compile to com/example/acme/app/FooHelper.class and possibly other class files matching com/example/acme/app/FooHelper$*.class.

This basic operation can be implemented with the existing JDOM API (or the proposed AST API): simply open the compilation unit and read off the names from the package declaration and and top-level type declarations.

Given this basic operation, it is straightforward to walk any set of source files and use it to prune a given set of class files. Source files in some folder in the workspace can be monitored with a resource change listener. It is trivial to delete corresponding class files incrementally as new source files are added.

Conclusion: New API is not required.

Notes Leftover from Earlier Proposals

The following notes are retained as background material. They include some of the other approaches we tried, and problems we ran into.

The Java model has 2 primitive kinds of inputs: Java source files, and Java library class files. The Java builder produces one primary output: generated Java class files. Each Java project has a build classpath listing what kinds of inputs it has and where they can be found, and a designated output folder where generated class files are to be placed. The runtime classpath is computed from the build classpath by substituting the output folder in place of the source folders.

Java "resource" files, defined to be files other than Java sources and class files, are of no particular interest to the Java model for compiling purposes. However, these resource files are very important to the user, and to the program when it runs. Resource files are routinely co-located with library class files. But it is also convenient for the user if resource files can be either co-located with source code, or segregated in a separate folder.

Ideally, the Java model should not introduce constraints on where inputs and outputs are located. This would give clients and users maximum flexibility with where they locate their files.

The proposal here has 4 separate parts. Taken in conjunction they remove the current constraints that make it difficult for some clients to place their files where they need to be.

[Revised proposal: Rather than write a completely new proposal, I've added a note like to the end of each subsequent section describing a revised proposal.]

Java Builder Attitude Adjustment

To appreciate the difficulties inherent with the Java builder sharing its output folder with other folk, consider the following workspace containing a Java project. Assume that this project has not been built in quite a while, and the user has been manually inserting and deleting class files in the project's output folder.

Java project p1/
    src/com/example/  (source folder on build classpath)
        Bar.java
        Foo.java
        Quux.java
    bin/com/example/ (output folder)
        Bar.class {SourceFile="Bar.java"}
        Foo.class {SourceFile="Foo.java"}
        Foo$1.class {SourceFile="Foo.java"}
        Internal.class {SourceFile="Foo.java"}
        Main.class {SourceFile="Main.java"}

From this arrangement of files (and looking at the SourceFile attributed embedded in class files), we can infer that:

Java Builder - Obsolete Class File Deletion

If the user was to request a full build of this project, how would the Java builder proceed? Before it compile any source files, it begins by deleting existing class files that correspond to source files it is about to recompile. Why? Because obsolete class files left around (a) waste storage and (b) would be available at runtime where they could cause the program to run incorrectly.

In this situation, the Java builder deletes the class files corresponding to Bar.java (i.e., Bar.class), to Foo.java (i.e., Foo.class, Foo$1.class, and Internal.class), and to Quux.java (none, in this case). The remaining class files (Main.class) must be retained because it is irreplaceable.

The Java builder takes responsibility for deleting obsolete class files in order to support automated incremental recompilation of entire folders of source files. Note that standard Java compilers like javac never ever delete class files; they simply write (or overwrite) class files to the output folder for the source files that they are given to compile. Standard Java compilers do not support incremental recompilation: the user is responsible for deleting any obsolete class files that they bring about.

If the Java builder is free to assume that all class files in the output folder are ones that correspond to source files, then it can simply delete all class files in the output folder at the start of a full build. If it cannot assume this, the builder is forced to look at class files in the output folder to determine whether it has source code for them. This is clearly more expensive that not having to do so. By declaring that it "owns" the output folder, the current builder is able to makes this simplifying assumption. Allowing users and clients to place additional class files in the output folder requires throwing out this assumption.

If the user or client is free to manipulate class files in the output folder without the Java builder's involvement, then the builder cannot perform full or incremental builds without looking at and deleting the obsolete class files from the output folder corresponding to source files being compiling.

Under the proposed change, the Java builder would need to look at the class files in the output folder to determine whether it should delete them. The only files in the output folder that the Java builder would be entitled to overwrite or delete are class files which the Java builder would reasonably generate, or did generate, while compiling that project.

This change is not a breaking API change. The old spec said that the Java model/builder owned the output folder, but didn't further specify what all that entailed. The new spec will modify this position to allow clients to store files in the output folder; it will promise that these files are perfectly safe unless they are in the Java builder's direct line of fire.

Java Model - Obsolete Class File Deletion

There is another facet of the obsolete class file problem that the Java builder is not in a position to help with.

If the source file Foo.java were to be deleted, its three class files become obsolete and need to be deleted immediately. Why immediately? Consider what happens if the class files are not deleted immediately. If the user requests a full build, the Java builder is presented with the following workspace:

Java project p1/
    src/com/example/  (source folder on build classpath)
        Bar.java
        Quux.java
    bin/com/example/ (output folder)
        Bar.class {SourceFile="Bar.java"}
        Foo.class {SourceFile="Foo.java"}
        Foo$1.class {SourceFile="Foo.java"}
        Internal.class {SourceFile="Foo.java"}
        Main.class {SourceFile="Main.java"}

Since a full build is requested, the Java builder is not passed a resource delta tree for the project. This means that the Java builder has no way of knowing that Foo.java was just deleted. The Java builder has no choice but to retain the three class files Foo.class, Foo$1.class, and Internal.class, just as it retains Main.class. This too is a consequence of allowing the Java builder to share the output folder with the user's class files.

If the obsolete class files are not deleted in response to the deletion of a source file, these class files will linger around. The Java builder will be unable to get rid of them.

The proposal is to have the Java model monitor source file deletions on an ongoing basis and identify and delete any corresponding obsolete class files in the output folder. This clean up activity must handle the case of source files that disappear while the Java Core plug-in is not activated (this entails registering a Core save participant).

Since deleting (including renaming and moving) a source file is a relatively uncommon thing for a developer to do, the implementation should bet it does not have to do this very often. When a source file in deleted, its package name gives us exactly which subfolder of the output folder might contain corresponding class files that might now be obsolete. In the worst case, the implementation would need to access all class files in that subfolder to determine whether any of them have become obsolete. In cases where there is more than one source folder on the builder classpath, and there is therefore the possibility of one source file hiding another by the same name, it is necessary to consult the build classpath to see whether the deleted source file was exposed or buried.

Implementation Tricks

Some observations and implementation tricks that should help reduce the space and time impact of doing this.

When all else fails

A special concern is that the user must be able to recover from crashes or other problems that result in obsolete class files being left behind in the output folder. It can be very bad when this kind of thing happens (and it does happen, despite our best efforts), and can undercut the user's confidence in the Java compiler and IDE. In a large output folder that contains important user files, the user can't just delete the output folder and do a full build. The user has no easy way to distinguish class files with corresponding source from ones without. A simple way to address this need would be to have a command (somewhere in the UI) that would delete all class files in the output folder for which source code is available ("Delete Generated Class Files"). This would at least give the user some help in recovering from these minor disasters.

[Revised proposal: The Java builder remembers the names of the class files it has generated. On full builds, it cleans out all class files that it has on record as having generated; all other class files are left in place. On incremental builds, it selectively cleans out the class files that it has on record as having generated corresponding to the source files that it is going to recompile. There is no need to monitor source file deletions: corresponding generated class files will be deleted on the next full build (because it nukes them all) or next incremental build (because it sees the source file deletion in the delta). The Java builder never looks at class files for their SourceFile attributes. A full build always deletes generated class files, so there's no need to a special UI action.]

Allowing Folders to Play Multiple Roles

The proposed change is to consistently allow the same folder to be used in multiple ways on the same build classpath.

This change is not a breaking change; it would simply allow some classpath configurations that are currently disallowed to be considered legitimate. The API would not need to change.

[Revised proposal: Many parts of the Java model assume that library folders are relatively quiet. Allow a library folder to coincide with the output folder would invalidate this assumption, which would tend to degrade performance. For instance, the indexer indexes libraries and source folders, but completely ignores the output folder. If the output folder was also a library, it would repeatedly extract indexes for class files generated by the builder.

N.B. This means that the original scenario of library class files in the output folder is cannot be done this way. It will need to be addressed in some other way (discussed later on).

The identity criteria for package fragment root handles are based on resources/paths and do not take kind (source vs. binary) into account. This means that a source folder and a library folder at the same path map to the same package fragment root handle! Thus allowing a source folder to coincide with a library folder cannot be supported without revising Java element identity criteria (which is due for an overhaul, but that's a different, and bigger, work item).


]

Completely eliminate resource file copying behavior

The current Java builder copies "resource" files from source folders to the output folder (provided that source and output do not coincide). Once in the output folder, the resource files are available at runtime because the output folder is always present on the runtime class path.

This copying is problematic:

The proposal is to eliminate this copying behavior. The proper way to handle this is to include an additional library entry on the build classpath for any source folders that contain resources. Since library entries are also included on the runtime classpath, the resource files contained therein will be available at runtime.

We would beef up the API specification to explain how the build classpath and the runtime classpath are related, and suggests that one deals with resource files in source folders using library entries. This would be a breaking change for clients or users that rely on the current resource file copying behavior.

The clients that would be most affected are ones that co-locate their resource files with their source files in a folder separate from their output folder. This is a fairly large base of customers that would need to add an additional library entry for their source folder.

It would be simple to write a plug-in that detected and fixed up the Java projects in the workspace as required. By the same token, the same mechanism could be built in to the Java UI. If the user introduces a resource files into a source folder that had none and there is no library entry for that folder on the build classpath, ask the user whether they intend this resource file to be available at runtime.

(JW believes that WSAD will be able to roll with this punch.)

[Revised proposal: Retain copying from source to output folder where necessary.

This eliminates the screw case where resources get copied from one source folder into another source folder, possibly overwriting client data.]

Minimize the opportunity for obsolete class files to have bad effects

The Java compiler should minimize the opportunity for obsolete class files to have bad effects.

Consider the following workspace:

Java project p1/
    src/com/example/  (source folder on build classpath)
        C1.java {package com.example; public class C1 {}}
        C2.java {package com.example; public class C2 extends Secondary {})
    lib/com/example/ (library folder on build classpath)
        C1.class {from compiling an old version of C1.java
           that read package com.example; public class C1 {}; class Secondary {}}
        C2.class {from compiling an old but unchanged version of C2.java}
        Secondary.class {from compiling an old but unchanged version of C2.java}
        Quux.class {from compiling Quux.java}

Assume the source folder precedes the library folder on the build classpath (sources should always precede libraries).

When the compiler is compiling both C1.java and C2.java, it should not satisfy the reference to the class com.example.Secondary using the existing Secondary.class because the SourceFile attributes shows that Secondary.class is clearly an output from compiling C1.java, not an input. In general, the compiler should ignore library class files that correspond to source files which are in the process of being recompiled. (In this case, only Quux.class is available to satisfy references.) The Java builder does not do this.

Arguably, the current behavior should be considered a bug. (javac 1.4 (beta) has this bug too.) Fixing this bug should not be a breaking change.

When the SourceFile attribute is not present in a class file, there is no choice but to use it.

[Revised proposal: Maintain current behavior.]

Library Copying Proposal

The proposal is to arrange to copy class files from a certain library folder into the output folder. The library folder would have to be represented by a library classpath entry so that the compiler can find any class files it needs to compile source files. Copying the class files to the output folder would unite them with the class files generated by the compiler. Since there may be source code in the source folder corresponding to some of the classes in the library folder, the builder should only use a class file when source is available.

Desired semantics:

S (source folder)
L (library folder)
O (output folder)

Invariant:

x.class in O =
    if some y.java in S generates x.class then
        x.class from compiling x.java in S
    else
        if x.class in L then
            x.class in L
        else
            none
        endif
    endif

Full builds achieve invariant.
Incremental builds maintain invariant.

Full build:
    Scrub all class files from O.
    Compile all source files in S into class files in O.
    Infill/copy all class files from L to O (no overwriting).

Incremental build:
    (phase 1) process all changes to L:
        for delete or change x.class in L
        if x.class in O was not generated by compiler then scrub x.class from O
        for add or change x.class to L
            remember to infill x.class
    (phase 2) process negative changes to S:
        for delete or change y.java from S
            scrub any class file x.class from O that y.java compiled into
            remember to infill x.class
    (phase 3) process positive changes to S:
        for add or change y.java from S
            compile y.java into O
    (phase 4) Infill/copy indicated class files from L to O (no overwriting).

We will look at ways to implement the above behavior that do not involve changing the Java builder. This would mean that a customer (such as WSAD) that requires library copying would be able to add it themselves; otherwise, we will need to complicate the Java builder (which is complex enough as it is) and integrate the mechanism into JDT Core.

Copying pre-builder

Could the copying of class files from the library folder L to the output folder O be accomplished in a separate incremental project builder that would run before the Java builder?

Assume the Java builder manages its own class files in the output folder and knows nothing of the pre-builder. Conversely, assume that the pre-builder has no access to the insides of the Java builder.

Pre-copying of class files to the output folder cannot handle the case where a source file gets deleted and a pre-existing class file in the library folder should now take its place. The Java builder, which runs last, deletes the class file; the pre-builder has missed its chance and does not get an opportunity to fill that hole. When this happens on a full build, the full build does not achieve the invariant. This is unacceptable.

Here's the nasty case:
    S (source folder): Bar.java (but recently has Foo.java as well)
    L (library folder): Foo.class

On a full build
    Pre-builder runs first:
        Scrubs Foo.class and Bar.class from O.
        Copies in Foo.class from L to O.
    Java Builder runs second:
        Scrubs Foo.class from O (generated by Java builder from Foo.java on last build).
        Compile Bar.java into Bar.class O (Foo.java is no longer around).

The output folder should contain a copy of Foo.class from L since there is no equivalent source file that compiles to Foo.class. It doesn't.

Copying post-builder

Could the copying of class files from the library folder to the output folder be accomplished in a separate incremental project builder that would run after the Java builder?

Again, assume the Java builder manages its own class files in the output folder and knows nothing of the post-builder, and conversely.

Post-copying of class files to the output folder (no overwriting) cannot handle the case where library class files are changed or deleted since the last build, because the post-builder is never in a position to delete or overwrite class files in the output folder (they might have been generated by the Java builder). Once lost, the invariant cannot be reachieved no matter how many full builds you do (you're stuck with stale or obsolete class files). This is unacceptable.

Combination of pre- and post-builder

Could the copying of class files from the library folder to the output folder be accomplished by a pair of separate incremental project builders that run on either side of the Java builder?

Assume the Java builder manages its own class files in the output folder and knows nothing of the pre-builder and post-builder, and the pre- and post-builders have no access to the insides of the Java builder.

Full build:
    Pre-builder runs first:
        Scrubs all class files from O.
    Java Builder runs second:
        Scrubs all class files from O generated by Java builder.
        Compiles all source files into O.
    Post-builder runs third:
        Infill/copy class files from L to O (no overwriting).

Incremental build when L changes:
    Pre-builder runs first:
        For delete or change x.class in L
            Does nothing (FAILs if no corresponding source file)
        For add x.class to L
            Infill/copy Foo.class from L to O (no overwriting).
    Java Builder runs second:
        Recompiles classes that depend on affected class files in L.
    Post-builder runs third:
        Infill/copy class files from L to O (no overwriting).

Incremental build - changes to source folder:
    Pre-builder runs first:
        Does nothing since library did not change.
    Java Builder runs second:
        Compiles source files into O.
    Post-builder runs third:
        Infill/copy class files from L to O (no overwriting).

An incremental build may fail in the case of a library class file being changed or deleted, leading to stale or obsolete class files in the output folder. Fortunately, a full build always achieves the invariant, and can be used to repair the damage due to changes to the library.

So while the combination of pre- and post-builders is not perfect, it does work in many cases. If the user could do a full build after making changes to the library folder, they would avoid all the problems. The solution has the advantage of not requiring anything special from the Java Core (i.e., WSAD should be able to implement it themselves). An example implementation is available here

Resources in Output Folder

When the source folder and output folder coincide, there is no problem keeping resource files in the output folder since they are not at risk of being overwritten (no with the proposed change to disable resource copying when the source folder and output folder coincide).

When the source folder and output folder do not coincide, keeping resource files in the output folder on a permanent basis encounters two issues:

(1) The first issue is that output folder has no presence in the packages view. Any resources that permanently resided in the output folder would therefore be invisible during regular Java development. One would have to switch to the resource navigator view to access them.

The packages view only shows resource files in source and library folders. Changing the packages view to show resources in the output folder is infeasible. Including the output folder on the classpath as a library folder was discussed at length above and is out of the question. Including the output folder on the classpath as a source folder is an option (in fact, it's exactly what you get when your source and output folders coincide).

(2) The second issue is that resource files in the output folder are in harm's way of resources of the same name being copied from a source folder.

If resources existing in the output folder are given precedence over the ones in source folders, then the ones from source folders would only be copied once and nevermore overwritten. Copies in the output folder would get stale or obsolete; automatic cleanup would not be possible.

On the other hand, if resources existing in source folders are given precedence over the ones in the output folders, then one that exists only in the output folders would be permanently lost if a resource by the same name was ever to be created in a source folder. It is a dangerous practice to allow the user to store important data in a place that could be clobbered by an automatic mechanism that usually operates unseen to the user.

Conclusion: Keeping resource files in the output folder on a permanent basis is not well supported at the UI, and should only be done if the resource files can be considered expendable.