Code Recommenders

The project has been created. Please visit the project page.

Code Recommenders

Code Recommenders is a proposed open source project under the Eclipse Technology Container Project.

This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process) and is written to declare its intent and scope. We solicit additional participation and input from the Eclipse community. Please send all feedback to the Eclipse Proposals Forum.

This proposal is structured as follows. Section "Background" gives the motivation of the project and provides some background information about the origins of the proposed project, namely, the Code Recommenders Project developed at Darmstadt University of Technology. Section "Scope" outlines the initial set of tools and platforms this project aims to deliver to its users; Section "Initial Contributions" describes the current state of the project and the initial contributions that will be made. "Description" gives little more details on the intermediate goals. Section "Related Eclipse Projects" describes potential future connections between current Eclipse Projects and the Code Recommenders project as well as likely collaborations. The remaining sections (Committers, Mentors, Interested Parties, Additional Information) describe what their names suggest.

Background

Under the right circumstances, groups are remarkably intelligent and are often better than the smartest person in them.    - James Surowiecki: "Wisdom of the Crowds"

Application frameworks have become an integral part of today's software development - this is hardly surprising given their promised benefits such as reduced costs, higher quality, and shorter time to market. But using an application framework is not free of cost. Before frameworks can be used efficiently, software developers have to learn their correct usage which often results in high initial training costs.

To reduce these training costs, framework developers provide diverse documentation addressing different information needs. Tutorials, for instance, describe typical usage scenarios, and thus give the application developer an initial insight into the workings of the framework. However, their benefit quickly disappears when problems have to be solved that differ from standard usage scenarios. Now, API documentation becomes the most important resource for software developers. Documentation is scanned for hints relevant for the own problem at hand but if it does not provide the required information, the most costly part of the research begins: The source code of other programs is investigated that successfully used the framework in a similar way. But learning correct framework usage from these real-world examples is difficult. The problem with these examples is that they also contain application-specific code that obscures the view on what is really important for using the framework. This significantly complicates the understanding process which makes the training a challenging and time-consuming task again. However, source code of other applications seems to be a valuable source of information. Code-search engines like Google Codesearch or Krugle experience their hype not least because existing framework documentation seems insufficient to support developers on their daily work.

But despite their widespread use, it's an open question whether code-search engines solve the problem of missing documentation in a satisfactory manner. When looking at how developers use code-search engines, it turns out that they rarely create a single query and study just a single example; instead, they typically refine their queries several times, investigate a number of examples, compare them to each other and try to extract a pattern that underlies all these examples, i.e., a common way how to use the API in question.

Although this task is very time-consuming, analyzing example code seems worth doing. Apparently, example code must provide some important insights in how to use a given API. Given this observation, the question is raised whether such important information can be extracted from example code automatically, i.e., without large manual effort. And furthermore if valuable information can be found, how can these findings made accessible to support developers on their daily work.

The Code Recommenders' project developed at Darmstadt University of Technology investigates exactly these two questions. In a nutshell, tools are developed that automatically analyze large-scale code repositories, extract various interesting data from it and integrate this information back into the IDE where it is reused by developers on their daily work. The vision of the project is to create a context-sensitive IDE that learns from what is relevant in a given code situation from its users and, in turn, give back this knowledge to other users. If you like, you may think of it like a collaborative way of sharing knowledge over the IDE.

This Eclipse proposal is the next step towards the goal to build next generation of collaborative IDE services, which we call "the IDE 2.0" - inspired by the success of Web 2.0. The complete vision and explanation of the IDE 2.0 to web 2.0 analogy is described in IDE 2.0: Collective Intelligence in Software Development - published at the Working Conference on the "Future of Software Engineering Research (FoSER) 2010".

Scope

One of the major goals of this project is to make a new generation of tool ideas accessible and usable by the Eclipse community, to further improve these tools based on the user feedback obtained or even to build completely new tools based on the experiences and developer needs. So far, a couple of steps towards IDE 2.0 have been accomplished, some of which we will describe briefly in Section "Initial Contributions". These tools, however, have to prove themselves as being useful. To allow this evaluation this project aims to (i) provide a platform for innovative IDE features that leverage the wisdom of the crowds, (ii) build a very vibrant community around IDE 2.0 services based on Eclipse, and (iii) provide an open platform allowing every community member to actively contribute to these services and to build and evaluate new tools based on the data contributed by the community itself. The initial scope of this project is to provide tools for the following topics:
  1. Intelligent Code Completion Systems:
    Code Completion Systems pretty good in showing a developer all possible completions in a given context. However, sometimes these proposals can be overwhelming for novice developers. Goal of this project is to develop completion engines that leverage the information how other developers used certain types in similar context and thus are capable to filter OR rearrange proposals according to some relevance criterion (similar to Mylyn's Context model but learning this relevance judgment based on how thousands of users used a given API). read more...
  2. Smart Template Engines:
    The well-known SWT Templates are pretty helpful for developers not familiar with all details of SWT. Unfortunately creating such templates is a tedious and time-consuming task. Consequently the number of such code templates is rather small. However, code of existing applications contains hundreds of frequently reoccurring code snippets that can be extracted and shared among developers. This project will provide tools that support developers finding (for instance) method call chains for situations like "How do I get an instance of IStatusLineManager inside a ViewPart" and will allow them to share such templates with other developers.
  3. Crowd-sourced and Usage-Driven API Documentation:
    API documentation, independent of how much time has been spent on writing them, lacks the information how developers actually use these APIs. This information, however, can be easily extracted from code that uses the APIs in questions, and thus could be used to enrich existing API documentation with real usage driven documentation. Code Recommenders aims to develop tools for finding and sharing this kind of knowledge among developers. read more...
  4. Stacktrace Search Engine:
    Exceptions occur. Apache Maven, for instance, reflects this reality by providing wiki pages for frequently occurring build exceptions which aim to explain why these exceptions may have occurred during a Maven build and how to fix them. This concept is a pretty neat idea but its potential is not exhausted yet. Currently the matching between an exception occurring during a build and a wiki page is done based on the type of the exception (e.g., BuildException, IllegalArgumentException'etc.) This matching is rather coarse-grained and neglects the fact that the same exception might occur in many different locations and may be caused by many different reasons. First experimental results have shown that leveraging much more information like the stackframe elements and exceptions messages etc. yield to a system that is capable to find very similar exceptions and thus allows building a new kind of search engine for stacktraces. This project aims to develop such a stacktrace search engine and provide integrations of this engine into existing web platforms like the Eclipse forums and others.
  5. API Misuse / Bug Detector:
    When using APIs unfamiliar with we often misuse a given API, i.e., we forget to call certain methods or pass wrong parameters to a method call etc. These mistakes are hard to find and debug. Tools like PMD and FindBugs do a great job on finding issues like NULL pointers, or recommend overriding hashCode along with equals but aren't a big help if framework specific usage rules are violated. However, research tools exist that are capable to find strange API uses, i.e., usages which significantly differ from how most people used a certain API and thus may indicate possibly bugs in code. This project aims to provide an evaluation for such tools and will provide an initial system as baseline. read more

However, the scope of the recommenders project is not limited to such kind of tools and encourages the community discuss new ideas of tools that might be helpful for software engineers.

Initial Contribution

There are dozens of (research) projects that leverage collective intelligence in one way or the other, and the code recommenders project developed at Darmstadt University of Technology is just one of them. However, an open vendor-neutral Eclipse project may be a perfect place for these tools to contribute to Eclipse and to evaluate their approaches within a vibrant user community. But every Eclipse incubator project has to start with an initial contribution which will consist of two existing recommender components. Each component was described in its own blog post in detail, and we refer interested parties to these blog posts and to the forum for further discussions of these tools.
  1. Intelligent Code Completion
  2. Extended, usage-driven Javadoc

Components like the Stacktrace search engine, or API Usage bug detector are under development yet and will follow when ready.

The proposed namespace of the project will be org.eclipse.recommenders.*.

Description

Goal of the (code) recommenders project is to build IDE tools like intelligent code completion, extended API docs etc. that continuously improve themselves by leveraging implicit and explicit knowledge about how APIs are used by their clients, and, in turn, give back this information to other developers to ease their work with new and unfamiliar frameworks and development environments.

Current state of the initial contribution is that these systems are fed more or less manually by an administrator that collects example applications from large code repositories like EclipseSource's Yoxos and then starts the analysis and data extraction process to build new models. This approach may be further automated to leverage the already existing infrastructure of the Eclipse Marketplace and P2 to continuously scan and update API usages and build up-to-date models for the Eclipse APIs.

Unfortunately, such a manual approach does not scale well if potentially thousands of (non-eclipse-based) frameworks should be supported. It is simply too difficult to find enough example applications to make this approach work. Thus, in the long-term this manual data collection process should be replaced by a community-driven approach where users are allowed to voluntary share their knowledge about how use these APIs either by giving explicit or implicit feedback (cf. the position paper about user feedback and information sharing). Clearly, special requirements for privacy have to be met so that no individual's private or company´┐Żs critical data is collected or published. Different models of data sharing have to be developed and discussed with the community.

As one of the first steps, a platform allowing developers to share knowledge will be developed and the existing tools (i.e., intelligent code completion and usage-driven Javadocs) will be based on these concepts as a proof of concept. A community driven approach may follow.

Committers

The following individuals are proposed as initial committers to the project:

The Code Recommenders project is developed at Darmstadt University of Technology. The project is lead by Marcel Bruch and advised by Mira Mezini. Although the number of initial committers is low, we expect this set to quickly grow. The project itself was supported by more than 50 students doing various hands-on trainings, bachelor and master theses in the past and future contributions will be made directly under the proposed project. Thus, the initial committers will be
Marcel Bruch, Darmstadt University of Technology
Project Lead
Mira Mezini, Darmstadt University of Technology
Project Management
Eric Bodden, Darmstadt University of Technology
Committer
Johannes Lerch
Committer
Dennis Sänger
Committer
Sebastian Proksch
Committer

We welcome additional committers and contributions.

Mentors

The following Architecture Council members will mentor this project:

Interested Parties

The following individuals, organisations, companies and projects have expressed interest in this project:

  • Chris Aniszczyk, Red Hat
  • Fabian Steeg, University of Cologne
  • Benjamin Muskalla, Tasktop
  • Beyhan Veliev, EclipseSource
  • Holger Staudacher, EclipseSource
  • Zviki Cohen, nWire Software
  • Martin Robillard, McGill University, Montreal, Canada
  • Stefan Lay, SAP AG
  • Matthias Sohn, SAP AG
  • Frederic Madiot, Eclipse MoDisco
  • Maxime Jeanmart, JITT Consulting

Additional Information

Changes to this Document

Date Change
28-October-2010 Document created
22-November-2010 Updated Initial Contributions (added proposed namespace), Interested Parties (added new interested parties), Mentors (added second mentor), Committers (added three initial committers)