When I used the term "bandwidth", I was
referring to time in the human resources sense. The
Eclipse webmaster *may* be able to set you up with a
virtual server, but the EF has limited resources to
provide support and maintenance on that server.
On 01/11/2012 04:16 AM, Marcel Bruch wrote:
I think that there are several things that
need discussion.
1. Which data users (explicitly or
implicitly) provide
2. Under which terms of use this data is used
by us and others
3. Who stores the data
4. Who can access the data and in which
format (degree of anonymization).
1.
The term 'data' subsumes a quite large range
of information.
For Snipmatch this includes code snippets and
maybe usage statistics (what has been used when
to update the ranking strategies)
For Extdoc this may include information like
comments, editorial actions, or user ratings.
For Call Completion this includes the models
that have to be delivered to the clients and
information about their jar's they use (e.g.,
file fingerprints etc).
For Chain Completion this may include usage
statistics (as for snipmatch to improve ranking
strategies) and code snippets.
You can think of other information too.
2.
I'd like to say that this is an important
topic that needs a solid research. It will
probably require us to get in contact with
lawyers to clarify what's possible/required.
It should be clear that everyone who shares
data (code snippets etc.) must be in the
position to actually be allowed to share it. For
me, it's basically the same as with the Eclipse
Wiki. All users that contribute to it must agree
its terms of use. Is there a difference? Are
these terms of use reusable for our use case? I
guess I should prepare a detailed description
what get's collected and provided by whom to
enable a lawyer to help here?
3.
If I understood correctly, the foundation has
no bandwidth to host these services. In that
case, I've to get back to my university and ask
for permission to host these services somewhere
close to our backbone or raise some funding to
put a server elsewhere. One question that comes
into my mind: If the foundation is not hosting
these services, can we deliver Code Recommenders
with preconfigured URLs that point to external
project servers? For instance, something like "
code.recommenders.org"?
4.
What is needed - and technical feasible? It
may become the case that the raw data exceeds
TBs (not in the first years I guess :)).
Honestly, I've yet no clue how much data will be
collected and what information others may be
interested in. What we have in mind is to create
reference data sets for machine learners and se
researchers to enable research to create new
tools and improve algorithms for code search,
code recommendations etc. But these data sets
will, for practical reasons, only include a
subset of (anonymized) data needed for research
purpose. Would this be satisfying? Do you think
some kind of agreement is needed?
Is there anything I'm currently not aware of?
Thanks,
Marcel
On 11.01.2012, at 00:08, Wayne Beaton
wrote:
FWIW, the Eclipse Foundation has a
single lawyer on staff. Though we do retain
the services of other lawyers. So I guess,
"lawyers" is generally accurate :-)
The project needs to make a case to the
Eclipse Foundation for capturing and
maintaining this data. We are very concerned
about privacy, and so are many people in the
community. There are actual laws in some
countries that need to be considered as
well.
Since we are a transparent and open
organization, there needs to be
consideration for disseminating the
collected data to other parties. With the
usage data, we tried publishing filtered
data (which excluded anything that could
potentially expose/identify specific users)
with limited success. We failed in this
regard which is a big reason why we shut
down the udc.
Unfortunately, the Eclipse Foundation lacks
the bandwidth to maintain this data on your
behalf.
Wayne
On 01/10/2012 05:36 PM, Marcel Bruch wrote:
sounds good to me. But let's see what the Foundation's lawyers say about this... I'll keep you posted.
On 10.01.2012, at 23:26, Doug Wightman wrote:
Hi Marcel,
I think that's a great idea. For SnipMatch, it would probably make the
most sense to have wording in to the effect that the contributor is
verifying that they own the code and is giving a royalty-free license
to use it for any purpose. This would be associated with a checkbox
that must be checked when the code is to be shared publicly. We
currently have something to this effect already built, but the wording
hasn't been run by lawyers.
Doug
On Tue, Jan 10, 2012 at 3:00 PM, Marcel Bruch <bruch@xxxxxxxxxxxxxxxxxx> wrote:
Hi PMC,
code recommenders is making good progress and we are confident that we'll
satisfy all major criteria for M5. Extended documentation platform, code
completion engines, and local code search engine are maturing quickly and
SnipMatch guys will start at the end of January. Java, RCP/RAP, and Scout
Packages expressed some interest to integrate Code Recommenders in their
package and we work at full blast to make this happen.
One thing that hasn't been discussed in detail was how do we deal with the
data users provide for instance to snipmatch's community code templates
store or to the extended documentation platform? Is there a special
wiki-like 'terms of usage' needed? Were does this data go to? Also, for
stacktrace search or model generation and model download some data needs to
be delivered to the client and submitted. We started this discussion a while
ago but postponed it.
I'd like to pick up the discussion again - early enough before Juno
arrives. I'm not sure wether this is a discussion for the PMC mailing list
since finally it's a decision of the Foundation. But Wayne will know, I
guess.
Thanks,
Marcel
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev
Thanks,
Marcel
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev
_______________________________________________
recommenders-dev mailing list
recommenders-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/recommenders-dev