Home » Eclipse Projects » EGit / JGit » Git and submodules via HTTP
Git and submodules via HTTP [message #6747] |
Fri, 08 May 2009 08:49 |
Eclipse User |
|
|
|
Originally posted by: alex_blewitt.nospam.yahoo.com
My understanding of a git submodule is that it is essentially a
git-repository-within-a-repository, with a pointer to a different git
repository's tree at a particular commit.
That would allow me (for example) to have a local checked out copy of (say)
http://example.com/parent.git
http://other.example.com/child.git
appear locally as parent/child in my file system. In order to do work on
child.git, any pushes would go to other.example.com; and when I've committed
my changes, I can commit parent.git (which would essentially update a pointer
from child.git/abcdef to child.git/123456 or whatever the tip pointed to).
So my question is, do these get served by nested HTTP requests, or must they
always be separate on the server? In other words, if I checked out 'parent',
would I just see parent.git/childsymlink coming down (and then the git client
would resolve symlink to otherhost/child.git automatically)?
I'm wondering whether I need to support recursive Git repositories in the HTTP
namespace, like whether I'd ever see http://example.com/parent.git/child.git.
I'm also trying to find out how I'd parse the URL to identify a specific git
repo. For example, I could have a server hosting multiple git repositories
e.g.
http://example.com/public/product.git
http://example.com/private/other.git
And then either with the --export-all or similar, how I'd then find out which
part of the URL is the parent repo and which isn't.
Lastly, can I assume that the URL is always going to be of the form
http://example.com/.*/.*\.git/.* ? Or do we have to cater for the .git
extension being missing? I'm just wondering whether I can assume patterns in
the URL to initialise the Repo, or whether I'm going to have to hit the file
systems repeatedly looking fro (e.g.) public/.git, public.git/ etc. in order
to resolve the repo.
Alex
|
|
|
Re: Git and submodules via HTTP [message #6755 is a reply to message #6747] |
Fri, 08 May 2009 11:18 |
Eclipse User |
|
|
|
Originally posted by: j16sdiz.gmail.com
On 8/5/2009 16:49, Alex Blewitt wrote:
[...]
> I'm wondering whether I need to support recursive Git repositories in the HTTP
> namespace, like whether I'd ever see http://example.com/parent.git/child.git.
>
(1) "recursive" repositories are valid and works.
(2) server-side repositories should be bare, so there are no valid reason
why sb would do this -- just ignore this if it is too hard.
> I'm also trying to find out how I'd parse the URL to identify a specific git
> repo. For example, I could have a server hosting multiple git repositories
> e.g.
>
> http://example.com/public/product.git
> http://example.com/private/other.git
>
> And then either with the --export-all or similar, how I'd then find out which
> part of the URL is the parent repo and which isn't.
>
Can't tell -- and don't have to know.
Submodules are client-side hacks, server don't need any knowledge of it.
[...]
|
|
|
Re: Git and submodules via HTTP [message #6761 is a reply to message #6747] |
Fri, 08 May 2009 11:23 |
Eclipse User |
|
|
|
Originally posted by: j16sdiz.gmail.com
On 8/5/2009 16:49, Alex Blewitt wrote:
[...]
> Lastly, can I assume that the URL is always going to be of the form
> http://example.com/.*/.*\.git/.* ? Or do we have to cater for the .git
> extension being missing?
Current HTTP implementation works without .git suffix.
But I have never seen any (real-life) repository missing that suffix.
> I'm just wondering whether I can assume patterns in
> the URL to initialise the Repo, or whether I'm going to have to hit the file
> systems repeatedly looking fro (e.g.) public/.git, public.git/ etc. in order
> to resolve the repo.
>
|
|
|
Re: Git and submodules via HTTP [message #6767 is a reply to message #6761] |
Fri, 08 May 2009 14:59 |
Shawn O. Pearce Messages: 82 Registered: July 2009 |
Member |
|
|
Daniel Cheng wrote:
> On 8/5/2009 16:49, Alex Blewitt wrote:
> [...]
>> Lastly, can I assume that the URL is always going to be of the form
>> http://example.com/.*/.*\.git/.* ? Or do we have to cater for the .git
>> extension being missing?
>
> Current HTTP implementation works without .git suffix.
> But I have never seen any (real-life) repository missing that suffix.
Its rare, but I have seen a few. Most people do have the .git suffix.
>> I'm just wondering whether I can assume patterns in
>> the URL to initialise the Repo, or whether I'm going to have to hit
>> the file
>> systems repeatedly looking fro (e.g.) public/.git, public.git/ etc. in
>> order
>> to resolve the repo.
This probably should follow the same rules gitweb.cgi follows for
deciding if a repository is available for web service. I think those
rules are rather sane and work for anyone.
IIRC the rules are like:
- Repository name comes in the "PATH_INFO" field of the request.
In terms of a Java servlet, we'd need some web.xml mapping of /* into
the service servlet, and its getPathInfo() method would yield the
relative repository name.
- A configured base path is prefixed.
Take the name you got from PATH_INFO and stick a base path in front.
This gives you a location on disk of where the repository should be.
- Adjust Git suffix.
Basically you probe around on disk for the following paths, to see if
any of them are a git repository:
1) path
2) path + "/.git"
3) path + ".git"
*) else fail
Finally, gitweb.cgi also allows a "project listing". This is a simple
text file of valid repository names, one per line. The base path is
still inserted to get the filesystem path, but if a project listing is
configured, *ONLY* the projects listed in the project listing are
available for service.
Where this gets ugly for you is, how do you split a REST-like URI such
as "/public/foo/bar/this.git/info/refs" into the repository part and the
part within the repository?
My suggestion is, support only suffixes that are "well known patterns"
that you can fairly easily trim off the end of the PATH_INFO part of the
request, and try the remaining PATH_INFO portion against the rules
above. You may need to try a couple of splits before you find the right
one, but odds are, you'll split correctly the first time out. Most
people don't nest repositories, and most people won't name repositories
that are similar to your suffix naming conventions.
|
|
|
Re: Git and submodules via HTTP [message #6772 is a reply to message #6767] |
Fri, 08 May 2009 21:37 |
Eclipse User |
|
|
|
Originally posted by: alex_blewitt.nospam.yahoo.com
> Shawn Pearce wrote:
> This probably should follow the same rules gitweb.cgi follows for
> deciding if a repository is available for web service. I think those
> rules are rather sane and work for anyone.
>
> IIRC the rules are like:
>
> - Repository name comes in the "PATH_INFO" field of the request.
>
> In terms of a Java servlet, we'd need some web.xml mapping of /* into
> the service servlet, and its getPathInfo() method would yield the
> relative repository name.
Right, so /* would be picked up by the servlet, but the first part may not be
a git repository, i.e. /[a-z]*/ might not correspond to a .git repo on disk.
There might be a few of them like /foo/bar/other/repo.git which would be
handled by the same servlet. The goal is to parse a URL like /a/b/c/d/e/f and
then figure out whether the repo is based in a or b or c or d, and then from
that which is the relative part (like refs/head/master) for d or e or f.
> Basically you probe around on disk for the following paths, to see if
> any of them are a git repository:
>
> 1) path
> 2) path + "/.git"
> 3) path + ".git"
> *) else fail
Still not sure how that works because you're assuming that 'path' here is just
the end of the repository; i.e. in the case above that 'f' is the f.git
location. That probably isn't the case, because there'd be an set of local
parts to the repository.
> Where this gets ugly for you is, how do you split a REST-like URI such
> as "/public/foo/bar/this.git/info/refs" into the repository part and the
> part within the repository?
Yes, that's exactly the problem.
> My suggestion is, support only suffixes that are "well known patterns"
> that you can fairly easily trim off the end of the PATH_INFO part of the
> request, and try the remaining PATH_INFO portion against the rules
> above. You may need to try a couple of splits before you find the right
> one, but odds are, you'll split correctly the first time out. Most
> people don't nest repositories, and most people won't name repositories
> that are similar to your suffix naming conventions.
Right - it's easy enough to just say that the repo is /(.*\.git)(/.*) or
similar, use the first part to map to disk and then the last part to be the
relative part.
The plan is to have a router servlet which does these decisions, then pases it
on to a repo servlet which is given a Repository configured and can then
handle the other entries. Restlet has some pretty nice ideas in terms of how
to field this so I may do something similar (but without a dependency on
Restlet itself).
Alex
|
|
|
Re: Git and submodules via HTTP [message #6779 is a reply to message #6755] |
Fri, 08 May 2009 21:37 |
Eclipse User |
|
|
|
Originally posted by: alex_blewitt.nospam.yahoo.com
> Daniel Cheng wrote:
>> On 8/5/2009 16:49, Alex Blewitt wrote:
>>
>> And then either with the --export-all or similar, how I'd then find out
which
>> part of the URL is the parent repo and which isn't.
>>
>
> Can't tell -- and don't have to know.
I'm thinking of making the URLs part of a rest-based interface to the
repository, so I have to parse http://example.com/foo/bar/other/objects/refs
and then use that to figure out which directory the git repository is based
(without any aprior knowledge that there's likely to be an object/refs
underneath the repo).
Alex
|
|
|
Re: Git and submodules via HTTP [message #572482 is a reply to message #6747] |
Fri, 08 May 2009 11:18 |
Eclipse User |
|
|
|
Originally posted by: j16sdiz.gmail.com
On 8/5/2009 16:49, Alex Blewitt wrote:
[...]
> I'm wondering whether I need to support recursive Git repositories in the HTTP
> namespace, like whether I'd ever see http://example.com/parent.git/child.git
>
(1) "recursive" repositories are valid and works.
(2) server-side repositories should be bare, so there are no valid reason
why sb would do this -- just ignore this if it is too hard.
> I'm also trying to find out how I'd parse the URL to identify a specific git
> repo. For example, I could have a server hosting multiple git repositories
> e.g.
>
> http://example.com/public/product.git
> http://example.com/private/other.git
>
> And then either with the --export-all or similar, how I'd then find out which
> part of the URL is the parent repo and which isn't.
>
Can't tell -- and don't have to know.
Submodules are client-side hacks, server don't need any knowledge of it.
[...]
|
|
|
Re: Git and submodules via HTTP [message #572530 is a reply to message #6747] |
Fri, 08 May 2009 11:23 |
Eclipse User |
|
|
|
Originally posted by: j16sdiz.gmail.com
On 8/5/2009 16:49, Alex Blewitt wrote:
[...]
> Lastly, can I assume that the URL is always going to be of the form
> http://example.com/.*/.*\.git/.* ? Or do we have to cater for the .git
> extension being missing?
Current HTTP implementation works without .git suffix.
But I have never seen any (real-life) repository missing that suffix.
> I'm just wondering whether I can assume patterns in
> the URL to initialise the Repo, or whether I'm going to have to hit the file
> systems repeatedly looking fro (e.g.) public/.git, public.git/ etc. in order
> to resolve the repo.
>
|
|
|
Re: Git and submodules via HTTP [message #572546 is a reply to message #6761] |
Fri, 08 May 2009 14:59 |
Shawn O. Pearce Messages: 82 Registered: July 2009 |
Member |
|
|
Daniel Cheng wrote:
> On 8/5/2009 16:49, Alex Blewitt wrote:
> [...]
>> Lastly, can I assume that the URL is always going to be of the form
>> http://example.com/.*/.*\.git/.* ? Or do we have to cater for the .git
>> extension being missing?
>
> Current HTTP implementation works without .git suffix.
> But I have never seen any (real-life) repository missing that suffix.
Its rare, but I have seen a few. Most people do have the .git suffix.
>> I'm just wondering whether I can assume patterns in
>> the URL to initialise the Repo, or whether I'm going to have to hit
>> the file
>> systems repeatedly looking fro (e.g.) public/.git, public.git/ etc. in
>> order
>> to resolve the repo.
This probably should follow the same rules gitweb.cgi follows for
deciding if a repository is available for web service. I think those
rules are rather sane and work for anyone.
IIRC the rules are like:
- Repository name comes in the "PATH_INFO" field of the request.
In terms of a Java servlet, we'd need some web.xml mapping of /* into
the service servlet, and its getPathInfo() method would yield the
relative repository name.
- A configured base path is prefixed.
Take the name you got from PATH_INFO and stick a base path in front.
This gives you a location on disk of where the repository should be.
- Adjust Git suffix.
Basically you probe around on disk for the following paths, to see if
any of them are a git repository:
1) path
2) path + "/.git"
3) path + ".git"
*) else fail
Finally, gitweb.cgi also allows a "project listing". This is a simple
text file of valid repository names, one per line. The base path is
still inserted to get the filesystem path, but if a project listing is
configured, *ONLY* the projects listed in the project listing are
available for service.
Where this gets ugly for you is, how do you split a REST-like URI such
as "/public/foo/bar/this.git/info/refs" into the repository part and the
part within the repository?
My suggestion is, support only suffixes that are "well known patterns"
that you can fairly easily trim off the end of the PATH_INFO part of the
request, and try the remaining PATH_INFO portion against the rules
above. You may need to try a couple of splits before you find the right
one, but odds are, you'll split correctly the first time out. Most
people don't nest repositories, and most people won't name repositories
that are similar to your suffix naming conventions.
|
|
|
Re: Git and submodules via HTTP [message #572615 is a reply to message #6767] |
Fri, 08 May 2009 21:37 |
Alex Blewitt Messages: 946 Registered: July 2009 |
Senior Member |
|
|
> Shawn Pearce wrote:
> This probably should follow the same rules gitweb.cgi follows for
> deciding if a repository is available for web service. I think those
> rules are rather sane and work for anyone.
>
> IIRC the rules are like:
>
> - Repository name comes in the "PATH_INFO" field of the request.
>
> In terms of a Java servlet, we'd need some web.xml mapping of /* into
> the service servlet, and its getPathInfo() method would yield the
> relative repository name.
Right, so /* would be picked up by the servlet, but the first part may not be
a git repository, i.e. /[a-z]*/ might not correspond to a .git repo on disk.
There might be a few of them like /foo/bar/other/repo.git which would be
handled by the same servlet. The goal is to parse a URL like /a/b/c/d/e/f and
then figure out whether the repo is based in a or b or c or d, and then from
that which is the relative part (like refs/head/master) for d or e or f.
> Basically you probe around on disk for the following paths, to see if
> any of them are a git repository:
>
> 1) path
> 2) path + "/.git"
> 3) path + ".git"
> *) else fail
Still not sure how that works because you're assuming that 'path' here is just
the end of the repository; i.e. in the case above that 'f' is the f.git
location. That probably isn't the case, because there'd be an set of local
parts to the repository.
> Where this gets ugly for you is, how do you split a REST-like URI such
> as "/public/foo/bar/this.git/info/refs" into the repository part and the
> part within the repository?
Yes, that's exactly the problem.
> My suggestion is, support only suffixes that are "well known patterns"
> that you can fairly easily trim off the end of the PATH_INFO part of the
> request, and try the remaining PATH_INFO portion against the rules
> above. You may need to try a couple of splits before you find the right
> one, but odds are, you'll split correctly the first time out. Most
> people don't nest repositories, and most people won't name repositories
> that are similar to your suffix naming conventions.
Right - it's easy enough to just say that the repo is /(.*\.git)(/.*) or
similar, use the first part to map to disk and then the last part to be the
relative part.
The plan is to have a router servlet which does these decisions, then pases it
on to a repo servlet which is given a Repository configured and can then
handle the other entries. Restlet has some pretty nice ideas in terms of how
to field this so I may do something similar (but without a dependency on
Restlet itself).
Alex
|
|
| |
Goto Forum:
Current Time: Thu Jan 02 22:34:48 GMT 2025
Powered by FUDForum. Page generated in 0.04502 seconds
|