Re: [ptp-dev] Questions about PTP SDM debugger

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [ptp-dev] Questions about PTP SDM debugger

From: Dave Wootton <dwootton@xxxxxxxxxx>
Date: Thu, 28 Aug 2008 22:23:18 -0400
Delivered-to: ptp-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/ptp-dev>
List-help: <mailto:ptp-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/ptp-dev>, <mailto:ptp-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/ptp-dev>, <mailto:ptp-dev-request@eclipse.org?subject=unsubscribe>

A correction on my previous email. Building the tree bottom up was not the solution for the problem of malicious users trying to get control of a debug session. The way the tree was build in the solution I was thinking of was totally different, where the top level process created child processes, After the parent process invoked the child processes, the parent process opened a port that the child processes know the port number, then waited for the child processes to connect to that port. The structure of the application was such that a malicious user could not send data down the tree to the application, but only upward to the top level process in the tree. It doesn't look like what I was thinking off with this approach will work for the SDMs since our approach is totally different where the SDMs are all created and then connection requests are initiated downward from the top level SDM.
Dave

Dave Wootton/Poughkeepsie/IBM@IBMUS
Sent by: ptp-dev-bounces@xxxxxxxxxxx

08/28/2008 02:08 PM

Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>

To	Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc
Subject	Re: [ptp-dev] Questions about PTP SDM debugger

I also prefer a base + mod(rank)/n approach. My problem with the random port selection with retry is that you introduce delays in the startup process as you build the tree. Depending on tree depth, this could cause debugger startup to be slow. I don't think 65536 is the correct 'n' since you then end up scattering PTP ports across the entire user port range, and also, as Daniel points out, because of the multiple debugger instances case. I was thinking 'n' might be 256 or maybe 512 since it's not very likely a user would ever run that many tasks on a node. There's still the slight possibility of port collisions because of other applications, so I'd 'reserve' a few more ports above base+n to use for collisions. I'm not entirely sure why the parent SDM which is trying to build the tree downwards needs to use a random port connection approach. I'm also afraid that if you make 'n' 65536, that you will accidentally connect to a random port from some other application and either get yourself hung because whatever you connected to doesn't understand your protocol, or worse, you hang or crash the other aplication. You can at least partially solve this by telling the SDMs what 'base' and 'n' (if not hardcoded) are when you start the SDMs. Each task picks its port number using the base + mod(rank)/n calculation. The proxy also knows what 'base' and 'n' are, as well as task rank, so as it builds the routing_file it can fill in the port number for each SDM. Then as SDMs build the tree, they try to connect to that port. The question is what do you do if you have a collision on port? The parent SDM can try connecting to ports in the spare port range until it connects to the proper SDM.. You still have the problem of connecting to random applications. Also, what's the timeout when you try to connect to a port where nothing is listening? Could it be long eneough to make startup time a problem? Your handshake to validate that a legitimate SDM is connecting is important, especially since whatever connects (wrong user's SDM, malicious user) to the SDM could get control of the debugger on some tasks in the application. If you were to build the tree bottom-up instead of top-down, then as long as an invalid connection could do no worse than send bad data upstream to the GUI and not grab control of the application, the risk is less. I'm not sure I'd rely on users to pick 'base'. If two users pick the same base, or one that causes overlap of port numbers, then you still have a problem. A simplistic way to pick base would be a calculation based on the user's uid number (uid # * something mod(256)?). The only other way I know how to solve the problem is what we did in DPCL, where we had what we called a 'super daemon' that ran as root and handled the issue of starting multiple instances of DPCL daemons, but that gets a little complicated. How do you tell when the tree is built? Is this by each parent SDM keeping track of how many child SDMs it started and counting responses then reporting 'done' up the tree? Dave

References:
- Re: [ptp-dev] Questions about PTP SDM debugger
  - From: Dave Wootton

Prev by Date: Re: [ptp-dev] Questions about PTP SDM debugger
Next by Date: [ptp-dev] Problems starting SDM processes
Previous by thread: Re: [ptp-dev] Questions about PTP SDM debugger
Next by thread: Re: [ptp-dev] Questions about PTP SDM debugger
Index(es):
- Date
- Thread

Breadcrumbs