Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Questions about PTP SDM debugger


A correction on my previous email. Building the tree bottom up was not the solution for the problem of malicious users trying to get control of a debug session. The way the tree was build in the solution I was thinking of was totally different, where the top level process created child processes, After the parent process invoked the child processes, the parent process opened a port that the child processes know the port number, then waited for the child processes to connect to that port. The structure of the application was such that a malicious user could not send data down the tree to the application, but only upward to the top level process in the tree. It doesn't look like what I was thinking off with this approach will work for the SDMs since our approach is totally different where the SDMs are all created and then connection requests are initiated downward from the top level SDM.
Dave


Dave Wootton/Poughkeepsie/IBM@IBMUS
Sent by: ptp-dev-bounces@xxxxxxxxxxx

08/28/2008 02:08 PM

Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>

To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc
Subject
Re: [ptp-dev] Questions about PTP SDM debugger





I also prefer a base + mod(rank)/n approach. My problem with the random
port selection with retry is that you introduce delays in the startup
process as you build the tree. Depending on tree depth, this could cause
debugger startup to be slow. I don't think 65536 is the correct 'n' since
you then end up scattering PTP ports across the entire user port range,
and also, as Daniel points out, because of the multiple debugger instances
case. I was thinking 'n' might be 256 or maybe 512 since it's not very
likely a user would ever run that many tasks on a node. There's still the
slight possibility of port collisions because of other applications, so
I'd 'reserve' a few more ports above base+n to use for collisions.

I'm not entirely sure why the parent SDM which is trying to build the tree
downwards needs to use a random port connection approach. I'm also afraid
that if you make 'n' 65536, that you will accidentally connect to a random
port from some other application and either get yourself hung because
whatever you connected to doesn't understand your protocol, or worse, you
hang or crash the other aplication.

You can at least partially solve this by telling the SDMs what 'base' and
'n' (if not hardcoded) are when you start the SDMs. Each task picks its
port number using the base + mod(rank)/n calculation. The proxy also knows
what 'base' and 'n' are, as well as task rank, so as it builds the
routing_file it can fill in the port number for each SDM. Then as SDMs
build the tree, they try to connect to that port.

The question is what do you do if you have a collision on port? The parent
SDM can try connecting to ports in the spare port range until it connects
to the proper SDM.. You still have the problem of connecting to random
applications. Also, what's the timeout when you try to connect to a port
where nothing is listening? Could it be long eneough to make startup time
a problem?

Your handshake to validate that a legitimate SDM is connecting is
important, especially since whatever connects (wrong user's SDM, malicious
user) to the SDM could get control of the debugger on some tasks in the
application. If you were to build the tree bottom-up instead of top-down,
then as long as an invalid connection could do no worse than send bad data
upstream to the GUI and not grab control of the application, the risk is
less.

I'm not sure I'd rely on users to pick 'base'. If two users pick the same
base, or one that causes overlap of port numbers, then you still have a
problem.
A simplistic way to pick base would be a calculation based on the user's
uid number (uid # * something mod(256)?). The only other way I know how to
solve the problem is what we did in DPCL, where we had what we called a
'super daemon' that ran as root and handled the issue of starting multiple
instances of DPCL daemons, but that gets a little complicated.

How do you tell when the tree is built? Is this by each parent SDM keeping
track of how many child SDMs it started and counting responses then
reporting 'done' up the tree?
Dave




Back to the top