Re: [ptp-dev] Questions about PTP SDM debugger

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [ptp-dev] Questions about PTP SDM debugger

From: Daniel Felix Ferber <dfferber@xxxxxxxxxxxxxxxxxx>
Date: Thu, 28 Aug 2008 14:18:27 -0300
Delivered-to: ptp-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/ptp-dev>
List-help: <mailto:ptp-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/ptp-dev>, <mailto:ptp-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/ptp-dev>, <mailto:ptp-dev-request@eclipse.org?subject=unsubscribe>
Organization: IBM
User-agent: Thunderbird 2.0.0.16 (X11/20080723)

Greg and Dave,

I think that Greg suggestion to launch SDM is reasonable. But are weconsidering race conditions? A am afraid that this approach mightpresent several failure patterns depending on how long each sdm delaysto start.

For example: The servers and the master are started nearly at the sametime. All servers bind to a port as you described. The master receivesthe routing file and starts connecting to children that on their turnconnect to grandchildren and so on. What happens if a children delays tostart up for some reason? Its parent will try to connect (but thechildren will not be listening yet) and the parent will try the nextports, but will never try again the port that the children is actuallylistening to. I saw this happening, and that is the reason why thelauncher is currently starting the master after the servers instead theopposite as described in the specification. I think other raceconditions might be possible.

There is another issue in the strategy to launch the sdm master. Afterstarting the sdm master, the launcher starts listening on a socket wheresdm master is expected to connect. The port number is passed asparameter to sdm master. However, it may happen that sdm starts fasterthan the launcher creates the socket. The sdm master will try toconnect, and on failure try the next ports. This approach does not makesense in this situation, since the port number passed to sdm master isguaranteed to be the port where the launcher is listening. Therefore,sdm master should not try the next ports, but try the same port again.

Another concern: Does the handshake consider the job ID? There could bea scenario were two users start a debug sessions on the same machine atthe same time. Then, one might connect to the SDM server of the other,by accident, if the are listening on the same port range.

I agree that using a base port number is better than using a randomnumber for each process. I think it is enough that the base port numberis pseudo-random. I would avoid using a fixed port number because thatwould potentially cause port number collisions on two simultaneousdebugging. I understand that sdm servers will know to handle thiscollision, but the start of sdm servers will take more time. By choosingthe base port randomly, we reduce the probability of causing collisions.

My comments about who should write the hostfile: I see Dave concerns. Ireally did not consider that the amount of data to be transmitted wouldbecome that large.Couldn't we establish a standard file format to be used for alldebuggers? Then the file could be written by the proxy, regardless whichdebugger is being used. I don't have a really good idea for this issue yet.


Best regards,
Daniel Felix Ferber


Greg Watson wrote:

Good, I'm glad we're in agreement :-). Daniel, do you have anycomments on this?
Regarding the port numbers, this is not how I had intended thedebugger startup to work, so I want to change this at some point. Myapproach is as follows, but any other suggestions would be welcome.
1. The SDM servers are given a "base" port number. At startup, theyattempt to bind to this port. If that fails, they try to bind tobase_port+1 after waiting a short random period (this is to avoidservers started on the same node from chasing each other up the portnumbers). An alternative to this would be to bind to ((base_port+rank)%65536)+1024. A third alternative would be to use a pseudorandom number generator seeded by the rank.
2. When the SDM master receives the routing file, it can thendetermine the location of it's children, so it attempts to connect toeach in turn using the same port generation mechanism as in #1.
3. Once the connection is established to the server, a handshake isused to swap credentials, etc., then the routing file is sent. Therouting file could be successively pruned as it propagates up the treeto reduce bandwidth.
4. Once the server receives the routing file, it does the same as #2.
5. This continues until all connections have been established, orthere was a timeout or some other error.
Greg

On Aug 28, 2008, at 8:46 AM, Dave Wootton wrote:
Greg
I think the proxy should be responsible for building the routingfile, inorder to keep the traffic on the connection between the GUI and theproxy
down. With the current approach, you are sending node information across
the connection twice, once to populate the PTP runtime model, then a
second time to create the routing file on the nodes where the SDMs are
running. I'm not sure what the message length for the messages from the
proxy to the GUI are, but for the remote_file you havestrlen(task_index)
+ strlen(hostname) + strlen(port_number) + 3 bytes per node. In my case
that's close to 20 bytes per task, minimum. With large numbers of tasks,
this could be a lot of data, and since all of these interactions between
the GUI, the proxy, and the SDMs are a serial process, they slow down
debugger startup.
The down side to this is the need for each proxy to implement supportforeach of unique debugger startup sequences it is willing to support,where
you could end up with some proxies not supporting a debugger. If you
implement all of the code in the GUI resource manager side though,I'm not
sure you don't have the same problem, where the RM needs to be aware of
the details of both the debugger startup sequence and the details of a
particular runtime environment/proxy.

The other question I have after seeing the contents of the routing file
you generate is the generation of random port numbers. If you end up
actually using these port numbers, do you run the risk of accidentally
using a port number reserved for some other application, unless youblockout a range of port numbers and only use that range? Even if portnumbers
are up for grabs with no expectation of reserved port numbers, what
happens if something else is using your port number?
Dave

Follow-Ups:
- Re: [ptp-dev] Questions about PTP SDM debugger
  - From: Dave Wootton

References:
- Re: [ptp-dev] Questions about PTP SDM debugger
  - From: Dave Wootton
- Re: [ptp-dev] Questions about PTP SDM debugger
  - From: Greg Watson

Prev by Date: Re: [ptp-dev] Questions about PTP SDM debugger
Next by Date: Re: [ptp-dev] New ANTLR version changes
Previous by thread: Re: [ptp-dev] Questions about PTP SDM debugger
Next by thread: Re: [ptp-dev] Questions about PTP SDM debugger
Index(es):
- Date
- Thread

Breadcrumbs