Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Problem with invoking SDM debugger on pSeries Linux

Dave,

I've found a few of problems. There was a bug in the handling of thread stack frames that I've now fixed, but was only apparent on linux_ppc64. Second, the version of gdb on the machine does not handle threads correctly. If you attach gdb to a poe process and select a thread, the -stack-info-depth command thinks the stack frame is corrupted. I tried the same thing with gdb 6.8 (the current version is 6.5) and the problem goes away. See the traces below. Finally, you're not passing the correct parameters to the sdm master and child processes. The --debugger and --debugger_path arguments should only be passed to the child processes. Currently you're passing both to the master which doesn't use them and not passing --debugger_path to the children.

Greg

**** GDB 6.5-25.el5rh ****

-bash-3.1$ gdb -i mi x
~"GNU gdb Red Hat Linux (6.5-25.el5rh)\n"
~"Copyright (C) 2006 Free Software Foundation, Inc.\n"
~"GDB is free software, covered by the GNU General Public License, and you are\n" ~"welcome to change it and/or distribute copies of it under certain conditions.\n"
~"Type \"show copying\" to see the conditions.\n"
~"There is absolutely no warranty for GDB. Type \"show warranty\" for details.\n"
~"This GDB was configured as \"ppc64-redhat-linux-gnu\"..."
~"Using host libthread_db library \"/lib64/libthread_db.so.1\".\n"
~"\n"
(gdb)
attach 22461
&"attach 22461\n"
~"Attaching to program: /home/greg/x, process 22461\n"
~"Reading symbols from /usr/lib/libmpi_ibm.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/libmpi_ibm.so\n"
~"Reading symbols from /usr/lib/libpoe.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/libpoe.so\n"
~"Reading symbols from /usr/lib/liblapi.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/liblapi.so\n"
~"Reading symbols from /lib/libpthread.so.0..."
~"done.\n"
~"[Thread debugging using libthread_db enabled]\n"
~"[New Thread 268383472 (LWP 22461)]\n"
~"[New Thread 1222243504 (LWP 22473)]\n"
~"[New Thread 1218049200 (LWP 22472)]\n"
~"[New Thread 1205073072 (LWP 22465)]\n"
~"[New Thread 1200878768 (LWP 22464)]\n"
~"[New Thread 1083241648 (LWP 22463)]\n"
~"Loaded symbols for /lib/libpthread.so.0\n"
~"Reading symbols from /lib/libm.so.6..."
~"done.\n"
~"Loaded symbols for /lib/libm.so.6\n"
~"Reading symbols from /lib/libc.so.6..."
~"done.\n"
~"Loaded symbols for /lib/libc.so.6\n"
~"Reading symbols from /usr/lib/libstdc++.so.6..."
~"done.\n"
~"Loaded symbols for /usr/lib/libstdc++.so.6\n"
~"Reading symbols from /lib/libgcc_s.so.1..."
~"done.\n"
~"Loaded symbols for /lib/libgcc_s.so.1\n"
~"Reading symbols from /lib/libdl.so.2..."
~"done.\n"
~"Loaded symbols for /lib/libdl.so.2\n"
~"Reading symbols from /lib/ld.so.1..."
~"done.\n"
~"Loaded symbols for /lib/ld.so.1\n"
~"Reading symbols from /usr/lib/libpnsd.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/libpnsd.so\n"
~"Reading symbols from /usr/lib/liblapiudp.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/liblapiudp.so\n"
~"0x401fc2fc in nanosleep () from /lib/libc.so.6\n"
^done
(gdb)
info threads
&"info threads\n"
~"  6 Thread 1083241648 (LWP 22463)  0x4004082c in do_sigwait ()\n"
~"   from /lib/libpthread.so.0\n"
~" 5 Thread 1200878768 (LWP 22464) 0x4003b014 in pthread_cond_wait@@GLIBC_2.3.2\n"
~"    () from /lib/libpthread.so.0\n"
~" 4 Thread 1205073072 (LWP 22465) 0x4003b014 in pthread_cond_wait@@GLIBC_2.3.2\n"
~"    () from /lib/libpthread.so.0\n"
~" 3 Thread 1218049200 (LWP 22472) 0x4003b014 in pthread_cond_wait@@GLIBC_2.3.2\n"
~"    () from /lib/libpthread.so.0\n"
~" 2 Thread 1222243504 (LWP 22473) 0x4003b5c4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0\n"
~"  1 Thread 268383472 (LWP 22461)  0x401fc2fc in nanosleep ()\n"
~"   from /lib/libc.so.6\n"
^done
(gdb)
-stack-info-depth
^done,depth="3"
(gdb)
-thread-select 2
^done,new-thread- id = "2 ",frame = {level = "0 ",addr = "0x4003b5c4",func="pthread_cond_timedwait@@GLIBC_2.3.2",args=[],from="/ lib/libpthread.so.0"}
(gdb)
-stack-info-depth
&"Previous frame inner to this frame (corrupt stack?)\n"
^error,msg="Previous frame inner to this frame (corrupt stack?)"
(gdb)

*** GDB 6.8 ****

-bash-3.1$ /home/greg/gdb-6.8/gdb/gdb -i mi x
~"GNU gdb 6.8\n"
~"Copyright (C) 2008 Free Software Foundation, Inc.\n"
~"License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html >\n"
~"This is free software: you are free to change and redistribute it.\n"
~"There is NO WARRANTY, to the extent permitted by law. Type \"show copying\"\n"
~"and \"show warranty\" for details.\n"
~"This GDB was configured as \"powerpc64-unknown-linux-gnu\"...\n"
(gdb)
attach 22461
&"attach 22461\n"
~"Attaching to program: /home/greg/x, process 22461\n"
~"Reading symbols from /usr/lib/libmpi_ibm.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/libmpi_ibm.so\n"
~"Reading symbols from /usr/lib/libpoe.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/libpoe.so\n"
~"Reading symbols from /usr/lib/liblapi.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/liblapi.so\n"
~"Reading symbols from /lib/libpthread.so.0..."
~"done.\n"
~"[Thread debugging using libthread_db enabled]\n"
~"[New Thread 0xfff34f0 (LWP 22461)]\n"
~"[New Thread 0x48d9f4b0 (LWP 22473)]\n"
~"[New Thread 0x4899f4b0 (LWP 22472)]\n"
~"[New Thread 0x47d3f4b0 (LWP 22465)]\n"
~"[New Thread 0x4793f4b0 (LWP 22464)]\n"
~"[New Thread 0x4090f4b0 (LWP 22463)]\n"
~"Loaded symbols for /lib/libpthread.so.0\n"
~"Reading symbols from /lib/libm.so.6..."
~"done.\n"
~"Loaded symbols for /lib/libm.so.6\n"
~"Reading symbols from /lib/libc.so.6..."
~"done.\n"
~"Loaded symbols for /lib/libc.so.6\n"
~"Reading symbols from /usr/lib/libstdc++.so.6..."
~"done.\n"
~"Loaded symbols for /usr/lib/libstdc++.so.6\n"
~"Reading symbols from /lib/libgcc_s.so.1..."
~"done.\n"
~"Loaded symbols for /lib/libgcc_s.so.1\n"
~"Reading symbols from /lib/libdl.so.2..."
~"done.\n"
~"Loaded symbols for /lib/libdl.so.2\n"
~"Reading symbols from /lib/ld.so.1..."
~"done.\n"
~"Loaded symbols for /lib/ld.so.1\n"
~"Reading symbols from /usr/lib/libpnsd.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/libpnsd.so\n"
~"Reading symbols from /usr/lib/liblapiudp.so..."
~"done.\n"
~"Loaded symbols for /usr/lib/liblapiudp.so\n"
~"0x401fc2fc in nanosleep () from /lib/libc.so.6\n"
^done
(gdb)
info threads
&"info threads\n"
~"  6 Thread 0x4090f4b0 (LWP 22463)  0x4004082c in do_sigwait ()\n"
~"   from /lib/libpthread.so.0\n"
~" 5 Thread 0x4793f4b0 (LWP 22464) 0x4003b014 in pthread_cond_wait@@GLIBC_2.3.2\n"
~"    () from /lib/libpthread.so.0\n"
~" 4 Thread 0x47d3f4b0 (LWP 22465) 0x4003b014 in pthread_cond_wait@@GLIBC_2.3.2\n"
~"    () from /lib/libpthread.so.0\n"
~" 3 Thread 0x4899f4b0 (LWP 22472) 0x4003b014 in pthread_cond_wait@@GLIBC_2.3.2\n"
~"    () from /lib/libpthread.so.0\n"
~" 2 Thread 0x48d9f4b0 (LWP 22473) 0x4003b5c4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0\n"
~"  1 Thread 0xfff34f0 (LWP 22461)  0x401fc2fc in nanosleep ()\n"
~"   from /lib/libc.so.6\n"
^done
(gdb)
-stack-info-depth
^done,depth="3"
(gdb)
-thread-select 2
^done,new-thread- id = "2 ",frame = {level = "0 ",addr = "0x4003b5c4",func="pthread_cond_timedwait@@GLIBC_2.3.2",args=[],from="/ lib/libpthread.so.0"}
(gdb)
-stack-info-depth
^done,depth="5"
(gdb)

On Oct 21, 2008, at 5:48 PM, Greg Watson wrote:

Dave,

I think the ability to see global variables is a function of the gdb version, so it might not be supported on this platform.

How do you have your project set up? Do you have a local copy of the project in your workspace, or are you using RDT?

Greg

On Oct 21, 2008, at 4:07 PM, Dave Wootton wrote:

I forgot to mention, when the processes suspend at main, I can see local variables in main() and their values in the variables view and gdb status for signals in the signals view. I don't see global variables. The 'add
global variables' icon is grayed out.

Eclipse tries to open a source window, but fails with a message 'Could not
open the editor: Editor could not be initialized.' and the following
exception.
I don't know if this exception is due to running Eclipse remotely from the application and remote files support isn't hooked up or because of some
other problem.

java.lang.NullPointerException
      at
org .eclipse .cdt.internal.ui.editor.CEditor.updateScalabilityMode(CEditor.java: 1347)
      at
org.eclipse.cdt.internal.ui.editor.CEditor.doSetInput(CEditor.java: 1294)
      at
org.eclipse.ui.texteditor.AbstractTextEditor $19.run(AbstractTextEditor.java:3025)
      at
org .eclipse .jface.operation.ModalContext.runInCurrentThread(ModalContext.java: 446)
      at
org.eclipse.jface.operation.ModalContext.run(ModalContext.java:354)
      at
org.eclipse.jface.window.ApplicationWindow $1.run(ApplicationWindow.java:758)
      at
org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70)
      at
org .eclipse.jface.window.ApplicationWindow.run(ApplicationWindow.java: 755)
      at
org.eclipse.ui.internal.WorkbenchWindow.run(WorkbenchWindow.java: 2483)
      at
org .eclipse .ui .texteditor.AbstractTextEditor.internalInit(AbstractTextEditor.java: 3043)
      at
org .eclipse .ui.texteditor.AbstractTextEditor.init(AbstractTextEditor.java:3070)
      at
org.eclipse.ui.internal.EditorManager.createSite(EditorManager.java: 799)
      at
org .eclipse .ui.internal.EditorReference.createPartHelper(EditorReference.java: 643)
      at
org .eclipse .ui.internal.EditorReference.createPart(EditorReference.java:428)
      at
org .eclipse .ui .internal .WorkbenchPartReference.getPart(WorkbenchPartReference.java:594) at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java: 306)
      at
org .eclipse .ui .internal .presentations.PresentablePart.setVisible(PresentablePart.java:180)
      at
org .eclipse .ui .internal .presentations .util.PresentablePartFolder.select(PresentablePartFolder.java:270)
      at
org .eclipse .ui .internal .presentations .util.LeftToRightTabOrder.select(LeftToRightTabOrder.java:65)
      at
org .eclipse .ui .internal .presentations .util .TabbedStackPresentation.selectPart(TabbedStackPresentation.java:473)
      at
org .eclipse .ui.internal.PartStack.refreshPresentationSelection(PartStack.java: 1256)
      at
org.eclipse.ui.internal.PartStack.setSelection(PartStack.java:1209)
at org.eclipse.ui.internal.PartStack.showPart(PartStack.java: 1608)
      at org.eclipse.ui.internal.PartStack.add(PartStack.java:499)
at org.eclipse.ui.internal.EditorStack.add(EditorStack.java: 103)
      at org.eclipse.ui.internal.PartStack.add(PartStack.java:485)
at org.eclipse.ui.internal.EditorStack.add(EditorStack.java: 112)
      at
org .eclipse .ui.internal.EditorSashContainer.addEditor(EditorSashContainer.java: 63)
      at
org .eclipse .ui.internal.EditorAreaHelper.addToLayout(EditorAreaHelper.java:217)
      at
org .eclipse .ui.internal.EditorAreaHelper.addEditor(EditorAreaHelper.java:207)
      at
org .eclipse .ui.internal.EditorManager.createEditorTab(EditorManager.java:779)
      at
org .eclipse .ui .internal.EditorManager.openEditorFromDescriptor(EditorManager.java: 678)
      at
org.eclipse.ui.internal.EditorManager.openEditor(EditorManager.java: 639)
      at
org .eclipse .ui.internal.WorkbenchPage.busyOpenEditorBatched(WorkbenchPage.java: 2817)
      at
org .eclipse .ui.internal.WorkbenchPage.busyOpenEditor(WorkbenchPage.java:2729)
      at
org.eclipse.ui.internal.WorkbenchPage.access$11(WorkbenchPage.java: 2721)
      at
org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java:2673)
      at
org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70)
      at
org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java: 2668)
      at
org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java: 2652)
      at
org.eclipse.debug.internal.ui.sourcelookup.SourceLookupFacility $1.run(SourceLookupFacility.java:355)
      at
org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70)
      at
org .eclipse .debug .internal .ui .sourcelookup .SourceLookupFacility.openEditor(SourceLookupFacility.java:365)
      at
org .eclipse .debug .internal .ui .sourcelookup .SourceLookupFacility.openEditor(SourceLookupFacility.java:274)
      at
org .eclipse .debug .internal .ui .sourcelookup .SourceLookupFacility.display(SourceLookupFacility.java:218)
      at
org.eclipse.debug.ui.DebugUITools.displaySource(DebugUITools.java: 776)
      at
org .eclipse .debug.internal.ui.elements.adapters.StackFrameSourceDisplayAdapter $SourceDisplayJob.runInUIThread(StackFrameSourceDisplayAdapter.java: 167)
      at org.eclipse.ui.progress.UIJob$1.run(UIJob.java:94)
at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java: 35)
      at
org .eclipse .swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:133)
      at
org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3800)
      at
org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3425)
      at
org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2382)
      at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2346)
at org.eclipse.ui.internal.Workbench.access$4(Workbench.java: 2198)
      at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:493)
      at
org .eclipse .core.databinding.observable.Realm.runWithDefault(Realm.java:288)
      at
org .eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java: 488)
      at
org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
      at
org .eclipse .ui .internal.ide.application.IDEApplication.start(IDEApplication.java: 113)
      at
org .eclipse .equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:193)
      at
org .eclipse .core .runtime .internal .adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java: 110)
      at
org .eclipse .core .runtime .internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java: 79)
      at
org .eclipse .core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:382)
      at
org .eclipse .core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at
org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:549)
      at org.eclipse.equinox.launcher.Main.basicRun(Main.java:504)
      at org.eclipse.equinox.launcher.Main.run(Main.java:1236)
      at org.eclipse.equinox.launcher.Main.main(Main.java:1212)
Dave



Dave Wootton/Poughkeepsie/IBM@IBMUS
Sent by: ptp-dev-bounces@xxxxxxxxxxx
10/21/2008 03:47 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>


To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>,
ptp-dev-bounces@xxxxxxxxxxx
Subject
Re: [ptp-dev] Problem with invoking SDM debugger on pSeries Linux






I ran the debugger test again this afternoon and have a debug log from the

initial attempt, which was partially successful. I started my proxy, and invoked a 2-task application (on 1 node) It got to the point where the PTP

debug perspective opened and the debug view showed a partially expanded tree with a node for process 0 and threads 1 and 2 as childrenm of process

0. Thread 1 is expanded and shows as suspended at main(). Thread 2 is
collapsed and shows as suspended but no location. If I expand the thread 2

node then I get an ArrayIndexOutOfBounds exception as noted in the
attached console log. Before I expand thread 2, I issued 'ps' on my proxy node and see that I have 1 proxy, 3 SDM processes, 2 active gdb processes,

two defunct gdb processes and the two application processes, all of which,

with the exception of the defunct gdb processes looks right.

This seems to be consistently repeatable, where I get the same results
each time I start from a fresh instance of Eclipse and my proxy.

I think I have my proxy starting the SDMs properly at this time. I need to

spend some time tomorrow looking at exactly when I start the master SDM since I think I want to start it only after my attach.cfg file is created instead of the arbitrary delay that I have now. Once I sort that out I
expect I will have another patch for for you to commit to the PTP 2.1
branch, in addition to the PE and LoadLeveler patches I've already sent in

the last week.
Dave





Dave Wootton/Poughkeepsie/IBM@IBMUS
Sent by: ptp-dev-bounces@xxxxxxxxxxx
10/17/2008 07:19 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>


To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc

Subject
Re: [ptp-dev] Problem with invoking SDM debugger on pSeries Linux






Greg
It occurred to me that my problem might be due to my test program not
being compiled with -g. I recompiled and it seemed like I got a bit
farther along. The GUI got so far as to try to display a stack trace for task 0 (of a 2 tasks on the same node application) suspended at main() and


showing the line # for main(), trying to open an editor window, and then crashing with a subscript out of range exception. Unfortunately I did not
capture the stack trace since I thought I could recreate it and get a
better trace, then couldn't get the debugger to run any more. The editor window failed to open the source file (maybe since I am running remote on
a Windows XP system)

My network connection seems exceptionally sluggish for some reason, which seems to have caused a second problem, where I was getting a segmentation violation at line 324 of src/impl/sdm_routing_table_file.c. This was a call to fclose(*routing_file). I'm not sure what should be happening here.


I commented out the fclose() and that got me past the sigsegv, but with an


intermittent message about too many open files. If I got bast the 'too many open files' message then I got to the point where the debugger tried
to show a stack trace.

I'll look at this more next week.
Dave



Dave Wootton/Poughkeepsie/IBM@IBMUS
Sent by: ptp-dev-bounces@xxxxxxxxxxx
10/17/2008 01:25 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>


To
<ptp-dev@xxxxxxxxxxx>
cc

Subject
[ptp-dev] Problem with invoking SDM debugger on pSeries Linux






Greg
I tried invoking the SDM debugger non my RedHat 5 system (the k17sf2p03
system you have access to, and which is up now), and had two problems

The first is that the code which waits for the routing file has a timeout of 10 seconds, which is apparently too quick, since I get a message that
SDM timed out waiting for the routing file. I changed both calls that
waited for the routing file to wait for 1000 seconds which fixed that
problem.

The second problem is that I get some sort of error message that I think is goming from gdb. I'm attaching the logs for both the child SDMs and the



master SDMs.

The good news is that I'm making it much farther in SDM than I was a month



ago when I last looked at this.


Dave

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

[attachment "sdm_child.txt" deleted by Dave Wootton/Poughkeepsie/IBM]
[attachment "sdm_master.txt" deleted by Dave Wootton/Poughkeepsie/ IBM]
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

[attachment "debug_1021_log.txt" deleted by Dave Wootton/ Poughkeepsie/IBM]
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev




Back to the top