Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [sumo-user] Unspecified Fatal Error and blank trace logging outputs

Hi Marcelo, hi Jakob,

thanks for the backtraces (looks good)

The problem in this scenario is that MSVehicle::getBoundingBox (this=0x0) is called with a null-Object from this loop:

        for (AnyVehicleIterator veh = anyVehiclesBegin(); veh != anyVehiclesEnd(); ++veh) {
            MSVehicle* collider = const_cast<MSVehicle*>(*veh);
            //std::cout << "   collider " << collider->getID() << "\n";
            PositionVector colliderBoundary = collider->getBoundingBox();

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358

Regards, Harald

Am 03.03.21 um 13:35 schrieb Jakob Erdmann:
Parallelization is a typical source of bugs that are only triggered seldomly and seemingly at random. Please try running without setting any parallelization options and check if the issue persists.
What sumo options were you using?

Am Mi., 3. März 2021 um 12:19 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <md@xxxxxxxxx>:

Here it is


Screenshot from 2021-03-03 08-16-46.png
(gdb) thread apply all bt

Thread 5 (Thread 0x7fb486f1e700 (LWP 12551)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311b860) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311b778, cond=0x56197311b838) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311b838, mutex=0x56197311b778) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311b760) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb486f1e700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fb47ef1c700 (LWP 12553)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311be00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311bd18, cond=0x56197311bdd8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bdd8, mutex=0x56197311bd18) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311bd00) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb47ef1c700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fb48af1f700 (LWP 12550)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197301a230) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197301a148, cond=0x56197301a208) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197301a208, mutex=0x56197301a148) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197301a130) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb48af1f700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fb482f1d700 (LWP 12552)):
#0  0x00007fb496160ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x56197311bb30) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x56197311ba48, cond=0x56197311bb08) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x56197311bb08, mutex=0x56197311ba48) at pthread_cond_wait.c:655
#3  0x00005619704a5196 in FXWorkerThread::run (this=0x56197311ba30) at /app/sumo-git/src/utils/foxtools/FXWorkerThread.h:338
#4  0x00007fb4965bdd4f in FX::FXThread::execute(void*) () from /usr/lib/x86_64-linux-gnu/libFOX-1.6.so.0
#5  0x00007fb49615a6db in start_thread (arg=0x7fb482f1d700) at pthread_create.c:463
#6  0x00007fb49554471f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fb4974cd780 (LWP 12544)):
#0  0x0000561970425dcc in MSVehicle::getBoundingBox (this=0x0) at /app/sumo-git/src/microsim/MSVehicle.cpp:5925
#1  0x00005619704c23f5 in MSLane::detectCollisions (this=0x561972d88020, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSLane.cpp:1358
#2  0x00005619704a44af in MSEdgeControl::detectCollisions (this=0x561973113e10, timestep=947000, stage="move") at /app/sumo-git/src/microsim/MSEdgeControl.cpp:339
#3  0x000056197037d531 in MSNet::simulationStep (this=0x561972b2fd20) at /app/sumo-git/src/microsim/MSNet.cpp:636
#4  0x000056197037a18b in MSNet::simulate (this=0x561972b2fd20, start=0, stop=-1000) at /app/sumo-git/src/microsim/MSNet.cpp:378
#5  0x0000561970376ab8 in main (argc=31, argv=0x7ffcfc014bb8) at /app/sumo-git/src/sumo_main.cpp:98
(gdb)



On Tue, Mar 2, 2021 at 3:33 PM Harald Schaefer <fechsaer@xxxxxxxxx> wrote:

Hi Marcelo,

sumo runs in your example in 5 threads (or light weight processes LWP), google for gdb and threads

What is the output of info threads?

You can toggle between the threads by typing

thread n

You should go to the "right" thread and execute bt there

I think you must ensure that the core file is generated by the same binary which is used for gdb

Harald

Am 02.03.21 um 19:18 schrieb Marcelo Andrade Rodrigues D Almeida:
Now I received

"warning: exec file is newer than core file."

I included gdb in the requirements and the sumo installation was after this point...

I guess I need to retry from scratch with a new core file and a prepared environment


On Tue, Mar 2, 2021 at 3:10 PM Marcelo Andrade Rodrigues D Almeida <md@xxxxxxxxx> wrote:
I think I forgot to undo the debug flag in the last build...  be right back

On Tue, Mar 2, 2021 at 3:04 PM Marcelo Andrade Rodrigues D Almeida <md@xxxxxxxxx> wrote:
The problem was that neither the remote server nor my docker image had gdb installed, so I pulled the core file to my computer.

I created a new image now with the exact environment and gdb installed to test it. Inspecting sumo-git I couldn't find any sumo bin, only sumoD

Screenshot from 2021-03-02
                                15-00-52.png

I tried to run with sumoD anyway, but it didn't make any difference (besides the warning and successfully reading the symbols)
Screenshot from 2021-03-02
                                14-59-41.png




On Tue, Mar 2, 2021 at 2:25 PM Harald Schaefer <fechsaer@xxxxxxxxx> wrote:

Hi Marcelo,

the name of the binary reported by gdb and the name you gave as argument to gdb does not match:

You called gdb with ../sumo/bin/sumo

but the core was created by sumo-git/bin/sumo

Greetings, Harald

Am 02.03.21 um 18:18 schrieb Jakob Erdmann:
Unfortunately, this dump is not very helpful. I'm not sure why that is because live gdb sessions of the release-build usually include at least method names. You could try to build the debug version and trigger the crash with that.
Another suggestion would be to try and trigger the crash without the use of multiprocessing (and also to check whether this fixes traceFile generation).

Am Di., 2. März 2021 um 18:07 Uhr schrieb Marcelo Andrade Rodrigues D Almeida <md@xxxxxxxxx>:
This is what I found


Screenshot from
                                            2021-03-02 14-05-24.png

(base) marcelo@Lenovo-Legion-5-15IMH05H:~/code/temp$ gdb ../sumo/bin/sumo core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../sumo/bin/sumo...
(No debugging symbols found in ../sumo/bin/sumo)
[New LWP 2143]
[New LWP 2144]
[New LWP 2145]
[New LWP 2147]
[New LWP 2146]
Core was generated by `sumo-git/bin/sumo -n /app/scenario/experimental/Bologna_small-0.29.0/joined/joi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f07528f7a6 in ?? ()
[Current thread is 1 (LWP 2143)]
(gdb) bt
#0  0x000055f07528f7a6 in ?? ()
#1  0x3fde3c2e82e54800 in ?? ()
#2  0x4023ebf47ba9bb80 in ?? ()
#3  0x000055f075b70740 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)


On Tue, Mar 2, 2021 at 8:59 AM Marcelo Andrade Rodrigues D Almeida <md@xxxxxxxxx> wrote:
"Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it."

Unfortunately no. self.path_to_log points to each execution's own path. I can check the files and they are all empty

"what you can try is to enable core dumps in your shell"

Thank you, I'm going to try this


Sincerely,

Marcelo d'Almeida


On Tue, Mar 2, 2021 at 6:30 AM Jakob Erdmann <namdre.sumo@xxxxxxxxx> wrote:
Could it be that multiple processes are writing to the same traceFile?
I recommend investigation on this front because reproducing the crash in isolation will probably be necessary to fix it.

Am Mo., 1. März 2021 um 22:53 Uhr schrieb Harald Schaefer <fechsaer@xxxxxxxxx>:

Hi Marcelo,

what you can try is to enable core dumps in your shell

    ulimit -c unlimited

Then run your test series.

The corefile might be very large, depending on your scenario size.

At the end you should have a file named core in your current working directory.

You can examine this file by

    gdb <path to sumo-bin> core

and type e.g. bt

The stacktrace might help the developers of SUMO

Greetings, Harald

Am 01.03.21 um 17:22 schrieb Marcelo Andrade Rodrigues D Almeida:
Hi everyone

I running traffic light control experiments in the Bologna (joined) scenario and from time to time I encounter an unspecified Fatal error. (shown below)

I'm trying to debug it, but:
- Logging the commands generate blank outputs (even with traceGetters enabled)

        trace_file_path = ROOT_DIR + '/' + self.path_to_log + '/' + 'trace_file_log.txt'
        traci.start(sumo_cmd_str, label=self.execution_name, traceFile=trace_file_path, traceGetters=True)


A trivial trace (logging) example works fine though

- Debugging the traci sessions is not viable since I cannot tell when the error is going to occur (I have to run the scenario 1600 times total per experiment)



I also updated the sumo to the latest nightly build but no success.

Is there anything I can try? I'm out of options here

Thank you in advance


Sincerely,

Marcelo d'Almeida


Error:
Process Process-1:22:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "traffic-light-optimization/algorithm/frap_pub/pipeline.py", line 104, in generator_wrapper
    generator.generate()
  File "traffic-light-optimization/algorithm/frap_pub/generator.py", line 121, in generate
    next_state, reward, done, steps_iterated, next_action = self.env.step(action_list)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 514, in step
    self._inner_step(action)
  File "traffic-light-optimization/algorithm/frap_pub/sumo_env.py", line 559, in _inner_step
    traci_connection.simulationStep()
  File "sumo-git/tools/traci/connection.py", line 302, in simulationStep
    result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
  File "sumo-git/tools/traci/connection.py", line 180, in _sendCmd
    return self._sendExact()
  File "sumo-git/tools/traci/connection.py", line 90, in _sendExact
    raise FatalTraCIError("connection closed by SUMO")
traci.exceptions.FatalTraCIError: connection closed by SUMO

_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user
_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

Back to the top