File descriptors leak [message #1839865] |
Tue, 30 March 2021 14:34 |
Alfredo Quesada Messages: 5 Registered: July 2009 |
Junior Member |
|
|
After having detected a problem with a C app I developed that uses the mosquitto library, I finally found where the problem is and how to force it.
Attached there's a simple test source file that I've used to show how to force the problem (I've skipped some checks, that was not the point).
First of all, my scenario includes a Raspberry Pi with Raspberry Pi OS (the new name for Raspbian) Buster (2021-01-11 / kernel 5.4.83 and libmosquitto1 1.5.7-1+deb10u1) connected using a switch to my PC which has 2 network interfaces. The machine hosts the mosquitto server and a DHCP server (DHCP for the internal network including the RBPI).
Under normal circumstances everything works fine and the file descriptors related to the socket used by the library are closed and reused once it's time to reconnect. If I just stop the mosquitto server, the TCP/SYN gets no response and the TCP layer works fine.
However, if I put down the internal interface from the PC right after closing mosquitto, in some cases the socket isn't fully closed and the file descriptor remains open. As a result a new call to mosquitto_connect_async (and indirectly to socket) returns a new file descriptor.
The steps to follow to reproduce the problem are these:
- Start mosquitto (server) with a configuration that allows anonymous users to connect.
- Start the app (RBPI).
- Wait until the connection is established. You should see a debug messsage main: WAITING_ACK_CONNECT / returnCode 0.
- In another terminal get the PID of the app and execute sudo lsof -p $PID. You should see something like this:
app 8039 pi 5u IPv4 55275 0t0 TCP 192.168.1.100:58976->192.168.1.88:1883 (ESTABLISHED)
- Stop mosquitto and disable the internal interface at the server with sudo ip link set dev ethX down.
- In the output of the app you should see it trying to reconnect and you should see mosquitto_connect_async getting a new file descriptor in some cases. If you don't, just repeat the process (enable the interface at the PC, start mosquitto again, wait until the app connects and wait for a while to stop mosquitto and disable the interface).
- Once this happens, you can check lsof again and you'll see now multiple entries for those file descriptors that haven't been closed.
Is there any way to fix this or prevent this from happening? As you can imagine, eventually the app runs out of file descriptors and you get an errno 24 (EMFILE -> Too many open files) when you call mosquitto_connect_async. After that there isn't much more you can do, at least not in a thread-safe way.
Regards
-
Attachment: main.c
(Size: 3.37KB, Downloaded 84 times)
[Updated on: Tue, 30 March 2021 17:14] Report message to a moderator
|
|
|
|
Re: File descriptors leak [message #1839963 is a reply to message #1839962] |
Fri, 02 April 2021 10:29 |
Alfredo Quesada Messages: 5 Registered: July 2009 |
Junior Member |
|
|
That's good to hear, thank you :)
Just out of curiosity, where was the problem? Although I didn't properly debug the library because I didn't want to have to recompile it, I analyzed the main parts of the source code and they looked correct. I was starting to think there was a problem with close considering it may not close the file descriptor in certain cases as stated in the man page.
Well, there's actually a well-known potential leak point there that can't always be fixed as it depends on the implementation and POSIX is not 100% strict on this function's behavior.
Regards
[Updated on: Fri, 02 April 2021 10:30] Report message to a moderator
|
|
|
Re: File descriptors leak [message #1839985 is a reply to message #1839962] |
Fri, 02 April 2021 21:35 |
Alfredo Quesada Messages: 5 Registered: July 2009 |
Junior Member |
|
|
Ok, I just took a look at the patch and I understand the problem. By using a temporary variable, the socket referred by it wouldn't be closed if net__try_connect didn't return MOSQ_ERR_SUCCESS.
However this should happen (and that was something to be fixed indeed) as long as the server was out of reach including those cases when it was just offline. IIRC I managed to reproduce the problem only with the put-the-link-down thing and just stopping the server wasn't enough. I'll try to recheck it next week when I'm back to work in order to confirm it.
Regards
[Updated on: Fri, 02 April 2021 23:24] Report message to a moderator
|
|
|
|
Powered by
FUDForum. Page generated in 0.03347 seconds