[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [tcf-dev] Problem with ZeroCopy and flushing in TCF agent
|
Hi Eugene,
The scenario that causes problems looks something like this:
shared memory buffer initially contains [ooooo]
write header to shared memory buffer [Aoooo]
write data to shared memory buffer [Aaooo]
write data to shared memory buffer [Aaaoo]
write data to shared memory buffer [Aaaao]
write data to shared memory buffer [Aaaaa]
write_stream(TCF message header)
json_splice_binary_offset(shared memory buffer)
write_stream(MARKER_EOM)
write header to shared memory buffer [Baaaa]
write data to shared memory buffer [Bbaaa]
write data to shared memory buffer [Bbbaa]
write data to shared memory buffer [Bbbba]
write data to shared memory buffer [Bbbbb]
write_stream(TCF message header)
json_splice_binary_offset(shared memory buffer)
write_stream(MARKER_EOM)
...and so on...
On the receiving side we expect [Aaaaa] [Bbbbb] but we receive [Baaaa][Bbbbb].
The corruption occurs rarely with low traffic, but constantly under heavy load.
In tcp_splice_block_stream(), after the second splice() to the socket returns without error, are we absolutely sure that the shared bytes 'have left the building', even under high load?
I just tested adding a sleep() after every write_stream(MARKER_OEM) and it goes from hundreds of corrupted messages to not a single one.
Thanks,
Patrick
On Thu, Jun 16, 2011 at 3:01 PM, Tarassov, Eugene
<eugene.tarassov@xxxxxxxxxxxxx> wrote:
I have run my tests for json_splice_binary_offset() and it worked fine.
Could you investigate and provide details on what exactly is not working in your case?
Thanks,
Eugene
Sent: Thursday, June 16, 2011 11:39 AM
To: TCF Development
Subject: Re: [tcf-dev] Problem with ZeroCopy and flushing in TCF agent
Hi Eugene,
I'm not sure why we need to flush before, but from my testing, it seems we would also need to flush after.
Otherwise the binary data can get corrupted in the time between the splice and the end-of-message flush that happens in another thread.
Patrick
On Thu, Jun 16, 2011 at 2:27 PM, Tarassov, Eugene
<eugene.tarassov@xxxxxxxxxxxxx> wrote:
Hi Patrick,
You should not need to call flush_stream().
json_splice_binary_offset() calls tcp_splice_block_stream(), which contains a code to flush the channel before doing splice into the socket:
/* We need to flush the buffer then send our data */
tcp_flush_with_flags(c, MSG_MORE);
#if ENABLE_OutputQueue
while (!output_queue_is_empty(&c->out_queue)) {
cancel_event(done_write_request, &c->wr_req, 1);
done_write_request(&c->wr_req);
}
#endif
The code looks correct to me.
Are you suggesting that this code might not work as expected?
Regards,
Eugene
________________________________
From: tcf-dev-bounces@xxxxxxxxxxx [mailto:tcf-dev-bounces@xxxxxxxxxxx] On Behalf Of Patrick Tasse
Sent: Thursday, June 16, 2011 11:07 AM
To: tcf-dev@xxxxxxxxxxx
Subject: [tcf-dev] Problem with ZeroCopy and flushing in TCF agent
Hi,
I am working on an agent plugin that requires the ZeroCopy service to be enabled in the TCF agent.
It writes an Event message to the channel's output stream using write_stream() for some message parameters and json_splice_binary_offset() for the binary data in the message. This method then performs a zero-copy splice.
The calling thread must be assured that the message is sent and that the zero-copied binary data is no longer needed before continuing. In previous code (which was based on an older version of TCF), it was calling the flush_stream() method, but this method
does not exist anymore (instead, if I understand correctly, Event messages are automatically flushed once we write MARKER_EOM to the output stream).
However, the flushing mechanism in channel_tcp.c is to post a tcp_flush_event to the channel's event thread. The write_stream(out, MARKER_EOM) method can return before the flushing is even started. Therefore, the zero-copied binary data can be overwritten before
the message is sent, and the TCF channel on the other side receives a corrupt message.
Is there a mechanism that an agent plugin could use to block until, or be notified that the flushing has completed?
If there is none, I could help to write and test a patch for it. But I would need advice on how it should be designed.
Thanks,
Patrick
(by the way the link to the tcf-dev mailing list is not working on this page:
http://www.eclipse.org/projects/project_summary.php?projectid=tools.cdt.tcf)
_______________________________________________
tcf-dev mailing list
tcf-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/tcf-dev
_______________________________________________
tcf-dev mailing list
tcf-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/tcf-dev