Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [milo-dev] Netty LEAK message

PW of 20171221124012.zip is "kchQfwAVJaVI1KsR".

<ishida_shigeru@xxxxxxxxxxx> wrote, Thu, 21 Dec 2017 12:40:06 +0900 (JST)

> For reference, I will send the image of GCViewer. The above graph is
> what caused OOME. The graph below is currently in running.
> 
> Please ignore the time in the upper corner of each graph as it is not
> the correct time.
> 
> The server code is also substantially the same, and the way the client
> loads it is the same.
> 
> For the time being, I will look at the situation.
> 
> <ishida_shigeru@xxxxxxxxxxx> wrote, Thu, 21 Dec 2017 10:57:17 +0900 (JST)
> 
> > I removed the option of concurrent GC, added -XX:+HeapDumpOnOutOfMemoryError,
> > one day passed, but it is running smoothly.
> > 
> > Last time, OOME was output in about 9 hours, but this time GC log was executed
> > about 7 hours in the first full GC and about 14 hours in the second Full GC.
> > 
> > As Used Tenured Heap has fallen to the same extent at each Full GC, it is doing
> > fine now.
> > 
> > When OOME came out in about 9 hours last time, Used Tenured Heap rose suddenly
> > about 1.5 hours after the first full GC, OOME came out like a LEAK.
> > 
> > The difference between JVM options since the last OOME occurrence is that you added
> > -XX:+HeapDumpOnOutOfMemoryError. Together, the difference in the code is that I
> > merged e472766. However, since clients are connected with SecurityPolicy=None,
> > I think that merging e472766 does not matter.
> > 
> > Compared with the previous OOME, it is strange that it is running smoothly,
> > but I will see the situation like this.
> > 
> > If I can get Heap Dump, I will analyze it with MAT.
> > 
> > <kevinherron@xxxxxxxxx> wrote, Tue, 19 Dec 2017 05:51:35 -0800
> > 
> > > If there are no more messages from Netty but you still get the "GC overhead
> > > limit exceeded" warning then it's possible we should be searching for a
> > > traditional heap memory leak rather than a native memory leak caused by
> > > leaking Netty buffers.
> > > 
> > > In that case, using a profiler and turning on the HeapDumpOnOOM JVM flag
> > > may help find it.
> > > 
> > > On Mon, Dec 18, 2017 at 9:57 PM, Shigeru Ishida <ishida_shigeru@xxxxxxxxxxx>
> > > wrote:
> > > 
> > > > Additional information.
> > > >
> > > > The operating environment is as follows.
> > > >
> > > > OS: CentOS Linux release 7.3.1611 (Core)
> > > > CPUs: 4
> > > > Mem: 4 GB
> > > > Java: OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)
> > > >
> > > > This server from 6 clients
> > > >
> > > > - Publishing interval 200 (msec)
> > > > - Sampling interval 100 (msec)
> > > > - NodeId items 350
> > > > - Number of concurrent clients 6
> > > >
> > > > Requests are accepted with the above settings.
> > > >
> > > > As changed points from last night, concurrent GC is enabled and it is
> > > > running.
> > > >
> > > > -XX:+UseConcMarkSweepGC
> > > > -XX:+CMSParallelRemarkEnabled
> > > > -XX:+UseParNewGC
> > > >
> > > > As a result, the increasing tendency of Used Tenured Heap has drastically
> > > > decreased.
> > > > As a feature of concurrent GC, usage of Old area may be easy to raise, but
> > > > in this case
> > > > it decreased conversely.
> > > >
> > > > In addition, the Used Young Heap for Young Generation has remained at
> > > > around 10%,
> > > > and has remained constant at the moment.
> > > >
> > > > Even if OutOfMemoryError occurs, it will be about a week ahead if it keeps
> > > > going
> > > > like this.
> > > >
> > > > <ishida_shigeru@xxxxxxxxxxx> wrote, Tue, 19 Dec 2017 13:39:39 +0900 (JST)
> > > >
> > > > > OK.
> > > > >
> > > > > From yesterday I updated Milo to 0.1.6 and Netty to 4.0.54 and started.
> > > > > After about 9 hours,
> > > > >
> > > > >   java.lang.OutOfMemoryError: GC overhead limit exceeded
> > > > >
> > > > > The log above has come up, and it look like LEAK at the GC log in GCView.
> > > > > Anyways I changed the parameters of memory & GC and restarted it.
> > > > >
> > > > > I will continue to investigate.
> > > > >
> > > > > <kevinherron@xxxxxxxxx> wrote, Mon, 18 Dec 2017 06:31:39 -0800
> > > > >
> > > > > > Yes, you're right, that buffer would leak if the exception was thrown.
> > > > > >
> > > > > > I've already released 0.1.6 so I'll fix it in 0.2.0. Fortunately I
> > > > don't
> > > > > > think that exception is likely to ever be thrown, so it would only leak
> > > > > > under the rarest of circumstances.
> > > > > >
> > > > > > On Mon, Dec 18, 2017 at 12:14 AM, Shigeru Ishida <
> > > > ishida_shigeru@xxxxxxxxxxx
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi Kevin,
> > > > > > >
> > > > > > > I saw the code of Milo version 0.1.6. There is one point to worry
> > > > about.
> > > > > > >
> > > > > > > ChunkDecoder#decryptChunk()
> > > > > > > L177: ByteBuf plainTextBuffer = BufferUtil.buffer(
> > > > plainTextBufferSize);
> > > > > > >
> > > > > > > The plainTextBuffer obtained above does not mean that release() will
> > > > not
> > > > > > > be called when an exception is thrown in L200?
> > > > > > >
> > > > > > > I am sorry if I misunderstood.
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > --Shigeru
> > > > > > >
> > > > > > > <kevinherron@xxxxxxxxx> wrote, Sat, 16 Dec 2017 10:51:16 -0800
> > > > > > >
> > > > > > > > Leak fixes have been merged into master (already fixed in
> > > > dev/0.2.x).
> > > > > > > >
> > > > > > > > I'm releasing 0.1.6 this weekend. It will be the last of the 0.1
> > > > series.
> > > > > > > >
> > > > > > > > 0.2.0 will be released by the end of next week, if not this
> > > > weekend.
> > > > > > > >
> > > > > > > > On Thu, Dec 14, 2017 at 7:19 AM, Kevin Herron <
> > > > kevinherron@xxxxxxxxx>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > >  > As one goal, I expect to run it continuously and stably for
> > > > about
> > > > > > > two
> > > > > > > > > months.
> > > > > > > > >
> > > > > > > > > Yes, of course. I would expect it to run stable indefinitely.
> > > > Thanks
> > > > > > > for
> > > > > > > > > testing.
> > > > > > > > >
> > > > > > > > > The focus of the 0.3 release is going to be on the server SDK.
> > > > > > > > >
> > > > > > > > > On Wed, Dec 13, 2017 at 8:43 PM, Shigeru Ishida <
> > > > > > > > > ishida_shigeru@xxxxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > >> One day passed, it does not look like LEAK from the GC log.
> > > > > > > > >> I changed it to output the GC log in detail and restarted it.
> > > > > > > > >>
> > > > > > > > >> As one goal, I expect to run it continuously and stably for
> > > > about two
> > > > > > > > >> months.
> > > > > > > > >>
> > > > > > > > >> <kevinherron@xxxxxxxxx> wrote, Wed, 13 Dec 2017 05:21:01 -0800
> > > > > > > > >>
> > > > > > > > >> > Okay, interesting.
> > > > > > > > >> >
> > > > > > > > >> > The other place leaks were fixed was the re-implementation of
> > > > > > > > >> ChunkEncoder
> > > > > > > > >> > and ChunkDecoder in dev/0.2.x.
> > > > > > > > >> >
> > > > > > > > >> > You can see when encoding/decoding fails they clean up:
> > > > > > > > >> >
> > > > > > > > >> > https://github.com/eclipse/milo/blob/37bee220ae53026db87328c
> > > > > > > > >> fbb0b664671bff071/opc-ua-stack/stack-core/src/main/
> > > > > > > > >> java/org/eclipse/milo/opcua/stack/core/channel/
> > > > > > > ChunkEncoder.java#L86-L93
> > > > > > > > >> > https://github.com/eclipse/milo/blob/37bee220ae53026db87328c
> > > > > > > > >> fbb0b664671bff071/opc-ua-stack/stack-core/src/main/
> > > > > > > > >> java/org/eclipse/milo/opcua/stack/core/channel/
> > > > > > > ChunkDecoder.java#L82-L92
> > > > > > > > >> >
> > > > > > > > >> > This won't be a straight-forward cherry pick like the other
> > > > changes.
> > > > > > > > >> >
> > > > > > > > >> > Keep me updated.
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On Tue, Dec 12, 2017 at 10:03 PM, Shigeru Ishida <
> > > > > > > > >> ishida_shigeru@xxxxxxxxxxx
> > > > > > > > >> > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Hi Kevin,
> > > > > > > > >> > >
> > > > > > > > >> > > Merge the following commits in dev/0.2.x,
> > > > > > > > >> > >
> > > > > > > > >> > > - 2017/12/09 c1b6978
> > > > > > > > >> > > - 2017/12/13 e4e5915
> > > > > > > > >> > >
> > > > > > > > >> > > Furthermore, I replaced ChunkDecoder.java from
> > > > > > > > >> > > https://github.com/eclipse/milo/tree/release-buffers and
> > > > > > > executed it.
> > > > > > > > >> > >
> > > > > > > > >> > > The JVM options are the same. I also set the GC log.
> > > > > > > > >> > >
> > > > > > > > >> > > -Dio.netty.leakDetection.maxRecords=1000
> > > > > > > > >> > > -Dio.netty.leakDetection.level=paranoid
> > > > > > > > >> > >
> > > > > > > > >> > > Also, the setting of the concurrent clients is the same.
> > > > > > > > >> > >
> > > > > > > > >> > > - Publishing interval 200 (msec)
> > > > > > > > >> > > - Sampling interval 100 (msec)
> > > > > > > > >> > > - NodeId items 350
> > > > > > > > >> > > - Number of concurrent clients 6
> > > > > > > > >> > >
> > > > > > > > >> > > When 1.5 hours passed, the increasing trend of GC's heap
> > > > area
> > > > > > > remained
> > > > > > > > >> > > as usual, and it seems that LEAK message seems to be coming
> > > > out in
> > > > > > > > >> about
> > > > > > > > >> > > 4 hours from the past experience if this trend continues.
> > > > > > > > >> > >
> > > > > > > > >> > > Therefore, when changing from Netty 4.1.4 to 4.0.53 (the
> > > > latest
> > > > > > > > >> version of
> > > > > > > > >> > > 4.0.x),
> > > > > > > > >> > > the occurrence frequency of Scavenge GC / Full GC decreased
> > > > > > > > >> drastically,
> > > > > > > > >> > > and the heap area increased by about 8 minutes It was
> > > > stopped.
> > > > > > > > >> > >
> > > > > > > > >> > > I have not seen it for about 1.5 hours yet, but it feels
> > > > good.
> > > > > > > > >> > >
> > > > > > > > >> > > It is early to state the conclusion, so let's go on
> > > > continuously
> > > > > > > > >> > > as it is and see the situation.
> > > > > > > > >> > >
> > > > > > > > >> > > Furthermore, since the mail became too big, I cut down
> > > > quoted
> > > > > > > mails
> > > > > > > > >> > > other than the nearest.
> > > > > > > > >> > >
> > > > > > > > >> > > Regards,
> > > > > > > > >> > >
> > > > > > > > >> > > --Shigeru
> > > > > > > > >> > >
> > > > > > > > >> > > <kevinherron@xxxxxxxxx> wrote, Tue, 12 Dec 2017 17:16:37
> > > > -0800
> > > > > > > > >> > >
> > > > > > > > >> > > > I think I've squashed all of the leaks that occur under
> > > > > > > exceptional
> > > > > > > > >> > > > circumstances in the dev/0.2.x branch:
> > > > > > > > >> > > > https://github.com/eclipse/milo/tree/dev/0.2.x
> > > > > > > > >> > > >
> > > > > > > > >> > > > It's a little difficult to cherry pick the fixes because
> > > > the
> > > > > > > 0.2.x
> > > > > > > > >> branch
> > > > > > > > >> > > > has significant changes.
> > > > > > > > >> > > _______________________________________________
> > > > > > > > >> > > milo-dev mailing list
> > > > > > > > >> > > milo-dev@xxxxxxxxxxx
> > > > > > > > >> > > To change your delivery options, retrieve your password, or
> > > > > > > > >> unsubscribe
> > > > > > > > >> > > from this list, visit
> > > > > > > > >> > > https://dev.eclipse.org/mailman/listinfo/milo-dev
> > > > > > > > >> _______________________________________________
> > > > > > > > >> milo-dev mailing list
> > > > > > > > >> milo-dev@xxxxxxxxxxx
> > > > > > > > >> To change your delivery options, retrieve your password, or
> > > > > > > unsubscribe
> > > > > > > > >> from this list, visit
> > > > > > > > >> https://dev.eclipse.org/mailman/listinfo/milo-dev
> > > > > > > _______________________________________________
> > > > > > > milo-dev mailing list
> > > > > > > milo-dev@xxxxxxxxxxx
> > > > > > > To change your delivery options, retrieve your password, or
> > > > unsubscribe
> > > > > > > from this list, visit
> > > > > > > https://dev.eclipse.org/mailman/listinfo/milo-dev
> > > > _______________________________________________
> > > > milo-dev mailing list
> > > > milo-dev@xxxxxxxxxxx
> > > > To change your delivery options, retrieve your password, or unsubscribe
> > > > from this list, visit
> > > > https://dev.eclipse.org/mailman/listinfo/milo-dev


Back to the top