Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [milo-dev] Netty LEAK message

I removed the option of concurrent GC, added -XX:+HeapDumpOnOutOfMemoryError,
one day passed, but it is running smoothly.

Last time, OOME was output in about 9 hours, but this time GC log was executed
about 7 hours in the first full GC and about 14 hours in the second Full GC.

As Used Tenured Heap has fallen to the same extent at each Full GC, it is doing
fine now.

When OOME came out in about 9 hours last time, Used Tenured Heap rose suddenly
about 1.5 hours after the first full GC, OOME came out like a LEAK.

The difference between JVM options since the last OOME occurrence is that you added
-XX:+HeapDumpOnOutOfMemoryError. Together, the difference in the code is that I
merged e472766. However, since clients are connected with SecurityPolicy=None,
I think that merging e472766 does not matter.

Compared with the previous OOME, it is strange that it is running smoothly,
but I will see the situation like this.

If I can get Heap Dump, I will analyze it with MAT.

<kevinherron@xxxxxxxxx> wrote, Tue, 19 Dec 2017 05:51:35 -0800

> If there are no more messages from Netty but you still get the "GC overhead
> limit exceeded" warning then it's possible we should be searching for a
> traditional heap memory leak rather than a native memory leak caused by
> leaking Netty buffers.
> 
> In that case, using a profiler and turning on the HeapDumpOnOOM JVM flag
> may help find it.
> 
> On Mon, Dec 18, 2017 at 9:57 PM, Shigeru Ishida <ishida_shigeru@xxxxxxxxxxx>
> wrote:
> 
> > Additional information.
> >
> > The operating environment is as follows.
> >
> > OS: CentOS Linux release 7.3.1611 (Core)
> > CPUs: 4
> > Mem: 4 GB
> > Java: OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)
> >
> > This server from 6 clients
> >
> > - Publishing interval 200 (msec)
> > - Sampling interval 100 (msec)
> > - NodeId items 350
> > - Number of concurrent clients 6
> >
> > Requests are accepted with the above settings.
> >
> > As changed points from last night, concurrent GC is enabled and it is
> > running.
> >
> > -XX:+UseConcMarkSweepGC
> > -XX:+CMSParallelRemarkEnabled
> > -XX:+UseParNewGC
> >
> > As a result, the increasing tendency of Used Tenured Heap has drastically
> > decreased.
> > As a feature of concurrent GC, usage of Old area may be easy to raise, but
> > in this case
> > it decreased conversely.
> >
> > In addition, the Used Young Heap for Young Generation has remained at
> > around 10%,
> > and has remained constant at the moment.
> >
> > Even if OutOfMemoryError occurs, it will be about a week ahead if it keeps
> > going
> > like this.
> >
> > <ishida_shigeru@xxxxxxxxxxx> wrote, Tue, 19 Dec 2017 13:39:39 +0900 (JST)
> >
> > > OK.
> > >
> > > From yesterday I updated Milo to 0.1.6 and Netty to 4.0.54 and started.
> > > After about 9 hours,
> > >
> > >   java.lang.OutOfMemoryError: GC overhead limit exceeded
> > >
> > > The log above has come up, and it look like LEAK at the GC log in GCView.
> > > Anyways I changed the parameters of memory & GC and restarted it.
> > >
> > > I will continue to investigate.
> > >
> > > <kevinherron@xxxxxxxxx> wrote, Mon, 18 Dec 2017 06:31:39 -0800
> > >
> > > > Yes, you're right, that buffer would leak if the exception was thrown.
> > > >
> > > > I've already released 0.1.6 so I'll fix it in 0.2.0. Fortunately I
> > don't
> > > > think that exception is likely to ever be thrown, so it would only leak
> > > > under the rarest of circumstances.
> > > >
> > > > On Mon, Dec 18, 2017 at 12:14 AM, Shigeru Ishida <
> > ishida_shigeru@xxxxxxxxxxx
> > > > > wrote:
> > > >
> > > > > Hi Kevin,
> > > > >
> > > > > I saw the code of Milo version 0.1.6. There is one point to worry
> > about.
> > > > >
> > > > > ChunkDecoder#decryptChunk()
> > > > > L177: ByteBuf plainTextBuffer = BufferUtil.buffer(
> > plainTextBufferSize);
> > > > >
> > > > > The plainTextBuffer obtained above does not mean that release() will
> > not
> > > > > be called when an exception is thrown in L200?
> > > > >
> > > > > I am sorry if I misunderstood.
> > > > >
> > > > > Regards,
> > > > >
> > > > > --Shigeru
> > > > >
> > > > > <kevinherron@xxxxxxxxx> wrote, Sat, 16 Dec 2017 10:51:16 -0800
> > > > >
> > > > > > Leak fixes have been merged into master (already fixed in
> > dev/0.2.x).
> > > > > >
> > > > > > I'm releasing 0.1.6 this weekend. It will be the last of the 0.1
> > series.
> > > > > >
> > > > > > 0.2.0 will be released by the end of next week, if not this
> > weekend.
> > > > > >
> > > > > > On Thu, Dec 14, 2017 at 7:19 AM, Kevin Herron <
> > kevinherron@xxxxxxxxx>
> > > > > wrote:
> > > > > >
> > > > > > >  > As one goal, I expect to run it continuously and stably for
> > about
> > > > > two
> > > > > > > months.
> > > > > > >
> > > > > > > Yes, of course. I would expect it to run stable indefinitely.
> > Thanks
> > > > > for
> > > > > > > testing.
> > > > > > >
> > > > > > > The focus of the 0.3 release is going to be on the server SDK.
> > > > > > >
> > > > > > > On Wed, Dec 13, 2017 at 8:43 PM, Shigeru Ishida <
> > > > > > > ishida_shigeru@xxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > >> One day passed, it does not look like LEAK from the GC log.
> > > > > > >> I changed it to output the GC log in detail and restarted it.
> > > > > > >>
> > > > > > >> As one goal, I expect to run it continuously and stably for
> > about two
> > > > > > >> months.
> > > > > > >>
> > > > > > >> <kevinherron@xxxxxxxxx> wrote, Wed, 13 Dec 2017 05:21:01 -0800
> > > > > > >>
> > > > > > >> > Okay, interesting.
> > > > > > >> >
> > > > > > >> > The other place leaks were fixed was the re-implementation of
> > > > > > >> ChunkEncoder
> > > > > > >> > and ChunkDecoder in dev/0.2.x.
> > > > > > >> >
> > > > > > >> > You can see when encoding/decoding fails they clean up:
> > > > > > >> >
> > > > > > >> > https://github.com/eclipse/milo/blob/37bee220ae53026db87328c
> > > > > > >> fbb0b664671bff071/opc-ua-stack/stack-core/src/main/
> > > > > > >> java/org/eclipse/milo/opcua/stack/core/channel/
> > > > > ChunkEncoder.java#L86-L93
> > > > > > >> > https://github.com/eclipse/milo/blob/37bee220ae53026db87328c
> > > > > > >> fbb0b664671bff071/opc-ua-stack/stack-core/src/main/
> > > > > > >> java/org/eclipse/milo/opcua/stack/core/channel/
> > > > > ChunkDecoder.java#L82-L92
> > > > > > >> >
> > > > > > >> > This won't be a straight-forward cherry pick like the other
> > changes.
> > > > > > >> >
> > > > > > >> > Keep me updated.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Tue, Dec 12, 2017 at 10:03 PM, Shigeru Ishida <
> > > > > > >> ishida_shigeru@xxxxxxxxxxx
> > > > > > >> > > wrote:
> > > > > > >> >
> > > > > > >> > > Hi Kevin,
> > > > > > >> > >
> > > > > > >> > > Merge the following commits in dev/0.2.x,
> > > > > > >> > >
> > > > > > >> > > - 2017/12/09 c1b6978
> > > > > > >> > > - 2017/12/13 e4e5915
> > > > > > >> > >
> > > > > > >> > > Furthermore, I replaced ChunkDecoder.java from
> > > > > > >> > > https://github.com/eclipse/milo/tree/release-buffers and
> > > > > executed it.
> > > > > > >> > >
> > > > > > >> > > The JVM options are the same. I also set the GC log.
> > > > > > >> > >
> > > > > > >> > > -Dio.netty.leakDetection.maxRecords=1000
> > > > > > >> > > -Dio.netty.leakDetection.level=paranoid
> > > > > > >> > >
> > > > > > >> > > Also, the setting of the concurrent clients is the same.
> > > > > > >> > >
> > > > > > >> > > - Publishing interval 200 (msec)
> > > > > > >> > > - Sampling interval 100 (msec)
> > > > > > >> > > - NodeId items 350
> > > > > > >> > > - Number of concurrent clients 6
> > > > > > >> > >
> > > > > > >> > > When 1.5 hours passed, the increasing trend of GC's heap
> > area
> > > > > remained
> > > > > > >> > > as usual, and it seems that LEAK message seems to be coming
> > out in
> > > > > > >> about
> > > > > > >> > > 4 hours from the past experience if this trend continues.
> > > > > > >> > >
> > > > > > >> > > Therefore, when changing from Netty 4.1.4 to 4.0.53 (the
> > latest
> > > > > > >> version of
> > > > > > >> > > 4.0.x),
> > > > > > >> > > the occurrence frequency of Scavenge GC / Full GC decreased
> > > > > > >> drastically,
> > > > > > >> > > and the heap area increased by about 8 minutes It was
> > stopped.
> > > > > > >> > >
> > > > > > >> > > I have not seen it for about 1.5 hours yet, but it feels
> > good.
> > > > > > >> > >
> > > > > > >> > > It is early to state the conclusion, so let's go on
> > continuously
> > > > > > >> > > as it is and see the situation.
> > > > > > >> > >
> > > > > > >> > > Furthermore, since the mail became too big, I cut down
> > quoted
> > > > > mails
> > > > > > >> > > other than the nearest.
> > > > > > >> > >
> > > > > > >> > > Regards,
> > > > > > >> > >
> > > > > > >> > > --Shigeru
> > > > > > >> > >
> > > > > > >> > > <kevinherron@xxxxxxxxx> wrote, Tue, 12 Dec 2017 17:16:37
> > -0800
> > > > > > >> > >
> > > > > > >> > > > I think I've squashed all of the leaks that occur under
> > > > > exceptional
> > > > > > >> > > > circumstances in the dev/0.2.x branch:
> > > > > > >> > > > https://github.com/eclipse/milo/tree/dev/0.2.x
> > > > > > >> > > >
> > > > > > >> > > > It's a little difficult to cherry pick the fixes because
> > the
> > > > > 0.2.x
> > > > > > >> branch
> > > > > > >> > > > has significant changes.
> > > > > > >> > > _______________________________________________
> > > > > > >> > > milo-dev mailing list
> > > > > > >> > > milo-dev@xxxxxxxxxxx
> > > > > > >> > > To change your delivery options, retrieve your password, or
> > > > > > >> unsubscribe
> > > > > > >> > > from this list, visit
> > > > > > >> > > https://dev.eclipse.org/mailman/listinfo/milo-dev
> > > > > > >> _______________________________________________
> > > > > > >> milo-dev mailing list
> > > > > > >> milo-dev@xxxxxxxxxxx
> > > > > > >> To change your delivery options, retrieve your password, or
> > > > > unsubscribe
> > > > > > >> from this list, visit
> > > > > > >> https://dev.eclipse.org/mailman/listinfo/milo-dev
> > > > > _______________________________________________
> > > > > milo-dev mailing list
> > > > > milo-dev@xxxxxxxxxxx
> > > > > To change your delivery options, retrieve your password, or
> > unsubscribe
> > > > > from this list, visit
> > > > > https://dev.eclipse.org/mailman/listinfo/milo-dev
> > _______________________________________________
> > milo-dev mailing list
> > milo-dev@xxxxxxxxxxx
> > To change your delivery options, retrieve your password, or unsubscribe
> > from this list, visit
> > https://dev.eclipse.org/mailman/listinfo/milo-dev


Back to the top