Discussion:
BIND-9.16.1 memory leak?
(too old to reply)
s***@nethelp.no
2020-04-17 13:45:16 UTC
Permalink
We have what appears to be a significant memory leak in BIND-9.16.1.

Environment:
FreeBSD 12.1-STABLE.
BIND-9.16.1 installed from packages.
Also uses libuv-1.35.0 installed from packages.
Authoritative only.
Around 800 zones of varying sizes. DNSSEC in use.

Running a ps command for the named process every minute and logging
the result, I see the named virtual memory size (VSZ) increasing at
around 1.2 Mbyte/minute, and the resident size (RSS) increasing at
around 0.85 Mbyte/minute. No problems due to this so far, but pretty
obviously it's not viable in the long run.

I tried reading the CHANGES from 9.16.2, and didn't see anything which
suggested a fix for a memory leak problem.

Any suggestions?

Steinar Haug, Nethelp consulting, ***@nethelp.no
Karl Pielorz
2020-04-17 15:02:30 UTC
Permalink
Post by s***@nethelp.no
We have what appears to be a significant memory leak in BIND-9.16.1.
...
Post by s***@nethelp.no
Running a ps command for the named process every minute and logging
the result, I see the named virtual memory size (VSZ) increasing at
around 1.2 Mbyte/minute, and the resident size (RSS) increasing at
around 0.85 Mbyte/minute. No problems due to this so far, but pretty
obviously it's not viable in the long run.
I tried reading the CHANGES from 9.16.2, and didn't see anything which
suggested a fix for a memory leak problem.
Any suggestions?
Hi,

I seem to remember we got 'bitten' by large memory use when moving from a
previous version of bind - do you have 'max-cache-size' set in your config?

As far as I can remember, the 'default' is to take 90% of the memory on the
machine. This is great, unless the machine has lots of other stuff going on
etc. I think we noticed this during bind startup (i.e. from syslog output).

On our boxes we set it to "something sensible" - rather than using the
default.

Might not be your problem - but thought it was worth mentioning / checking.

-Karl
s***@nethelp.no
2020-04-17 15:25:17 UTC
Permalink
Post by Karl Pielorz
Post by s***@nethelp.no
We have what appears to be a significant memory leak in BIND-9.16.1.
...
Post by Karl Pielorz
I seem to remember we got 'bitten' by large memory use when moving
from a previous version of bind - do you have 'max-cache-size' set in
your config?
Yes. Set to 1G. In reality it shouldn't need a cache at all, since
this is a purely authoritative server (recursion no).

So this doesn't appear to be the problem. But thanks for the
suggestion!

Steinar Haug, Nethelp consulting, ***@nethelp.no
Anand Buddhdev
2020-04-17 15:35:27 UTC
Permalink
On 17/04/2020 17:02, Karl Pielorz wrote:

Hi Karl,
Post by Karl Pielorz
I seem to remember we got 'bitten' by large memory use when moving from
a previous version of bind - do you have 'max-cache-size' set in your
config?
It's an authoritative-only server, so there is (almost) no caching involved.

Anand
s***@nethelp.no
2020-04-19 12:55:53 UTC
Permalink
Post by s***@nethelp.no
We have what appears to be a significant memory leak in BIND-9.16.1.
FreeBSD 12.1-STABLE.
BIND-9.16.1 installed from packages.
Also uses libuv-1.35.0 installed from packages.
Authoritative only.
Around 800 zones of varying sizes. DNSSEC in use.
Running a ps command for the named process every minute and logging
the result, I see the named virtual memory size (VSZ) increasing at
around 1.2 Mbyte/minute, and the resident size (RSS) increasing at
around 0.85 Mbyte/minute. No problems due to this so far, but pretty
obviously it's not viable in the long run.
I tried reading the CHANGES from 9.16.2, and didn't see anything which
suggested a fix for a memory leak problem.
I now have a pcap file of queries that I can replay with the "drool"
application, and I'm consistently seeing similar memory leak problems
(i.e. the problems are reproducible). The memory leak rate seems to be
very approximately linear with the query rate - so replaying at 10
times the original speed means we also leak around 10 times as much per
minute.

Upgrading to 9.16.2 (and also libuv 1.36.0) makes no difference - the
same memory leak is observed.

Steinar Haug, Nethelp consulting, ***@nethelp.no
Evan Hunt
2020-04-20 06:28:55 UTC
Permalink
Post by s***@nethelp.no
I now have a pcap file of queries that I can replay with the "drool"
application, and I'm consistently seeing similar memory leak problems
(i.e. the problems are reproducible). The memory leak rate seems to be
very approximately linear with the query rate - so replaying at 10
times the original speed means we also leak around 10 times as much per
minute.
Upgrading to 9.16.2 (and also libuv 1.36.0) makes no difference - the
same memory leak is observed.
Is there anything unusual in your server configuration?
--
Evan Hunt -- ***@isc.org
Internet Systems Consortium, Inc.
Michael Sinatra
2020-04-20 17:46:58 UTC
Permalink
Post by s***@nethelp.no
We have what appears to be a significant memory leak in BIND-9.16.1.
FreeBSD 12.1-STABLE.
BIND-9.16.1 installed from packages.
Also uses libuv-1.35.0 installed from packages.
Authoritative only.
Around 800 zones of varying sizes. DNSSEC in use.
Additional datum, as I am seeing the same thing:

- FreeBSD 12.1-RELEASE-p3
- BIND-9.16.1 compiled from ports/poudriere via a local package build
server (no options changed, though, so it likely could have been
installed from the FreeBSD package repo).
- Authoritative only
- `rndc status` reports 1058 zones (69 automatic)
- Host is a VM with 16GiB allocated and 4 CPU cores
- named running for approx 2.5 weeks (wall-clock)

Current BIND status (from `top`):

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
1707 bind 14 52 0 5312M 5260M sigwai 2 34.4H 5.79% named

A recursive-only server, running the same versions of all software, on
an identically-provisioned VM, running for the same amount of wall-clock
time (approximately 2.5 weeks) looks like this:

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
1485 bind 14 52 0 927M 890M sigwai 3 89.6H 32.86% named

The recursive memory footprint looks normal.

Contrast that with a separate server:

- FreeBSD 11.3-RELEASE-p7
- BIND 9.14.11 compiled from ports/poudriere via a local package build
server (no options changed, though, so it likely could have been
installed from the FreeBSD package repo).
- Authoritative only + recursive only running in a separate jail
- Same configuration as above, only a bit busier
- Host is standalone with 96GiB RAM and 8 cores

In the `top` output below, both the jailed named processes are shown.
The busier one is the authoritative-only:

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
896 bind 18 52 0 956M 927M sigwai 0 99.2H 30.03%
named
1584 bind 18 52 0 1171M 1080M sigwai 2 166.2H 13.47%
named

It definitely looks like a memory leak in 9.16.1 when configured as
authoritative-only. The leak seems slow enough as to be manageable, but
the footprint does appear to growing monotonically (and is still
growing--by another 4M as I wrote this email).

michael
Greg Rivers
2020-06-10 15:37:49 UTC
Permalink
Post by s***@nethelp.no
We have what appears to be a significant memory leak in BIND-9.16.1.
FreeBSD 12.1-STABLE.
BIND-9.16.1 installed from packages.
Also uses libuv-1.35.0 installed from packages.
Authoritative only.
Around 800 zones of varying sizes. DNSSEC in use.
https://gitlab.isc.org/isc-projects/bind9/-/issues/1893
--
Greg
Loading...