Discussion:
DNS_RRL_MAX_RATE defines 1000
(too old to reply)
程智勇
2020-07-08 06:47:36 UTC
Permalink
Hi, all

I deployed a cluster of DNS which combined with a master and two slaves recently. I opened the response rate limiting function in slaves, which parameters like below:

rate-limit {
    ipv4-prefix-length 32;
    responses-per-second 250;
    all-per-second 1000;
    min-table-size 1000000;
    max-table-size 5000000;
    log-only no;
 };

But even with this configuration, there were still some dns queries dropped cause the RRL. I viewed the rrl.h and noticed the max rrl rate are defined like this:

#define DNS_RRL_MAX_RATE 1000

And "all-rer-second” shouldn’t larger than DNS_RRL_MAX_RATE.

So could anybody tell me why DNS_RRL_MAX_RATE defined 1000? And is there any other methods to bypass this limits?

Thanks and Regards, Zhiyong Cheng
Tony Finch
2020-07-08 15:45:38 UTC
Permalink
Post by 程智勇
So could anybody tell me why DNS_RRL_MAX_RATE defined 1000?
RRL is designed for authoritative DNS servers. Legitimate queries come
from recursive resolvers with caches. There should not be more than one
query for each RRset from each resolver per TTL. So a normal response rate
limit is relatively small - I set it to 10.

If you are hitting 1000 queries per second, that implies either there
are 1000 resolvers behind one IP address (which is VERY unlikely); or the
query traffic is abusive.

Are you sure the dropped traffic is legitimate?

Tony.
--
f.anthony.n.finch <***@dotat.at> http://dotat.at/
Channel Islands: West to southwest 4 to 5, occasionally 6 mid-channel
overnight and Thursday morning, occasionally west to northwest 2 to 4 in the
far south of the area. Slight to moderate with a low swell, perhaps
occasionally rather rough mid-channel until late morning. Occasional mist and
fog, especially overnight rain and drizzle at times, especially from Thursday
morning. Moderate to poor or very poor, locally good at times.
Zhiyong Cheng
2020-07-09 03:38:11 UTC
Permalink
Thanks for this reply : )

We are using named cluster in our internal network as the authoritative DNS. So there are no cache servers between clients and named cluster. Maybe we should add one but it is just another story.

There was a strange thing when I tested RRL using queryperf.  I generated 10000 qnames to test.txt and every qname queried once. The queryperf’s output pastes below:

Statistics:

 Parse input file: once
 Ended due to: reaching end of file

 Queries sent: 10000 queries
 Queries completed: 9820 queries
 Queries lost: 180 queries
 Queries delayed(?): 0 queries

 RTT max: 0.009435 sec
 RTT min: 0.000072 sec
 RTT average: 0.000503 sec
 RTT std deviation: 0.000785 sec
 RTT out of range: 0 queries

 Percentage completed: 98.20%
 Percentage lost: 1.80%

 Started at: Thu Jul 9 11:16:03 2020
 Finished at: Thu Jul 9 11:16:48 2020
 Ran for: 45.300412 seconds

 Queries per second: 216.775070 qps

The named rate-limiting logs pastes below:

09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b44ed190 10.0.0.10#38722 (anvq.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b4414020 10.0.0.10#38722 (anwi.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b4518840 10.0.0.10#38722 (anvf.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b4552680 10.0.0.10#38722 (anvx.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b44dea00 10.0.0.10#38722 (anwa.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b4487ca0 10.0.0.10#38722 (anva.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b4405890 10.0.0.10#38722 (anwg.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b4526fd0 10.0.0.10#38722 (anvr.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b446ad80 10.0.0.10#38722 (anvs.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b4430f40 10.0.0.10#38722 (anvh.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b44227b0 10.0.0.10#38722 (anvj.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b450a0b0 10.0.0.10#38722 (anvm.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b44a4bc0 10.0.0.10#38722 (anwe.internal): view xxxx: rate limit drop all response to 10.0.0.10/32
09-Jul-2020 11:16:54.055 rate-limit: info: client @0x7f83b4496430 10.0.0.10#38722 (anwh.internal): view xxxx: rate limit drop all response to 10.0.0.10/32

To my mind the RRL should not limit queries with different qnames from the same client. So is it my misunderstanding or wrong config?

BIND version pastes below:

version: BIND 9.11.4-P2 (Extended Support Version) <id:7107deb>
Post by Tony Finch
Post by 程智勇
So could anybody tell me why DNS_RRL_MAX_RATE defined 1000?
RRL is designed for authoritative DNS servers. Legitimate queries come
from recursive resolvers with caches. There should not be more than one
query for each RRset from each resolver per TTL. So a normal response rate
limit is relatively small - I set it to 10.
If you are hitting 1000 queries per second, that implies either there
are 1000 resolvers behind one IP address (which is VERY unlikely); or the
query traffic is abusive.
Are you sure the dropped traffic is legitimate?
Tony.
--
Channel Islands: West to southwest 4 to 5, occasionally 6 mid-channel
overnight and Thursday morning, occasionally west to northwest 2 to 4 in the
far south of the area. Slight to moderate with a low swell, perhaps
occasionally rather rough mid-channel until late morning. Occasional mist and
fog, especially overnight rain and drizzle at times, especially from Thursday
morning. Moderate to poor or very poor, locally good at times.
Tony Finch
2020-07-09 18:11:44 UTC
Permalink
Post by Zhiyong Cheng
We are using named cluster in our internal network as the authoritative
DNS. So there are no cache servers between clients and named cluster.
Maybe we should add one but it is just another story.
Sorry, I wasn't completely clear: I was not saying that your authoritative
servers should have a cache. I was saying that all the legitimate clients
of your servers (the resolvers at ISPs areound the Internet) have caches.
Post by Zhiyong Cheng
To my mind the RRL should not limit queries with different qnames from
the same client. So is it my misunderstanding or wrong config?
If you are querying for nonexistent names then RRL will treat the NXDOMAIN
responses as equivalent, so it will rate-limit them. RRL limits responses,
not queries. You can configure a different `nxdomains-per-second` limit if
you want.

Tony.
--
f.anthony.n.finch <***@dotat.at> http://dotat.at/
Rockall, Malin: Northwest 4 or 5. Moderate. Showers. Good.
Zhiyong Cheng
2020-07-10 12:15:31 UTC
Permalink
Post by Tony Finch
Post by Zhiyong Cheng
We are using named cluster in our internal network as the authoritative
DNS. So there are no cache servers between clients and named cluster.
Maybe we should add one but it is just another story.
Sorry, I wasn't completely clear: I was not saying that your authoritative
servers should have a cache. I was saying that all the legitimate clients
of your servers (the resolvers at ISPs areound the Internet) have caches.
All of these authoritative servers are only serve for our private clients. So
there won't have ISPs' resolvers.

I read the Bv9ARM again and noticed a hint in it:

 This mechanism is intended for authoritative DNS servers. It can be used on
 ecursive servers but can slow applications such as SMTP servers (mail
 receivers) and HTTP clients (web browsers) that repeatedly request the same
 domains. When possible, closing "open" recursive servers is better.

So it implies that I just should not use RRL in my authoritative servers.
Because all clients in my IDC internal queries my authoritative servers
directly. But RRL is not for this scenes.
Post by Tony Finch
Post by Zhiyong Cheng
To my mind the RRL should not limit queries with different qnames from
the same client. So is it my misunderstanding or wrong config?
If you are querying for nonexistent names then RRL will treat the NXDOMAIN
responses as equivalent, so it will rate-limit them. RRL limits responses,
not queries. You can configure a different `nxdomains-per-second` limit if
you want.
That’s it!  All of my queries are treated as equivalent. Thanks for your
patience :)
Post by Tony Finch
Tony.
--
Rockall, Malin: Northwest 4 or 5. Moderate. Showers. Good.
Zhiyong Cheng

Loading...