Discussion:
DNS security, amplification attacks and recursion
(too old to reply)
Michael De Roover
2020-07-07 13:00:13 UTC
Permalink
Hello,

Recently I discussed with a friend of mine the idea of NTP and DNS in
the context of denial of service attacks. In NTP this amplification
attack is done with the monlist command (that should honestly never have
been publicly available due to its purpose being pretty much entirely
debugging-related). The DNS version was rather unclear to me however.

Said friend said to me that he tested my authoritative name servers and
found them to be not vulnerable. I don't run the latest and greatest of
BIND at all, I mean it's Debian distribution packages we're talking
about there... But they were set up to be exclusively authoritative.
They do not respond to recursive queries. It appears that the test of
whether a server is "vulnerable" or not has to do with this. The command
used to test this was apparently "dig +short test.openresolver.com TXT
@your.name.server". That's simply a recursive query of what appears to
be an arbitrary record to me.

This also meant that supposedly the recursive DNS servers from Google,
Cloudflare and Quad9 were all considered vulnerable. I find this very
hard to believe. Authoritative name servers may not need a huge DNS
infrastructure for a small-ish zone (say under 1k records), but
recursors on the scale of Google and Cloudflare in particular (not sure
how popular Quad9 is so far).. those use massive infrastructure
including anycast and everything! I'd consider it safe to assume that
their servers are at least on the order of 100Gbps cumulatively, if not
more. If these would be vulnerable to amplification attacks just because
they allow recursion, wouldn't skids be jumping on this like there's no
tomorrow? It doesn't make any sense to me.

This seems to be not very well documented online (or more likely my
search terms aren't right), so yeah... I wonder why the idea of
recursion became associated with a vulnerable server in the first place.
--
Met vriendelijke groet / Best regards,
Michael De Roover
Stephane Bortzmeyer
2020-07-07 13:22:16 UTC
Permalink
On Tue, Jul 07, 2020 at 03:00:13PM +0200,
The command used to test this was apparently "dig +short
ANY instead of TXT may be more efficient (specially with +dnssec), if
the goal is to get the maximum amplification. Of course, if the server
implements RFC 8482, ANY won't help.
Authoritative name servers may not need a huge DNS infrastructure
for a small-ish zone (say under 1k records), but recursors on the
scale of Google and Cloudflare in particular (not sure how popular
Quad9 is so far).. those use massive infrastructure including
anycast and everything! I'd consider it safe to assume that their
servers are at least on the order of 100Gbps cumulatively, if not
more.
This is precisely what makes them dangerous. They are good reflectors
(good from the point of view of the attacker). On the other hand, they
typically implement various forms of rate-limiting, and they are
monitored closely by knowledgeable professionals so, they may not be
good reflectors after all.
If these would be vulnerable to amplification attacks just because
they allow recursion,
They're not vulnerable, this attack works by reflection (just like the
NTP attack you mentioned) so they are not the potential victims, they
could be used as helpers.
Tony Finch
2020-07-07 14:06:25 UTC
Permalink
Post by Michael De Roover
Said friend said to me that he tested my authoritative name servers and
found them to be not vulnerable. [snip] They do not respond to recursive
queries. It appears that the test of whether a server is "vulnerable" or
not has to do with this. The command used to test this was apparently
OK, that iss all right and correct, but there is (of course) a bit more to
this issue.

As you already know, the most basic thing to avoid is not being an open
recursive server. Out of the box, BIND has a recursion ACL that only
allows queries from directly connected networks, so you won't have this
problem without making an explicit configuration mistake. Normally for an
authoritative-only server, you should set `recursion no` to lock it down
more tightly.

An auth-only server can also be used for amplification attacks that use
its authoritative zones - these attacks don't have to use recursion.
There are a few ways to mitigate auth-only amplification attacks.

Response rate limiting is very effective. Start off by putting the
following in your options{} section, and look in the BIND ARM for other
directives you can put in the rate-limit{} section.

rate-limit {
responses-per-second 10;
};

Especially if you have DNSSEC signed zones then there are a few extra
things you can do to reduce the size of your response packets, which
reduces the attacker's amplification factor, and makes you less likely to
be abused.

Set a maximum UDP packet size, to suppress fragmented packets. The DNS
flag day 2020 campaign will make this a standard setting. For a long time
I have used:

max-udp-size 1420;

https://dnsflagday.net/2020/

A downside of small UDP responses is more truncated packets and more
queries over TCP, but there are still more ways to reduce response size
which also reduce truncation.

Reduce the size of responses to ANY queries, which are a favourite tool of
amplification attacks. There's basically no downside to this one, in my
opinion, but I'm biased because I implemented it.

minimal-any yes;

You can also reduce the size of other answers. In theory this option might
force resolvers to make more queries to get records that by default would
appear in the additional section, but I think in practice resolvers make
these queries anyway because of RFC 2181 trustworthiness logic, and
because applications (such as SMTP servers) find it easier to query
directly than use additional records. So on my auth servers I set:

minimal-responses yes;

If you are signing zones with DNSSEC, consider doing an algorithm
rollover to ECDSA p256 (algorithm 13) because this has much smaller
signatures than RSA. Algorithm rollovers are not particularly easy,
because you need a good grasp of the DNSSEC key timing parameters and how
and when to swap over your DS records. (There used to be even more
gotchas, so it is getting easier, honest!)

Finally, there's the built-in _bind CHAOS view. This has very strict
response rate limiting by default, but if you want to be super careful
you can set `version none` and `hostname none` to lock it down further.
(I don't bother with this.)

Here endeth the brain dump.

Tony.
--
f.anthony.n.finch <***@dotat.at> http://dotat.at/
Mull of Galloway to Mull of Kintyre including the Firth of Clyde and North
Channel: Variable, 2 to 4. Moderate at first near the Mull of Kintyre,
otherwise smooth or slight. Showers. Mainly good.
@lbutlr
2020-07-07 17:28:18 UTC
Permalink
On 07 Jul 2020, at 08:06, Tony Finch <***@dotat.at> wrote:

Excellent post, and a nice summary of some best practices.

I have a couple of questions.
Post by Tony Finch
Response rate limiting is very effective. Start off by putting the
following in your options{} section, and look in the BIND ARM for other
directives you can put in the rate-limit{} section.
rate-limit { responses-per-second 10; };
Does that apply to local queries as well (for example, a mail server may easily make a whole lot of queries to 127.0.0.1, and rate limiting it would at the very least affect logging and could delay mail if the MTA cannot verify DNS.

Do these setting also need to be applied to the secondary servers?
--
What's another word for Thesaurus?
Tony Finch
2020-07-07 17:58:35 UTC
Permalink
Post by @lbutlr
Post by Tony Finch
rate-limit { responses-per-second 10; };
Does that apply to local queries as well (for example, a mail server may
easily make a whole lot of queries to 127.0.0.1, and rate limiting it
would at the very least affect logging and could delay mail if the MTA
cannot verify DNS.
I don't recommend using response rate limiting on recursive servers.

The principle behind RRL is that auth servers are queried by resolvers
with caches, and a correctly-functioning cache will not repeat the same
query very frequently, so it is reasonable to apply a rate limit on the
auth servers.

Recursive servers, on the other hand, are often queried by stub resolvers
without caches. The query rate is then entirely driven by the application
workload, and you can't apply a rate limit on the recursive server without
causing serious trouble for the application.

It can be especially bad because traditional cacheless stub resolvers are
not good at error recovery, and when RRL hits, their retry strategy is
likely to increase the query rate observed by the server, making things
worse.

If you are running an oldskool multi-purpose server that is recursive for
its own daemons but authoritative for others, then you can use the
`rate-limit { exempt-clients }` option so that RRL doesn't apply to
recursive clients. But I wouldn't recommend a setup like this. (My auth
servers have their /etc/resolv.conf pointing at my recursive service.)
Post by @lbutlr
Do these setting also need to be applied to the secondary servers?
The settings I described are for public authoritative servers, i.e ones
that appear in NS records. These servers can be primary or secondary (but
are usually secondary).

Secondary servers don't necessarily appear in NS records, and if they
don't they are less likely to be exposed to this kind of attack.

Tony.
--
f.anthony.n.finch <***@dotat.at> http://dotat.at/
Southeast Iceland: Westerly or southwesterly, 3 to 5, becoming variable 3 or
less later in north. Moderate. Showers. Good.
Michael De Roover
2020-07-07 18:06:29 UTC
Permalink
Post by Tony Finch
An auth-only server can also be used for amplification attacks that use
its authoritative zones - these attacks don't have to use recursion.
There are a few ways to mitigate auth-only amplification attacks.
Response rate limiting is very effective. Start off by putting the
following in your options{} section, and look in the BIND ARM for other
directives you can put in the rate-limit{} section.
rate-limit {
responses-per-second 10;
};
That's a really useful option to have, I didn't know about this yet. It
seems like that could take care of the brunt of amplification attacks
already. Definitely going to add this in, thanks!
Post by Tony Finch
Set a maximum UDP packet size, to suppress fragmented packets. The DNS
flag day 2020 campaign will make this a standard setting. For a long time
max-udp-size 1420;
https://dnsflagday.net/2020/
A downside of small UDP responses is more truncated packets and more
queries over TCP, but there are still more ways to reduce response size
which also reduce truncation.
Interesting, I wasn't aware of this campaign. I don't know if I'm
knowledgeable enough on UDP to be able to make educated decisions on
this myself but I look forward to its eventual release.
Post by Tony Finch
Reduce the size of responses to ANY queries, which are a favourite tool of
amplification attacks. There's basically no downside to this one, in my
opinion, but I'm biased because I implemented it.
minimal-any yes;
I've heard of these ANY queries being preferred for amplification
attacks as well, since the responses are often so large... I don't think
that there would be any downsides to this either, in fact I've never
actually seen a legitimate application use it... Probably best to lock
down indeed.
Post by Tony Finch
You can also reduce the size of other answers. In theory this option might
force resolvers to make more queries to get records that by default would
appear in the additional section, but I think in practice resolvers make
these queries anyway because of RFC 2181 trustworthiness logic, and
because applications (such as SMTP servers) find it easier to query
minimal-responses yes;
Hmm, for the authoritative name servers this might be a good idea yeah..
Those are authoritative only (i.e. `recursion no`). So for clients
querying those, the NS records served in the additional section at least
should already be known to the client anyway... I mean that's why
they're there to begin with, so they must already know that information
from the DNS servers higher up the chain. And another query if needed,
saves traffic either way I suppose.

Thanks a lot for the detailed reply, I really appreciate it :)
--
Met vriendelijke groet / Best regards,
Michael De Roover
Brett Delmage
2020-07-07 18:21:13 UTC
Permalink
Post by Tony Finch
Reduce the size of responses to ANY queries, which are a favourite tool of
amplification attacks. There's basically no downside to this one, in my
opinion, but I'm biased because I implemented it.
minimal-any yes;
Why only reduce and not eliminate?

Can ANY responses be disabled completely with an option?

This article at cloudflare
https://blog.cloudflare.com/deprecating-dns-any-meta-query-type/
states that they have deprecated it because it wasn't being used. They
should know! This was posted over 5 years ago, in 2015.

Brett
Shumon Huque
2020-07-07 18:31:26 UTC
Permalink
Post by Brett Delmage
Post by Tony Finch
Reduce the size of responses to ANY queries, which are a favourite tool
of
Post by Tony Finch
amplification attacks. There's basically no downside to this one, in my
opinion, but I'm biased because I implemented it.
minimal-any yes;
Why only reduce and not eliminate?
Can ANY responses be disabled completely with an option?
This article at cloudflare
https://blog.cloudflare.com/deprecating-dns-any-meta-query-type/
states that they have deprecated it because it wasn't being used. They
should know! This was posted over 5 years ago, in 2015.
Cloudflare themselves now implement the "minimal any" behavior described
in this spec:

https://tools.ietf.org/html/rfc8482

Responding to ANY with NOTIMP, REFUSED, or unknown RCODEs, or not
responding at all results in undesirable follow-on behaviour from DNS
resolvers
(mostly aggressive retries).

Shumon.

---
$ dig @ns1.cloudflare.com. cloudflare.com. ANY

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54526
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;cloudflare.com. IN ANY

;; ANSWER SECTION:
cloudflare.com. 3789 IN HINFO "RFC8482" ""
Brett Delmage
2020-07-07 18:42:12 UTC
Permalink
Post by Shumon Huque
Cloudflare themselves now implement the "minimal any" behavior described
    https://tools.ietf.org/html/rfc8482
cloudflare.com.         3789    IN      HINFO   "RFC8482" ""
Gee, that's a pretty minimal answer! Thanks.
@lbutlr
2020-07-07 20:05:53 UTC
Permalink
Post by Tony Finch
max-udp-size 1420;
https://dnsflagday.net/2020/
Interesting, I wasn't aware of this campaign. I don't know if I'm knowledgeable enough on UDP to be able to make educated decisions on this myself but I look forward to its eventual release.
The URL has a good explanation of this setting and it looks like 1420 is a more than adequate packet size.

From the page:
An EDNS buffer size of 1232 bytes will avoid fragmentation on nearly all current networks. This is based on an MTU of 1280, which is required by the IPv6 specification, minus 48 bytes for the IPv6 and UDP headers.

Sunce 1420 is still well under the MTU on most connections (usually 1500, sometimes 1492) and well above the required, I suspect this is fine as well. I've gone ahead and added to to my named.conf with a comment linking to Tony's message.
--
"Are you pondering what I'm pondering?"
"I think so, Mr. Brain, but if the sun'll come out tomorrow, what's
it doing right now?"
Tony Finch
2020-07-07 20:31:04 UTC
Permalink
Post by Brett Delmage
Post by Tony Finch
minimal-any yes;
Why only reduce and not eliminate?
The reason is a bit subtle. If an ANY query comes via a recursive
resolver, it is much better to give the resolver an answer so that it will
put an entry in its cache. The cache entry will stop more ANY queries from
being sent from the resolver to the upstream auth server, as long as its
TTL lasts.

If the auth server does not answer, or sends a REFUSED error, the resolver
is likely to retry, which increases worthless traffic rather than
suppressing it, and the resolver may decide the auth server is lame which
will cause knock-on problems for legitimate queries.

There are some scenarios where reflection attacks go through multiple
servers. If you can get cache entries into those servers then the
attack traffic gets suppressed closer to its source. There have been quite
a lot of attacks that work like this:

* an ISP has a huge number of customers with crappy home routers, that
can act as open recursive resolvers

* an arsehole decides to use these crappy home routers in a reflection /
amplification DDoS attack

* the crappy home routers forward the attack queries to their ISP's
recursive servers; these recursive servers are legitimate and well
configured but suffer from bad client devices

* the recursive servers resolve the queries against some third party
authoritative servers

If the recursive servers cache the responses, then the auth servers should
not be much affected by the attack: most of the traffic is answered from the
ISP caches, and maybe the home router caches if they have them.

But if the auth servers don't answer, or send REFUSED errors, then the
recursive servers are going to keep retrying queries, and thereby relay a
very large proportion of the attack traffic to the auth servers. Sadness
will follow.

Note that RRL does not help in this scenario, because from the auth
server's point of view the ISP resolvers are legitimate clients, which RRL
can observe from their retry behaviour. RRL is designed for attacks where
the spoofed queries go direct to the auth server, which is not happening
in this case.

When this happened to us (when my servers were the third party auth
servers) the DDoS attack was hitting a very large number of ISPs, so our
auth servers were getting ANY queries via huge numbers of recursive
servers. Extra unfortunately, the ANY response was too big to fit in UDP,
so all the resolvers were trying to query over TCP. And our auth servers
did not have enough TCP capacity to handle the load. Much sadness. (It
didn't take us offline because our off-site auth servers were differently
configured and able to keep answering.)

So I implemented minimal-any to stop it from happening again.

Tony.
--
f.anthony.n.finch <***@dotat.at> http://dotat.at/
Fisher, German Bight: Westerly veering northwesterly 4 to 6, decreasing 3
later in south German Bight. Moderate, occasionally rough at first. Mainly
fair. Mainly good.
Brett Delmage
2020-07-07 22:17:33 UTC
Permalink
Post by Tony Finch
Post by Brett Delmage
Post by Tony Finch
minimal-any yes;
Why only reduce and not eliminate?
The reason is a bit subtle. If an ANY query comes via a recursive
resolver, it is much better to give the resolver an answer so that it will
put an entry in its cache...
This is a very interesting and clear explanation. Thanks for taking the
time to share this Tony. TIL :-)

Brett

Loading...