-
-
Notifications
You must be signed in to change notification settings - Fork 417
Description
Describe the bug
When unbound contains in its cache only records for a non-reachable AF, it cannot recover by loading the glue records from the parent zone. Example uses IPv4 as the reachable AF.
To reproduce
Steps to reproduce the behavior:
1:
docker run --rm -it debian:latest bash
apt-get update && apt-get -y install dnsutils iproute2 unbound
/etc/init.d/unbound start
host www.cloudflare.net 127.0.0.1
for RNAME in ns{1..5}.cloudflare.net; do unbound-control flush_type $RNAME A; done
unbound-control flush www.cloudflare.net
host www.cloudflare.net 127.0.0.1
(at time of writing, nameservers are ns1-5)
2:
unbound-control set_option do-ip6 no
host www.cloudflare.net 127.0.0.1
3:
unbound-control flush cloudflare.net
host www.cloudflare.net 127.0.0.1
Expected behavior
After 1 and 2, unbound returns a SERVFAIL. I would have expected it to recover by looking up A records for the nameservers.
After 3, unbound returns correct information as expected.
System:
- Unbound version: 1.17.1, debian package
1.17.1-2+deb12u3 - OS: debian 12.12
unbound -Voutput:Version 1.17.1 Configure line: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-pythonmodule --with-pyunbound --enable-subnet --enable-dnstap --enable-systemd --with-libnghttp2 --with-chroot-dir= --with-dnstap-socket-path=/run/dnstap.sock --disable-rpath --with-pidfile=/run/unbound.pid --with-libevent --enable-tfo-client --with-rootkey-file=/usr/share/dns/root.key --enable-tfo-server Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 3.0.17 1 Jul 2025 Linked modules: dns64 python subnetcache respip validator iterator TCP Fastopen feature available BSD licensed, see LICENSE in source package for details. Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues
Additional information
This somewhat contrived example was encountered in the wild on our production IPv4-only AWS environment. On these servers, there is nothing that explicitly removes records from unbound's cache. This is happening across all instances; a notable feature of these instances is that they are doing a lot of lookups (in general, instance polling, not eyeball serving). Debugging led to the discovery of the above situation.
In-the-wild observations were:
-
unbound returning SERVFAIL for perfectly sensible queries:
$ host v0n1.nic.ai. Host v0n1.nic.ai not found: 2(SERVFAIL) -
with debugging on, the following output:
Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] debug: sending to target: <ai.> 2001:500:a2::1#53 Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v2n1.nic.ai. A IN Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v0n1.nic.ai. A IN Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v2n1.nic.ai. A IN Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v0n2.nic.ai. A IN Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v0n1.nic.ai. A IN Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v0n0.nic.ai. A IN Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] error: SERVFAIL <v0n1.nic.ai. A IN>: misc failure -
the following cache contents:
app-017414dc42502e76a$ sudo unbound-control dump_cache | grep -E '^[vn01234]+.nic.ai.' v2n0.nic.ai. 39725 IN AAAA 2001:500:a4::1 v2n1.nic.ai. 62089 IN AAAA 2001:500:a5::1 v2n1.nic.ai. 62089 IN A 199.115.157.1 v0n3.nic.ai. 39725 IN AAAA 2001:500:a3::1 v0n1.nic.ai. 39725 IN AAAA 2001:500:a1::1 v0n0.nic.ai. 39725 IN AAAA 2001:500:a0::1 v0n2.nic.ai. 39725 IN AAAA 2001:500:a2::1(despite an A record for v2n1 being in the cache, by luck it was not chosen for communication)
-
another exemplar was the hostname
api.intacct.com- this service communicates with that hostname and was throwing resolution errors. Investigation showed:app-06b168c1985a53b48: ~ $ host api.intacct.com Host api.intacct.com not found: 2(SERVFAIL) app-06b168c1985a53b48: ~ $ host -t CNAME api.intacct.com api.intacct.com is an alias for api.bal.intacct.com. app-06b168c1985a53b48: ~ $ host -t CNAME api.bal.intacct.com. api.bal.intacct.com is an alias for api.intacct.com.cdn.cloudflare.net. app-06b168c1985a53b48: ~ $ host api.intacct.com.cdn.cloudflare.net. Host api.intacct.com.cdn.cloudflare.net not found: 2(SERVFAIL)and looking at the cloudflare.net nameservers revealed again the lack of IPv4 records:
app-06b168c1985a53b48: ~ $ sudo unbound-control dump_cache | grep cloudflare.net ns1.cloudflare.net. 82255 IN AAAA 2606:4700:57:1:4e4b:27d7:a29f:3cfa ns1.cloudflare.net. 82255 IN AAAA 2803:f800:52:1:1df9:cfa1:ac40:28fa ns1.cloudflare.net. 82255 IN AAAA 2a06:98c1:56:1:2693:f8af:6ca2:c6fa ns3.cloudflare.net. 82255 IN AAAA 2606:4700:57:1:cba:4d93:a29f:3cfc ns3.cloudflare.net. 82255 IN AAAA 2803:f800:52:1:b216:ccbe:ac40:28fc ns3.cloudflare.net. 82255 IN AAAA 2a06:98c1:56:1:2cfa:9492:6ca2:c6fc ns4.cloudflare.net. 82255 IN AAAA 2606:4700:57:1:fd14:b411:a29f:3cfd ns4.cloudflare.net. 82255 IN AAAA 2803:f800:52:1:e69e:46f0:ac40:28fd ns4.cloudflare.net. 82255 IN AAAA 2a06:98c1:56:1:45a3:de4a:6ca2:c6fd ns5.cloudflare.net. 82255 IN AAAA 2606:4700:57:1:cb7b:a51d:a29f:3cfe ns5.cloudflare.net. 82255 IN AAAA 2803:f800:52:1:d48b:2cd:ac40:28fe ns5.cloudflare.net. 82255 IN AAAA 2a06:98c1:56:1:3b6c:6602:6ca2:c6fe cloudflare.net. 82255 IN NS ns1.cloudflare.net. cloudflare.net. 82255 IN NS ns2.cloudflare.net. cloudflare.net. 82255 IN NS ns3.cloudflare.net. cloudflare.net. 82255 IN NS ns4.cloudflare.net. cloudflare.net. 82255 IN NS ns5.cloudflare.net. api.bal.intacct.com. 27 IN CNAME api.intacct.com.cdn.cloudflare.net. cloudflare.net. 82255 IN DS 2371 13 2 90F710A107DA51ED78125D30A68704CF3C0308AFD01BFCD7057D4BD03B62C68B cloudflare.net. 82255 IN RRSIG DS 13 2 86400 20250926030816 20250919015816 33296 net. FRUmszXtDMcaSYNZcvKgO9NBZHALgaU9rMsz1x0WFsjghFo4nWQuGxE0yQ9TS6hac/Vu9hytVfugNdnDSMQYcg== ;{id = 33296} ns2.cloudflare.net. 82255 IN AAAA 2606:4700:57:1:e6ea:b99c:a29f:3cfb ns2.cloudflare.net. 82255 IN AAAA 2803:f800:52:1:4da:414e:ac40:28fb ns2.cloudflare.net. 82255 IN AAAA 2a06:98c1:56:1:ca7d:4c29:6ca2:c6fb -
clearing all zone records made lookups function normally:
app-06b168c1985a53b48: ~ $ host api.intacct.com Host api.intacct.com not found: 2(SERVFAIL) app-06b168c1985a53b48: ~ $ sudo unbound-control flush_zone cloudflare.net ok removed 7 rrsets, 0 messages and 0 key entries app-06b168c1985a53b48: ~ $ host api.intacct.com api.intacct.com is an alias for api.bal.intacct.com. api.bal.intacct.com is an alias for api.intacct.com.cdn.cloudflare.net. api.intacct.com.cdn.cloudflare.net has address 104.16.254.232 api.intacct.com.cdn.cloudflare.net has address 104.16.255.232
The version of unbound on our Debian 12.12 servers is:
Version 1.23.1
Configure line: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-pythonmodule --with-pyunbound --enable-subnet --enable-dnstap --enable-systemd --with-libnghttp2 --with-chroot-dir= --with-dnstap-socket-path=/run/dnstap.sock --disable-rpath --with-pidfile=/run/unbound.pid --with-libevent --enable-tfo-client --with-rootkey-file=/usr/share/dns/root.key --enable-tfo-server
Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 3.0.17 1 Jul 2025
Linked modules: dns64 python subnetcache respip validator iterator
TCP Fastopen feature available
BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues
and our configuration there is:
server:
cache-max-negative-ttl: 5
do-not-query-localhost: no
edns-buffer-size: 1280
log-servfail: yes
prefetch: yes
val-bogus-ttl: 1
val-log-level: 2
interface: ::1
interface: 169.254.0.53
access-control: 0.0.0.0/0 allow_snoop
remote-control:
control-enable: yes
control-interface: /run/unbound.ctl
include-toplevel: /etc/unbound/zones.conf.d/*.conf