Skip to content

unable to recover v4 glue when cache only has v6 records for nameservers but v6 is not usable #1348

@Supermathie

Description

@Supermathie

Describe the bug
When unbound contains in its cache only records for a non-reachable AF, it cannot recover by loading the glue records from the parent zone. Example uses IPv4 as the reachable AF.

To reproduce
Steps to reproduce the behavior:

1:

docker run --rm -it debian:latest bash
apt-get update && apt-get -y install dnsutils iproute2 unbound
/etc/init.d/unbound start
host www.cloudflare.net 127.0.0.1
for RNAME in ns{1..5}.cloudflare.net; do unbound-control flush_type $RNAME A; done
unbound-control flush www.cloudflare.net
host www.cloudflare.net 127.0.0.1

(at time of writing, nameservers are ns1-5)

2:

unbound-control set_option do-ip6 no
host www.cloudflare.net 127.0.0.1

3:

unbound-control flush cloudflare.net
host www.cloudflare.net 127.0.0.1

Expected behavior
After 1 and 2, unbound returns a SERVFAIL. I would have expected it to recover by looking up A records for the nameservers.

After 3, unbound returns correct information as expected.

System:

  • Unbound version: 1.17.1, debian package 1.17.1-2+deb12u3
  • OS: debian 12.12
  • unbound -V output:
    Version 1.17.1
    
    Configure line: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-pythonmodule --with-pyunbound --enable-subnet --enable-dnstap --enable-systemd --with-libnghttp2 --with-chroot-dir= --with-dnstap-socket-path=/run/dnstap.sock --disable-rpath --with-pidfile=/run/unbound.pid --with-libevent --enable-tfo-client --with-rootkey-file=/usr/share/dns/root.key --enable-tfo-server
    Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 3.0.17 1 Jul 2025
    Linked modules: dns64 python subnetcache respip validator iterator
    TCP Fastopen feature available
    
    BSD licensed, see LICENSE in source package for details.
    Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues
    

Additional information
This somewhat contrived example was encountered in the wild on our production IPv4-only AWS environment. On these servers, there is nothing that explicitly removes records from unbound's cache. This is happening across all instances; a notable feature of these instances is that they are doing a lot of lookups (in general, instance polling, not eyeball serving). Debugging led to the discovery of the above situation.

In-the-wild observations were:

  • unbound returning SERVFAIL for perfectly sensible queries:

    $ host v0n1.nic.ai.
    Host v0n1.nic.ai not found: 2(SERVFAIL)
    
  • with debugging on, the following output:

    Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] debug: sending to target: <ai.> 2001:500:a2::1#53
    
    Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v2n1.nic.ai. A IN
    Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v0n1.nic.ai. A IN
    Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v2n1.nic.ai. A IN
    Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v0n2.nic.ai. A IN
    Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v0n1.nic.ai. A IN
    Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] info: skipping target due to dependency cycle (harden-glue: no may fix some of the cycles) v0n0.nic.ai. A IN
    Sep 22 20:29:29 app-017414dc42502e76a unbound[3727]: [3727:0] error: SERVFAIL <v0n1.nic.ai. A IN>: misc failure
    
  • the following cache contents:

    app-017414dc42502e76a$ sudo unbound-control dump_cache | grep -E '^[vn01234]+.nic.ai.'
    v2n0.nic.ai.    39725   IN      AAAA    2001:500:a4::1
    v2n1.nic.ai.    62089   IN      AAAA    2001:500:a5::1
    v2n1.nic.ai.    62089   IN      A       199.115.157.1
    v0n3.nic.ai.    39725   IN      AAAA    2001:500:a3::1
    v0n1.nic.ai.    39725   IN      AAAA    2001:500:a1::1
    v0n0.nic.ai.    39725   IN      AAAA    2001:500:a0::1
    v0n2.nic.ai.    39725   IN      AAAA    2001:500:a2::1
    

    (despite an A record for v2n1 being in the cache, by luck it was not chosen for communication)

  • another exemplar was the hostname api.intacct.com - this service communicates with that hostname and was throwing resolution errors. Investigation showed:

    app-06b168c1985a53b48: ~ $ host api.intacct.com
    Host api.intacct.com not found: 2(SERVFAIL)
    
    app-06b168c1985a53b48: ~ $ host -t CNAME api.intacct.com
    api.intacct.com is an alias for api.bal.intacct.com.
    
    app-06b168c1985a53b48: ~ $ host -t CNAME api.bal.intacct.com.
    api.bal.intacct.com is an alias for api.intacct.com.cdn.cloudflare.net.
    
    app-06b168c1985a53b48: ~ $ host api.intacct.com.cdn.cloudflare.net.
    Host api.intacct.com.cdn.cloudflare.net not found: 2(SERVFAIL)
    

    and looking at the cloudflare.net nameservers revealed again the lack of IPv4 records:

    app-06b168c1985a53b48: ~ $ sudo unbound-control dump_cache | grep cloudflare.net
    ns1.cloudflare.net.	82255	IN	AAAA	2606:4700:57:1:4e4b:27d7:a29f:3cfa
    ns1.cloudflare.net.	82255	IN	AAAA	2803:f800:52:1:1df9:cfa1:ac40:28fa
    ns1.cloudflare.net.	82255	IN	AAAA	2a06:98c1:56:1:2693:f8af:6ca2:c6fa
    ns3.cloudflare.net.	82255	IN	AAAA	2606:4700:57:1:cba:4d93:a29f:3cfc
    ns3.cloudflare.net.	82255	IN	AAAA	2803:f800:52:1:b216:ccbe:ac40:28fc
    ns3.cloudflare.net.	82255	IN	AAAA	2a06:98c1:56:1:2cfa:9492:6ca2:c6fc
    ns4.cloudflare.net.	82255	IN	AAAA	2606:4700:57:1:fd14:b411:a29f:3cfd
    ns4.cloudflare.net.	82255	IN	AAAA	2803:f800:52:1:e69e:46f0:ac40:28fd
    ns4.cloudflare.net.	82255	IN	AAAA	2a06:98c1:56:1:45a3:de4a:6ca2:c6fd
    ns5.cloudflare.net.	82255	IN	AAAA	2606:4700:57:1:cb7b:a51d:a29f:3cfe
    ns5.cloudflare.net.	82255	IN	AAAA	2803:f800:52:1:d48b:2cd:ac40:28fe
    ns5.cloudflare.net.	82255	IN	AAAA	2a06:98c1:56:1:3b6c:6602:6ca2:c6fe
    cloudflare.net.	82255	IN	NS	ns1.cloudflare.net.
    cloudflare.net.	82255	IN	NS	ns2.cloudflare.net.
    cloudflare.net.	82255	IN	NS	ns3.cloudflare.net.
    cloudflare.net.	82255	IN	NS	ns4.cloudflare.net.
    cloudflare.net.	82255	IN	NS	ns5.cloudflare.net.
    api.bal.intacct.com.	27	IN	CNAME	api.intacct.com.cdn.cloudflare.net.
    cloudflare.net.	82255	IN	DS	2371 13 2 90F710A107DA51ED78125D30A68704CF3C0308AFD01BFCD7057D4BD03B62C68B
    cloudflare.net.	82255	IN	RRSIG	DS 13 2 86400 20250926030816 20250919015816 33296 net. FRUmszXtDMcaSYNZcvKgO9NBZHALgaU9rMsz1x0WFsjghFo4nWQuGxE0yQ9TS6hac/Vu9hytVfugNdnDSMQYcg== ;{id = 33296}
    ns2.cloudflare.net.	82255	IN	AAAA	2606:4700:57:1:e6ea:b99c:a29f:3cfb
    ns2.cloudflare.net.	82255	IN	AAAA	2803:f800:52:1:4da:414e:ac40:28fb
    ns2.cloudflare.net.	82255	IN	AAAA	2a06:98c1:56:1:ca7d:4c29:6ca2:c6fb
    
  • clearing all zone records made lookups function normally:

    app-06b168c1985a53b48: ~ $ host api.intacct.com
    Host api.intacct.com not found: 2(SERVFAIL)
    
    app-06b168c1985a53b48: ~ $ sudo unbound-control flush_zone cloudflare.net
    ok removed 7 rrsets, 0 messages and 0 key entries
    
    app-06b168c1985a53b48: ~ $ host api.intacct.com
    api.intacct.com is an alias for api.bal.intacct.com.
    api.bal.intacct.com is an alias for api.intacct.com.cdn.cloudflare.net.
    api.intacct.com.cdn.cloudflare.net has address 104.16.254.232
    api.intacct.com.cdn.cloudflare.net has address 104.16.255.232
    

The version of unbound on our Debian 12.12 servers is:

Version 1.23.1

Configure line: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-pythonmodule --with-pyunbound --enable-subnet --enable-dnstap --enable-systemd --with-libnghttp2 --with-chroot-dir= --with-dnstap-socket-path=/run/dnstap.sock --disable-rpath --with-pidfile=/run/unbound.pid --with-libevent --enable-tfo-client --with-rootkey-file=/usr/share/dns/root.key --enable-tfo-server
Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 3.0.17 1 Jul 2025
Linked modules: dns64 python subnetcache respip validator iterator
TCP Fastopen feature available

BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues

and our configuration there is:

server:
  cache-max-negative-ttl: 5
  do-not-query-localhost: no
  edns-buffer-size: 1280
  log-servfail: yes
  prefetch: yes
  val-bogus-ttl: 1
  val-log-level: 2

  interface: ::1
  interface: 169.254.0.53

  access-control: 0.0.0.0/0 allow_snoop

remote-control:
  control-enable: yes
  control-interface: /run/unbound.ctl

include-toplevel: /etc/unbound/zones.conf.d/*.conf

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions