Follow @Openwall on Twitter for new release announcements and other news
[<prev] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87ik9wzumr.fsf@gmail.com>
Date: Sat, 11 Apr 2026 21:10:20 -0700
From: Collin Funk <collin.funk1@...il.com>
To: oss-security@...ts.openwall.com
Cc: Vahagn Vardanian <vahagn@...rays.io>,  Paul Eggert <eggert@...ucla.edu>
Subject: Re: GNU tar: listing/extraction desynchronization
 allows hidden file injection

Solar Designer <solar@...nwall.com> writes:

>> === BUG #03  (HIGH) ===
>>     Signed integer overflow in PAX v0.0/v0.1 sparse offset+numbytes
>> 
>> Files:   src/xheader.c:1426 (sparse_numbytes_decoder)
>>          src/xheader.c:1480 (sparse_map_decoder)
>> Impact:  Signed 64-bit integer overflow in offset+numbytes arithmetic.
>>          Undefined behavior that can corrupt internal state.
>> 
>> Description:
>> 
>> The PAX v1.0 decoder in sparse.c already checks:
>> 
>>   if (INT_ADD_OVERFLOW (sp.offset, u))
>>     ...
>> 
>> But the PAX v0.0 and v0.1 decoders in xheader.c do not.  A crafted
>> extended header with offset=9223372036854775800 and numbytes=100
>> passes individual range checks but overflows signed off_t when summed.
>> 
>> UBSan output:
>> 
>>   xheader.c:1473: runtime error: signed integer overflow:
>>     9223372036854775800 + 100 cannot be represented in type 'long int'
>> 
>> Proposed fix:
>> 
>> Add the same INT_ADD_OVERFLOW check after assigning numbytes in both
>> sparse_numbytes_decoder() and sparse_map_decoder(), and return early
>> with an error if the check fires.

Both of the functions they reference check that the value fits in off_t.
A commit from 16 years ago shows the relevant code [1], but even before
that overflows where checked.

>> === BUG #04  (LOW) ===
>>     FLEXNSIZEOF overflow in create_placeholder_file
>> 
>> File:    src/extract.c:1444
>> Impact:  Theoretical heap buffer overflow if link_name is near SIZE_MAX.
>> 
>> Description:
>> 
>> create_placeholder_file() computes:
>> 
>>   xmalloc (FLEXNSIZEOF (struct delayed_link, target,
>>                          strlen (current_stat_info.link_name) + 1));
>> 
>> If strlen(link_name) is near SIZE_MAX, adding 1 overflows size_t, and
>> FLEXNSIZEOF computes a small allocation size.  The subsequent strcpy
>> writes past the buffer.  PAX extended headers allow arbitrarily long
>> linkpath values, so the attacker controls the length.  In practice
>> this requires the system to have nearly SIZE_MAX bytes of memory
>> available, so it is mostly theoretical.
>> 
>> Proposed fix:
>> 
>>   size_t link_len = strlen (current_stat_info.link_name);
>>   if (SIZE_MAX - sizeof (struct delayed_link) <= link_len)
>>     xalloc_die ();

I think it is safe to assume that no one has SIZE_MAX bytes of memory.

>> === BUG #09  (MEDIUM) ===
>>     strcmp overread past the non-NUL-terminated magic field
>> 
>> File:    src/list.c:565, 632  (read_header, decode_header)
>> Impact:  Buffer overread past the 6-byte magic field boundary.
>>          Format misdetection with crafted headers.
>> 
>> Description:
>> 
>>   strcmp (h->magic, TMAGIC)                    // line 565
>>   strcmp (header->header.magic, TMAGIC)        // line 632
>> 
>> TMAGIC is "ustar\0" (6 bytes including NUL).  The magic field in the
>> header struct is exactly 6 bytes (char magic[6]).  If the field
>> contains "ustar\x01" (no NUL terminator), strcmp reads past byte 5
>> into the adjacent version[2] and uname[32] fields until it finds a NUL.
>> 
>> ASan (with tight heap allocator) reports:
>> 
>>   ERROR: AddressSanitizer: heap-buffer-overflow
>>     READ of size 1 ... in __strcmp_sse42
>> 
>> Proposed fix:
>> 
>>   memcmp (h->magic, TMAGIC, sizeof TMAGIC)

This is just copied from Paul's commit in 2025 [2].

>> === BUG #10  (MEDIUM) ===
>>     Incomplete destructor for delayed_link_table hash entries
>> 
>> File:    src/extract.c:1478  (create_placeholder_file)
>> Impact:  Memory leak on hash collision/replacement.
>> 
>> Description:
>> 
>>   hash_initialize (0, 0, dl_hash, dl_compare, free)
>> 
>> The destructor callback is just free(), which frees the top-level
>> struct delayed_link but not its sub-allocations: the sources linked
>> list, cntx_name, acls_a_ptr, acls_d_ptr, and xattr_map.  When a hash
>> collision causes an old entry to be evicted, these are leaked.
>> 
>> Proposed fix:
>> 
>> Write a proper destructor:
>> 
>>   static void
>>   free_delayed_link (void *entry)
>>   {
>>     struct delayed_link *p = entry;
>>     struct string_list *s, *next;
>>     for (s = p->sources; s; s = next)
>>       {
>>         next = s->next;
>>         free (s);
>>       }
>>     free (p->cntx_name);
>>     free (p->acls_a_ptr);
>>     free (p->acls_d_ptr);
>>     xattr_map_free (&p->xattr_map);
>>     free (p);
>>   }
>> 
>> And pass free_delayed_link instead of free to hash_initialize.

GNU tar never calls hash_free, and therefore will never call the free
function given to hash_initialize. It is explained in a comment and the
commit message [3]:

   if (false)
     {
       /* There is little point to freeing, as we are about to exit,
	  and freeing is more likely to cause than cure trouble.  */
       hash_free (delayed_link_table);
       delayed_link_table = NULL;
     }

>> === BUG #13  (LOW) ===
>>     Integer truncation in decode_record
>> 
>> File:    src/xheader.c:630  (decode_record)
>> Impact:  Implicit narrowing from ptrdiff_t to int; UBSan
>>          -fsanitize=implicit-conversion fires.
>> 
>> Description:
>> 
>>   int len_len = len_lim - p;    // line 630
>> 
>> len_lim - p is ptrdiff_t.  If the difference exceeds INT_MAX, the
>> implicit conversion to int wraps to a negative value.  The value is
>> only used to format an error message, so the practical impact is
>> limited to a garbled diagnostic, but UBSan rightfully flags it.
>> 
>> Proposed fix:
>> 
>>   int len_len = (int) (len_lim - p < 1000 ? len_lim - p : 1000);
>> 
>> This clamps the value to a reasonable range for the error message.

This proposal was just copied from Paul's commit in 2024 [4].

>> === SUMMARY TABLE ===
>> 
>>   #   Severity  File          Function / Line              Short description
>>   --  --------  -----------   ---------------------------  --------------------------
>>   01  CRITICAL  sparse.c:593  sparse_extract_file          Archive stream desync -> file injection
>>   14  CRITICAL  (same root cause as #01)                   Full RCE chain demo
>>   02  HIGH      sparse.c:803  oldgnu_add_sparse            Overlapping sparse regions -> corruption
>>   03  HIGH      xheader.c     sparse_numbytes/map_decoder  Missing INT_ADD_OVERFLOW
>>   04  LOW       extract.c:1444 create_placeholder_file     FLEXNSIZEOF size_t overflow
>>   05  MEDIUM    extract.c:566 delay_set_stat               Memory leak on reuse path
>>   06  MEDIUM    extract.c:996 apply_nonancestor_delayed..  Shallow xattr_map copy (UAF)
>>   07  MEDIUM    extract.c:985 apply_nonancestor_delayed..  Uninitialized tar_stat_info
>>   08  MEDIUM    extract.c:1898 apply_delayed_link          Uninitialized tar_stat_info
>>   09  MEDIUM    list.c:565    read_header / decode_header  strcmp past magic field
>>   10  MEDIUM    extract.c:1478 create_placeholder_file     Incomplete hash destructor
>>   11  LOW       extract.c:364 check_time                   Signed overflow in time diff
>>   12  LOW       extract.c:953 apply_nonancestor_delayed..  file_name_len=0 underflow
>>   13  LOW       xheader.c:630 decode_record                ptrdiff_t -> int truncation

I didn't look much at the others since I am not very familiar with tar.
Hopefully Paul can quickly tell if they are bogus or not.

>> === REPRODUCTION ===
>> 
>> All bugs were confirmed on tar 1.35 (commit 8e3c8fa17 from
>> https://git.savannah.gnu.org/cgit/tar.git).  Bugs #01, #02, and #14
>> are reproducible without sanitizers on any 64-bit Linux system.  The
>> remaining bugs require ASan, UBSan, or MSan to observe.

This commit hash does not exist, which does not inspire much confidence:

    $ git log 8e3c8fa17
    fatal: ambiguous argument '8e3c8fa17': unknown revision or path not in the working tree.
    Use '--' to separate paths from revisions, like this:
    'git <command> [<revision>...] -- [<file>...]'

Collin

[1] https://git.savannah.gnu.org/cgit/tar.git/commit/?id=a59c819beb4886ee43f16dfd80ec1151fda1abe6
[2] https://git.savannah.gnu.org/cgit/tar.git/commit/?id=c11084bcc2d7d9976570a12263b81d2488066115
[3] https://git.savannah.gnu.org/cgit/tar.git/commit/?id=258d1c44e5ee7c58b28bf0000e9d737df6081885
[4] https://git.savannah.gnu.org/cgit/tar.git/commit/?id=d1e72a536f26188230a147d948b9057714fd0b6b

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.