Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 10 Jun 2024 14:47:04 +0300
From: Valery Ushakov <>
Subject: Re: Different results with regex.h between Musl and Libc

On Mon, Jun 10, 2024 at 05:38:36 +0000, Nigel Kukard wrote:

> Musl output (Alpine 3.20), musl-1.2.5-r1...
> The input '37' matches the pattern '^([0-9]*)?\.?([0-9]*)?$'
> Match 0: 37
> Match 1:
> Match 2: 37
> Glibc output (ArchLinux), glibc 2.39+r52+gf8e4623421-1...
> The input '37' matches the pattern '^([0-9]*)?\.?([0-9]*)?$'
> Match 0: 37
> Match 1: 37
> Match 2:

I'm not sure what POSIX requires here.  The closest I can find after
skimming through "9. Regular Expressions" is 9.4.6 that ends with:

  An ERE matching a single character repeated by an '*', '?', or an
  interval expression shall not match a null expression unless this is
  the only match for the repetition or it is necessary to satisfy the
  exact or minimum number of occurrences for the interval expression.

I'm not sure what to read into the absense of the usual "or an ERE
enclosed in parentheses" chorus here.

> printf("Match %d: %.*s\n", i, matches[i].rm_eo - matches[i].rm_so, input + matches[i].rm_so);

Nit-pick: regoff_t may be wider than int (expected by '*').  E.g. your
test program prints nothing for all those %.* on NetBSD/macppc (with
the appropriate cast it prints 37/37/<empty>), as regoff_t is 64-bit
(very old posix required regoff_t to be at least as wide as off_t).
It will probably crash on a little-endian 32-bit NetBSD system, b/c
the zero MSW of a 64-bit regoff_t will be interpreted as the argument
for %s.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.