Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 7 Sep 2018 12:08:47 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Cc: Steffen Nurpmeso <steffen@...oden.eu>
Subject: Re: Regex: behaviour of ? after () atom

On Fri, Sep 07, 2018 at 06:00:46PM +0200, Steffen Nurpmeso wrote:
> Rich Felker wrote in <20180907153302.GM1878@...ghtrain.aerifal.cx>:
>  |On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
>  |> Rich Felker wrote in <20180907151821.GL1878@...ghtrain.aerifal.cx>:
>  |>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
>  |>|> Hello.
>  |>|> 
>  |>|> In perl this is
>  |>|> 
>  |>|>   $x="print 1 2";
>  |>|>   if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ 
>  |>|>     print "<$0> -> <$1> <$2> <$3>\n"
>  |>|>}
>  |>|> 
>  |>|> and the result is
>  |>|> 
>  |>|>   </tmp/t.pl> -> <> <print> < 1 2>
>  |>|> 
>  |>|> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
>  |>|> i maintain, which uses the normal regex stuff and calls it via
>  |>|> 
>  |>|>   echo eins=$3
>  |>|>          vput vexpr i regex "${3}" \
>  |>|>             '^(:[[:space:]]+)?([^[:space:]]+)(.*)$'  \
>  |>|>             '<\$0> -> <\$1> <\$2> <\$3>'
>  |>|>   echo i=$i
>  |>|> 
>  |>|> which in C code does 
>  |>|> 
>  |>|>       if((reflrv = regcomp(&re, argv[2], reflrv))){
>  |>|>           ...
>  |>|>          goto jestr;
>  |>|>}
>  |>|>   fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
>  |>|>   argv[1],argv[2],n_NELEM(rema));
>  |>|>       reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
>  |>|> 
>  |>|> and overall prints
>  |>|> 
>  |>|>   eins=print 1 2
>  |>|>   GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
>  |>|>   i=<print 1 2> -> <> <> <>
>  |>|> 
>  |>|> It works correctly if i remove the ()? atom, so i thought i should
>  |>|> report that.
>  |>|
>  |>|What is the value of the flags argument you passed to regcomp?
>  |>|
>  |> 
>  |> REG_EXTENDED, optional REG_ICASE:
>  |> 
>  |>       reflrv = REG_EXTENDED;
>  |>       if(f & a_ICASE)
>  |>          reflrv |= REG_ICASE;
>  |>       if((reflrv = regcomp(&re, argv[2], reflrv))){
>  |
>  |OK, it looks like that should work, and seemed to work here when I
>  |passed the regex to grep -E linked with musl's regex. Can you provide
>  |a minimal self-contained C program to demonstrate the issue you're
>  |having?
> 
> Happy user that i am, here something for tests/:
> 
>   #include <stdio.h>
>   #include <regex.h>
>   int main(void){
>           regmatch_t rema[1 + 21];
>           regex_t re;
>           int i;
>           
>           i = REG_EXTENDED;
>           if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))
>                   return 2;
>           i = regexec(&re, "print 1 2", 21, rema, 0);
>           regfree(&re);
>           if(i == REG_NOMATCH)
>                   return 3;
>           for(i = 1; i < 21 && rema[i].rm_so != -1; ++i)
>                   ;
>           return (i == 3) ? 0 : 4;
>   }       
> 
> i is 1 here.
> 
>  |BTW which "()?" are you talking about? The whole first parenthesized
>  |subsexpression and the ? after it? I wouldn't call that an atom, but
>  |nothing seems wrong with it.
> 
> I have read regex(7) first just in case something intellectual had
> to be said.  Otherwise i am all for Finnish tango.

Your stopping condition is just wrong -- you're stopping after seeing
that the first subexpression does not match anything, and failing to
inspect the others. If you get rid of that stopping condition and
add code to print the rest, you'll see (each line is i, rm_so, rm_eo):

1 -1 -1
2 0 5
3 5 9
4 -1 -1
5 -1 -1
6 -1 -1
...

Also, for what it's worth, there's no reason to store expressions
temporarily in variables like this:

>           i = REG_EXTENDED;
>           if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))

Just do:

          if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", REG_EXTENDED)))

etc.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.