musl - Re: Fix strverscmp

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20121205182101.d997eb6e.idunham@lavabit.com>
Date: Wed, 5 Dec 2012 18:21:01 -0800
From: Isaac Dunham <idunham@...abit.com>
To: musl@...ts.openwall.com
Subject: Re: Fix strverscmp

On Wed, 5 Dec 2012 20:00:00 -0500
Rich Felker <dalias@...ifal.cx> wrote:

> > This should not be equal, but not for the reason I'd expected: a
> > leading 0 is supposed to be interpreted as "0.0". Decimal points are
> > not factored in...
> 
> My understanding of the code is that, after it hits the first place
> where the strings differ, it switches to reading a digit string as a
> decimal number, and the result is the difference of those decimal
> numbers. I just used the decimal point as an example because it
> terminates the loop.
More-or-less, except that if the character is a number, it is expected to backtrack to the start of the longest substring which is purely numeric (the specification is rather insane, I think)

> I also don't understand what you mean about leading zero. If leading
> zeros are not considered equal to the number without leading zeros,
They are explicitly stated to be different in the man page:
"
What  this  function does is the following.  If both strings are equal,
return 0.  Otherwise find the position between two bytes with the prop-
erty  that  before  it  both strings are equal, while directly after it
there is a difference.  Find the largest consecutive digit strings con-
taining  (or  starting at, or ending at) this position.  If one or both
of these is empty, then  return  what  strcmp(3)  would  have  returned
(numerical  ordering  of  byte  values).  Otherwise, compare both digit
strings numerically, where digit strings with one or more leading zeros
are  interpreted  as  if they have a decimal point in front (so that in
particular digit strings with more  leading  zeros  come  before  digit
strings  with fewer leading zeros).  Thus, the ordering is 000, 00, 01,
010, 09, 0, 1, 9, 10.
"
> then a simple algorithm would be to, on the first mismatching byte,
> remember the difference, then count the number of consecutive digits
> starting at that position in both strings. If this count is the same
> (possibly zero) for both strings, return the saved byte difference. If
> the count is different, consider the string with fewer digits to be
> less than the one with more digits. This is trivial to implement with
> no arithmetic, but I'm not sure it matches the original semantics.
Hmm...
I'm getting the idea that that may actually work...in which case my last version is unneeded.
Except, it breaks here:
00123
001145 <-should be the lesser (the leading zeros)

OTOH, it could be done by recording the zero while walking up the chain:
/*NOT tested*/
while (*l && *r && l[0]==r[0]){
	if (l[0]='0'){
		nozero=1;
	} else if (!isdigit(l[0])) {
		nozero=0;
	}
	l++; r++;
}
if ((isdigit(l[0]) && isdigit(r[0]) && nozero) {
	//return the one with the longer substring of numbers
} else {
	return (l[0] -  r[0]);
}
-- 
Isaac Dunham <idunham@...abit.com>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.