musl - Re: Re: [PATCH 0/4] Refactor and expand string functions.

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130205140535.GE20323@brightrain.aerifal.cx>
Date: Tue, 5 Feb 2013 09:05:35 -0500
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: Re: [PATCH 0/4] Refactor and expand string functions.

On Tue, Feb 05, 2013 at 12:19:10PM +0100, Szabolcs Nagy wrote:
> * Nathan McSween <nwmcsween@...il.com> [2013-02-04 20:25:53 -0800]:
> > /**
> >  * memchr - Word sized c standard memchr.
> >  * @s: Source
> >  * @c: Character
> >  * @n: Max size of @s
> >  */
> > void *memchr(const void *s, int c, size_t n)
> > {
> > 	const unsigned char *cs = (const unsigned char *)s;
> > 	const size_t *w;
> > 
> > 	c = (unsigned char)c;
> > 
> > 	for (; (uintptr_t)cs % sizeof(size_t); cs++, n--) {
> > 		if (!n) return NULL;
> > 		if (*cs == c) return (void *)cs;
> > 	}
> > 
> > 	for (w = (const size_t *)cs; !word_has_char(*w, c); w++, n--)
> > 		if (!n) return NULL;
> > 	for (cs = (const unsigned char *)w; *cs != c; cs++, n--)
> > 		i
> 
> did you test this code?
> 
> w++ but n-- does not seem right

You mean it should be n -= sizeof(size_t) or whatever?

> > #define WORD_LSB_ONE ((size_t)-1 / (unsigned char)-1)
> > #define WORD_MSB_ONE (WORD_LSB_ONE * ((unsigned char)-1 / 2 + 1))
> > 
> > /**
> >  * word_has_zero - Word has a zero character
> >  * @w: Word
> >  */
> > static inline char word_has_zero(size_t w)
> > {
> > 	return !!((w - WORD_LSB_ONE) & (~w & WORD_MSB_ONE));
> > }
> > 
> > /**
> >  * word_has_char - Word has a character
> >  * @w: Word
> >  */
> > static inline char word_has_char(size_t w, char c)
> > {
> > 	return !!((w - WORD_LSB_ONE)
> > 		  & ((~w & WORD_MSB_ONE)^(WORD_LSB_ONE
> 
> the code is mostly ok: don't use "char c" argument, use
> unsigned char or int, same for return value
> 
> but i don't like the style of this
> 
> maybe it's just me but the WORD_LSB_ONE is a long name
> that does not help me understand what it is, it could
> be W1 vs W128 and the same amount of explanation would
> be embedded in the name (but much shorter and thus easier
> to recognize, things fit on one line etc, it assumes 8bit
> char though)

Honestly I would prefer the whole inline function be replaced with
just a single-line macro that could be pasted where needed. IMO this
makes the code more readable because you don't have to go looking at
other functions.

> the comments are bad (lot of text with no content)

They're doxygen format I believe.

> and the code actually does magic that may need explanation
> it's rather frustrating when you try to understand what's
> going on and have to read lot of useless comments..

I agree. I don't like comments except to explain (in natural language)
what's going on in a part of the code where the *reason* for doing
something is non-obvious. The only thing non-obvious in these files is
word_has_char, but it's obvious what it does from the name. (Still, I
prefer the original macro names, but that's just me; dunno what others
think.)

> i dont see how word_has_char works, the name suggests that
> it tests if w has c in it, but that's not what it does

That's what it's supposed to do.

> (and the comment is actively misleading: only shows one arg,
> which demonstrates why i dislike these kinds of comments,
> thye cause more harm than good)

:)

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.