Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 26 May 2023 22:51:19 +0200
From: Jₑₙₛ Gustedt <jens.gustedt@...ia.fr>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: [C23 printf 2/3] C23: implement the wN length specifiers
 for printf

Rich,

on Fri, 26 May 2023 16:31:07 -0400 you (Rich Felker <dalias@...c.org>)
wrote:

> On Fri, May 26, 2023 at 09:41:03PM +0200, Jens Gustedt wrote:
> > These are mandatory for C23 and concern all types for which the
> > platform has `int_leastN_t` and `uint_leastN_t`. For musl these
> > types always coincide with `intN_t` and `uintN_t` and are always
> > present for N equal 8, 16, 32 and 64.
> > 
> > They can be added for general use since all lowercase letters were
> > previously reserved.
> > 
> > Nevertheless, users that use these modifiers will see a lot of
> > warnings from compilers in the beginning. This is because the
> > compilers have not yet integrated this form of a specifier into
> > their correponding extensions (gcc attributes). So unfortunately
> > also testing this feature may be a bit noisy for the moment.
> > 
> > The only architecture dependend choice is the type for N == 64,
> > which may be `long` or `long long`. We just mimick the test that is
> > done in other places to compare `UINTPTR_MAX` and `UINT64_MAX` to
> > determine that.
> > ---
> >  src/stdio/vfprintf.c  | 18 ++++++++++++++++--
> >  src/stdio/vfwprintf.c | 18 ++++++++++++++++--
> >  2 files changed, 32 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/stdio/vfprintf.c b/src/stdio/vfprintf.c
> > index cbc79783..1a516663 100644
> > --- a/src/stdio/vfprintf.c
> > +++ b/src/stdio/vfprintf.c
> > @@ -33,7 +33,7 @@
> >  
> >  enum {
> >  	BARE, LPRE, LLPRE, HPRE, HHPRE, BIGLPRE,
> > -	ZTPRE, JPRE,
> > +	ZTPRE, JPRE, WPRE,
> >  	STOP,
> >  	PTR, INT, UINT, ULLONG,
> >  	LONG, ULONG,
> > @@ -57,7 +57,7 @@ static const unsigned char states[]['z'-'A'+1] = {
> >  		S('s') = PTR, S('S') = PTR, S('p') = UIPTR, S('n')
> > = PTR, S('m') = NOARG,
> >  		S('l') = LPRE, S('h') = HPRE, S('L') = BIGLPRE,
> > -		S('z') = ZTPRE, S('j') = JPRE, S('t') = ZTPRE,
> > +		S('z') = ZTPRE, S('j') = JPRE, S('t') = ZTPRE,
> > S('w') = WPRE, }, { /* 1: l-prefixed */
> >  		S('b') = ULONG, S('B') = ULONG,
> >  		S('d') = LONG, S('i') = LONG,
> > @@ -525,8 +525,22 @@ static int printf_core(FILE *f, const char
> > *fmt, va_list *ap, union arg *nl_arg, st=0;
> >  		do {
> >  			if (OOB(*s)) goto inval;
> > +		wpre:
> >  			ps=st;
> >  			st=states[st]S(*s++);
> > +			if (st == WPRE) {
> > +				switch (getint(&s)) {
> > +				case 8:  st = HHPRE; goto wpre;
> > +				case 16: st = HPRE; goto wpre;
> > +				case 32: st = BARE; goto wpre;
> > +#if UINTPTR_MAX >= UINT64_MAX
> > +				case 64: st = LPRE; goto wpre;
> > +#else
> > +				case 64: st = LLPRE; goto wpre;
> > +#endif
> > +				default: goto inval;
> > +				}
> > +			}
> >  		} while (st-1<STOP);
> >  		if (!st) goto inval;  
> 
> I don't see how this works. While you're in this new WPRE state,
> you're accesing an element of the states[] array with a potentially
> out-of-bounds index, because you skipped over the bounds check to
> ensure that the index is valid.

ah, ok, the `wpre` should probably move up a line.

> I'm not clear why you're doing that
> instead of just continuing the loop.
> 
> My preference would be not adding any code at all here and using the
> existing state machine, adding state transitions for the new prefixes
> to it, but that would require expanding the states stable to start at
> '1' instead of 'A', and to have a couple more intermediate states. I'm
> not sure how large that would get. There's a good chance it's
> comparable to the size of any added code, though.

I am not sure that I understand the alternative that you are proposing.

The difficulty that lead to this, is the "BARE" state which now
becomes accessible with a "w32" prefix. Then later, the "BARE" state
assumes (for the stop condition of the loop) that there had not been a
prefix, I think.

> One minor thing with the implementation using getint(): it accepts
> leading zeros, which are not valid here.

right, we should catch that.

Thanks
Jₑₙₛ

-- 
:: ICube :::::::::::::::::::::::::::::: deputy director ::
:: Université de Strasbourg :::::::::::::::::::::: ICPS ::
:: INRIA Nancy Grand Est :::::::::::::::::::::::: Camus ::
:: :::::::::::::::::::::::::::::::::::: ☎ +33 368854536 ::
:: https://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.