Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 May 2023 08:46:36 +0200
From: Jₑₙₛ Gustedt <jens.gustedt@...ia.fr>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: [C23 printf 2/3] C23: implement the wN length specifiers
 for printf

Rich,

on Mon, 29 May 2023 21:48:22 -0400 you (Rich Felker <dalias@...c.org>)
wrote:

> On Mon, May 29, 2023 at 09:21:55PM +0200, Jₑₙₛ Gustedt wrote:
> > Rich,
> > 
> > on Mon, 29 May 2023 11:46:40 -0400 you (Rich Felker
> > <dalias@...c.org>) wrote:
> >   
> > > On Mon, May 29, 2023 at 09:14:13AM +0200, Jₑₙₛ Gustedt wrote:  
>  [...]  
>  [...]  
>  [...]  
> > > 
> > > OK I think I can communicate better with code than natural
> > > language text, so here's a diff, completely untested, of what I
> > > had in mind.  
> > 
> > that's ... ugh ... not so prety, I think
> > 
> > In my current version I track the desired width, if there is w
> > specifier, and then do the adjustments after the loop. That takes
> > indeed care of undefined character sequences.
> > 
> > I find that much better readable, and also easier to extend (later
> > there comes the `wf` case and the `128`, and perhaps some day
> > `256`)  
> 
> It sounds like the core issue is that you don't like the state machine
> approach to how musl's printf processes format specifiers.

It is well suited for simple grammars, I agree with that, but here the
grammar is becomming more complex. Be it just for the fact that you'd
have to enlargen the set of possible values to match decimal digits.

> Personally,
> I like it because there's an obvious structured way to validate that
> it's accepting exactly the right things and nothing else, vs an
> approach like what you tried where you ended up accepting a lot of
> bogus specifiers.
> 
> One alternative I would consider is doing something like what you did,
> but moving it outside of/before the state machine loop, so it's not
> mixing the w* processing with the state machine. This avoids accepting
> bogus repeated w32 prefixes and similar (because there is no loop) and
> lets you get by with just adding one PLAIN state to have it start in
> (rather than BARE) after w32. I expect the overall size would be
> similar. Concept attached.

I'll post what I have in a minute. It has the advantage over yours
that it doesn't do the switch on the width inside the automaton and
also that it doesn't have to increase the rows of the matrix.


Thanks
Jₑₙₛ

-- 
:: ICube :::::::::::::::::::::::::::::: deputy director ::
:: Université de Strasbourg :::::::::::::::::::::: ICPS ::
:: INRIA Nancy Grand Est :::::::::::::::::::::::: Camus ::
:: :::::::::::::::::::::::::::::::::::: ☎ +33 368854536 ::
:: https://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.