Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 May 2023 13:28:33 -0400
From: Rich Felker <dalias@...c.org>
To: Jₑₙₛ Gustedt <jens.gustedt@...ia.fr>
Cc: musl@...ts.openwall.com
Subject: Re: [C23 printf 2/3] C23: implement the wN length specifiers
 for printf

On Tue, May 30, 2023 at 08:46:36AM +0200, Jₑₙₛ Gustedt wrote:
> Rich,
> 
> on Mon, 29 May 2023 21:48:22 -0400 you (Rich Felker <dalias@...c.org>)
> wrote:
> 
> > On Mon, May 29, 2023 at 09:21:55PM +0200, Jₑₙₛ Gustedt wrote:
> > > Rich,
> > > 
> > > on Mon, 29 May 2023 11:46:40 -0400 you (Rich Felker
> > > <dalias@...c.org>) wrote:
> > >   
> > > > On Mon, May 29, 2023 at 09:14:13AM +0200, Jₑₙₛ Gustedt wrote:  
> >  [...]  
> >  [...]  
> >  [...]  
> > > > 
> > > > OK I think I can communicate better with code than natural
> > > > language text, so here's a diff, completely untested, of what I
> > > > had in mind.  
> > > 
> > > that's ... ugh ... not so prety, I think
> > > 
> > > In my current version I track the desired width, if there is w
> > > specifier, and then do the adjustments after the loop. That takes
> > > indeed care of undefined character sequences.
> > > 
> > > I find that much better readable, and also easier to extend (later
> > > there comes the `wf` case and the `128`, and perhaps some day
> > > `256`)  
> > 
> > It sounds like the core issue is that you don't like the state machine
> > approach to how musl's printf processes format specifiers.
> 
> It is well suited for simple grammars, I agree with that, but here the
> grammar is becomming more complex. Be it just for the fact that you'd
> have to enlargen the set of possible values to match decimal digits.

I don't think it's really any more complex. It's just a few gratuitous
aliases that have a very small number of edge paths. The wf ones
almost entirely collapse with the w ones, and if we wanted to get rid
of the gratuitous separate hh/h loading, they'd entirely collapse. But
the version I posted as code is probably enough smaller to be
perferable. I guess I should take a look at that and see...

> > Personally,
> > I like it because there's an obvious structured way to validate that
> > it's accepting exactly the right things and nothing else, vs an
> > approach like what you tried where you ended up accepting a lot of
> > bogus specifiers.
> > 
> > One alternative I would consider is doing something like what you did,
> > but moving it outside of/before the state machine loop, so it's not
> > mixing the w* processing with the state machine. This avoids accepting
> > bogus repeated w32 prefixes and similar (because there is no loop) and
> > lets you get by with just adding one PLAIN state to have it start in
> > (rather than BARE) after w32. I expect the overall size would be
> > similar. Concept attached.
> 
> I'll post what I have in a minute. It has the advantage over yours
> that it doesn't do the switch on the width inside the automaton and
> also that it doesn't have to increase the rows of the matrix.

I don't understand how that's supposed to be an improvement. It
duplicates the state machine logic for the final types with a bunch of
conditionals in code (mainly to handle %n vs integer specifiers)
rather than letting the state machine spit out the right type as its
final state naturally.

Mine didn't have any switch inside the automaton. It did it before the
automaton processing (since the 'w' can only appear at the beginning,
it can be thought of as just setting the initial state).

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.