Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Wed, 21 Oct 2015 21:07:32 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: locale

I first brought this up off-list, but I think it should be in here:

On Wed, Oct 21, 2015 at 09:21:27AM +0200, magnum wrote:
> On 2015-10-19 22:05, Solar Designer wrote:
> >On Mon, Oct 19, 2015 at 02:51:36AM +0200, magnum wrote:
> >>On 2015-10-18 15:06, Solar Designer wrote:
> >>>BTW, magnum, can we please get rid of the UTF-8 char for degrees?  Don't
> >>>assume everyone has their terminal set to UTF-8 all the time, especially
> >>>as it's a totally unnecessary assumption here.
> >>
> >>I made it configurable but it still defaults to UTF-8. I dislike the
> >>idea of dropping it by default - users might not realize that "GPU:73C"
> >>is a temp reading at all.
> >
> >Maybe check the current locale and default to plain "C" if the current
> >locale is not UTF-8?  To avoid checking env vars explicitly, maybe use
> >mbrtowc() and see what it returns for the UTF-8 character under the
> >current locale?
> 
> I'm now checking/setting locale (if autoconf says I can) and fall back 
> to skipping the degree sign. Let me know if it misbehaves.

This is:

https://github.com/magnumripper/JohnTheRipper/issues/1841
https://github.com/magnumripper/JohnTheRipper/commit/5acb98062d25efb319e9ac4dbd04555693b1d739

Looking at these changes, I realize that my idea was probably bad:
initializing the locale with setlocale() affects lots of things,
including the ctype macros.  With some cracking modes, this might affect
what candidate passwords they generate.  IIRC, we avoided using the
ctype macros in our wordlist rules engine, but now that I grep e.g. for
"islower", I find uses in dynamic_compiler.c, jumbo.c, mask.c.

While we might later choose to add initializing locale to JtR for other
reasons, I think DEGREE_SIGN alone isn't a sufficient reason, and if we
do add locale support, we should make it consistent: initialize it all
the time and do so early on, and not only do it for OpenCL and CUDA
formats like the current code does.

For now, maybe we should in fact check env vars explicitly to decide on
DEGREE_SIGN.

A maybe acceptable hack (for jumbo) is to do something like:

	setlocale(LC_ALL, "");
	... check for UTF-8 here ...
	setlocale(LC_CTYPE, "C");

so that ctype macros are unaffected by the current locale (since our
uses of them appear to be of the kind where we prefer consistency over
customization; arguably, this means they are misuses).  But we'll need
to do it all the time, and early on, to ensure consistent behavior
regardless of whether an OpenCL or CUDA format is run.

Also, the current checks for strchr(setlocale(LC_ALL, NULL), '.') do not
tell us whether the locale is UTF-8 or not.  We'll need to do better.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.