john-dev - RE: Found cp1251 issue (and likely 8850-1) or many code pages.

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <025201cc4c91$8a280830$9e781890$@net>
Date: Wed, 27 Jul 2011 14:15:25 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: RE: Found cp1251 issue (and likely 8850-1) or many code pages.

>From: Frank Dittrich [mailto:frank_dittrich@...mail.com]
>
>Am 27.07.2011 19:02, schrieb jfoug:
>> The behavior I am working towards, is that when we upcase a string
>> with B5 in it (for cp1251/8859-1), that there will be a xB5 left in
>> the upcased string in the end.
>
>Hi Jim,
>
>I think this behavior is correct for all hash algorithms that have been
>"invented" prior to unicode.

So, it is likely that we may have to have some logic/data into the fmt_main
structure (such as flags or something), that would give hints to the code
running in Unicode.c on exactly how to proceed forward.   Thus, Unicode.c
could quickly check and know how to proceed.  If the format expects to see
xDF -> SS (current Unicode logic I could find), then that would be done.  Or
other formats may have other specifics that would change the behavior done
in john's Unicode.c.

I am glad to hear that leaving lower case chars that do not have matching
upper case (even if Unicode DOES), alone, is the right choice.  I built the
uc() / lc() to work that way, so we should be good.  

Once a string is converted into Unicode in john, and then acted upon (with
Unicode functions), it does not use code page logic.  The only way it
'would', is if we then convert back into code page.  However, the invalid
characters would not convert properly (just like in perl), so it is simply
something that people need to keep in mind, when modifying john.  All
manipulation logic will need to be done in CP, then if the string needs to
be in Unicode to proceed, the very last step is CP -> Unicode.  Rules
already would require this, since it is 8 bit.  I have 8 bit casing working
properly for each CP, so formats that want up/down casing can do this in CP
prior to converting to UTF16, if the format needs that.  

Very good information.  We just have to be careful, and implement that way
(and document the proper way), and then when we DO find anomalies (the xDF
'may' be one of these), that we find out exactly WHAT the original password
hash code did, and make sure we do the same.

Even after this first release, I am SURE there will be changes required,
because assumptions which have been made, turn out to be wrong.  However,
the ground work is solidly done, and it will be easy to modify behavior,
once it is fully know that any assumption was not correct for the actual
hash in the wild.

Jim.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.