Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 27 Feb 2012 02:52:15 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: -ext:keyboard with 8-bit chars

magnum -

Thank you for reporting this!

On Sun, Feb 26, 2012 at 06:11:20PM +0100, magnum wrote:
> I tried making a custom keyboard external mode for producing German
> keyboard output in iso-8859-1. I doubled the array sizes per the comment
> and also changed the while loop that initializes mc accordingly. At
> first I just entered the characters as '??' and so on, and took care that
> john.conf was encoded in iso-8859-1. But I got a segmentation fault when
> running.
> 
> After some head scratching I tried defining the 8-bit characters as hex
> instead, and this works. Is this requirement a bug or a known
> limitation? A non-jumbo build shows the same limitation.

It's neither a bug nor a known limitation - I was not aware of it
before (or at least I don't recall it), but it's also not exactly a bug.
... or maybe it is given that we have a comment that talks about adding
umlauts, but does not mention this detail.

Here's what this is about:

In C, which the external mode language is similar to, when you assign
a character constant to an int type variable, you may or may not get
sign extension depending on whether the char type is signed or not in a
given C implementation on a given platform and with given compiler
settings. %-)  John's external mode behaves as if char were signed (even
though it does not actually have this type except that you're able to
specify constants using the same syntax as you would for char).  So it
acts as a valid implementation of C in this respect (even though it does
not even try to in some other aspects).  That said, we may want to have
it behave the way a C implementation would for char being unsigned (also
valid) - this may be more convenient for us, as you have found out.
To make this change, you may e.g. edit the two instances of "value =
c_getchar(1)" to "value = (unsigned char)c_getchar(1)" in c_getint().
I did not test this change.

When you simply assign 8-bit characters e.g. to elements of word[], it
does not matter whether they get sign-extended or not because John only
uses the lower 8 bits.  However, Keyboard mode itself uses those chars
as array indices, hence the problem.  We may adjust the Keyboard mode to
avoid the problem e.g. by using " & 0xff" on the three references to k[]
further down in init() (and only in init(), so there's no performance
impact from this change).  I've just tested this, and it works.

In fact, I think we should adjust the Keyboard sample to ship with this
change already made, along with the m[] and mc[] size change and the
mc[] initialization loop change that you made.  (The comment may then be
dropped.)  I will probably commit this change to the main tree and think
about also adjusting compiler.c to treat char constants as unsigned.

> Another thing, the comment "This sample can be enhanced to infer the
> rest of the indices here". What exactly is missing, and what would it
> change? Are we not resuming correctly with current code?

We only infer the length and id[0], but not id[1] and on (they're now
set to 0).  This means that when we interrupt and --restore a Keyboard
mode run, it tries some previously-tested candidate passwords for a
second time.  If we add code to infer the rest of the indices in the
same way that we do for id[0], we'll be continuing from almost exactly
the place where we interrupted (skipping only the last incomplete set of
passwords that were in-flight at the time of interruption, which is
something John must be doing and so it does).  The downside is that the
sample will become longer and more complicated.  But we can try to
implement this and see.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.