john-dev - Big Endian problem (combo utf8 / oracle / mssql)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <4DD933E1.9050407@bredband.net>
Date: Sun, 22 May 2011 18:03:45 +0200
From: magnum <rawsmooth@...dband.net>
To: john-dev@...ts.openwall.com
Subject: Big Endian problem (combo utf8 / oracle / mssql)

I take this to the list, I believe we have been lazy and had a little 
too much discussions off it.

I agree we should decide that all these functions are working with LE, 
document that clearly and do the needed workarounds elsewhere. The code 
for the NET*LM* formats was originally in each file plus some in 
smbencrypt.c. I moved all such stuff to unicode.c. The original code had 
remarks that it was supposed to always return LE regardless of machine 
type, as "NT Unicode" is always UTF-16LE.

The Oracle format is special in that its internal format is BE. I 
remember those SHIFTL and SHIFTR, I wondered if that would end up right. 
As we always get LE from the conversion, and always want BE, the shifts 
should be same for any machine, just like you did.

Were all these formats working correctly in john-1.7.6-jumbo-12 on BE 
systems? It appears to me they can't have!?

magnum


On 2011-05-22 17:30, JFoug wrote:
> Ok, here is the problem. I have oracle and mssql working, but I think I
> will need to 'undo' what I have done.
>
> The problem is that the plaintowcs function (I think). This function is
> returning LE format all the time. I am not sure if this is what it is
> SUPPOSED to do or not, but if so, then we need to make some adjustments
> to the places where it is used. Here is an example (from
> oracle_set_salt() function)
>
> l = plaintowcs((UTF16 *)&out[1], (SALT_SIZE>>1) - 1, (UTF8 *)salt, l-2);
> if (l <= 0)
> l = strlen16(&out[1]);
> // This must be stored in big endian, and uppercase (if ASCII, for now)
> for(i=1;i<=l;i++) {
> out[i] = ((upper(out[i]&0xff) ENDIAN_SHIFT_L) | (out[i] ENDIAN_SHIFT_R));
> }
>
> This fails on BE systems. That is because the SHIFTL and SHIFTR are no
> op, thus do not shift (assume that the data is in right format).
> However, it is in LE format.
>
> This code 'works' with the current plaintowcs
>
> l = plaintowcs((UTF16 *)&out[1], (SALT_SIZE>>1) - 1, (UTF8 *)salt, l-2);
> if (l <= 0)
> l = strlen16(&out[1]);
> // This must be stored in big endian, and uppercase (if ASCII, for now)
> for(i=1;i<=l;i++) {
> out[i] = ((upper(out[i]&0xff) << 8) | (out[i] >> 8));
> }
>
> It simply forces conversion from the 'known' LE format, into BE. NOTE,
> that this is BE 2byte word. It is NOT BE 32 bit. But it appears to be
> what the format 'wants'.
>
>
> Now, to the question. I am pretty sure this is going to be the same
> 'bug' in the other -utf8 work. How do we proceed forward on this bug
> fix? Do we make plaintowcs return BE? That will likely open up other
> can's of worms. Do we fix the fires knowing that plaintowcs always
> returns in LE format, and simply document that fact.
>
> For me, I would rather ALWAYS have it return in one known format. Then
> if the data needs to be in a machine format, it can be converted. I
> think that makes for fewer conditionals (but I am not fully sure). There
> will also be times and places where we have to swap the order of UTF
> characters (and byte swap the 16's), to get data into 4 byte BE format.
> Even if the function returned data in 16 bit BE, we would still have to
> swap. However, if we returned the data in proper LE format, we simply
> can call some 'comnon' 32 bit LE->BE swap function and it works. If the
> data was in 16bit BE, we would have to first call a 16 bit BE->LE, then
> call 32 bit LE->BE (it can be done by swapping adjacent 16 bit words also).
>
> My vote is to leave plaintowcs alone, and KNOW the data is in fully
> proper LE format. But I wanted to here from you, since you have spent
> more time grinding out the code for the unicode crap.
>
> Jim.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.