Date: Fri, 2 Apr 2010 01:20:08 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: rule and encoding wordlist On Wed, Mar 31, 2010 at 09:23:47PM +0200, websiteaccess@...il.com wrote: > I use JTR 1.7.5 with latest patches, os X, terminal is UTF-8. > > With following rule (below) and a wordlist (1 word "tro") encoded > Western (Windows Latin 1) , end of line Windows (CRLF) > > >\r[00-9A-C] A\p0[0-9A-D],him, $1 > > I get > > iMac-de-xxx-xx:run xxxxx$ ./john -w:testmot.txt -rules -stdout > himtro1 > thimro1 > trhimo1 > trohim1 > words: 4 time: 0:00:00:00 100.00% (ETA: Wed Mar 31 21:10:40 Looks good. However, if you actually have any 8-bit character of the iso-8859-1 encoding in a wordlist entry, then it may/will be displayed improperly on your UTF-8 terminal, and indeed it will be tested in the iso-8859-1 encoding against your hashes (which may or may not be what you want). > With the same rule but this time my wordlist is unicode UTF-8 , end of > line Windows (CRLF) or Unix (LF) > > I get : > > john -w:testmot.txt -rules -stdout > himtro1 > ?him??tro1 > ?him?tro1 > himtro1 > thimro1 > trhimo1 > trohim1 > words: 7 time: 0:00:00:00 100.00% (ETA: Wed Mar 31 21:12:24 2010) > > As you can see some unwanted "?" are now included in the word > generated. Did you mean the question mark character, or did some other character get replaced by a question mark when you sent your message? Anyhow, JtR does not support multi-byte characters (in fact, it is unaware of what character encoding your wordlist is in). Most of the time, this is not a problem for UTF-8, because JtR will simply pass any UTF-8 characters from a wordlist into its password hashing routines verbatim (treating the multi-byte characters as multiple single-byte characters, which works just fine). However, when you try to insert strings at arbitrary character positions, you have a problem. The rule you have mentioned may try inserting the string "him" inbetween individual bytes of a single multi-byte UTF-8 character, thereby breaking that character. In your sample JtR output above, the input word appears to be treated as being 6 single-byte characters long. Since it does not appear to actually use any multi-byte characters, a guess is that your wordlist file starts with the BOM character, which is 3 bytes in UTF-8. It is this character that gets broken by having "him" inserted inbetween its bytes. This also results in the extra output lines. > I use very large wordlist (up to 60 gigas) , I can't reencode them. You also posted another message: http://www.openwall.com/lists/john-users/2010/04/01/2 stating that you "reencoded file with Unicode (UTF-8, no BOM)". That's nice, however please be aware that this won't save you from having other multi-byte UTF-8 characters broken and extra output lines produced by the rule you mentioned. You might not care, though, because hopefully there are relatively few wordlist entries with multi-byte UTF-8 characters, and the only impact is having JtR try extra candidate passwords (it will also generate and try all the correct/intended ones). A way to avoid this problem would be to have your wordlist in the iso-8859-1 encoding (which uses single-byte 8-bit characters) and to have an external filter() convert from iso-8859-1 to UTF-8. The filter() would be applied after the rules, thereby avoiding the problem with the lack of UTF-8 support in the rules engine, yet probing UTF-8 encoded strings against your hashes. Someone should write such an external filter(). Alexander P.S. I'd like to remind you that whenever you need to post a follow-up to your own posting, you need to "reply" to that posting (either as you received it via the list or as you sent it - you do save sent messages in a "Sent" folder, don't you). Don't post such follow-ups to the list anew. Doing so starts a new thread in the list archives, which is what happened this time (you started two separate threads for the same topic).
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.