Date: Thu, 23 Jul 2015 19:24:20 +0200 From: Marek Wrzosek <marek.wrzosek@...il.com> To: john-users@...ts.openwall.com Subject: Re: Bleeding jumbo now defaults to UTF-8 W dniu 22.07.2015 o 18:23, magnum pisze: > On 2015-07-22 16:34, Marek Wrzosek wrote: >> What is the one - proper way to use --inc=utf8 in new bleeding-jumbo? >> I mean, which encoding option we should use - --input-encoding=utf-8, >> --target-encoding=utf-8, --internal-encoding=utf-8 or just >> --encoding=utf-8. Because none seems to work in case of --inc=utf8. >> For --inc=latin1 --target-encoding=cp1252 is mandatory for pot file >> to be utf-8 only and not mixed with other encodings. > > The thing that mandates what encoding to use is what actual encoding was > used by the system producing the hashes in the first place. If it's > UCS-2/UTF-16 (eg. NT or MSSQL) you can use any encoding but if not, you > *need* to tell JtR about what -target-enc to use (unless it's your > default). > > After the above is established: Will you give your input in some *other* > encoding that your target (or default) encoding? In case of incremental > mode that would not make any sense: You must use an incremental mode > that corresponds with your encoding (any other approach would be slow). > So instead of -target-encoding, just use -enc (a.k.a -input-enc) instead > and do not specify any -target-enc (or set it same, that's the default). > > Now, if you targeted old web hashes and picked -enc=latin1, you can use > -inc=latin1. The default is -inc=ascii so it will always work, but > things like "-inc=utf8 -enc=latin1" will definitely produce garbage. > > -internal-encoding does not apply to incremental mode. It's only used in > case of "utf8 wordlist -> rules -> utf8/16 hashes" and for "mask mode -> > utf8/16 hashes" (if your mask contains non-ascii). > >> PS. Without any encoding options there are characters that are not from >> utf-8. The same with --enc=raw. Is there a bug with utf8 incremental >> mode after defaulting to utf-8? > > Incremental mode was not written with multi-byte charsets like UTF-8 in > mind, so will sometimes produce some worthless invalid characters. You > can add "-ext:filter_utf8" to filter them out but for fast formats it's > better to just ignore them: The filter is much slower than the waste it > mitigates. > > magnum > So this -inc=utf8 is producing garbage, because incremental mode isn't ready to charsets like utf-8. Why is utf8.chr available then? What is the format of .chr file? Is it possible to adapt current code of incremental mode to generate Unicode characters from plane 0 using utf-16 (e.g. using short instead of char)? If yes, then making utf-8 from them should be simply by encoding using this format (by using bit fields or shifting bits - whatever is faster). -- Marek Wrzosek marek.wrzosek@...il.com
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.