|
Message-ID: <20201103191642.GA31169@openwall.com> Date: Tue, 3 Nov 2020 20:16:42 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Rules characters unicode support. In addition to what magnum wrote: On Tue, Nov 03, 2020 at 03:48:57PM +0100, François wrote: > character substitutions from ASCII to Unicode were hitting some results (a > few hits on a large leak) for example: > seé > suü > scç > soö > saã > soø > snñ > saå > should I just try to use the A"..." command for my niche finding ? BTW, you can: /e Dp Ap"é" This is three commands: search for one character, delete the found character, insert a possibly multi-character string (in our case, just a multi-byte character) in the former character's place. You can also specify the multi-byte character via its hex codes, which makes the .conf file format character set agnostic (so you can have any character set active in your text editor, and it won't matter): /e Dp Ap"\xc3\xa9" However, the rules are indeed not character set agnostic - as written above, the rule produces UTF-8. A difference from the "s" command is that the above rule will find and replace only the first match, whereas "s" would find and replace all. You can reduce this difference by writing multiple rules like this: /e Dp Ap"\xc3\xa9" /e Dp Ap"\xc3\xa9" /e Dp Ap"\xc3\xa9" /e Dp Ap"\xc3\xa9" /e Dp Ap"\xc3\xa9" /e Dp Ap"\xc3\xa9" You can also choose which instances of the character you replace, e.g. to replace only the second: %2e Dp Ap"\xc3\xa9" Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.