Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 25 Jul 2011 22:39:11 +0200
From: Frank Dittrich <frank_dittrich@...mail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Character encoding 'how-to' and patch 0009

Am 25.07.2011 16:26, schrieb JimF:
> If simple '8-bit' fixed size character encoding (wide char encodings
> are not listed in this howto).
>
> 1. Build arrays of to-upper and to-lower values in rules.c. These
> arrays have to be the upper and matching lower case values, listed in
> the same order. If there are upper case only, or lower case only
> letters, then build a separate array for them.

I assume you mean characters which don't have a corresponding upper or
lower case character within the code page in question.
E.g., Ÿ (Unicode code point U+0178) is the upper case character for ÿ
(Unicode code point U+00FF), but only ÿ (latin small letter y with
diaresis) is part of iso-latin1.
For me, it is not clear whether or not ÿ should be converted to Ÿ when
applying rule u.

Another example: ß (U+00DF, latin small letter sharp s, aka German
Eszett, is a lower case character, which doesn't have an upper case version.
Even though recently (unicode version 5.1) ẞ (U+1E9E, latin capital
letter sharp s) has been added, hardly any user knows that this letter
exists, let alone how to enter such a character.
As far as I know, this character is meant either for small caps fonts,
or for writing EVERYTHING IN UPPER CASE...
(With a German keybord layout, you cannot enter this character by
pressing <shift>-<ß>.)

> 5. within unicode.c, add code into utf16toplain() to handle the
> conversion from utf16 back into the 8 bit character set.
>

What about Unicode characters which don't have a representation in the
single-byte code page?
(May be I would find out by just reading the source code...)


Frank

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.