Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 31 Jul 2011 17:03:26 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: Character casing question for U+0131

This is asked to the list as a whole, in hopes that someone will have the
answer.  I am hoping this can be easily answered, and then all formats can
conform to a single method.

 

 

Ok, in cp850, character at 0xD5 is U+0131.  This is the undotted lower case
'I' character.  Now, in Unicode, this character DOES upcase. It upcases to
normal cap I.  Thus, this is a non-circular upcase.  In cp850,
lc(uc(char(0xD5))) != char(0xD5), but instead == char(0x69)

 

If this is the proper behavior, then what I have right now, in Unicode.c and
rules.c is correct.  I handle cp850 differently, as after the normal
building of the upcase/downcase set of arrays, I change one element in the
upcase array, to handle character 0xD5 upcasing into character 0x49.  This
works great, AS LONG AS the actual formats, and or OS code page logic works
that way (for cp850).

 

Hopefully someone has the answer to this.

 

Jim.

 

 


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.